The Occasional Occurence

Customizing the Python Import System

July 31, 2008 at 10:39 PM | categories: Python, Software, work, computing, General

So I've been programming with Python since 2001 and I've never had the need to do anything that the standard import system didn't provide - until this week. We are planning on a little code reorganization for a project at work in preparation for collaboration from more developers. I wrote a simple custom importer/loader that let's a developer write

from application.widgets import foobar

instead of the longer

from application.widgets.foobar.widget import foobar

and the class foobar winds up in globals().

It's not groundbreaking functionality but it actually does add a little clarity in our situation. The whole task it was made quite simple by the features introduced in PEP 302 (that document is a great reference). Now, before anyone suggests that we could have just pulled the classes in via a __init__.py in the application/components directory, note that some components might depend on others which have not been imported and thus their imports would fail.

Anyhow, like I said, it isn't groundbreaking, but the very fact that you can customize Python's import system is neat. I got to thinking about what other ways I could hack the import system, and came up with a little web importer. I'll post the code below, only because I think it is a clever trick, not that it is something to use in development of a Real Application.

"""
Stupid Python Trick - import modules over the web.
Author: Christian Wyglendowski
License: MIT (http://dowski.com/mit.txt)
"""

import httplib
import imp
import sys

def register_domain(name):
    WebImporter.registered_domains.add(name)
    parts = reversed(name.split('.'))
    whole = []
    for part in parts:
        whole.append(part)
        WebImporter.domain_modules.add(".".join(whole))

class WebImporter(object):
    domain_modules = set()
    registered_domains = set()

    def find_module(self, fullname, path=None):
        if fullname in self.domain_modules:
            return self
        if fullname.rsplit('.')[0] not in self.domain_modules:
            return None
        try:
            r = self._do_request(fullname, method="HEAD")
        except ValueError:
            return None
        else:
            r.close()
            if r.status == 200:
                return self
        return None

    def load_module(self, fullname):
        if fullname in sys.modules:
            return sys.modules[fullname]
        mod = imp.new_module(fullname)
        mod.__loader__ = self
        sys.modules[fullname] = mod
        if fullname not in self.domain_modules:
            url = "http://%s%s" % self._get_host_and_path(fullname)
            mod.__file__ = url
            r = self._do_request(fullname)
            code = r.read()
            assert r.status == 200
            exec code in mod.__dict__
        else:
            mod.__file__ = "[fake module %r]" % fullname
            mod.__path__ = []
        return mod

    def _do_request(self, fullname, method="GET"):
        host, path = self._get_host_and_path(fullname)
        c = httplib.HTTPConnection(host)
        c.request(method, path)
        return c.getresponse()

    def _get_host_and_path(self, fullname):
        tld, domain, rest = fullname.split('.', 2)
        path = "/%s.py" % rest.replace('.', '/')
        return ".".join([domain, tld]), path

sys.meta_path = [WebImporter()]

You can use it like so:

import webimport
webimport.register_domain('dowski.com')
from com.dowski import test

That would fetch and import http://dowski.com/test.py.

There may be other Python libraries out there that do this better - I couldn't find any with a quick Google search. I can think of a number of features would be needed for a serious implementation of something like this (caching, HTTP-AUTH, signatures, remote package support, etc). For now though I'm just throwing this out there because I think it is neat.

Anyone else doing neat tricks with the import hooks that Python exposes?

Converting docx Files

April 16, 2008 at 01:34 PM | categories: Python, Software, work, computing, General

I'm working on an OOXML implementation in Python and found this handy utility for converting docx files to rtf.

Docx2Rtf

It seems to open docx files that Word complains about, but at least it let's me know that I am on the right track. Also, it runs under Wine on Linux, so there is no need for a virtual or non-virtual machine running Windows.

Command History

April 12, 2008 at 10:52 AM | categories: Python, work, computing, General

Since all the cool kids are doing it ...

Work laptop-----------

christian@yga-dowski:~$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}' |sort -rn|head
82 sudo
68 vim
51 ls
49 cd
48 exit
20 hg
16 rm
16 ipython
14 py.test
10 ping

Apparently I do a lot of exiting. I just started using Mercurial for local revision control, hence the presence of hg.

Dev server---------- (where I actually do most of my work)

cmw@watson:~/svn/g2wc$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}' |sort -rn|head
158 vim
81 make
70 svn
55 rm
34 ls
33 cd
15 sudo
11 exit
7 kinit
7 htop

The make commands encapsulate many calls to python and py.test.

Reading Chunked HTTP/1.1 Responses

April 02, 2008 at 12:35 AM | categories: Python, work, computing, cherrypy, General

For work today I wanted a way to iterate over an HTTP response with chunked transfer-coding on a chunk-for-chunk basis. I didn't see a builtin way to do that with httplib. It supports chunked reads but you have to specify the amount that you want to read if you don't want it to buffer. I just wanted it to read and yield each chunk that it received from the server.

For my first crack at it I really just tried to use the httplib basics:

import httplib

conn = httplib.HTTPConnection('localhost:8080')
conn.request('GET', '/')
r = conn.getresponse()
data = r.read(10)
while data:
    print data
    data = r.read(10)

That worked but since I won't know the chunk size in real-life, I would probably get output similar to this:

Chunk 0
Ch
unk 1
Chun
k 2
Chunk
3
Chunk 4
...

I really wanted that chunk-for-chunk iteration. After taking a look at the very readable httplib source this evening, it wasn't very hard to accomplish. I basically just took the httplib.HTTPResponse._read_chunked method and modified it to be a generator. I subclassed HTTPResponse and stuck my generator in an __iter__ method. Behold; now you can do this sort of thing:

if __name__ == "__main__":
    import httplib
    import iresponse
    conn = httplib.HTTPConnection('localhost:8080')
    conn.response_class = iresponse.IterableResponse
    conn.request('GET', '/')
    r = conn.getresponse()
    for chunk in r:
        print chunk

With nice results like this:

Chunk 0
Chunk 1
Chunk 2
Chunk 3
Chunk 4
...

You can download the iresponse module from my projects site. There is also a small CherryPy application that serves some data with chunked transfer-coding in case any of you want to fiddle with it.

SimpleParse Plug

December 19, 2007 at 01:37 AM | categories: Python, work, General

I've been doing more parsing stuff at work lately. For my latest project I've been using the SimpleParse library. It has quickly overtaken PLY as my Python parsing library of choice.

Here's a simple calculator example using SimpleParse. It does basic arithmetic and allows you to store values in single letter variable names. It basically just validates the line that you enter and then either evals or execs it as Python code.

#!/usr/bin/env python
from simpleparse.parser import Parser
from simpleparse.common import numbers
from simpleparse.error import ParserSyntaxError

grammar = '''
command     := !, assign/expr
assign      := varname, ts, '=', !, ts, expr
expr        := (lpar?,
                ts, operand, ts, (op, !, ts, operand, ts)*, ts,
               rpar?)+
lpar        := '('
rpar        := ')'
operand     := lpar?, ts, number/varname, ts, rpar?
varname     := [a-z]
op          := [+-*/]

        := [ \t]*
'''

class Calculator(object):
    def __init__(self):
        self.scanner = Parser(grammar, 'command')

    def parse(self, command):
        try:
            success, subtags, nextchar = self.scanner.parse(command)
        except ParserSyntaxError, e:
            return str(e)
        else:
            if subtags[0][0] == 'expr':
                try:
                    return eval(command)
                except Exception, e:
                    print "Error:", e.args[0]
            elif subtags[0][0] == 'assign':
                exec command in globals()

if __name__ == '__main__':
    calc = Calculator()
    def prompt():
        return raw_input('> ')

    command = prompt()
    while command != "quit":
        result = calc.parse(command)
        if result:
            print result
        command = prompt()

There it is, in all of its uncommented glory.

Here's an example session with the calculator:

> 2 + 2
4
> x = 4
> 8 * x
32
> y = 8 * x
> y
32
> 5 +
ParserSyntaxError: Failed parsing production "expr" @pos 3 (~line 1:3).
Expected syntax: operand
Got text: ''
> import shutil
Error: invalid syntax
> quit

Here are some of my favorite things about SimpleParse.

The entire grammar is declared in the `EBNF <http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_. This is a breath of fresh air coming from PLY, where the grammar rules are scattered amongst the action-code (in docstrings).

It has a clean API. The API is very succinct, and thus you get to know it fairly quickly. Creating Processors for parsed text is quite straightforward.

It's Fast. It's built on the fastfastfast mx.TextTools library from eGenix. It converts your EBNF grammar to mx.TextTools tag tables and pushes the heavy lifting off to the mx.TextTools tagging engine.

Some stuff that I found odd includes...

No Simple Tutorial. Maybe I'll do something about that using the calculator example above. The library is really great, but there is a fairly steep learning curve at this point due to the lack of a basic tutorial.

No Separate Lexing Stage. Unlike PLY and other traditional parser generators, you don't generate a token stream and then parse that according to your grammar. SimpleParse generates an mx.TextTools result tree which is then processed (typically) by SimpleParse DispatchProcessor subclass.

Error Handling SimpleParse seems to like to just stop parsing when it reaches a syntax item that it can't parse. You need to use a special token (an exclamation point, !) in your grammar to denote where you want to get picky about syntax. Although it is a bit tricky, it seems quite flexible so far.

Whitespace Handling. There is no way (that I know of) to completely ignore certain characters. PLY let me define characters that were simply ignored, as if they weren't even there. In the calculator above, I had to insert a ts production wherever I anticipate a potential series of tab or space characters in the input. However, by putting the production name in angle brackets the matched text is kept out of the result tree.

I'll end this now by encouraging anyone interested in parsing a little (or big!) language using Python to check out SimpleParse. It's clean API and speed make it very nice to work with.

« Previous Page -- Next Page »