The Occasional Occurence
Customizing the Python Import System
July 31, 2008 at 10:39 PM | categories: Python, Software, work, computing, GeneralSo I've been programming with Python since 2001 and I've never had the need to do anything that the standard import system didn't provide - until this week. We are planning on a little code reorganization for a project at work in preparation for collaboration from more developers. I wrote a simple custom importer/loader that let's a developer write
from application.widgets import foobar
instead of the longer
from application.widgets.foobar.widget import foobar
and the class foobar winds up in globals().
It's not groundbreaking functionality but it actually does add a little clarity in our situation. The whole task it was made quite simple by the features introduced in PEP 302 (that document is a great reference). Now, before anyone suggests that we could have just pulled the classes in via a __init__.py in the application/components directory, note that some components might depend on others which have not been imported and thus their imports would fail.
Anyhow, like I said, it isn't groundbreaking, but the very fact that you can customize Python's import system is neat. I got to thinking about what other ways I could hack the import system, and came up with a little web importer. I'll post the code below, only because I think it is a clever trick, not that it is something to use in development of a Real Application.
""" Stupid Python Trick - import modules over the web. Author: Christian Wyglendowski License: MIT (http://dowski.com/mit.txt) """ import httplib import imp import sys def register_domain(name): WebImporter.registered_domains.add(name) parts = reversed(name.split('.')) whole = [] for part in parts: whole.append(part) WebImporter.domain_modules.add(".".join(whole)) class WebImporter(object): domain_modules = set() registered_domains = set() def find_module(self, fullname, path=None): if fullname in self.domain_modules: return self if fullname.rsplit('.')[0] not in self.domain_modules: return None try: r = self._do_request(fullname, method="HEAD") except ValueError: return None else: r.close() if r.status == 200: return self return None def load_module(self, fullname): if fullname in sys.modules: return sys.modules[fullname] mod = imp.new_module(fullname) mod.__loader__ = self sys.modules[fullname] = mod if fullname not in self.domain_modules: url = "http://%s%s" % self._get_host_and_path(fullname) mod.__file__ = url r = self._do_request(fullname) code = r.read() assert r.status == 200 exec code in mod.__dict__ else: mod.__file__ = "[fake module %r]" % fullname mod.__path__ = [] return mod def _do_request(self, fullname, method="GET"): host, path = self._get_host_and_path(fullname) c = httplib.HTTPConnection(host) c.request(method, path) return c.getresponse() def _get_host_and_path(self, fullname): tld, domain, rest = fullname.split('.', 2) path = "/%s.py" % rest.replace('.', '/') return ".".join([domain, tld]), path sys.meta_path = [WebImporter()]
You can use it like so:
import webimport webimport.register_domain('dowski.com') from com.dowski import test
That would fetch and import http://dowski.com/test.py.
There may be other Python libraries out there that do this better - I couldn't find any with a quick Google search. I can think of a number of features would be needed for a serious implementation of something like this (caching, HTTP-AUTH, signatures, remote package support, etc). For now though I'm just throwing this out there because I think it is neat.
Anyone else doing neat tricks with the import hooks that Python exposes?
cw
Converting docx Files
April 16, 2008 at 01:34 PM | categories: Python, Software, work, computing, GeneralI'm working on an OOXML implementation in Python and found this handy utility for converting docx files to rtf.
It seems to open docx files that Word complains about, but at least it let's me know that I am on the right track. Also, it runs under Wine on Linux, so there is no need for a virtual or non-virtual machine running Windows.
cw
Command History
April 12, 2008 at 10:52 AM | categories: Python, work, computing, GeneralSince all the cool kids are doing it ...
Work laptop-----------
christian@yga-dowski:~$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}' |sort -rn|head 82 sudo 68 vim 51 ls 49 cd 48 exit 20 hg 16 rm 16 ipython 14 py.test 10 ping
Apparently I do a lot of exiting. I just started using Mercurial for local revision control, hence the presence of hg.
Dev server---------- (where I actually do most of my work)
cmw@watson:~/svn/g2wc$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}' |sort -rn|head 158 vim 81 make 70 svn 55 rm 34 ls 33 cd 15 sudo 11 exit 7 kinit 7 htop
The make commands encapsulate many calls to python and py.test.
Reading Chunked HTTP/1.1 Responses
April 02, 2008 at 12:35 AM | categories: Python, work, computing, cherrypy, GeneralFor work today I wanted a way to iterate over an HTTP response with chunked transfer-coding on a chunk-for-chunk basis. I didn't see a builtin way to do that with httplib. It supports chunked reads but you have to specify the amount that you want to read if you don't want it to buffer. I just wanted it to read and yield each chunk that it received from the server.
For my first crack at it I really just tried to use the httplib basics:
import httplib conn = httplib.HTTPConnection('localhost:8080') conn.request('GET', '/') r = conn.getresponse() data = r.read(10) while data: print data data = r.read(10)
That worked but since I won't know the chunk size in real-life, I would probably get output similar to this:
Chunk 0 Ch unk 1 Chun k 2 Chunk 3 Chunk 4 ...
I really wanted that chunk-for-chunk iteration. After taking a look at the very readable httplib source this evening, it wasn't very hard to accomplish. I basically just took the httplib.HTTPResponse._read_chunked method and modified it to be a generator. I subclassed HTTPResponse and stuck my generator in an __iter__ method. Behold; now you can do this sort of thing:
if __name__ == "__main__": import httplib import iresponse conn = httplib.HTTPConnection('localhost:8080') conn.response_class = iresponse.IterableResponse conn.request('GET', '/') r = conn.getresponse() for chunk in r: print chunk
With nice results like this:
Chunk 0 Chunk 1 Chunk 2 Chunk 3 Chunk 4 ...
You can download the iresponse module from my projects site. There is also a small CherryPy application that serves some data with chunked transfer-coding in case any of you want to fiddle with it.
cw
SimpleParse Plug
December 19, 2007 at 01:37 AM | categories: Python, work, GeneralI've been doing more parsing stuff at work lately. For my latest project I've been using the SimpleParse library. It has quickly overtaken PLY as my Python parsing library of choice.
Here's a simple calculator example using SimpleParse. It does basic arithmetic and allows you to store values in single letter variable names. It basically just validates the line that you enter and then either evals or execs it as Python code.
#!/usr/bin/env python from simpleparse.parser import Parser from simpleparse.common import numbers from simpleparse.error import ParserSyntaxError grammar = ''' command := !, assign/expr assign := varname, ts, '=', !, ts, expr expr := (lpar?, ts, operand, ts, (op, !, ts, operand, ts)*, ts, rpar?)+ lpar := '(' rpar := ')' operand := lpar?, ts, number/varname, ts, rpar? varname := [a-z] op := [+-*/] := [ \t]* ''' class Calculator(object): def __init__(self): self.scanner = Parser(grammar, 'command') def parse(self, command): try: success, subtags, nextchar = self.scanner.parse(command) except ParserSyntaxError, e: return str(e) else: if subtags[0][0] == 'expr': try: return eval(command) except Exception, e: print "Error:", e.args[0] elif subtags[0][0] == 'assign': exec command in globals() if __name__ == '__main__': calc = Calculator() def prompt(): return raw_input('> ') command = prompt() while command != "quit": result = calc.parse(command) if result: print result command = prompt()
There it is, in all of its uncommented glory.
Here's an example session with the calculator:
> 2 + 2 4 > x = 4 > 8 * x 32 > y = 8 * x > y 32 > 5 + ParserSyntaxError: Failed parsing production "expr" @pos 3 (~line 1:3). Expected syntax: operand Got text: '' > import shutil Error: invalid syntax > quit
Here are some of my favorite things about SimpleParse.
The entire grammar is declared in the `EBNF <http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_. This is a breath of fresh air coming from PLY, where the grammar rules are scattered amongst the action-code (in docstrings).
It has a clean API. The API is very succinct, and thus you get to know it fairly quickly. Creating Processors for parsed text is quite straightforward.
It's Fast. It's built on the fastfastfast mx.TextTools library from eGenix. It converts your EBNF grammar to mx.TextTools tag tables and pushes the heavy lifting off to the mx.TextTools tagging engine.
Some stuff that I found odd includes...
No Simple Tutorial. Maybe I'll do something about that using the calculator example above. The library is really great, but there is a fairly steep learning curve at this point due to the lack of a basic tutorial.
No Separate Lexing Stage. Unlike PLY and other traditional parser generators, you don't generate a token stream and then parse that according to your grammar. SimpleParse generates an mx.TextTools result tree which is then processed (typically) by SimpleParse DispatchProcessor subclass.
Error Handling SimpleParse seems to like to just stop parsing when it reaches a syntax item that it can't parse. You need to use a special token (an exclamation point, !) in your grammar to denote where you want to get picky about syntax. Although it is a bit tricky, it seems quite flexible so far.
Whitespace Handling. There is no way (that I know of) to completely ignore certain characters. PLY let me define characters that were simply ignored, as if they weren't even there. In the calculator above, I had to insert a ts production wherever I anticipate a potential series of tab or space characters in the input. However, by putting the production name in angle brackets the matched text is kept out of the result tree.
I'll end this now by encouraging anyone interested in parsing a little (or big!) language using Python to check out SimpleParse. It's clean API and speed make it very nice to work with.
cw