The Occasional Occurence

SimpleParse Plug

December 19, 2007 at 01:37 AM | categories: Python, work, General

I've been doing more parsing stuff at work lately. For my latest project I've been using the SimpleParse library. It has quickly overtaken PLY as my Python parsing library of choice.

Here's a simple calculator example using SimpleParse. It does basic arithmetic and allows you to store values in single letter variable names. It basically just validates the line that you enter and then either evals or execs it as Python code.

#!/usr/bin/env python
from simpleparse.parser import Parser
from simpleparse.common import numbers
from simpleparse.error import ParserSyntaxError

grammar = '''
command     := !, assign/expr
assign      := varname, ts, '=', !, ts, expr
expr        := (lpar?,
                ts, operand, ts, (op, !, ts, operand, ts)*, ts,
               rpar?)+
lpar        := '('
rpar        := ')'
operand     := lpar?, ts, number/varname, ts, rpar?
varname     := [a-z]
op          := [+-*/]

        := [ \t]*
'''

class Calculator(object):
    def __init__(self):
        self.scanner = Parser(grammar, 'command')

    def parse(self, command):
        try:
            success, subtags, nextchar = self.scanner.parse(command)
        except ParserSyntaxError, e:
            return str(e)
        else:
            if subtags[0][0] == 'expr':
                try:
                    return eval(command)
                except Exception, e:
                    print "Error:", e.args[0]
            elif subtags[0][0] == 'assign':
                exec command in globals()

if __name__ == '__main__':
    calc = Calculator()
    def prompt():
        return raw_input('> ')

    command = prompt()
    while command != "quit":
        result = calc.parse(command)
        if result:
            print result
        command = prompt()

There it is, in all of its uncommented glory.

Here's an example session with the calculator:

> 2 + 2
4
> x = 4
> 8 * x
32
> y = 8 * x
> y
32
> 5 +
ParserSyntaxError: Failed parsing production "expr" @pos 3 (~line 1:3).
Expected syntax: operand
Got text: ''
> import shutil
Error: invalid syntax
> quit

Here are some of my favorite things about SimpleParse.

The entire grammar is declared in the `EBNF <http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_. This is a breath of fresh air coming from PLY, where the grammar rules are scattered amongst the action-code (in docstrings).

It has a clean API. The API is very succinct, and thus you get to know it fairly quickly. Creating Processors for parsed text is quite straightforward.

It's Fast. It's built on the fastfastfast mx.TextTools library from eGenix. It converts your EBNF grammar to mx.TextTools tag tables and pushes the heavy lifting off to the mx.TextTools tagging engine.

Some stuff that I found odd includes...

No Simple Tutorial. Maybe I'll do something about that using the calculator example above. The library is really great, but there is a fairly steep learning curve at this point due to the lack of a basic tutorial.

No Separate Lexing Stage. Unlike PLY and other traditional parser generators, you don't generate a token stream and then parse that according to your grammar. SimpleParse generates an mx.TextTools result tree which is then processed (typically) by SimpleParse DispatchProcessor subclass.

Error Handling SimpleParse seems to like to just stop parsing when it reaches a syntax item that it can't parse. You need to use a special token (an exclamation point, !) in your grammar to denote where you want to get picky about syntax. Although it is a bit tricky, it seems quite flexible so far.

Whitespace Handling. There is no way (that I know of) to completely ignore certain characters. PLY let me define characters that were simply ignored, as if they weren't even there. In the calculator above, I had to insert a ts production wherever I anticipate a potential series of tab or space characters in the input. However, by putting the production name in angle brackets the matched text is kept out of the result tree.

I'll end this now by encouraging anyone interested in parsing a little (or big!) language using Python to check out SimpleParse. It's clean API and speed make it very nice to work with.

cw