The Occasional Occurence

A JSON Parser Using SimpleParse

July 21, 2010 at 02:31 PM | categories: Python, Software, computing, General

I've been reading the recent posts on CodeTalker with interest. I've written a handful of parsers using two different parser generators for Python: PLY and SimpleParse. My most recent work with parsing has had me gravitating toward SimpleParse so I thought I'd see how it stacks up against CodeTalker.

First I checked the web to see if someone had written a JSON parser using SimpleParse. I found Rob Lanphier's JsonOrder. It at least had a grammar that I could yank as a jumping off point.

The result after about an hour of coding and benchmarking is spjson.py. At first I tried to adapt Rob's version but I switched back to SimpleParse's dispatch processor model. Pretty much the only thing that remains from JsonOrder is the tweaked grammar.

How does it measure up to CodeTalker's JSON parser?

It's a bit slower. I added a simple timeit benchmark to the spjson.py file. I used the same JSON file that Jared (the CodeTalker author) used in his benchmarks. Here are the results of running it against the latest version of CodeTalker at the time:

CodeTalker 0.0484498786926
SimpleParse 0.0623928356171

In terms of lines of code they are nearly identical. I didn't do anything fancy to omit docstrings or comments (neither module has many of either).

$ # use 'head' to strip the 'if __name__ ...' section
$ head spjson.py -n 71 | grep -v '^\s*$' | wc -l
55
$ cat src/codetalker/codetalker/contrib/json.py | grep -v '^\s*$' | wc -l
55

The style is the biggest difference. SimpleParse uses an EBNF defined in a string to create the parser. CodeTalker uses an EBNF defined using Python code.

Both libraries let you write specialized processors for the grammar. This is one thing that I really value in SimpleParse over PLY. Your grammar and the code to act on the parse tree (or token stream) are neatly separated.

I'm still quite happy with SimpleParse. I like having the grammar in a nice contained EBNF rather than defined with Python + syntactic sugar. I'll probably give CodeTalker a look next time a new parsing task comes up though. It's great to see how the options for parsing in Python are expanding.