The Occasional Occurence

HTTP Utilities with CherryPy

January 08, 2009 at 12:56 PM | categories: Python, Software, cherrypy, General

Eric Florenzano posted a detailed blog entry on creating fast web utilities with bare WSGI. In the blog he shared that the larger Python web frameworks are overkill for small utility-like applications. He then proceeded to build a small utility app that conforms to the WSGI spec.

I like to use CherryPy to write HTTP utility-style applications. It lets you write RESTful WSGI-compliant HTTP applications without even knowing that you are doing so. Here is Eric's song counter application rewritten using CherryPy and using the builtin MethodDispatcher.

from collections import defaultdict

import cherrypy

counts = defaultdict(int)

class SongCounts(object):
    exposed = True

    def GET(self, id):
        return str(counts[id])

    def POST(self, id):
        counts[id] += 1
        return str(counts[id])


class CounterApp(object):
    exposed = True
    song = SongCounts()

    def GET(self):
        return ','.join(['%s=%s' % (k, v) for k, v in counts.iteritems()])

    def DELETE(self):
        counts.clear()
        return 'OK'

sc_conf = {
    '/': {
        'request.dispatch':cherrypy.dispatch.MethodDispatcher(),
    },
}

application = cherrypy.tree.mount(CounterApp(), config=sc_conf)

It might just be personal preference and experience with CherryPy, but that code is much more expressive and readable to me than a raw WSGI callable. Another thing, related to scalability, is skill scalability. While you can use CherryPy to build small utilities like this, it is also useful for combining with a template engine and a database for writing full-scale web applications.

Of course, the main thrust of Eric's argument is that raw WSGI is faster, not more readable. Here are some benchmarks from my machine running Eric's raw WSGI app and my CherryPy WSGI app.

The specs on my machine are: IBM Thinkpad T61, Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz, 4GB 667 MHz DDR2 SDRAM

First Eric's app:

$ spawn -t 0 -p 8080 counter.application
$ curl -X POST -H "Content-Length:0" http://127.0.0.1:8080/song/1
$ ab -n 10000 http://127.0.0.1:8080/song/1
...
Concurrency Level:      1
Time taken for tests:   7.38927 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1020000 bytes
HTML transferred:       10000 bytes
Requests per second:    1420.67 [#/sec] (mean)
Time per request:       0.704 [ms] (mean)
Time per request:       0.704 [ms] (mean, across all concurrent requests)
Transfer rate:          141.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       7
Processing:     0    0   0.5      0      11
Waiting:        0    0   0.3      0      11
Total:          0    0   0.5      0      11

Now my CherryPy app:

$ spawn -t 0 -p 8080 songcounter.application
$ curl -X POST -H "Content-Length:0" http://127.0.0.1:8080/song/1
$ ab -n 10000 http://127.0.0.1:8080/song/1
...
Concurrency Level:      1
Time taken for tests:   14.529259 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1680000 bytes
HTML transferred:       10000 bytes
Requests per second:    688.27 [#/sec] (mean)
Time per request:       1.453 [ms] (mean)
Time per request:       1.453 [ms] (mean, across all concurrent requests)
Transfer rate:          112.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       8
Processing:     0    1   0.8      1      31
Waiting:        0    0   1.1      1      14
Total:          0    1   0.8      1      31

Wow! Eric's raw WSGI app is over 2x faster in req/s. Of course the measly ~688 req/s from my CherryPy application translates to over 59 million req/24hrs*. Not too shabby either. ;-)

Looking at the benchmarks side by side, raw WSGI is going to get you the most bang for your buck out of your hardware. But like I'd rather write things in Python before dropping down to a lower-level language for speed, I'd rather write my HTTP utilities in CherryPy and only fall back to raw WSGI when the need arises.

* Eric's app would service over 123 million requests in the same time period. :P

UPDATE:

Ok, after taking Robert's suggestion to profile, I realized that my CherryPy application was logging to stdout in addition to spawning doing its own stdout logging. Adding the following line to my application bumped up the speed to 842.34 req/s.

cherrypy.config.update({'log.screen':False})

That should bring the speed closer to what Tim Parkin measured using his restish framework.