HTTP Utilities with CherryPy
Eric Florenzano posted a detailed blog entry on creating fast web utilities with bare WSGI. In the blog he shared that the larger Python web frameworks are overkill for small utility-like applications. He then proceeded to build a small utility app that conforms to the WSGI spec.
I like to use CherryPy to write HTTP utility-style applications. It lets you write RESTful WSGI-compliant HTTP applications without even knowing that you are doing so. Here is Eric’s song counter application rewritten using CherryPy and using the builtin MethodDispatcher.
from collections import defaultdict import cherrypy counts = defaultdict(int) class SongCounts(object): exposed = True def GET(self, id): return str(counts[id]) def POST(self, id): counts[id] += 1 return str(counts[id]) class CounterApp(object): exposed = True song = SongCounts() def GET(self): return ','.join(['%s=%s' % (k, v) for k, v in counts.iteritems()]) def DELETE(self): counts.clear() return 'OK' sc_conf = { '/': { 'request.dispatch':cherrypy.dispatch.MethodDispatcher(), }, } application = cherrypy.tree.mount(CounterApp(), config=sc_conf)
It might just be personal preference and experience with CherryPy, but that code is much more expressive and readable to me than a raw WSGI callable. Another thing, related to scalability, is skill scalability. While you can use CherryPy to build small utilities like this, it is also useful for combining with a template engine and a database for writing full-scale web applications.
Of course, the main thrust of Eric’s argument is that raw WSGI is faster, not more readable. Here are some benchmarks from my machine running Eric’s raw WSGI app and my CherryPy WSGI app.
The specs on my machine are:
IBM Thinkpad T61, Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz, 4GB 667 MHz DDR2 SDRAM
First Eric’s app:
$ spawn -t 0 -p 8080 counter.application
$ curl -X POST -H "Content-Length:0" http://127.0.0.1:8080/song/1
$ ab -n 10000 http://127.0.0.1:8080/song/1
...
Concurrency Level: 1
Time taken for tests: 7.38927 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 1020000 bytes
HTML transferred: 10000 bytes
Requests per second: 1420.67 [#/sec] (mean)
Time per request: 0.704 [ms] (mean)
Time per request: 0.704 [ms] (mean, across all concurrent requests)
Transfer rate: 141.50 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 7
Processing: 0 0 0.5 0 11
Waiting: 0 0 0.3 0 11
Total: 0 0 0.5 0 11
Now my CherryPy app:
$ spawn -t 0 -p 8080 songcounter.application
$ curl -X POST -H "Content-Length:0" http://127.0.0.1:8080/song/1
$ ab -n 10000 http://127.0.0.1:8080/song/1
...
Concurrency Level: 1
Time taken for tests: 14.529259 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 1680000 bytes
HTML transferred: 10000 bytes
Requests per second: 688.27 [#/sec] (mean)
Time per request: 1.453 [ms] (mean)
Time per request: 1.453 [ms] (mean, across all concurrent requests)
Transfer rate: 112.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 8
Processing: 0 1 0.8 1 31
Waiting: 0 0 1.1 1 14
Total: 0 1 0.8 1 31
Wow! Eric’s raw WSGI app is over 2x faster in req/s. Of course the measly ~688 req/s from my CherryPy application translates to over 59 million req/24hrs*. Not too shabby either.
Looking at the benchmarks side by side, raw WSGI is going to get you the most bang for your buck out of your hardware. But like I’d rather write things in Python before dropping down to a lower-level language for speed, I’d rather write my HTTP utilities in CherryPy and only fall back to raw WSGI when the need arises.
cw
* Eric’s app would service over 123 million requests in the same time period.
UPDATE:
Ok, after taking Robert’s suggestion to profile, I realized that my CherryPy application was logging to stdout in addition to spawning doing its own stdout logging. Adding the following line to my application bumped up the speed to 842.34 req/s.
cherrypy.config.update({'log.screen':False})
That should bring the speed closer to what Tim Parkin measured using his restish framework.
January 8th, 2009 at 2:42 pm
I usually use straight WSGI as well, but as Kevin Dangoor said in the comments of Eric’s blog, I use webob and Selector for my URL dispatching. There are definitely degrees of how to use these tools, but I’d wonder how much the benchmarks change based on what sorts of helpers you write. For example automatically passing in a request and response object or simply adding dispatching based on Selector vs Routes.
January 8th, 2009 at 8:19 pm
Hi Christian! I like the blog post.. I didn’t realise CherryPy let you get away with typing so little! I’ve responded with a similar app written using our newish restish app at http://dev.timparkin.co.uk/2009/01/happy-medium-from-wsgi-to-cherrypy.html
I agree with you not liking benchmarking but it’s contagious! I want to know where our app is bottlenecking now and I wasn’t bothered in the slightist before! Out of interest, where do you think CherryPy is spending most of it’s time?
January 9th, 2009 at 1:44 am
Thanks for the comments guys.
@Eric - Yeah, the more nice stuff you add, the slower it gets I’m sure. Probably mostly due to Python object initialization and function/method call overhead.
@Tim - Glad you liked the post. ‘restish’ looks interesting. One of the CP devs (lakin) just introduced some subcontroller dispatch that looks similar to your resource.child feature.
As far as bottlenecks go, I think for CherryPy they are probably in some of the additional features it provides, and boiling down to what I mentioned to Eric - object/method/function overhead. Just Python being Python - not the fastest but definitely prettier than the competition.
January 9th, 2009 at 3:37 am
/me invites Tim to apply all that knowledge and effort on the CherryPy dev team, now that he’s written one alone. It’s what I did.
January 9th, 2009 at 1:14 pm
Thanks for the fantastic response! I think everyone will agree that your solution is a *lot* more readable than mine, just as Tim Parkin’s is. We should do some sort of shootout that looks at a bunch of different ways of implementing this, lets you look at them side-by-side, and compares performance. That would be quite cool.