The Occasional Occurence

Odd Old-Style vs. New-Style Class Behavior

May 21, 2009 at 10:55 AM | categories: Python, Software, work, computing, General

So we have some older Python code at work that uses old-style classes. We usually try and bring those up to date when we encounter them.

The other day one of the developers did that and one of our tests started failing. A simple change from:

class Foo:
    # stuff here

to:

class Foo(object):
    # stuff here

was all that happened.

Here is some code that encapsulates the problem and runs with some interesting results:

"""Two nearly identical classes with quite different behavior."""

class LameContainerOld:
    def __init__(self):
        self._items = {'bar':'test'}

    def __getitem__(self, name):
        return self._items[name]

    def __getattr__(self, attr):
        return getattr(self._items, attr)

class LameContainerNew(object):
    def __init__(self):
        self._items = {'bar':'test'}

    def __getitem__(self, name):
        return self._items[name]

    def __getattr__(self, attr):
        return getattr(self._items, attr)

if __name__ == '__main__':
    for cls in [LameContainerOld, LameContainerNew]:
        container = cls()
        print "Testing", cls
        try:
            'foo' in container
        except Exception, e:
            print "\tMembership in %s raised %r!" % (container.__class__, e)
        else:
            print "\tMembership in %s worked!" % container.__class__
        print "\t%s" % container.__getitem__

Here is some output from running that:

Testing __main__.LameContainerOld
        Membership in __main__.LameContainerOld worked!
            bound method LameContainerOld.__getitem__ of {'bar': 'test'}
Testing     class '__main__.LameContainerNew'
        Membership in     class '__main__.LameContainerNew'     raised KeyError(0,)!
            bound method LameContainerNew.__getitem__ of     __main__.LameContainerNew object at 0xb7d1c1ac

From what I can tell, when a membership test happens on the old-style instance, a membership test is done on self._items and it returns False. When the membership test happens on the new-style instance, it tries to treat it like a sequence and calls the __getitem__ method with index 0.

Does that seem like a correct analysis? Does anyone know why the behavior is different there?

Also look at the output for printing container.__getitem__. Isn't __getattr__ only supposed to be called when the attribute isn't present on the instance? Why does it return the __getitem__ method of self._items for the old-style instance then?

Very puzzling.

Caching HTTP Responses with CherryPy

February 25, 2009 at 10:53 AM | categories: Python, Software, computing, cherrypy, General

The most basic case is very simple.

import time
import cherrypy

class WebSvc(object):
    @cherrypy.tools.caching(delay=300)
    @cherrypy.expose
    def quadruple(self, number):
        time.sleep(1) # make the real call somewhat costly
        return str(int(number) * 4)

cherrypy.quickstart(WebSvc())

That uses an in-memory cache and defaults to items expiring from the cache in 300 seconds (5 minutes). If you want to tweak that setting or others you can configure the caching tool to your liking.

This is in response to `a post that asks if setting up caching in other web frameworks is as easy as in <http://slightlynew.blogspot.com/2009/02/full-web-service-with-http-caching-in-7.html>`_Rails` Ruby with Sinatra <http://slightlynew.blogspot.com/2009/02/full-web-service-with-http-caching-in-7.html>`_.

HTTP Utilities with CherryPy

January 08, 2009 at 12:56 PM | categories: Python, Software, cherrypy, General

Eric Florenzano posted a detailed blog entry on creating fast web utilities with bare WSGI. In the blog he shared that the larger Python web frameworks are overkill for small utility-like applications. He then proceeded to build a small utility app that conforms to the WSGI spec.

I like to use CherryPy to write HTTP utility-style applications. It lets you write RESTful WSGI-compliant HTTP applications without even knowing that you are doing so. Here is Eric's song counter application rewritten using CherryPy and using the builtin MethodDispatcher.

from collections import defaultdict

import cherrypy

counts = defaultdict(int)

class SongCounts(object):
    exposed = True

    def GET(self, id):
        return str(counts[id])

    def POST(self, id):
        counts[id] += 1
        return str(counts[id])


class CounterApp(object):
    exposed = True
    song = SongCounts()

    def GET(self):
        return ','.join(['%s=%s' % (k, v) for k, v in counts.iteritems()])

    def DELETE(self):
        counts.clear()
        return 'OK'

sc_conf = {
    '/': {
        'request.dispatch':cherrypy.dispatch.MethodDispatcher(),
    },
}

application = cherrypy.tree.mount(CounterApp(), config=sc_conf)

It might just be personal preference and experience with CherryPy, but that code is much more expressive and readable to me than a raw WSGI callable. Another thing, related to scalability, is skill scalability. While you can use CherryPy to build small utilities like this, it is also useful for combining with a template engine and a database for writing full-scale web applications.

Of course, the main thrust of Eric's argument is that raw WSGI is faster, not more readable. Here are some benchmarks from my machine running Eric's raw WSGI app and my CherryPy WSGI app.

The specs on my machine are: IBM Thinkpad T61, Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz, 4GB 667 MHz DDR2 SDRAM

First Eric's app:

$ spawn -t 0 -p 8080 counter.application
$ curl -X POST -H "Content-Length:0" http://127.0.0.1:8080/song/1
$ ab -n 10000 http://127.0.0.1:8080/song/1
...
Concurrency Level:      1
Time taken for tests:   7.38927 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1020000 bytes
HTML transferred:       10000 bytes
Requests per second:    1420.67 [#/sec] (mean)
Time per request:       0.704 [ms] (mean)
Time per request:       0.704 [ms] (mean, across all concurrent requests)
Transfer rate:          141.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       7
Processing:     0    0   0.5      0      11
Waiting:        0    0   0.3      0      11
Total:          0    0   0.5      0      11

Now my CherryPy app:

$ spawn -t 0 -p 8080 songcounter.application
$ curl -X POST -H "Content-Length:0" http://127.0.0.1:8080/song/1
$ ab -n 10000 http://127.0.0.1:8080/song/1
...
Concurrency Level:      1
Time taken for tests:   14.529259 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1680000 bytes
HTML transferred:       10000 bytes
Requests per second:    688.27 [#/sec] (mean)
Time per request:       1.453 [ms] (mean)
Time per request:       1.453 [ms] (mean, across all concurrent requests)
Transfer rate:          112.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       8
Processing:     0    1   0.8      1      31
Waiting:        0    0   1.1      1      14
Total:          0    1   0.8      1      31

Wow! Eric's raw WSGI app is over 2x faster in req/s. Of course the measly ~688 req/s from my CherryPy application translates to over 59 million req/24hrs*. Not too shabby either. ;-)

Looking at the benchmarks side by side, raw WSGI is going to get you the most bang for your buck out of your hardware. But like I'd rather write things in Python before dropping down to a lower-level language for speed, I'd rather write my HTTP utilities in CherryPy and only fall back to raw WSGI when the need arises.

* Eric's app would service over 123 million requests in the same time period. :P

UPDATE:

Ok, after taking Robert's suggestion to profile, I realized that my CherryPy application was logging to stdout in addition to spawning doing its own stdout logging. Adding the following line to my application bumped up the speed to 842.34 req/s.

cherrypy.config.update({'log.screen':False})

That should bring the speed closer to what Tim Parkin measured using his restish framework.

My Solution for the Auto Industry (and the Economy In General): Smaller, Faster, Leaner

December 20, 2008 at 03:40 PM | categories: butt kickings, Software, computing, General

Ok, this post is on a controversial issue, and I'm entitled to my own ridiculous opinion. Therefore ...

The fact that we have 3 major automakers in the US that our economy apparently hinges upon is a problem. From the news, the talk is that a failure of any of them is disaster.

The problem with that is stuff fails. It just does.

It sounds like each of the Big Three has become a single point of failure. We try to avoid these in the technology world - we plan for failure in hardware and software.

def oversimplification():
    try:
        return make_cars()
    except ForeignCompetition:
        return make_better_cars()

We're human. We make stuff that breaks. We are flawed and fail a lot. It seems we have some economic single points of failure.

So now we are in a bad place. It reminds me of a monopoly situation. I realize these companies are three separate entities, but it seems like the public is in a position similar to a monopoly - the entities in question are so large and all encompassing that we need them whether or not we want them.

I want a US auto industry.

But why, oh why, OH WHY does everything in America have to be so BIG?

Grow, grow, grow. More, more, more. I'm tired of it.

So here's my solution (the culmination of my own ridiculous opinion) - many smaller automakers that adhere to open standards. I don't know how something like that gets regulated or started or anything, but I think that would be a much healthier situation for our nation. Lean companies that are making cars that they would like to drive for people like them.

There would need to be some sort of growth-cap or production-cap too. Think of it as a salary cap for the auto industry. You can only get so big.

Man, that doesn't sound very capitalist-like. I'm so back and forth on that sort of thing. I guess for that to be fair there would need to be caps on all industries. Moving on ...

Come on now! - we regulate against monopolies - we should regulate against (loaded language alert) cancerous big-business that entraps our children and furry woodland friends! I kid ... sort of.

Cancerous cells grow out of control until they form a mass that harms the vital functioning of the body, right? Isn't this a real-life economic cancer? Companies that have grown out of control and are now threatening harm to the vital functioning of the economy?

I've been picking on the auto industry because they are the ones in deep doo right now and are getting the headlines. Know that I was even more incensed at the Wall Street situation, and even wrote my representatives in Congress, rather than just idly blogging about it.

So there you have it. My solution for the life, the universe and everything, or at least the economy. Stop trying to get so big. You're only going to crash harder.

So, now, in the words of Captain Jean Luc Picard: "Make it so" (because everything that gets blogged about magically happens).

Useful Diagramming Web Application

October 17, 2008 at 09:06 AM | categories: Python, Software, computing, General

I stumbled across Web Sequence Diagrams the other day.

It's a web-based diagramming application that uses a simple syntax to generate UML diagrams. No clicking around and positioning little boxes all over the place - just type in text that describes the diagram, and voila, there it is. He even provides a number of styles that you can apply to the diagrams. All in all, very slick.

Here's an example of some code that describes a request to a database backed website.

Client->HTTP Server: GET /foo
HTTP Server-->Database: SQL query
Database-->HTTP Server: Query results
alt resource exists
  HTTP Server->Client: HTTP 200 OK
else not found
  HTTP Server->Client: HTTP 404 Not Found
end

Here is the resulting diagram:

I read his blog a bit and it sounds like he is using Python, PHP and the Cairo graphics library. I wish the source was available, but it is not. He does have an open HTTP API though.

I think this might be an interesting solution to the images-in-docstrings discussion on comp.lang.python the other day, but only for including diagrams, of course.

« Previous Page -- Next Page »