Posts

March 03, 11:18 AM

It has been some time now since the last posts, but I’ve been very busy with this new thing I am working on; about a year ago I started working on a new web-service named SiteSupport.

SiteSupport will solve the problems many website owners currently face:

  • It is impossible to view the stuff users do on your website
  • It is difficult to explain users how to navigate your website

Our service solves this by enabling remote-desktop like collaboration on any web application, without the need for plugins. We achieve this by synchronizing the website-state over a websocket connection. This requirement of websocket connections is the reason why I  benchmarked various event-driven servers about a year ago. The benchmark results showed that the servers which used the lightweight thread model (coroutines) where among the top performers. I thought that was great news, as I prefer their clean approach compared to callbacks.

Not long after I published the post, I got in contact with Denis Bilenko, the author of Gevent and it soon became clear that we both shared the same passions: programming in Python and reading Hacker News.

And thus, I am happy to announce SiteSupport, a joint-venture between Denis and me.

Check out our demo

(and keep watching for at least 30sec,  it’ll become less boring)



June 23, 04:50 AM

ZeroMQ is a messaging library, which allows you to design a complex communication system without much effort. It has been wrestling with how to effectively describe itself in the recent years. In the beginning it was introduced as ‘messaging middleware’ later they moved to ‘TCP on steroids’ and right now it is a ‘new layer on the networking stack’.

I had some trouble understanding ZeroMQ at first and really had to reset my brain. First of all, it is not a complete messaging system such as RabbitMQ or ActiveMQ. I know the guys of Linden Research compared them, but it is apples and oranges. A full fledged messaging system gives you an out of the box experience. Unwrap it, configure it, start it up and you’re good to go once you have figured out all its complexities.

ZeroMQ is not such a system at all; it is a simple messaging library to be used programmatically. It basically gives you a pimped socket interface allowing you to quickly build your own messaging system.

Float like a butterfly, sting like a bee

But why use ZeroMQ and not just use the low level Berkeley socket interface or a high level messaging system? I think the answer is balance. You probably want the flexibility and performance of the low level while still having the ease of implementation of the high level. However, maintaining raw sockets is difficult and cumbersome when you want to implement a scalable system. A high level system often works perfect if you use it for the situation it was designed for, but it can be difficult to change core elements of the system and its ease of use often comes with a cost in performance. This isn’t a problem that is limited to messaging systems only. We can see the previous dilemma also in web frameworks; it could very well be that this is exactly the reason why ‘Micro Frameworks’ gain in popularity.

I believe that ZeroMQ perfectly fits this gap between the high and the low level, so what are its features?

Performance

ZeroMQ is blazing fast. It is orders of magnitude faster than most AMQP messaging systems and it can obtain this high performance because of the following techniques:

Simplicity

The API is deceptively simple, and it makes sending messages really simple compared with a raw socket implementation where you have to continuously ‘feed’ the socket buffer. In ZeroMQ you can just fire off an async send call, it will queue the message in a separate thread and do all the work for you. Because of this async nature, your application does not have to waste time waiting until the message has been flushed. The async nature of 0MQ makes it a perfect companion for an event-based framework.

ZeroMQ’s simple wire protocol fits perfectly in the current time setting where we have lots of different transport protocols. With AMQP it always felt a bit weird to use an extra protocol layer on top. 0MQ gives you complete freedom on how you encode your message, as it will just interpret it as a blob. So you can send simple JSON messages, go the binary route with for example BSON, Protocol Buffers or Thrift and all this without feeling guilty.

Scalability

While ZeroMQ sockets look low level they provide lots of features. A single ZeroMQ socket can for example connect to multiple end points and automatically load balance messages over them. Or it can work as some sort of Fan-In, collecting messages from multiple sources through a single socket.

ZeroMQ follows a brokerless design so that there is no single point of failure. Combine this with its simplicity and performance and you get something that you can use to make your application distributed.

Implementing a messaging layer with ZeroMQ

In the next section I will show how to design and implement a messaging layer with ZeroMQ. For the code example I will use Brian Granger’s PyZMQ, which is the excellent Python binding to ZeroMQ.

Implementing a ZeroMQ messaging layer is a three-step approach:

  1. Choose a transport
  2. Set up the infrastructure
  3. Select a messaging pattern

Choosing a transport

The first step is to choosing a transport. ZeroMQ provides 4 different transports:

  1. INPROC an In-Process communication model
  2. IPC an Inter-Process communication model
  3. MULTICAST multicast via PGM, possibly encapsulated in UDP
  4. TCP a network based transport

The TCP transport is often the best choice, it is very performant and robust. However, when there is no need to cross the machine border it can be interesting to look at the IPC or INPROC protocol to lower the latency even more. The MULTICAST transport can be interesting in special cases. But personally, I am a bit careful with applying multicast, as it is difficult to understand how it will behave when scaling up. Think of issues such as figuring out how many multicast groups you can create with this or that hardware and how much stress it is going to put on the different switches in your network. If you want to be sure that your code runs cross platforms it is probably best to go with TCP as the other transports are not guaranteed to be available on the different platforms.

Setting up the infrastructure

When you have decided upon your transport you will have to think about how the different components are connected to each other. It is simply answering the question: “Who connects to whom?” You probably want the most stable part of the network to BIND on a specific port and have the more dynamic parts CONNECT to that. In the image below we have depicted how a server binds to a certain port and how a client connects to it.

It is possible that both ends of the networks are relatively dynamic so that it is difficult to have a single stable connection point. If this is the case, you could make use of the forwarding devices that ZeroMQ provides. These devices can bind to 2 different ports and forward messages from one end to the other. By doing so, the forwarding device can become the stable point in your network where each component can connect to. ZeroMQ provides three kinds of devices:

  1. QUEUE, a forwarder for the request/response messaging pattern
  2. FORWARDER, a forwarder for the publish/subscribe messaging pattern
  3. STREAMER, a forwarder for the pipelined messaging pattern

In the image below we can see such a device being used, in this situation both the client and the server initialize a connection to the forwarder, which binds to two different ports. Using such a device will remove the need of extra application logic, as you will not need to maintain a list of connected peers.

Selecting a message pattern

The previous steps build the infrastructure but did not specify the message flow. The next step is to think carefully about the message pattern each component should follow. The patterns that 0MQ supports are:

  1. REQUEST/REPLY, bidirectional, load balanced and state based
  2. PUBLISH/SUBSCRIBE, publish to multiple recipients at once
  3. UPSTREAM / DOWNSTREAM, distribute data to nodes arranged in a pipeline
  4. PAIR, communication exclusively between peers

I will explain them a bit more below.



Request Reply

The request reply paradigm is very common and can be found in most type of servers. For example: HTTP, POP or IMAP. This pattern has a certain state associated with it as a request has to be followed by a reply. The client uses a socket of type REQ as it will initiate the request by performing a .send() on the socket. The server uses a socket of type REP, and it will start by performing a .recv() to read the incoming request, after which it can send its reply.

ZeroMQ greatly simplifies this pattern by allowing you to have a single socket connect to multiple end points. ZeroMQ will automatically balance requests over the different peers.

The Python code below will create an echo server that listens on port 5000 with a REP socket. It will then loop an alternation of performing .recv() for incoming requests and then .send() a reply to them.

import zmq
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://127.0.0.1:5000")

while True:
    msg = socket.recv()
    print "Got", msg
    socket.send(msg)

When you have multiple clients connected to this server the ZMQ socket will fair queue between all incoming requests. Now, if you want your client to be able to connect to multiple servers as well, you can take the above code, change port 5000 to 6000 and use it to run an extra server. The following client code will then be able to use both of the servers:

import zmq
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://127.0.0.1:5000")
socket.connect("tcp://127.0.0.1:6000")

for i in range(10):
    msg = "msg %s" % i
    socket.send(msg)
    print "Sending", msg
    msg_in = socket.recv()

The above sends 10 requests in total but since we are connected to 2 different servers, each server only has to handle 5 requests. Isn’t that great? With only a few lines of code we were able to create a distributed client/server model.

Now, if we want to add an extra server to handle our requests we will have to adjust our code. This can be cumbersome as we need to do this for all our clients to let them know it can now balance the requests over an extra server.

This is exactly where the ZeroMQ devices fit in. Instead of having the clients connect directly to multiple servers it can connect to a single forwarding device. The forwarding device will then reroute all messages to the connected servers.

Example client output:

Sending msg 0
Sending msg 1
Sending msg 2
Sending msg 3
Sending msg 4
Sending msg 5
Sending msg 6
Sending msg 7
Sending msg 8
Sending msg 9

Example output server 1 at port 5000:

Got msg 0
Got msg 2
Got msg 4
Got msg 6
Got msg 8

Example output server 2 at port 6000:

Got msg 1
Got msg 3
Got msg 5
Got msg 7
Got msg 9



Publish Subscribe

The Pub/Sub paradigm has gained lots of interest the last few years. You can think of things such as message pushing, XMPP or webhooks. In a pub/sub pattern the components are loosely coupled. This will greatly help you to scale out as there is no need to worry about the subscribers. However, this loose coupling can also lead to unexpected behavior when not fully understood. A nice metaphor for the Pub/Sub paradigm is thinking of it is a radio station. When you publish messages you send something over a certain frequency, only listeners that have subscribed to that frequency will receive the signal. But also, just as with a radio, if you tuned in to the station after the broadcast you will miss the show.

It is good to stress that the various message patterns have no coupling with the infrastructure. It is thus possible to bind to a port and publish to the peers that connect to it. But it is also possible to do it the other way around, connect to multiple peers and broadcast to them. The first example resembles the radio metaphor (everybody can tune in), while the second one more resembles yelling at your peers through a megaphone (a selected group). In both situations your peers can decide not to listen to your messages by not subscribing to them.

The following code shows how you could create a broadcasting server for live soccer events:

import zmq
from random import choice
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://127.0.0.1:5000")

countries = ['netherlands','brazil','germany','portugal']
events = ['yellow card', 'red card', 'goal', 'corner', 'foul']

while True:
    msg = choice( countries ) +" "+ choice( events )
    print "->",msg
    socket.send( msg )

The server will generate an unlimited amount of events for the different countries and pushes them over a socket of type PUB. Below you can find some example output:

-> portugal corner
-> portugal yellow card
-> portugal goal
-> netherlands yellow card
-> germany yellow card
-> brazil yellow card
-> portugal goal
-> germany corner

Now if we are only interested in events concerning The Netherlands and Germany we can create a client that subscribes to those specific messages:

import zmq

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://127.0.0.1:5000")
socket.setsockopt(zmq.SUBSCRIBE, "netherlands")
socket.setsockopt(zmq.SUBSCRIBE, "germany")

while True:
    print  socket.recv()

The client will create a SUB socket, connect to our broadcast server at port 5000 and subscribe to messages starting with ‘netherlands’ or ‘germany’. The output will look something like this:

netherlands red card
netherlands goal
netherlands red card
germany foul
netherlands yellow card
germany foul
netherlands goal
netherlands corner
germany foul
netherlands corner



Pipelining

The pipeline pattern looks remarkably similar to the Rep/Req pattern, the difference is that instead of requiring a reply being sent to the requester the reply can be pushed down the pipe. This is a paradigm commonly seen when there is a need to process data in parallel. For example, lets say we have some sort of system that does face recognition. We have a job server that pushes the images to one of the workers, which will then process it, once finished it will then push it down the stream again towards some sort of collector.

In the design at the left we can see that a worker will receive its message from an UPSTREAM socket and once they are processed sends them DOWNSTREAM. It routes messages from two different socket types.

The jobserver can just keep pushing tasks DOWNSTREAM through a single socket but with multiple endpoints. ZeroMQ and recently also PyZMQ can send the messages in a zero-copy manner. This is great if you need to push large messages around and you don’t want to waste IO cycles.



Paired sockets

Paired sockets are very similar to regular sockets as the communication is bidirectional, there is no specific state stored within the socket and there can only be one connected peer. Most real life problems can be captured in one of the previously explained patterns and I want to recommend that you look at them first before applying this one as it will simplify your problem.

The figure at the left depicts the infrastructure of a paired socket, the server listens on a certain port and a client connects to it. The red lines indicate the flow of messages, in this pattern both endpoints use a socket of type PAIR and as you can see the messages can flow bidirectional.

The following code shows how to implement such a thing. We will bind to a port on one end:

import zmq
context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.bind("tcp://127.0.0.1:5555")

And on the other end where we will connect to it.

import zmq
context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.connect("tcp://127.0.0.1:5555")


ZeroMQ and the future

In this post I have given a short introduction to ZeroMQ, I hope that at this point you will now share my ideas about what a great little library it is. But while the library may feel small it has a grand vision of being the new messaging layer. And really, it is not that weird when you come to think of it. Scalability issues are mostly just communication and portability issues, ZeroMQ can solve these problems for you.

Lets say you want to create some new sort of database because Redis, Cassandra, TokyoTyrant, Postgres, MongoDB, DabbleDB, CouchDB, HBase, etc. just don’t serve your needs that well. You create an amazing in memory tree representation for your data and have a blazing fast indexer. Now all you need is some sort of messaging layer such that different clients can talk to your server. Preferably implemented in different programming language and with clustering capabilities. You could of course create such a messaging framework all by yourself, but that is a lot of hard work.

A simple solution is to just implement your database as a ZeroMQ server and pick a message protocol (fe JSON). As you have seen by now, implementing such functionality with ZeroMQ is really easy and on top of this you will get almost instant scalability because of the way ZeroMQ can route messages. It will also make it incredibly easy to implement different clients that will communicate with your server. Basically all you need to do is pick one of the 15 available language bindings, use the same message protocol and you’re done. Currently the following languages have a ZeroMQ binding: Ada, C, C++, Common Lisp, Erlang, Go, Haskell, Java, Lua, .NET, OOC, Perl, PHP, Python and Ruby.

ZeroMQ could very well be the new way in how we connect our components. A good example of someone who understands the possibilities of ZeroMQ is Zed Shaw as can be seen with his recent project Mongrel2. You can use Mongrel2 to bridge the gap between a regular HTTP client and a ZeroMQ component. If you don’t immediately see how awesome this is you probably have never worked with websockets, comet or flash based sockets. Another way to look at the great possibilities of such an implementation is to think of Facebook’s BigPipe where each Pagelet can transparantly be generated by a different component connected with 0MQ.

March 15, 10:29 AM

It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks, which was being benchmarked by doing a regular HTTP request against the TCP server. The server itself was dumb and did not actually understand the headers being send to it. In this benchmark I will be looking at how different WSGI servers perform at exactly that task; the handling of a full HTTP request.

I should immediately start with a word of caution. I tried my best to present an objective benchmark of the different WSGI servers. And I truly believe that a benchmark is one of the best methods to present an unbiased comparison. However, a benchmark measures the performance on a very specific domain and it could very well be that this domain is slanted towards certain frameworks. But, if we keep that in mind we can actually put some measurements behind all those ‘faster than’ or ‘lighter than’ claims you will find everywhere. It is my opinion that such comparison claims without any detailed description of how they are measured are worse than a biased but detailed benchmark. The specific domain of this benchmark is, yet again, the PingPong benchmark as used earlier in my Async Socket Benchmark. However, there are some differences:

  • We will fire multiple requests over a single connection, when possible, by using a HTTP 1.1 keepalive connection
  • It is a distributed benchmark with multiple clients
  • We will use an identical WSGI application for all servers instead of specially crafted code to return the reply
  • We expect the server to understand our HTTP request and reply with the correct error codes

This benchmark is a conceptually simple one and you could claim that this is not representable for most common web application which rely heavily on blocking database connections. I agree with that to some extent as this is mostly the case. However, the push towards HTML5’s websockets and highly interactive web applications will require servers that are capable to serve lots of concurrent connections with low latency.

The benchmark

We will run the following WSGI application ‘pong.py’ on all servers.

def application(environ, start_response):
    status = '200 OK'
    output = 'Pong!'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]

We will also tune both client and server by running the following commands. This basically enables the server to open LOTS of concurrent connections.

echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240

The server is a virtual machine with only one assigned processor. I have explicitly limited the amount of available processors to make sure that it is a fair comparison. Whether or not the server scales over multiple processors is an interesting and useful feature but this is not something I will measure in this benchmark. The reason for this is that it isn’t that difficult to scale up your application to multiple processors by using a reverse proxy and multiple server processes (this can even be managed for you by special applications such as Spawning or Grainbows). The server and clients run Debian Lenny with Python 2.6.4 on the amd64 architecture. I made sure that all WSGI servers have a backlog set of at least 500 and that (connection/error) logging is disabled, when this was not directly possible from the callable I modified the library. The server and the clients have 1GB of ram.

I benchmarked the HTTP/1.0 request rate of all server and the HTTP/1.1 request rate on the subset of servers that support pipelining multiple requests over a single connection. While the lack of HTTP 1.1 keepalive support is most likely a non issue in current deployment situations I expect it to become an important feature in applications that depend heavily on low latency connections. You should think about comet-style web applications or applications that use HTML5 websockets.

I categorize a server as HTTP/1.1 capable by its behaviour, not by its specs. For example the Paster server says that it has some support for HTTP 1.1 keep alives but I was unable to pipeline multiple requests. This reported bug might be relevant to this situation and might apply to some of the other “HTTP 1.0 Servers”.

The benchmark will be performed by running a recompiled httperf (which bypasses the static compiled file limit in the debian package) on 3 different specially setup client machines. To initialize the different request rates and aggregate the results I will use a tool called autobench. Note: this is not ApacheBench (ab).

The command to benchmark HTTP/1.0 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=1

And the command for HTTP/1.1 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=10

The Contestants

Python is really rich with WSGI servers, i have made a selection of different servers which are listed below.

NameVersionhttp 1.1FlavourRepo.BlogCommunity
Gunicorn0.6.4Noprocessor/threadGIT?#gunicorn
uWSGITrunk (253)Yesprocessor/threadrepo?Mailing List
FAPWS30.3.1Noprocessor/threadGITWilliam Os4yGoogle Groups
Aspen0.8Noprocessor/threadSVNChad WhitacreGoogle Groups
Mod_WSGI3.1Yesprocessor/threadSVNGraham DumpletonGoogle Groups
wsgirefPy 2.6.4Noprocessor/threadSVNNoneMailing List
CherryPy3.1.2Yesprocessor/threadSVNPlanet CherryPyPlanet, IRC
Magnum Py0.2Noprocessor/threadSVNMatt GattisGoogle Groups
Twisted 10.0.0Yesprocessor/threadSVNPlanet Twisted Community
Cogen 0.2.1Yescallback/generatorSVN Maries Ionel Google Groups
GEvent 0.12.2Yeslightweight threadsMercurialDenis BilenkoGoogle Groups
Tornado0.2Yescallback/generatorGITFacebookGoogle Groups
Eventlet0.9.6Yeslightweight threadsMercurialEventletMailinglist
ConcurrencetipYeslightweight threadsGITNoneGoogle Groups

Most of the information in this table should be rather straightforward, I specify the version benchmarked and whether or not the server has been found capable of HTTP 1.1. The flavour of the server specifies the concurrency model the server uses and I identify 3 different flavours:

Processor / Thread model

The p/t model is the most common flavour. Every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or a function call in a C extension) will not influence other requests. This is convenient as you do not need to worry about how everything is implemented, but it does come at a price. The maximum amount of concurrent connections is limited by your number of workers or threads and this is known to scale badly when you have the need for lots of concurrent users.

Callback / Generator model

The callback/generator model handles multiple concurrent connections in a single thread, thereby removing the thread barrier. A single blocking call will block the whole event loop however and has to be prevented. The servers that have this flavour usually provide a threadpool to integrate blocking calls in their async framework or provide alternative non-blocking database connectors. In order to provide flow control this flavour uses callbacks or generators. Some think that this is a beautiful way to create a form of event driven programming others think that it is snake pit that quickly changes your clean code to an entangled mess of callbacks or yield statements.

Lightweight Threads

The lightweight flavour uses greenlets to provide concurrency. This also works by providing concurrency from a single thread but in a less obtrusive way then with the callbacks or generator approach. But of course one has to be careful with blocking connections as this will stop the event loop. To prevent this from happening, Eventlet and Gevent can monkeypatch the socket module to stop it from blocking so when you are using a pure python database connector this should never block the loop. Concurrence provides an asynchronous database adapter.

Implementation specifics for each WSGI server

Aspen

Ruby might be full with all kinds of rockstar programmers (whatever that might mean) but if i have to nominate just one Python programmer with some sort of ‘rockstar award’ i would definitely nominate Chad Whitacre. Its not only the great tools he created; Testosterone, Aspen, Stephane. But mostly how he promotes them with the most awesome screencasts i have ever seen.

Anyway, Aspen is a neat little Web server which is also able to serve WSGI applications. It can be easily installed with ‘pip install aspen’ and uses a special directory structure for configuration and if you want more information i am going to point you to his screencasts.

CherryPy

CherryPy is actually an object oriented Python framework but features an excellent WSGI server. Installation can be done with a simple ‘pip install cherrypy’. I ran the following script to test out the performance of the WSGI server:

from cherrypy import wsgiserver
from pong import application

# Here we set our application to the script_name '/'
wsgi_apps = [('/', application)]

server = wsgiserver.CherryPyWSGIServer(('0.0.0.0', 8070), wsgi_apps, request_queue_size=500,     server_name='localhost')

if __name__ == '__main__':
    try:
        server.start()
    except KeyboardInterrupt:
        server.stop()

Cogen

The code to have Cogen run a WSGI application is as follows:

from cogen.web import wsgi
from cogen.common import *
from pong import application

m = Scheduler(default_priority=priority.LAST, default_timeout=15)
server = wsgi.WSGIServer(
            ('0.0.0.0', 8070),
            application,
            m,
            server_name='pongserver')
m.add(server.serve)
try:
    m.run()
except (KeyboardInterrupt, SystemExit):
    pass

Concurrence

Concurrence is an asynchronous framework under development by Hyves (you might call it the Dutch Facebook) built upon Libevent (I used the latest stable version 1.4.13), I fired up the pong application as follows:

from concurrence import dispatch
from concurrence.http import WSGIServer
from pong import application
server = WSGIServer(application)
# Concurrence has a default backlog of 512
dispatch(server.serve(('0.0.0.0', 8080)))

Eventlet

Eventlet is a full featured asynchronous framework which also provides WSGI server functionality. It is in development by Linden Labs (makers of Second Life). To run the application I used the following code:

import eventlet
from eventlet import wsgi
from pong import application
wsgi.server(eventlet.listen(('', 8090), backlog=500), application, max_size=8000)

FAPWS3

FAPWS3 is a WSGI server build around the LibEV library (I used version 3.43-1.1). When LibEV has been installed, FAPWS can be easily installed with pip. The philosophy behind FAPWS3 is to stay the simplest and fastest webserver. The script I used to start up the WSGI application is as follows:

import fapws._evwsgi as evwsgi
from fapws import base
from pong import application

def start():
    evwsgi.start("0.0.0.0", 8080)
    evwsgi.set_base_module(base)

    evwsgi.wsgi_cb(("/", application))

    evwsgi.set_debug(0)
    evwsgi.run()

if __name__=="__main__":
    start()

Gevent

Gevent is one of the best performing Async frameworks in my previous socket benchmark. Gevent extends Libevent and uses its HTTP server functionality extensively. To install Gevent you need Libevent installed after which you can pull in Gevent with PIP.

from gevent import wsgi
from pong import application
wsgi.WSGIServer(('', 8088), application, spawn=None).serve_forever()

The above code will run the pong application without spawning a Greenlet on every request. If you leave out the argument ’spawn=None’ Gevent will spawn a Greenlet for every new request.

Gunicorn

Gunicorn stands for ‘Green Unicorn’, everybody knows that a unicorn is a mix of the the awesome narwhal and the magnificent pony the green does however have nothing to do with the great greenlets as it really has a threaded flavour. Installation is easy and can be done with a simple ‘pip install gunicorn’ Gunicorn provides you with a simple command to run wsgi applications, all I had to do was:

gunicorn -b :8000 -w 1 pong:application

Update: I had some suggestions in the comment section that using a single worker and having a client connect  to the naked server is not the correct way to work with Gunicorn. So I took their suggestions and moved Gunicorn behind NGINX and increased the worker count to the suggested number of workers, 2*N+1 where N is 1 which makes 3. The result of this is depicted in the graphs as gunicorn-3w.

The run Gunicorn with more workers can be done such as:

gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application

MagnumPy

MagnumPy has to be the server with the most awesome name. This is still a very young project but its homepage is making some strong statements about its performance so it is worth testing out. It does not feel as polished as the other contestants and installing is basically pushing the ‘magnum’ directory on your PYTHONPATH edit ‘./magnum/config.py’ after which you can start the server by running ‘./magnum/serve.py start’

#config.py
import magnum
import magnum.http
import magnum.http.wsgi
from pong import application

WORKER_PROCESSES = 1
WORKER_THREADS_PER_PROCESS = 1000
HOST = ('', 8050)
HANDLER_CLASS = magnum.http.wsgi.WSGIWrapper(application)
DEBUG = False
PID_FILE = '/tmp/magnum.pid'

Mod_WSGI

Mod_WSGI is the successor of Mod_Python, it allows you to easily integrate Python code with the Apache server. My first python web app experience was with mod_python and PSP templates, WSGI and cool frameworks such as Pylons have really made life a lot easier.

Mod_WSGI is a great way to get your application deployed quickly. Installing ‘Mod_WSGI’ is with most Linux distributions really easy. For example:

aptitude install libapache2-mod-wsgi

Is all you need to do on a pristine Debian distro to get a working Apache (MPM-Worker) server with Mod_WSGI enabled. To point Apache to your WSGI app just add a single line to ‘/etc/apache2/httpd.conf’:

WSGIScriptAlias / /home/nicholas/benchmark/wsgibench/pong.py

The problem is, that most people already have Apache installed and that they are using it for *shudder* serving PHP. PHP is not thread safe, meaning that you are forced to use a pre-forking Apache server. In this benchmark I am using the threaded Apache version and use mod_wsgi in embedded mode (as it gave me the best performance).

I disabled all unnecessary modules and configured Apache to provide me with a single worker, lots of threads and disabled logging (note: i tried various settings):

<IfModule mpm_worker_module>
    ServerLimit         1
    ThreadLimit         1000
    StartServers          1
    MaxClients          1000
    MinSpareThreads     25
    MaxSpareThreads     75
    ThreadsPerChild     1000
    MaxRequestsPerChild   0
</IfModule>
CustomLog /dev/null combined
ErrorLog /dev/null

Paster

The Paster webserver is the webserver provided with Python Paste it is Pylons default webserver. You can run a WSGI application as follows:

from pong import application
from paste import httpserver
httpserver.serve(application, '0.0.0.0', request_queue_size=500)

Tornado

Tornado is the non-blocking webserver that powers FriendFeed. It provides some WSGI server functionality which can be used as described below. In the previous benchmark I have shown that it provides excellent raw-socket performance.

import os
import tornado.httpserver
import tornado.ioloop
import tornado.wsgi
import sys
from pong import application
sys.path.append('/home/nicholas/benchmark/wsgibench/')
def main():
    container = tornado.wsgi.WSGIContainer(application)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(8000)
    tornado.ioloop.IOLoop.instance().start()
if __name__ == "__main__":
    main()

Twisted

After installing Twisted with PIP you get a tool ‘twistd’ which allows you to easily serve WSGI applications fe:

wistd –pidfile=/tmp/twisted.pid -no web –wsgi=pong.application –logfile=/dev/null

But you can also run a WSGI application as follows:

from twisted.web.server import Site
from twisted.web.wsgi import WSGIResource
from twisted.internet import reactor
from pong import application

resource = WSGIResource(reactor, reactor.getThreadPool(), application)
reactor.listenTCP(8000,Site(resource))
reactor.run()

uWSGI

uWSGI is a server written in C, it is not meant to run stand-alone but has to be placed behind a webserver. It provides modules for Apache, NGINX, Cherokee and Lighttpd. I have placed it behind NGINX which i configured as follows:

worker_processes  1;

events {
    worker_connections  30000;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    keepalive_timeout  65;

    upstream pingpong {
        ip_hash;
        server unix:/var/nginx/uwsgi.sock;
    }

    server {
        listen       9090;
        server_name  localhost;

        location / {
            uwsgi_pass  pingpong;
            include     uwsgi_params;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

}

This made NGINX listen on a unix socket, now all i needed to do was have uWSGI connect to that same unix socket, which i did with the following command:

./uwsgi -s /var/nginx/uwsgi.sock -i -H /home/nicholas/benchmark/wsgibench/ -M -p 1 -w pong -z 30 -l 500 -L

WsgiRef

WsgiRef is the default WSGI server included with Python since version 2.5. To have this server run my application I use the following code which disables logging and increases the backlog.

from pong import application
from wsgiref import simple_server

class PimpedWSGIServer(simple_server.WSGIServer):
    # To increase the backlog
    request_queue_size = 500

class PimpedHandler(simple_server.WSGIRequestHandler):
    # to disable logging
    def log_message(self, *args):
        pass

httpd = PimpedWSGIServer(('',8000), PimpedHandler)
httpd.set_app(application)
httpd.serve_forever()

Results

Below you will find the results as plotted with Highcharts, the line will thicken when hovered over and you can easily enable or disable plotted results by clicking on the legend.

HTTP 1.0 Server results

Disqualified servers

From the above graph it should be clear that some of the web servers are missing, the reason is that I was unable to have them completely benchmarked as they stopped replying when the request rate passed a certain critical value. The servers that are missing are:

  • MagnumPy, i was able to obtain a reply rate of 500 RPS, but when the request rate passed the 700 RPS mark, MagnumPy crashed
  • Concurrence, I was able to obtain a successful reply rate of 700 RPS, but it stopped replying when we fired more than 800 requests a second at the server. However, since Concurrence does support HTTP/1.1 keep alive connections and behaves correctly when benchmarked under a lower connection rate but higher request rate you can find its results in the HTTP/1.1 benchmark
  • Cogen, was able to obtain a reply rate of 800 per second but stopped replying when the request rate was above 1500 per second. It does have a complete benchmark under the HTTP/1.1 test though.
  • WSGIRef, I obtained a reply rate of 352 but it stopped reacting when we passed the 1900 RPS mark
  • Paster, obtained a reply rate of 500 but it failed when we passed the 2000 RPS mark

Interpretation

From the servers that passed the benchmark we can see that they all have an admirable performance. At the bottom we have Twisted and Gunicorn, the performance of Twisted is somewhat expected as well it isn’t really tuned for WSGI performance. I find the performance of Gunicorn somewhat disappointing, also because for example Aspen which is a pure Python from a few years back, shows a significant better performance.  We can see however, that  increasing the worker count does in fact improve the performance as it is able to obtain a reply rate competitive with Aspen.

The other pure python servers, CherryPy  and Tornado seem to be performing on par with ModWSGI. It looks that CherryPy has a slight performance edge over Tornado. So, if you are thinking to change from ModWSGI or CherryPy to Tornado because of increased performance you should think again. Not only does this benchmark show that there isn’t that much to gain. But you will also abandon the process/thread model meaning that you should be cautious with code blocking your interpreter.

The top performers are clearly FAPWS3, uWSGI and Gevent. FAPWS3 has been designed to be fast and lives up the expectations, this has been noted by others as well as it looks like it is being used in production at Ebay. uWSGI is used successfully in production at (and in development by) the Italian ISP Unbit. Gevent is a relatively young project but already very successful. Not only did it perform great in the previous async server benchmark but its reliance on the Libevent HTTP server gives it a performance beyond the other asynchronous frameworks.

I should note that the difference between these top 3 is too small to declare a clear winner of the ‘reply rate contest’. However, I want to stress that with almost all servers I had to be careful to keep the amount of concurrent connections low since threaded servers aren’t that fond of lots concurrent connections. The async servers (Gevent, Eventlet, and Tornado) were happy to work on whatever was being thrown at them. This really gives a great feeling of stability as you do not have to worry about settings such as poolsize, worker count etc..

Most of the servers have an acceptable response time. Twisted and Eventlet are somewhat on the slow side but Gunicorn shows, unfortunately, a dramatic increase in latency when the request rate passes the 1000 RPS mark. Increasing the Gunicorn worker count lowers this latency by a lot but it still on the high side compared with for example Aspen or CherryPy.

The low error rates for CherryPy, ModWSGI, Tornado, uWSGI should give everybody confidence in their suitability for a production environment.

HTTP 1.1 Server results

In the HTTP/1.1 benchmark we have a different list of contestants as not all servers were able to pipeline multiple requests over a single connection. In this test the connection rate is relatively low, for example a request rate of 8000 per second is about 800 connections per second with 10 requests per connection. This means that some servers that were not able to complete the HTTP/1.0 benchmark (with connection rates up to 5000 per second) are able to complete the HTTP/1.1 benchmark (Cogen and Concurrence for example).

This graph shows the achieved request rate of the servers and we can clearly see that the achieved request rate is higher than in the HTTP/1.0 test. We could increase the total request rate even more by increasing the number of pipelined requests but this would then lower the connection rate. I think that 10 pipelined requests is a ok generalization of a webbrowser opening an average page.

The graph shows a huge gap in performance difference, with the fastest server Gevent we are able to obtain about 9000 replies per second, with Twisted, Concurrence and Cogen we get about 1000. In the middle we have CherryPy and ModWSGI with them we are able to obtain a reply rate around the 4000. It is interesting that Tornado while being close to CherryPy and ModWSGI seems to have an edge in this benchmark compared to the edge CherryPy had in the HTTP/1.0 benchmark. This is along the lines of our expectations as pipelined requests in Tornado are cheaper (since it is Async) then in ModWSGI or CherryPy. We expect this gap to widen if we increase the number of pipelined requests. However, it falls to be seen how much of a performance boost this would provide in a deployment setup as Tornado and CherryPy will then probably be sitting behind a reverse proxy, for example NGINX. In such a setting the connection type between the upstream and the proxy is usually limited to HTTP/1.0, NGINX for example does not even support HTTP/1.1 keep alive connections to its upstreams.

The best performers are clearly uWSGI and Gevent. I benchmarked Gevent with the ’spawn=none’ option to prevent Gevent from spawning a Greenlet, this seems fair in a benchmark like this. However, when you want to do something interesting with lots of concurrent connections you want each request to have its own Greentlet as this allows you to have thread like flow control. Thus I also benchmarked that version which can be seen in the Graph under the name ‘Gevent-Spawn’, from its results we can see that performance penalty is small.

Cogen is getting a high latency after about 2000 requests per second, Eventlet and Twisted show an increased latency fairly early as well.

The error rate shows that Twisted, Concurrence and Cogen have some trouble keeping up, I think all other error rates are acceptable.

Memory Usage

I also monitored the memory usage of the different frameworks during the benchmark. The benchmark noted below is the peak memory usage of all accumulated processes. As this benchmark does not really benefit from additional processes (as there is only one available processor) I limited the amount of workers when possible.

From these results there is one thing that really stands out and that is the absolutely low memory usage of uWSGI, Gevent and FAPWS3. Especially if we take their performance into account. It looks like Cogen is leaking memory, but I haven’t really looked into that. Gunicorn-3w shows compared with Gunicorn a relatively high memory usage. But it should be noted that this is mainly caused by the switch from the naked deployment to the deployment after NGINX as we now also have to add the memory usage of NGINX. A single Gunicorn worker only takes about 7.5Mb of memory.

Let’s Kick it up a notch

The first part of this post focussed purely on the RPS performance of the different frameworks under a high load. When the WSGI server was working hard enough it could simply answer all requests from a certain user and move on to the next user. This keeps the amount of concurrent connections relatively low making such a benchmark suitable for threaded web servers.

However, if we are going to increase the amount of concurrent connections we will quickly run into system limits as explained in the introduction. This is commonly known as the C10K problem. Asynchronous servers use a single thread to handle multiple connections and when efficiently implemented with for example EPoll or KQueue are perfectly able to handle a large amount of concurrent connections.

So that is what we are going to do, we are going to take the top-3 performing WSGI servers namely Tornado, Gevent and uWSGI (FAPWS3 lack of HTTP/1.1 support made it unsuitable for this benchmark) and give them 5 minutes of ping-pong mayhem.

You see, ping-pong is a simple game and it isn’t really the complexity that makes it interesting it is the speed and the reaction of the players. Now, what is 5 minutes of pingpong mayhem? Imagine that for 5 minutes long every second an Airbus loaded with ping-pong players lands (500 clients) and each of those players is going to slam you exactly 12 balls (with a 5 second interval). This would mean that after 5 seconds you would already have to return the volleys of 2000 different players at once.

Tsung Benchmark Setup

To perform this benchmark I am going to use Tsung, which is a multi-protocol distributed load testing tool written in Erlang. I will then have 3 different machines simulating the ping-pong rampage. I used the following Tsung script.

<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []>
<tsung loglevel="warning">

    <clients>
        <client host="tsung2" use_controller_vm="false" maxusers="800"/>
        <client host="tsung3" use_controller_vm="false" maxusers="800"/>
        <client host="bastet" use_controller_vm="false" maxusers="800"/>
    </clients>
    <servers>
        <server host="tsung1" port="8000" type="tcp"/>
    </servers>
    <monitoring>
        <monitor host="tsung1" type="erlang"/>
    </monitoring>

    <load>
        <arrivalphase phase="1" duration="5" unit="minute">
            <users interarrival="0.002" unit="second"/>
        </arrivalphase>
    </load>

    <sessions>
        <session name='wsgitest' probability='100'  type='ts_http'>
            <for from="0" to="12" incr="1" var="counter">
                <request>
                    <http url='http://tsung1:8000/' version='1.1' method='GET'/>
                </request>
                <thinktime random='false' value='5'/>
            </for>
        </session>
    </sessions>

</tsung>

Tsung Benchmark Results

Let me first state that all the three frameworks are perfectly capable to handle this kind of load, none of the frameworks dropped connection or ignored requests. Which I must say is already quite an achievement, considering that they had to handle about 2 million requests each.

Below the concurrent connection graph we can see the system load, the cpu usage and the free memory on the system during the benchmark. We can clearly see that Gevent put less strain on the system as the CPU and Load graph indicate. In the memory graph we can see that all frameworks used a consistent amount of memory.

The readers that still pay close attention to this article should note that the memory graph displays 4 lines instead of 3. The fourth line is Gevent compiled against Libevent 2.0.4a, the new release of Libevent has been said to show considerable performance improvements in its HTTP server. But it is still an alpha version and the memory graph shows that this version is leaking memory. Not something you want on your production site.

The final graph shows the latency of the 3 frameworks we can see a clear difference between Tornado and its competitors as Tornado’s response time hovers around 100ms, uWSGI around 5ms and gevent around 3ms. This is quite a difference and I am really amazed by the low latency of both Gevent and uWSGI during this onslaught.

Summary and Remarks

The above results show that as a Python web developer we have lots of different methods to deploy our applications. Some of these seem to perform better than others but by focussing only on server performance I will not justify most of the tested servers as they differ greatly in functionality. Also, if you are going to take some stock web framework and won’t do any optimizations or caching, the performance of your webserver is not going to matter as this will not be the bottleneck. If there is one thing which made this benchmark clear is that most Python Web servers offer great performance and if you feel things are slow the first thing to look at is really your own application.

When you are just interested in quickly hosting your threaded application you really can’t go wrong with Apache ModWSGI. Even though Apache ModWSGI might put a little more strain on your memory requirements there is a lot to go for in terms of functionality. For example, protecting part of your website by using a LDAP server is as easy as enabling a module. Standalone CherryPy also shows great performance and functionality and is really a viable (fully Python) alternative which can lower memory requirements.

When you are a little more adventurous you can look at uWSGI and FAPWS3, they are relatively new compared to CherryPy and ModWSGI but they show a significant performance increase and do have lower memory requirements.

Concerning Tornado and performance, I do not think Tornado is an alternative for CherryPy or even ModWSGI. Not only does it hardly show any increases in performance but it also requires you to rethink your code. But Tornado can be a great option if you do not have any code using blocking connections or are just wanting to look at something new.

And then there is Gevent, it really showed amazing performance at a low memory footprint, it might need some adjustments to your legacy code but then again the monkey patching of the socket module could help and I really love the cleanness of Greenlets. There has already been some reports of deploying Gevent successfully even with SQLAlchemy.

And if you want to dive into high performance websockets with lots of concurrent connections you really have to go with an asynchronous framework. Gevent seems like the perfect companion for that, at least that is what we are going to use.

December 22, 03:33 AM

There has already been written a lot on the C10K problem and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we handle the concurrent connections in a single thread.

In this post i am going to look at a selection of asynchronous servers implemented in Python.

Asynchronous Server Specs

Since Python is really rich with (asynchronous) frameworks, I collected a few and looked at the following features:

  • What License does the framework have?
  • Does it provide documentation?
  • Does the documentation contain examples?
  • Is it used in production somewhere?
  • Does it have some sort of community (mailinglist, irc, etc..)?
  • Is there any recent activity?
  • Does it have a blog (from the owner)?
  • Does it have a twitter account?
  • Where can i find the repository?
  • Does it have a Thread Pool?
  • Does it provide access to a TCP Socket?
  • Does it have any Comet features?
  • Is it using EPOLL?
  • What kind of server is it? (greenlets, callbacks, generators etc..)

This gave me the following table.

NameLic.DocEx.Prod.Com.Act.BlogTwtRep.PoolWsgiScketCmetEpollTestStyle
TwistedMITYesYesYesHugeYesLotsNoTracYesYesYesNoYesYesCallback
TornadoApacheYesYesF.FeedYesYesFBYesGHubNoLim. YesNoYesNoAsync
OrbitedMITYesYesYesYesYesYesNoTracNoNoYesYesYesYesCallback
DieselWebBSDYesYesSTalkYesYes YesYesBitB.NoLim.YesYesYesNoGenerator
MultiTaskMITSomeNoNoNoNoYesNoBzrNoNoNoNoNoNoGenerator
ChiralGPL2APINoNoIRCNoNoNoTracNoYesYesYesYesYes Coroutine
EventletMITYesYesS. Life YesYesYesNoBitB.YesYes YesNoYesYesGreenlet
FriendlyFlowGPL2Some OneNoNoNoNoYesGgleNoNoYesNoNoYesGenerator
WeightlessGPL2YesNoYesNoNoNoYesSFNoNoYesNoNoYesGenerator
FibraMITNoNoNoNoNoYesNoGgleNoNoYesNoNoNoGenerator
ConcurrenceMITYesYeshyvesYesYesNoNoGHubNoYesYesNoYesYes Tasklet
CircuitsMITYesYesYesYesYesYesYesTracNo YesYesNoNoYesAsync
GeventMITYesYesYesYesYesYesYesBitB.NoYesYesNoYesYesGreenlet
CogenMITYesYesYesNoYesYesYesGgleNoYesYesNoYesYesGenerator

This is quite a list and i probably still missed a few. The main reasons for using a framework and not implementing something your self is that you hope to be able to accelerate your own development process by standing on the shoulders of other developers. I think it therefore is important that there is documentation, some sort of developers community (mailinglist fe)  and that it is still active. If we take this as a requirement we are left with the following solutions:

  • Orbited / Twisted (callbacks)
  • Tornado (async)
  • Dieselweb (generator)
  • Eventlet (greenlet)
  • Concurrence (stackless)
  • Circuits (async)
  • Gevent (greenlet)
  • Cogen (generator)

To quickly summarize this list; Twisted has been the de-facto standard to async programming with Python. It has an immense community, a wealth of tools, protocols and features. It has grown big and some say it reminds them of shirtless men drinking Jager-bombs complex. This is also one of the biggest reasons why people are looking elsewhere. Recently Facebook released the code of their async. approach called Tornado which is also using callbacks and recent benchmark show that it outperforms Twisted.

A common heard argument against programming with callbacks is that it can get overly complex. A programmatically cleaner approach is to use light-weight threads (imho). This can be achieved by using a different Python implementation; Stackless (such as Concurrence is using) or a plugin for regular python Greenlet (such as Eventlet and Gevent are using). Another approach is to simulate these light-weight threads with Python generators, such as Dieselweb and Cogen are doing.

This should already show that while all these frameworks provide you asynchronous concurrency they do this in each of their own ways. I want to invite you to look at these frameworks as they all have their own code gems. For example, Concurrence has a non-blocking interface to MySQL. Eventlet has a neat thread-pool, Tornado can pre-fork over CPU’s, Gevent offloads HTTP header parsing and DNS lookups to Libevent, Cogen has sendfile support and Twisted probably already has a factory doing exactly what you are planning to do next.

The Ping Pong Benchmark

In this benchmark i am going to focus on the performance of the framework to listen on a socket and write to incoming connections. The client pings the socket by opening it, the server responds with a ‘Pong!’ and closes the socket. This should be really simple but it is a pain to create something that does this in an asynchronous and non-blocking way from scratch and that is exactly the reason why we are looking at these frameworks. It is all about making our lives easier.

Ok, for this benchmark i am going to use httperf,  a high performance tool that understands the HTTP protocol. If we want httperf to play along in our Ping-Pong benchmark we have to make it understand the ‘PONG!’ response. We can do this by mimicking a HTTP server and have our server respond with:

HTTP/1.0 200 OK
Content-Length: 5

Pong!

instead of just ‘Pong!’. Also, since most default server configurations are not set up to handle a large amount of concurrent requests, we need to make a few adjustments:

  • Raise the per-process file limit by compiling httperf after some adjustments.
  • Raise the per-user file limit, set ‘ulimit -n 10000‘ on both server and client.
  • Raise kernel limit on file handles: ‘echo “128000″ > /proc/sys/fs/file-max’.
  • Increase the connection backlog, ‘sysctl -w net.core.netdev_max_backlog = 2500
  • Raise the maximum connections with ’sysctl -w net.core.somaxconn = 250000

With these settings my Debian Lenny system was ready to hammer the different servers up to rates far beyond the capacity of the frameworks. I used the following command

httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1

And increased the rate with an interval of 100 from 400 up to 9000 requests per second for a total of 40.000 requests at each interval.

Code

What will now follow, is the implementation of the server side in the different frameworks. It should show the different approaches the frameworks take.

Twisted

Gentlemen start your reactor!

from twisted.internet import epollreactor epollreactor.install()
from twisted.internet.protocol import Protocol, Factory
from twisted.internet import reactor

class Pong(Protocol):
 def connectionMade(self):
 self.transport.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
 self.transport.loseConnection()

# Start the reactor
factory = Factory()
factory.protocol = Pong
reactor.listenTCP(8000, factory)
reactor.run()

Tornado

Tornado, does not hide the raw socket interface, which makes this example more lengthy then the others.


import errno
import functools
import socket
from tornado import ioloop, iostream

def connection_ready(sock, fd, events):
    while True:
        try:
            connection, address = sock.accept()
        except socket.error, e:
            if e[0] not in (errno.EWOULDBLOCK, errno.EAGAIN):
                raise
            return
        connection.setblocking(0)
        stream = iostream.IOStream(connection)
        stream.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n", stream.close)

if __name__ == '__main__':
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.setblocking(0)
    sock.bind(("", 8010))
    sock.listen(5000)

    io_loop = ioloop.IOLoop.instance()
    callback = functools.partial(connection_ready, sock)
    io_loop.add_handler(sock.fileno(), callback, io_loop.READ)
    try:
        io_loop.start()
    except KeyboardInterrupt:
        io_loop.stop()
        print "exited cleanly"

Dieselweb

While this example is beautifully small, i do not really enjoy the generator approach which sprinkles ‘yield’ all over the place.

from diesel import Application, Service

def server_pong(addr):
    yield "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"

app = Application()
app.add_service(Service(server_pong, 8020))
app.run()

Circuits

I think the Circuit code is the most beautiful of them all, very elegent.

from circuits.net.sockets import TCPServer

class PongServer(TCPServer):

    def connect(self, sock, host, port):
        self.write(sock, 'HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n')
        self.close(sock)

PongServer(('localhost', 8050)).run()

Eventlet

The Eventlet uses a Greenlet approach.

from eventlet import api

def handle_socket(sock):
    sock.makefile('w').write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
    sock.close()

server = api.tcp_listener(('localhost', 8030))
while True:
    try:
        new_sock, address = server.accept()
    except KeyboardInterrupt:
        break
    # handle every new connection with a new coroutine
    api.spawn(handle_socket, new_sock)

Gevent

Gevent is presented as a rewrite of eventlet focussing on performance.

import gevent
from gevent import socket

def handle_socket(sock):
    sock.sendall("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
    sock.close()

server = socket.socket()
server.bind(('localhost', 8070))
server.listen(500)
while True:
    try:
        new_sock, address = server.accept()
    except KeyboardInterrupt:
        break
    # handle every new connection with a new coroutine
    gevent.spawn(handle_socket, new_sock)

Concurrence

Concurrence uses the Tasklet approach, it can be run under Greenlet and under Stackless Python. In this benchmark there was not really any performance difference between the two different engines.

from concurrence import dispatch, Tasklet
from concurrence.io import BufferedStream, Socket

def handler(client_socket):
    stream = BufferedStream(client_socket)
    writer = stream.writer
    writer.write_bytes("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
    writer.flush()
    stream.close()

def server():
    server_socket = Socket.new()
    server_socket.bind(('localhost', 8040))
    server_socket.listen()

    while True:
        client_socket = server_socket.accept()
        Tasklet.new(handler)(client_socket)

if __name__ == '__main__':
    dispatch(server)

Cogen

Cogen, uses the generator approach as well.

import sys

from cogen.core import sockets
from cogen.core import schedulers
from cogen.core.coroutines import coroutine

@coroutine
def server():
    srv = sockets.Socket()
    adr = ('0.0.0.0', len(sys.argv)>1 and int(sys.argv[1]) or 1200)
    srv.bind(adr)
    srv.listen(500)
    while 1:
        conn, addr = yield srv.accept()
        fh = conn.makefile()
        yield fh.write("HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nHello World!\r\n")
        yield fh.flush()
        conn.close()

m = schedulers.Scheduler()
m.add(server)
m.run()

Results

The first graph clearly shows at which connection rate (on the horizontal axis) the successful connection rate starts to degrade. It shows a huge difference between the best performer; Tornado with 7400 requests per second and the worst, Circuits with 1400 requests per second (which doesn’t use EPOLL). This connection rate was sustained for at least 40.000 requests. We can see that, when the hammering of the server continues beyond rates the server can handle, the performance drops. This is caused by connection errors or timeouts.

This graph shows the response time, it is clearly visible that once the maximum connection rate has been reached the overal response time starts to increase.

The last graph shows the amount of errors, ie no return of a 200 detected by httperf. We can see a correlation between the performance of the server and the returned errors at a given request rate. The performing servers return less overall errors. There is however, one exception. Cogen was able to return ALL its requests successfully no matter how hard it was hammered. It is therefore not visible in this graph. This is interesting, at 9000 requests per second it was still able to answer all requests. However, the average connection time (from socket open till socket close) was about 7 seconds meaning that Cogen was serving about 28000 concurrent connections somewhat at reduced performance but not dropping them.

Notes

This post should make it clear that Python has a rich set of options toward asynchronous programming. All tested frameworks show great performance. I mean, even Circuits results with 1300 requests per second isn’t too bad. Tornado really blew me away with its performance at 7400 requests per second. But if i had to choose a favorite i would  probably go with Gevent, i am really digging its greenlet style.

The clean Greentlet / Stackless style is really cool, especially since Stackless Python is keeping up nowadays with CPython. There was some talk on a mailing list about Gevent running on Stackless. The concurrence framework already runs on Stackless and can thus be a great option already if you are looking for specific features of Stackless Python such as tasklet-pickling.

I want to make clear that this test only shows  how these frameworks perform at a relatively simple task. It could be that when more stuff is going on in the background the results will change. However, I feel that this benchmark is a great indicator of how each frameworks handles a socket connection.

In the coming days I plan to investigate this some more. I will also check out  how these Python frameworks stack up against its equivalents in different languages, fe Ape, CometD, NodeJS. Stay tuned!

December 21, 05:17 PM

For my Msc thesis I have developed a system build in Python which does person recognition and have shown that it is possible to obtain a better recognition rate with this system than by using Google’s Picasa. I have put the source code online and will hereby announce that I will try my best to spend some time explaining how to do person and face recognition with Python.  I hope that a public announcements such as this will instantly create some public debt forcing me to actually complete this task. We shall see.

The approach described in my thesis uses a combination of pictorial cues (Eigenfaces, SIFT points, color histogram) and contextual cues (co-occurence with other persons or background). The idea behind this is really simple, in order to recognize a person we don’t even need to see the face of that specific person in most cases if we have more contextual information about the setting in which it was taken. For example, lets say you are looking at some pictures from ‘Sesame street’, when you detect Ernie there is a high probability that that other person on that same photo will be Bert. Even more so if we detect that the main component of that other persons color histogram is yellow.

The approach can be divided in three different subtasks:

  1. Detecting the person and segmenting its specific region
  2. Extracting the features
  3. Clustering over these features

In the first task, we will detect the different people by using a face detection technique based on haarcascades. I plan on showing how to use OpenCV with Python and how you can improve its performance by combining the result of multiple haar cascades.

With the detected face we only have a certain square region within a picture which is very likely to contain a face. In order to detect the rest of the body i used a graph based segmentation technique and highly optimized the segmentation algorithm by using implementations in NumPy, Fortran and finally PyCUDA

From this segmented region we will then extract pictorial features such as a color histogram and SIFT features. With this information we can then try to extract and use our contextual information.

In the schematic overview we can see the different steps, we start with some images and segment it in regions of interest. From these regions we will then extract features to build our person models, over which we can then cluster by using  pictorial features (SIFT points and Color Histograms) and contextual features (ie, co-occurence with detected background or other persons). The code for this can already be found on BitBucket but is a bit rough, but as already said I promised to do some explaining. So keep an eye on this blog if you’re interested.

For now, I present you my collected list of references (also in BibTex format) regarding person recognition. It could be a nice starting point for anyone interested in this domain.

References

davis2006rbp
The relationship between precision-recall and ROC curves
J. Davis and M. Goadrich
233--240  (2006)
raghavan1989cir
A critical investigation of recall and precision as measures of retrieval system performance
V. Raghavan and P. Bollmann and G. S. Jung
ACM Transactions on Information Systems (TOIS)  7  205--229  (1989)
wilson2006ffd
Facial feature detection using Haar classifiers
P. I. Wilson and J. Fernandez
Journal of Computing Sciences in Colleges  21  127--133  (2006)
freund1997dtg
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Y. Freund and R. E. Schapire
Journal of Computer and System Sciences  55  119--139  (1997)
graham2002tep
Time as essence for photo browsing through personal digital libraries
A. Graham and H. Garcia-Molina and A. Paepcke and T. Winograd
326--335  (2002)
cooper2005tec
Temporal event clustering for digital photo collections
M. Cooper and J. Foote and A. Girgensohn and L. Wilcox
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  1  269--288  (2005)
moon2001cap
Computational and performance aspects of PCA-based face-recognition algorithms
H. Moon and P. J. Phillips
Perception-London  30  303--322  (2001)
otoole2005fra
Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination
A. O'Toole and P. J. Phillips and F. Jiang and J. Ayyad and N. Pénard and H. Abdi
IEEE Transactions on Pattern Analysis and Machine Intelligence  1642--1646  (2007)
liu2006cdi
Capitalize on Dimensionality Increasing Techniques for Improving Face Recognition Grand Challenge Performance
C. Liu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE  725--737  (2006)
beis1997siu
Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces
J. Beis and D. Lowe
1000--1006  (1997)
bay2006ssu
SURF: Speeded Up Robust Features
H. Bay and T. Tuytelaars and L. Van Gool
Lecture Notes in Computer Science  3951  404  (2006)
ke2004psm
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
Y. Ke and R. Sukthankar
2  (2004)
kryszczuk:ccf
Color correction for face detection based on human visual perception metaphor
K. Kryszczuk and A. Drygajlo
138--143  (2007)
gevers1999cbo
Color-based object recognition
T. Gevers and A. W. M. Smeulders
Pattern Recognition  32  453--464  (1999)
swain1991ci
Color indexing
M. J. Swain and D. H. Ballard
International Journal of Computer Vision  7  11--32  (1991)
schmid2000eip
Evaluation of Interest Point Detectors
C. Schmid and R. Mohr and C. Bauckhage
International Journal of Computer Vision  37  151--172  (2000)
lowe2004dif
Distinctive Image Features from Scale-Invariant Keypoints
D. G. Lowe
International Journal of Computer Vision  60  91--110  (2004)
mikolajczyk2004sai
Scale & Affine Invariant Interest Point Detectors
K. Mikolajczyk and C. Schmid
International Journal of Computer Vision  60  63--86  (2004)
mikolajczyk2005pel
A Performance Evaluation of Local Descriptors
K. Mikolajczyk and C. Schmid
IEEE Transactions on Pattern Analysis and Machine Intelligence  1615--1630  (2005)
comaniciu2002msr
Mean Shift: A Robust Approach Toward Feature Space Analysis
D. Comaniciu and P. Meer
IEEE Transactions on Pattern Analysis and Machine Intelligence  603--619  (2002)
grabner2006fas
Fast Approximated SIFT
M. Grabner and H. Grabner and H. Bischof
Lecture Notes in Computer Science  3851  918  (2006)
castrillonsantana2008faf
Face and Facial Feature Detection Evaluation
M. Castrillón-Santana and L. Déniz-Suárez and L. Antón-Canalís and J. Lorenzo-Navarro
7  (2008)
lienhart2002esh
LAn Extended Set of Haar-like Features for Rapid Object Detection
R. Lienhart and J. Maydt
1  900--903  (2002)
turk1991er
Eigenfaces for Recognition
M. Turk and A. Pentland
Journal of Cognitive Neuroscience  3  71--86  (1991)
felzenszwalb2004egb
Efficient Graph-Based Image Segmentation
P. F. Felzenszwalb and D. P. Huttenlocher
International Journal of Computer Vision  59  167--181  (2004)
Hae-sang:2006rz
A K-means-like Algorithm for K-medoids Clustering and Its Performance
P. Hae-sang and L. Jong-seok and J. Chi-hyuck
1222-1231  (2006)
cui2007eip
EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking
J. Cui and F. Wen and R. Xiao and Y. Tian and X. Tang
367--376  (2007)
elgammal2001pfs
Probabilistic framework for segmenting people under occlusion
A. Elgammal and L. Davis
2  (2001)
2008_garcia_cvgpu
Fast k nearest neighbor search using GPU
V. Garcia and E. Debreuve and M. Barlaud
(2008)
VanDeSandeCVPR2008
Evaluation of Color Descriptors for Object and Scene Recognition
K. E. A. van de Sande and T. Gevers and C. G. M. Snoek
(2008)
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used. To increase illumination invariance and discriminative power, color descriptors have been proposed only recently. As many descriptors exist, a structured overview of color invariant descriptors in the context of image category recognition is required. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors in a structured way. The invariance properties of color descriptors are shown analytically using a taxonomy based on invariance properties with respect to photometric transformations. The distinctiveness of color descriptors is assessed experimentally using two benchmarks from the image domain and the video domain. From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results reveal further that, for light intensity changes, the usefulness of invariance is category-specific.
wagstaff2001ckm
Constrained k-means clustering with background knowledge
K. Wagstaff and C. Cardie and S. Rogers and S. Schroedl
577--584  (2001)
gallagher_cvpr_08_clothing
Clothing Cosegmentation for Recognizing People
A. Gallagher and T. Chen
(2008)
Lepetit:cr
Keypoint Recognition using Randomized Trees
V. Lepetit and P. Fua
IEEE Transactions on Pattern Analysis and Machine Intelligence  (2006)
Indyk:1999dq
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
P. Indyk and R. Motwani
Proceedings of the thirtieth annual ACM symposium on Theory of computing  605--613  (1998)
Gionis:1999bh
Similarity Search in High Dimensions via Hashing
A. Gionis and P. Indyk and R. Motwani
???  518-529  (1999)
Sivic:2004qf
Efficient Visual Content Retrieval and Mining in Videos
J. Sivic and A. Zisserman
???  (2004)
Clayton:2007vn
A learning framework for nearest neighbor search
L. Clayton and S. Dasgupta
Advances in Neural Information Processing Systems 20  (2007)
ozuysal2007fkr
Fast keypoint recognition in ten lines of code
M. Ozuysal and P. Fua and V. Lepetit
Proc. IEEE Conference on Computing Vision and Pattern Recognition  (2007)
shi2000nca
Normalized cuts and image segmentation
J. Shi and J. Malik
IEEE Transactions on Pattern Analysis and Machine Intelligence  22  888--905  (2000)
marfil2006psa
Pyramid segmentation algorithms revisited
R. Marfil and L. Molina-Tanco and A. Bandera and J. Rodríguez and F. Sandoval
Pattern Recognition  39  1430--1451  (2006)
Zhang:2007pd
Local features and kernels for classification of texture and ob ject categories
J. Zhang and M. Marszalek and S. Lazebnik and C. Schmid
International Journal of Computer Vision  73  213-238  (2007)
lowe1999orl
Object recognition from local scale-invariant features
D. G. Lowe
International Conference on Computer Vision  2  1150--1157  (1999)
SandeMSC07
Coloring Concept Detection in Video using Interest Regions
K. E. A. v. d. Sande
(2007)
Video concept detection aims to detect high-level semantic information present in video. State-of-the-art systems are based on visual features and use machine learning to build concept detectors from annotated examples. The choice of features and machine learning algorithms is of great influence on the accuracy of the concept detector. So far, intensity-based SIFT features based on interest regions have been applied with great success in image retrieval. Features based on interest regions, also known as local features, consist of an interest region detector and a region descriptor. In contrast to using intensity information only, we will extend both interest region detection and region description with color information in this thesis. We hypothesize that automated concept detection using interest region features benefits from the addition of color information. Our experiments, using the Mediamill Challenge benchmark, show that the combination of intensity features with color features improves significantly over intensity features alone.
ramanan2003fat
Finding and tracking people from the bottom up
D. Ramanan and D. Forsyth
Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on  2  (2003)
felzenswalb
Efficient Matching of Pictorial Structures
P. Felzenszwalb and D. Huttenlocher
66-73  (2000)
girgensohn2004lfr
Leveraging face recognition technology to find and organize photos
A. Girgensohn and J. Adcock and L. Wilcox
Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval  99--106  (2004)
naaman2005lcr
Leveraging context to resolve identity in photo albums
M. Naaman and R. B. Yeh and H. Garcia-Molina and A. Paepcke
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries  178--187  (2005)
berg2007naf
Names and Faces
T. L. Berg and A. C. Berg and J. Edwards and M. Maire and R. White and Y. W. Teh and E. Learned-Miller and D. Forsyth
University of California Berkeley. Technical report  (2007)
tian2007faf
A Face Annotation Framework with Partial Clustering and Interactive Labeling
Y. Tian and W. Liu and R. Xiao and F. Wen and X. Tang
Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on  1--8  (2007)
zhao2006apa
Automatic Person Annotation of Family Photo Album
M. Zhao and Y. Teo and S. Liu and T. Chua and R. Jain
International Conference on Image and Video Retrieval  163--172  (2006)
zhang2005rfa
Robust Face Alignment Based on Local Texture Classifiers
L. Zhang and H. Ai and S. Xin and C. Huang and S. Tsukiji and S. Lao
The IEEE International Conference on Image Processing  354--357  (2005)
arandjelovic2006acl
Automatic Cast Listing in Feature-Length Films with Anisotropic Manifold Space
O. Arandjelovic and R. Cipolla
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  2  1513--1520
jaffre11ipl
JImprovement of a person labelling method using extracted knowledge on costume
G. Jaffre and P. Joly
jaffre:cnf
Costume: A New Feature for Automatic Video Content Indexing
G. Jaffre and P. Joly
Coupling approaches, coupling media and coupling languages for information retrieval (RIAO)  314--325  (2004)
anguelov2007cir
Contextual Identity Recognition in Personal Photo Albums
D. Anguelov and K. Lee and S. B. Gokturk and B. Sumengen
Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on  1--7  (2007)
yacoob2005daa
Detection, Analysis and Matching of Hair
Y. Yacoob and L. Davis
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on  1  (2005)
song2006cah
Context-Aided Human Recognition--Clustering
Y. Song and T. Leung
European Conference on Computer Vision  (2006)
sivic2006fpr
Finding people in repeated shots of the same scene
J. Sivic and C. L. Zitnick and R. Szeliski
British Machine Vision Conference  (2006)
yilmaz2006ots
Object tracking: A survey
A. YILMAZ and O. JAVED and M. SHAH
ACM computing surveys  38  1--45  (2006)
dalal:hdu
Human Detection Using Oriented Histograms of Flow and Appearance
N. Dalal and B. Triggs and C. Schmid
kpalma:oap
An Overview of Advances of Pattern Recognition Systems in Computer Vision
K. Kpalma and J. Ronsin
gavrila1999vah
Visual analysis of human movement: A survey
D. M. Gavrila
Computer Vision and Image Understanding  73  82--98  (1999)
The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially impor- tant applications, ``looking at people'' is currently one of the most active application domains in computer vision. This survey identi- fies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, sys- tems are reviewed. We conclude with some thoughts about future directions.
jones2002scm
Statistical Color Models with Application to Skin Detection
M. J. Jones and J. M. Rehg
International Journal of Computer Vision  46  81--96  (2002)
The existence of large image datasets such as the set of photos on the World Wide Web make it possible to build powerful generic models for low-level image attributes like color using simple histogram learning techniques. We describe the construction of color models for skin and non-skin classes from a dataset of nearly 1 billion labelled pixels. These classes exhibit a surprising degree of separability which we exploit by building a skin pixel detector achieving a detection rate of 80% with 8.5% false positives. We compare the performance of histogram and mixture models in skin detection and find histogram models to be superior in accuracy and computational cost. Using aggregate features computed from the skin pixel detector we build a surprisingly effective detector for naked people. Our results suggest that color can be a more powerful cue for detecting people in unconstrained imagery than was previously suspected. We believe this work is the most comprehensive and detailed exploration of skin color models to date.
diplaros2004sdu
Skin detection using the EM algorithm with spatial constraints
A. Diplaros and T. Gevers and N. Vlassis
Systems, Man and Cybernetics, 2004 IEEE International Conference on  4  (2004)
Abstract -- In this paper, we propose a color-based method for skin detection and segmentation, which also takes into account the spatial coherence of the skin pixels. We treat the problem of skin detection as an inference problem. We as- sume that each pixel in an image has a hidden binary label associated with it, that specifies if it is skin or not. In order to solve the inference problem ,we use a variational EM al- gorithm which incorporates the spatial constraints with just a small computational overhead in the E-step. Finally, we show that our method provides better results than the stan- dard EM algorithm and a state-of-art skin-detection method from the literature [9].
gavrila07eb
A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching
D. M. Gavrila
29  (2007)
gavrila2007mcp
Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle
D. M. Gavrila and S. Munder
International Journal of Computer Vision  73  41--59  (2007)
This paper presents a multi-cue vision system for the real-time detection and tracking of pedestrians from a moving vehicle. The detection component involves a cascade of modules, each utilizing complementary visual criteria to successively narrow down the image search space, balancing robustness and efficiency considerations. Novel is the tight integration of the consecutive modules: (sparse) stereo-based ROI generation, shape-based detection, texture-based classification and (dense) stereo-based verification. For example, shape-based detection activates a weighted combination of texture-based classifiers, each attuned to a particular body pose.Performance of individual modules and their interaction is analyzed by means of Receiver Operator Characteristics (ROCs). A sequential optimization technique allows the successive combination of individual ROCs, providing optimized system parameter settings in a systematic fashion, avoiding ad-hoc parameter tuning. Application-dependent processing constraints can be incorporated in the optimization procedure. Results from extensive field tests in difficult urban traffic conditions suggest system performance is at the leading edge.
bowyer2006saa
A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition
K. W. Bowyer and K. Chang and P. Flynn
Computer Vision and Image Understanding  101  1--15  (2006)
This survey focuses on recognition performed by matching models of the three-dimensional shape of the face, either alone or in combination with matching corresponding two-dimensional intensity images. Research trends to date are summarized, and challenges confronting the development of more accurate three-dimensional face recognition are identified. These challenges include the need for better sensors, improved recognition algorithms, and more rigorous experimental methodology.
phillips2005ofr
Overview of the face recognition grand challenge
P. J. Phillips and P. J. Flynn and T. Scruggs and K. W. Bowyer and J. Chang and K. Hoffman and J. Marques and J. Min and W. Worek
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition  1  947--954  (2005)
Over the last couple of years, face recognition researchers have been developing new techniques. These developments are being fueled by advances in computer vision techniques, computer design, sensor design, and interest in fielding face recognition systems. Such advances hold the promise of reducing the error rate in face recognition systems by an order of magnitude over Face Recognition Vendor Test (FRVT) 2002 results. The Face Recognition Grand Challenge (FRGC) is designed to achieve this performance goal by presenting to researchers a six-experiment challenge problem along with data corpus of 50,000 images. The data consists of 3D scans and high resolution still imagery taken under controlled and uncontrolled conditions. This paper describes the challenge problem, data corpus, and presents baseline performance and preliminary results on natural statistics of facial imagery.
zhu2006fhd
Fast Human Detection Using a Cascade of Histograms of Oriented Gradients
Q. Zhu and S. Avidan and M. C. Yeh and K. T. Cheng
Computer Vision and Pattern Recognition  1  4  (2006)
We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Using AdaBoost for feature selection, we identify the appropriate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the computation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.
dalai2005hog
Histograms of oriented gradients for human detection
N. Dalai and B. Triggs and I. Rhone-Alps and F. Montbonnot
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on  1  (2005)
We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
schneiderman2004odu
Object Detection Using the Statistics of Parts
H. Schneiderman and T. Kanade
International Journal of Computer Vision  56  151--177  (2004)
In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixed-size image window. To find the object at any location and size, these classifiers scan the image exhaustively. Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the class-conditional statistics of these part values from representative samples of object and non-object images. We trained each classifier to minimize classification error on the training set by using Adaboost with Confidence-Weighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated class-conditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation---labeling the input as non-object---or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.
zhao2003frl
Face Recognition: A Literature Survey
W. Zhao and R. Chellappa and P. Phillips and A. Rosenfeld
ACM Computing Surveys  35  399--458  (2003)
As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system. This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.
viola2001rod
Rapid object detection using a boosted cascade of simple features
P. Viola and M. Jones
Proc. CVPR  1  511--518  (2001)
yang2002dfi
Detecting faces in images: a survey
M. H. Yang and D. Kriegman and N. Ahuja
Pattern Analysis and Machine Intelligence, IEEE Transactions on  24  34--58  (2002)
osuna1997tsv
Training support vector machines: an application to face detection
E. Osuna and R. Freund and F. Girosi and others
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition  24  (1997)
rowley1998nnb
Neural network-based face detection
H. Rowley and S. Baluja and T. Kanade
Pattern Analysis and Machine Intelligence, IEEE Transactions on  20  23--38  (1998)
vezhnevets2003spb
A survey on pixel-based skin color detection techniques
V. Vezhnevets and V. Sazonov and A. Andreeva
Proc. Graphicon  85--92  (2003)
Skin color has proven to be a useful and robust cue for face de- tection, localization and tracking. Image content filtering, content- aware video compression and image color balancing applications can also benefit from automatic detection of skin in images. Numer- ous techniques for skin color modelling and recognition have been proposed during several past years. A few papers comparing differ- ent approaches have been published [Zarit et al. 1999], [Terrillon et al. 2000], [Brand and Mason 2000]. However, a comprehensive survey on the topic is still missing. We try to fill this vacuum by reviewing most widely used methods and techniques and collecting their numerical evaluation results.
terrillon1998adh
Automatic detection of human faces in natural scene images by use of a skin color model and of invariant moments
J. C. Terrillon and M. David and S. Akamatsu
Proc. of the Third International Conference on Automatic Face and Gesture Recognition  112--117  (1998)
yang1997scm
Skin-color Modeling and Adaptation
J. Yang and W. Lu and A. Waibel
(1997)
sigal2000eap
Estimation and prediction of evolving color distributions for skin segmentation under varying illumination
L. Sigal and S. Sclaroff and V. Athitsos
PROC IEEE COMPUT SOC CONF COMPUT VISION PATTERN RECOGNIT  2  152--159  (2000)
A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin color (HSV) histogram over time. Histograms are dynamically updated based on feed- back from the current segmentation and based on predic- tions of the Markov model. The evolution of the skin color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are prop- agated by warping and re-sampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. Quantitative evaluation of the method was conducted on labeled ground-truth video sequences taken from popular movies.
raja1998tas
Tracking and segmenting people in varying lighting conditions using colour
Y. Raja and S. J. McKenna and S. Gong
Third International Conference on Automatic Face and Gesture Recognition, Nara, Japan, IEEE Computer Society Press  228--233  (1998)
Colour cues were used to obtain robust detection and tracking of people in relatively unconstrained dynamic scenes. Gaussian mixture models were used to estimate probability densities of colour for skin, clothing and back- ground. These models were used to detect, track and seg- ment people, faces and hands. A technique for dynamically updating the models to accommodate changes in apparent colour due to varying lighting conditions was used. Two applications are highlighted: (1) actor segmentation for vir- tual studios, and (2) focus of attention for face and gesture recognition systems. A system implemented on a 200MHz PC tracks multiple objects in real-time.
drew1998iic
Illumination-invariant color object recognition via compressedchromaticity histograms of color-channel-normalized images
M. Drew and J. Wei and Z. N. Li
Computer Vision, 1998. Sixth International Conference on  533--540  (1998)
chang1996cts
Color texture segmentation for clothing in a computer-aided fashion design system
C. C. Chang and L. L. Wang
Image and Vision Computing  14  685--702  (1996)
A traditional fashion designer has to draw a large number of drafts in order to accomplish an ideal style. Better performance can be achieved if these operations are done on computers, because the designer can easily make changes for various patterns and colors. To develop a computer-aided fashion design system, one of the most difficult tasks is to automatically separate the clothing from the background so that a new item can be `put on'. One difficulty of the segmentation work arises from the diverse patterns on the clothing, especially with folds or shadows. In this study, circular histograms are first utilized to quantize color and to reduce shadow/highlight effects. Then a color co-occurrence matrix and a color occurrence vector are proposed to characterize the color spatial dependence and color occurrence frequency of the clothing's texture. Next, based on the two color features blocks on the clothing are found by a region growing method. Finally, post-processing is applied to obtain a smooth clothing boundary. Experimental results are presented to show the feasibility of the proposed approach.
darrell2000ipt
Integrated Person Tracking Using Stereo, Color, and Pattern Detection
T. Darrell and G. Gordon and M. Harville and J. Woodfill
International Journal of Computer Vision  37  175--185  (2000)
We present an approach to real-time person tracking in crowded and/or unknown environments using integration of multiple visual modalities. We combine stereo, color, and face detection modules into a single robust system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours or days). Short-term tracking is performed using simple region position and size correspondences, while medium and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual module, describe our integration method, and report results with the complete system in trials with thousands of users.
arandjelovic2005afr
Automatic face recognition for film character retrieval in feature-length films
O. Arandjelovic and A. Zisserman
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on  1  (2005)
everingham2005iiv
Identifying individuals in video by combining generative and discriminative head models
M. Everingham and A. Zisserman
Proc. ICCV  1103--1110  (2005)
zhang2003aah
Automated annotation of human faces in family albums
L. Zhang and L. Chen and M. Li and H. Zhang
Proceedings of the eleventh ACM international conference on Multimedia  355--358  (2003)
Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based image retrieval and face recognition are seamlessly integrated to achieve automated annotation. Secondly, face annotation is formulated in a Bayesian framework, in which the face similarity measure is defined as maximum a posteriori (MAP) estimation. Thirdly, to deal with the missing features, marginal probability is used so that samples which have missing features are compared with those having the full feature set to ensure a non-biased decision. The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums.
apostof07
Who Are You? realtime person identification
N. Apostoloff and A. Zisserman
(2007)
Everingham06a
Hello! My name is... Buffy -- Automatic Naming of Characters in TV Video
M. Everingham and J. Sivic and A. Zisserman
(2006)
kruppa
Fast and Robust Face Finding via Local Context
H. Kruppa and M. Costrillon-Santana and B. Schiele
Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance  (2003)
santana
ENCARA2: Real-time Detection of Multiple Faces at Different Resolutions in Video Streams
M. Castrillón Santana and O. Déniz Suárez and M. Hernández Tejera and C. Guerra Artal
Journal of Visual Communication and Image Representation  130-140  (2007)
sanne
Classifying the Head-shoulder region and orientation in pedestrians
S. Korzec
(2007)
mori
Recovering Human body configurations: Combining segmentation and recognition
G. Mori and X. Ren and A. A. Efros and J. Malik
IEEE Computer Vision and Pattern Recognition  326-333  (2004)
partassembly
Detection and tracking of Humans by Probalistic Body Part Assembly
A. Micilotta
This paper presents a probabilistic framework of assembling detected hu- man body parts into a full 2D human configuration. The face, torso, legs and hands are detected in cluttered scenes using boosted body part detectors trained by AdaBoost. Body configurations are assembled from the detected parts using RANSAC, and a coarse heuristic is applied to eliminate obvious outliers. An a priori mixture model of upper-body configurations is used to provide a pose likelihood for each configuration. A joint-likelihood model is then determined by combining the pose, part detector and corresponding skin model likelihoods. The assembly with the highest likelihood is selected by RANSAC, and the elbow positions are inferred. This paper also illustrates the combination of skin colour likelihood and detection likelihood to further reduce false hand and face detections.
viola
Robust real-time face detection
P. Viola and M. J. Jones
International Journal of Computer Vision  57  137-154  (2004)
December 11, 09:56 AM

Now that the dust has somewhat settled after climategate, the consensus seems to be that it has been overblown. If you look at the timeline of events this isn’t surprising. Between the public appearance of the report and the first damning articles on the 20th there was less then a single day.  It is not that difficult to question how thorough the review of 160mb of data was.  It simply wasn’t.

It was as if some people thought they had hit gold and where aggressively searching for that specific quote within the leaked emails which would make them famous instantly. But all in all it was a bit disappointing if you where hoping to find exciting revelations. The thing that could be distilled from the e-mails was that most researchers are having strong opinions and big ego’s, but this shouldn’t really be a surprise.

It is naive to think that scientists are unbiassed, they simply aren’t. However, they are expected to backup up their views with unbiassed facts. The main argument thats left if we ignore all personal slander seems to be focused around a quote in one of the emails concerning the WMO Statement of the status of the global climate in 1999. The front page of this report shows the picture below and indicates that 1990-1999 has been the hottest decade on the record. So yes, it is an argument about a 10 year old report. It might be worth noting that a few days ago (8 dec 2009), the World Meteorological Institute came with a new press release that our current decade is the warmest on record. That information got probably lost in the heated debate.

From the leaked emails conservative news sources state that the following quote is a clear sign of manipulation of evidence:

“I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith’s to hide the decline.”

But is it? In a rebuttal by the Climate Research Unit they state the following:

This email referred to a “trick” of adding recent instrumental data to the end of temperature reconstructions that were based on proxy data.

Phil Jones comments further: “One of the three temperature reconstructions was based entirely on a particular set of tree-ring data that shows a strong correlation with temperature from the 19th century through to the mid-20th century, but does not show a realistic trend of temperature after 1960. This is well known and is called the ‘decline’ or ‘divergence’. The use of the term ‘hiding the decline’ was in an email written in haste. CRU has not sought to hide the decline. Indeed, CRU has published a number of articles that both illustrate, and discuss the implications of, this recent tree-ring decline, including the article that is listed in the legend of the WMO Statement figure.

They also provide an extra graph where they show the climate reconstruction and the recent instrumental data seperately:

So, as you can see there isn’t really anything shocking to report.

It seems that our viewpoint concerning climate change seems closely linked to our position on the political spectrum. In the red corner, we have the conservatives who consider any idea where they might need to change their way of living threatening. In the blue corner we have the progressives, those who feel that change is a goal not just a method. During the first round of the climate gate boxing match we mainly heard the conservative viewpoint represented by the TelegraphFOX news, Washington Times and lots of infuriated bloggers but now that that the round is over i think the focus will shift to a more progressive point of view. You see, wether or not climate change is happening we will have to think about how we manage our environment. We are running out of resources and we are polluting our environment . When we do not act accordingly we will end up like the easter islands.

Round 2: The need for data sharing

A positive result of this climate battle is the renewed focus on the public availability of data and methodologies. CRU claims that 95% of their data is already open to the public and that they will make the remaining 5% publicly available, which is great news.  This movement of ‘data freeing’ is a great initiative, certainly in this time of collective sharing. John Wilbanks of Science Commons says the following:

“the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge”.

Making our research widely available is a great way to catalyze progress in the broadest sense, this is probably better illustrated with the next video by  Jesse Dylan.

The importance of data sharing is already recognized by the government of the USA, they have created data.gov with the purpose to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. It currently has over 118000 different datasets which really makes it a dataminers wetdream.

My efforts on data sharing

In an effort to not just stand along the sideline but participate in this ‘release your data’ party, I have decided to put my master thesis and its results in the public domain. For my master thesis i have implemented a system in mostly Python code which does person recognition on static images. You can compare it with what Google’s Picasa does. However, i was able to outperform Picasa in recognition rate on a few datasets. I have already released some of the source code on BitBucket and you can find a little bit more information on the Projects page.

In the next few months i am going to explain this approach in more detail and will put up my collected resources and a bibtex file. I think this will be a great start for anyone interested in machine vision and person recognition. If your interested just follow me on twitter!

November 30, 10:37 AM

Using a Content Delivery Network (CDN) is a method  to improve the performance of your website. Some of the reasons for using a CDN are:

  • Placing content geographically close to the end user and thus lowering latency and increasing bandwidth.
  • Increasing the amount of parallel downloads at the client by distributing over different domains
  • Offload the burden on your servers
  • Facilitate long term caching by using a robust source for libraries

Especially this last point is why I looked at Google’s CDN for Ajax libraries. It is a beautiful idea. When more people are using the same CDN, the cost of downloading an Ajax library can be ignored because it is very likely that the web browser will already have the library in its belly. Wonderfull!

For example, when I try to download the Prototype library everything goes well and the Google’s CDN spits the following back:

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:31:50 GMT
Expires:Tue, 30 Nov 2010 13:56:34 GMT

As you can see, Google tells your browser to cache the file for a full year as it should. Now, lets look at what happens when trying this with JQuery 1.3.2:

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:40:15 GMT
Expires:Tue, 30 Nov 2010 14:40:15 GMT

Again, everything is ok. Now, lets try a different version, JQuery 1.3:

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:41:42 GMT
Expires:Mon, 30 Nov 2009 15:41:42 GMT

Huh? When requesting the 1.3 version, Google is basically telling us to ‘remember it only for one hour’. This is wrong imho. When you specify 1.3, you are telling Google you want ‘the latest version in the 1.3 series‘. On the jquery.com site they are linking to the 1.3 version as well. This means that for a pagehit on the jquery website you will re download 60k of minified Jquery goodness when this file is not in your cache. A better approach would be if Google just let the client do its cache  revalidation  (which can do so by using the ‘if-modified since’ header).

But wait there is more. Wordpress, for example, adds an extra version argument to the file (?ver=<bla>). This can be handy when you want to generate a certain script or css file dynamically. And really should not be a problem with Google’s CDN, should it? Well lets see what happens when we request http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js?ver=1.3.2

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 15:00:00 GMT
Expires:Fri, 01 Jan 1990 00:00:00 GMT
Last-Modified:Mon, 23 Nov 2009 18:54:21 GMT

Holy cow, Google invented time travel!  The implications of this are pretty big, this may affect a lot of people with Wordpress blogs who where ’smart enough’ to use the Google CDN but without really testing if it worked.

Basically what this means is that http://ajax.googleapis.com isn’t really your performance safety net. You need to know exactly what you’re doing otherwise it will bite you back and you might be better off just hosting those libraries on your own site. Thus my recommendation would be to use the Google CDN but specify exactly which version of the library you are going to need and make sure you do not provide any arguments.

November 16, 03:26 PM

Yes, “Hello world!” , the first default post on a Wordpress blog. After much consideration i decided to simply use the best piece of software there is for blogging purposes. Yeah, i know it is PHP code and not Python and trust me, i tried really hard to use some Python alternative and even started working (doesn’t everybody else?) on my own blogging software written on top of Pylons, Mako and STORM, but it just doesn’t compare with WP.

Well, i must admit Zine (Python blog software) looks nice, but still, WP integrates so nice with everything else. Choosing for WP isn’t just choosing a software package, it is being part of a huge community. It is hard to compete with the zillions of PHP coders that extend Wordpress or integrate it with other platforms, no matter how much nicer your language of choice is.

The things that i where looking for, at minimum:

  • Easy editing of posts, the WP webinterface rocks!
  • RSS Support
  • Edit with XML-RPC
  • Pingbacks
  • Spamfiltering, WP integrates nicely with Akismet

But with WP Plugins you easily get so much more:

So, here it is my Wordpress blog, hosted behind a NGINX proxy which handles all static data and some Apache workers that handle the PHP scripts.  I must say, I feel confident that it can handle a Digg. At least, i benchmarked it to handle 3000 reqs/second without breaking a sweat.

Posts

RT @openQRM: openQRM 4.8 released - much more than “just” Cloud Computing - http://bit.ly/iatiQa, http://bit.ly/7dy0HF, http://bit.ly/hgz060

RT @mikkohypponen: As it turns out, mysql.com is vulnerable to - wait for it - SQL injection.

“Silly me, I thought the ‘sellable resource’ lawyers had was their law expertise, not their hours in the day.” by @bramcohen

RT @hsl: Heb je een passie voor Social Media, zoek je nog een stageplek en wil je community manager worden bij @XS4ALL? DM me voor meer info

RT @jgelens: Launched website for a charity concert #sosjapan #charity #earthquake #tsunami http://t.co/zK7ASBY please buy a ticket!

Even when not understanding Japanese http://bit.ly/9ZhkUw makes a nice point of the complexity of Plone vs TG / Pylons

Last night I watched a drunk guy try to dial a phone number using Calculator on his iPhone.His look of confusion was priceless/@timmolendijk

.@getify We are not going to loose Flash at all. Flash is going to have a great future as a canvas composer. ( http://bit.ly/dm6v8l )

Twitter is awesome for checking if a site is down/slow. This time bitbucket, yup its not just me.

Hmm latency between Amsterdam and San Francisco has an absolute minimum of 0.03 seconds, at least if Einstein is right.

Now if we take the refractive index of glass fiber in account the absolute minimum is around 0.05 seconds.

Experiment have seen collisions!!!!!!!!!!! (via @CERN)

News Flash: world sees a huge productivity spike. In other news: parts of youtube are down.

RT @ambroff: Just released Greenlet 0.3: http://bit.ly/bI22W1 (via @gevent)

work on gevent during google summer of code: http://blog.gevent.org/2010/03/16/google-summer-of-code/ (via @gevent)

RT @zzzeek: stackoverflow: PICK ME! PICK ME! MY ANSWER IS THE BEST! THE MOST PRACTICAL! THE ONE YOU CAN UNDERSTAND ! NOOOO NOT ALEX MAR …

RT @zzzeek: Martelli nipping at my heels again ! Common isn’t 57K points enough man ? http://bit.ly/67nZeU

Geez, stackoverflow has LOTS of badges. 2010 startup tip. A social community badges aggregation site, show your awesomeness.

Proposed website name: karmaccumulation.com (it is still FREE!)

New post: Asynchronous Servers in Python http://bit.ly/8UHKhK #in

abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz