Lorenzo Bolla

Yet another personale homepage.

Posts

March 21, 11:57 AM

Postgres supports from version 8.4 a very interesting functionality: LISTEN/NOTIFY allows sending asynchronous messages to clients connected to the database.

As in a normal “chat”, a client “subscribed” (LISTEN) to a channel receives all the messages that other clients “sent” (NOTIFY) on that channel.

Since version 9.0, a notification message can have a payload string as long as 8000 bytes.

In order to experiment with this feature, I've implemented a simple chat based on Tornado's IOLoop. Each client subscribes to a channel (or “room” in chat jargon) and listens to it adding a callback to react to a new notification. In the meantime, in another thread, the client is free to write and submit messages to the “room”. Here is a screenshot of the chat in action:

This is the code, available also on gist:

March 06, 07:36 AM

Postgres has a lot of useful builtin data types, but only some of them are mapped to Python types when accessing the DB using psycopg2.

Extending the support to other types is not straightforward, and involves the following steps:

  • Create a Python class to store the data, e.g. class Point
  • Write a function to convert a Point to its SQL string representation, e.g. adapt_point
  • Write the inverse function to parse the SQL string representation of a Point and return and instance of a Point, e.g. cast_point
  • Finally bind all these functions and types, see register_point_type

The complete code is as follows, also available as a gist:

March 06, 07:31 AM

redis is often described as an “in-memory persistent key-value store”, but it's much more than that. One of its nicest features is its support for the Publish/Subscribe messaging paradigm, which makes it easy to implement, for example, a chat server.

In order to learn how to use it, I decided to implement a chat server using Redis and Tornado. This is a classical exercise, and others have done the same: but their solution has some pitfalls that I tried to fix.

The code is forked from pelletier's, with some improvements:

  • Support for the latest Python Redis's client redis-py version 2.6.9
  • Thread-safety: using the only method in Tornado's IOLoop that is thread-safe
  • Tested with Python 3.3

This is the code, available also on gist:

January 24, 04:33 AM

Every now and then a new discussion is raised on Tornado's mailling list about what is the best way to execute blocking tasks. It turns out that there are 3 feasible options, in order of increasing complexity:

  • Optimize blocking calls. Often, a slow DB query, or an overly complicate template are the blocking bottleneck. Rather than complicating the webserver, the first thing to try is to speed them up. This is sufficient 99% of the time.
  • Execute the slow task in a separate thread or process. This means off-loading the task to a different thread (or process) to the one running the IOLoop, which is then free to accept other requests.
  • Use an asynchronous driver/library to run the task. For example, something like gevent, motor and the like.

This blog post is about the second option, in particular using Python's concurrent.futures package.

For example, consider this simple web server, with a blocking “SleepHandler” handler:

import time

import tornado.ioloop
import tornado.web


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        self.write("Hello, world %s" % time.time())


class SleepHandler(tornado.web.RequestHandler):

    def get(self, n):
        time.sleep(float(n))
        self.write("Awake! %s" % time.time())


application = tornado.web.Application([
    (r"/", MainHandler),
    (r"/sleep/(\d+)", SleepHandler),
])


if __name__ == "__main__":
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()

Try to visit http://localhost:8888/sleep/10 in one tab and http://localhost:8888/ in another: you'll see that “Hello, world” is not printed in the second tab until the first one has finished, after 10 seconds. Effectively, the first call is blocking the IOLoop, who cannot serve the second tab.

You can make the “SleepHandler” Tornado-friendly by executing it in another thread. Below is a decorator that can be used to “unblock” it:

from concurrent.futures import ThreadPoolExecutor
from functools import partial, wraps

import tornado.ioloop
import tornado.web


EXECUTOR = ThreadPoolExecutor(max_workers=4)


def unblock(f):

    @tornado.web.asynchronous
    @wraps(f)
    def wrapper(*args, **kwargs):
        self = args[0]

        def callback(future):
            self.write(future.result())
            self.finish()

        EXECUTOR.submit(
            partial(f, *args, **kwargs)
        ).add_done_callback(
            lambda future: tornado.ioloop.IOLoop.instance().add_callback(
                partial(callback, future)))

    return wrapper


class SleepHandler(tornado.web.RequestHandler):

    @unblock
    def get(self, n):
        time.sleep(float(n))
        return "Awake! %s" % time.time()

Very simply, the unblock decorator submits the decorated function to the thread pool, which returns a future; a callback is added to this future to return control to the IOLoop, by calling add_callback, which eventually will call self.finish and conclude the request.

Note that the decorated function must be itself be decorated with tornado.web.asynchronous, in order to not call self.finish too soon! Moreover, self.write is not thread-safe (thanks mrjoes!) therefore it must be called in the main thread with the future's result as parameter.

Full code is below, available on gist.

December 12, 06:49 AM

Last my weekend project was to write something similar to WeHasLinks. In fact, WeHasLinks is a file sharing website, but I misread it as “We-Hash-Links” and the funny thing is that they indeed hash their links (for obvious reasons…). Anyway, WeHasLinks's links are hashed so that only the user who visited the page is allowed to them.

I liked the idea very much, and I decided to implement it in go, as an exercise! You can find the code on github. A demo is available at unshareme.lbolla.info.

The links are encrypted using AES-256 and validated using HMAC, which is the standard way to encrypt secure cookies in web apps. In fact, gorilla provides a library to do just that. The code looks pretty much like this:

var hashKey = securecookie.GenerateRandomKey(32)
var blockKey = securecookie.GenerateRandomKey(32)
var encodeName = "encodeName"
var sc = securecookie.New(hashKey, blockKey)
...

func encode(msg PersonalURL) (string, error) {
    enc, err := sc.Encode(encodeName, msg)
    ...

“Personalization” of links is done coupling each link with the remote IP visiting the page.

// Store URI and IP together
type PersonalURL struct {
    URI string
    IP string
}

When visited, the web app will decode the link, verify that the remote IP visiting it is the same as the IP who requested the links in the first place and redirect to the real url. Otherwise, a 400 will be raised.

Per se, the app is very simple but I learnt a lot about go while implementing itt: in particular, that in term of speed of development it's very close to a scripting language go's standard library is amazing and gorilla is a very nice complement for web apps.

One thing I didn't like, is how templates are handled: it's overly complicated to specify a relative path for the templates directory and templates are not compiled into the source code automatically. The easiest solution I found was to specify the path on the command line. In this case, [10][yesod has a better solution].

Full code, for reference:

package main

import (
    "encoding/base64"
    "flag"
    "fmt"
    "github.com/gorilla/securecookie"
    "github.com/gorilla/mux"
    "html/template"
    "log"
    "net/http"
    "net/url"
    "path/filepath"
    "strings"
)

// Random stuff for encoding
var hashKey = securecookie.GenerateRandomKey(32)
var blockKey = securecookie.GenerateRandomKey(32)
var encodeName = "encodeName"
var sc = securecookie.New(hashKey, blockKey)

// Router for handlers
var router = mux.NewRouter()

// Store URI and IP together
type PersonalURL struct {
    URI string
    IP string
}

// Flags
var templates_path = flag.String("t", "src/unshareme/tmpl/", "Path to the templates")
var templates = template.New("")

func encode(msg PersonalURL) (string, error) {
    enc, err := sc.Encode(encodeName, msg)
    if err != nil {
        return "", err
    }

    b64enc := base64.URLEncoding.EncodeToString([]byte(enc))

    return b64enc, nil
}

func decode(enc string) (msg PersonalURL, err error) {
    b64enc, err := base64.URLEncoding.DecodeString(enc)
    if err != nil {
        return
    }

    err = sc.Decode(encodeName, string(b64enc), &msg)
    if err != nil {
        return
    }

    return
}

// Only works for IPv4, like 127.0.0.1:12345, not IPv6 like [::1]:12345
func remoteIP(r *http.Request) string {
    // Get it from headers, as set by nginx
    ip := r.Header.Get("X-Real-IP")
    if ip == "" {
        // Strips port number
        ip = strings.Split(r.RemoteAddr, ":")[0]
    }
//         log.Print("IP:", ip)
    return ip
}

func MainHandler(w http.ResponseWriter, r *http.Request) {
    err := templates.ExecuteTemplate(w, "index.html", nil)
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
    }
}

func EncodeHandler(w http.ResponseWriter, r *http.Request) {
    u, err := url.Parse(r.URL.Query().Get("u"))
    if err != nil {
        log.Print(err.Error())
        http.Error(w, "", http.StatusBadRequest)
        return
    }

    if u.Scheme == "" {
        http.Error(w, "Invalid scheme", http.StatusBadRequest)
        return
    }

    msg := PersonalURL{URI: u.String(), IP: remoteIP(r)}
    enc, err := encode(msg)
    if err != nil {
        log.Print(err.Error())
        http.Error(w, "", http.StatusBadRequest)
        return
    }

    link, _ := router.Get("Decode").URL("enc", enc)
    fmt.Fprint(w, link.String())
}

func DecodeHandler(w http.ResponseWriter, r *http.Request) {
    vars := mux.Vars(r)
    dec, err := decode(vars["enc"])
    if err != nil {
        log.Print(err.Error())
        http.Error(w, "", http.StatusBadRequest)
        return
    }

    if rip := remoteIP(r); dec.IP != rip {
        log.Print(dec.IP, rip)
        http.Error(w, "", http.StatusBadRequest)
        return
    }

    http.Redirect(w, r, dec.URI, http.StatusFound)
    return
}

func main() {
    flag.Parse()
    templates = template.Must(template.ParseFiles(filepath.Join(*templates_path, "index.html")))
    router.Handle("/favicon.ico", http.NotFoundHandler())
    router.HandleFunc("/", MainHandler).Methods("GET")
    router.HandleFunc("/enc", EncodeHandler).Methods("GET")
    router.HandleFunc("/dec/{enc}", DecodeHandler).Methods("GET").Name("Decode")
    http.Handle("/", router)
    log.Fatal(http.ListenAndServe(":7001", nil))
}
November 30, 09:07 AM

This is the fifth post of a series describing simple scripts that I wrote to ease my life as a programmer.

They are available on github: fork & hack at will!

Watch reacts to changes in a directory executing a command provided by the user. It can be used, for example, to monitor a directory and run some unittests as soon as files in it change. This is exactly how I am using Watch in acme.

Watch is based on the pyinotify library, a very slim, one file library that I included my repo for simplicity. Basically, pyinotify relies on inotify, an event-driven notifier merged in the Linux kernel since version 2.6.13: given a directory to watch, it raises events that users can process defining handlers in the ProcessEvent class.

One note is that Watch refuses to run its command more often that once every 3 seconds. This is to avoid that multiple events raised on the same directory too quickly queue up too many processes.

Here is the code:

#!/usr/bin/env python

# Watch for modified files in localdir (.) and react.
# ./Watch <cmd>
# i.e.: ./Watch flake8 .

from pylib.pyinotify import WatchManager, EventsCodes, ProcessEvent, Notifier
from subprocess import call
import sys
import time


class ProcessManager(ProcessEvent):

    LAST_TIME = None

    def __init__(self, cmds):
        super(ProcessEvent, self).__init__()
        self.cmds = cmds

    def is_too_soon(self):
        return self.LAST_TIME and time.time() - self.LAST_TIME < 3

    def process_IN_CLOSE_WRITE(self, event):
        # For some reason, this event is triggered twice
        if not self.is_too_soon():
            call(self.cmds)
            self.LAST_TIME = time.time()


def main():

    dir = '.'
    cmds = sys.argv[1:]

    wm = WatchManager()

    mask = EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE']

    notifier = Notifier(wm, ProcessManager(cmds))
    wm.add_watch(dir, mask, rec=True)

    while True:
        try:
            notifier.process_events()
            if notifier.check_events():
                notifier.read_events()
        except KeyboardInterrupt:
            notifier.stop()
            break


if __name__ == '__main__':
    main()
November 23, 10:03 AM

This is the forth post of a series describing simple scripts that I wrote to ease my life as a programmer.

In this post I'll describe 2 simple scripts to indent nicely HTML and XML files. I use them primarily with acme, to pipe selected text and get back nicely formatted output.

Code is available here: htmlind and xmlind. Both programs are written in Python and make use of specialized libraries freely available online. In particular, xmlind uses xml.dom.minidom, included in Python's standard library, and htmlind uses a modified version of BeautifulSoup.

The most interesting part of these script is the modification to BeautifulSoup, in order to support variable tabstop width in pretty printing. The patch is here: it basically allows a user to set tabstop width as an environmental variable ($tabstop) which defaults to “4”.

For example:

% echo '<a><b>text text</b><c>more text</c></a>' | htmlind
<a>
    <b>
        text text
    </b>
    <c>
        more text
    </c>
</a>

% tabstop=1 echo '<a><b>text text</b><c>more text</c></a>' | htmlind
<a>
 <b>
  text text
 </b>
 <c>
  more text
 </c>
</a>
November 16, 08:46 AM

This is the third post of a series describing simple scripts that I wrote to ease my life as a programmer.

In this post, I'll describe 3 scripts to “pretty print” some common file types, to improve readability: csvfmt, xmlfmt and jsonfmt.

csvfmt takes a CSV (“Comma Separated Values”) file from stdin, parses it and pretty print each record as a Python dictionary.

#!/usr/bin/env python

import csv
import sys
import pprint

for row in csv.DictReader(sys.stdin):
    pprint.pprint(row)

Output looks like this:

% echo 'a,b,c
1,2,3
4,5,6
' | csvfmt
{'a': '1', 'b': '2', 'c': '3'}
{'a': '4', 'b': '5', 'c': '6'}

xmlfmt takes an XML file from either stdin or a file (specified on the cmd line) and extracts all the text from it. This script is thought to be used to read the text embedded in XML tags, and it's analogous to [htmlfmt]5. If you want to format an XML file, maintaining the XML tags, use [xmllint -format]6, or my [xmlind]7

#!/usr/bin/env python

import xml.dom.minidom
from pylib.xmlutil import getText, getInput

dom = xml.dom.minidom.parse(getInput())
print(getText(dom))

For example:

% echo '<a>a text<b>b text</b>more a text</a>' | xmlfmt
a textb textmore a text

jsonfmt takes a JSON file from stdin and pretty prints it as a Python object.

#!/usr/bin/env python

import json
import sys
import pprint

pprint.pprint(json.load(sys.stdin))

Try it out:

$> curl 'http://search.twitter.com/search.json?q=lorenzo' | jsonfmt
{u'completed_in': 0.035,
 u'max_id': 267982040698351617L,
 u'max_id_str': u'267982040698351617',
 u'next_page': u'?page=2&max_id=267982040698351617&q=lorenzo',
 u'page': 1,
 u'query': u'lorenzo',
 u'refresh_url': u'?since_id=267982040698351617&q=lorenzo',
 u'results': [{u'created_at': u'Mon, 12 Nov 2012 13:27:52 +0000',
               u'from_user': u'michael_174',
               u'from_user_id': 234373960,
               u'from_user_id_str': u'234373960',
               u'from_user_name': u'Michael Adhiyatama',
               u'geo': None,
               u'id': 267982040698351617L,
               u'id_str': u'267982040698351617',
               u'iso_language_code': u'in',
 etc. etc.

All three scripts are written in Python and available here.

November 12, 10:33 AM

Recently, I moved away from Wordpress. I did it primarily because Wordpress is so much more than just a blogging platform and what I needed was just a simple way of publishing posts with embedded code, links and images. Moreover, writing blogs using Wordpress's web editor is less than ideal…

The biggest problem to solve when moving away from Wordpress is how to not lose all your posts. Luckily, Wordpress allows you to export all your stuff in XML, but you also need a way to import them in whatever other blogging platform you are going to use.

After some research, I decided to choose a static site generator. Out of all the available alternatives, I picked Felix Felicis (aka “liquidluck”): it's written in Python, very simple to customize and extend, and with some pleasing themes. Other solutions, like jekyll, public-static, etc. are way too “powerful” (read “complicated”) for my taste.

Unfortunately, unlike other more popular alternatives, Felix Felicis does not come with an “importer” of Wordpress's XML file. So, I decided to fork one of the existing solutions and adapt it to my needs.

I also forked the liquid luck's default theme and created my own.

If you want to do like me, migrate away from Wordpress and use Felix Felicis as your static site generator, do the following:

  1. Export your posts from Wordpress in an XML file
  2. git clone my fork of wp2md and run it over the XML file
  3. Manually check that all your links and posts have been properly exported: mine needed almost zero editing!
November 30, 09:07 AM

This is the second post of a series describing simple scripts that I wrote to ease my life as a programmer.

They are available on github: fork & hack at will!

c+/c-

In this post I'll describe a very simple script, c+, and its counterpart c-.

c+ prepends every line of stdin with #. c- strips # from the beginning of each line of stdin. I use these scripts to comment/uncomment lines in Python scripts when using acme.

Here is the code:

c+

#!/usr/bin/env rc

sed 's/^/#/'

c-

#!/usr/bin/env rc

sed 's/^#//'
November 12, 09:07 AM

This is the first post of a series describing simple scripts that I wrote to ease my life as a programmer.

They are implemented in various languages (python, bash, go) and thought to be used in Linux. Some of them are “general purpose”, while others are specifically designed to interface other tools I use (for example, acme.)

All of them tend to have the following properties:

  • Input from stdin, output to stdout, errors to stderr
  • Return zero on success, non-zero on failure
  • Do one thing only
  • Not too much customizable

These properties allow the scripts to remain very simple, be composable and easy to remember.

They are available on github: fork & hack at will!

a+/a-

In this post I'll describe a very simple script, a+, and its counterpart a-. They are the first I wrote when I started using acme.

a+ indents every line of stdin by 4 spaces. a- “de-indents” it by the same amount. The amount of spaces (4) is fixed (to resist the temptation to change it), and indentation is done with spaces and not tabs.

The code is trivial: it uses sed and rc, the Plan9's shell ported to *nix (although, in this case, any shell would do.) Here it is:

a+

# !/usr/bin/env rc  

sed 's/^/ /'

a-

# !/usr/bin/env rc  

sed 's/^ //'
November 06, 09:01 AM

After having tinkered with Haskell for quite a bit, I decided that I needed some rest from theory and esoteric concepts, and a more pragmatic programming language to explore.

I've spent the last few days refreshing my memories on Go: I hadn't touched it for almost 2 years and I must say that I find it changed: for the better.

Here is a short tutorial on how to write a simple web application in Go, and publish it on Google App Engine. The application is not a mere exercise, but scratches an itch I recently had: it counts how many times each of its handlers is hit. So, for example, visiting: go-count-urls.appspot.com/hello returns how many times the /hello handler has been visited. You can use it as a trivial real-time tracker.

For example, I used it to verify that an email I sent to someone was actually opened (and presumably read). I just picked a random URL path (like go-count-urls.appspot.com/random-string-here) and created an html email with an empty img tag pointing to it: <img src="http://go-count-urls.appspot.com/random-string-here" width=0 height=0 />. Every time the email client opens the email, it requires that URL and the hit is recorded. I admit that this use is pretty lame, and that there are other services doing this, but I needed a real-world problem to work on!

So here we go!

Setup your development environment

First of all, download and install the App Engine Go software development kit. Then create the following directory structure:

go-count-urls/
    app.yaml
    app/
        counter.go

Show me the code!

The whole application is made of just one file [counter.go]6. Here it is, comments inline:

package counter
import (
    "appengine"
    "appengine/datastore"
    "fmt"
    "net/http"
    "time"
)

// Object to store in Google's Datastore. Keeps track of how many times a
// URL was hit and when.
type Counter struct {
    Path      string
    Count     int
    Timestamp time.Time
}

// Return a brand new Counter
func getEmptyCounter(path string) Counter {
    return Counter{Path: path, Count: 0, Timestamp: time.Now()}
}

// Increment the counter for a URL. If it's the first time this URL is
// visited, create a brand new Counter before incrementing it.
// On error, return and empty counter and an error.
func inc(c appengine.Context, key *datastore.Key, path string) (Counter, error)
{
    var x Counter

    if err := datastore.Get(c, key, &amp;x); err != nil &amp;&amp; err !=
datastore.ErrNoSuchEntity {
        return getEmptyCounter(path), err
    }

    // Increment it, and update the last modified time
    x.Path = path
    x.Count++
    x.Timestamp = time.Now()

    // Save the counter
    if _, err := datastore.Put(c, key, &amp;x); err != nil {
        return getEmptyCounter(path), err
    }

    return x, nil
}

// This is the only handler. It just picks the paths, removed the leading
// slash and stores it in the Datastore. As a key in the Datastore, the URL
// itself is used.
func handle(w http.ResponseWriter, r *http.Request) {

    key := r.URL.Path[1:]
    if key == "" {
        // Return 404 on the root handler (we might want a splash page here...)
        http.NotFound(w, r)
        return
    } else if key == "favicon.ico" {
        // We are not interested in tracking favicon.ico
        w.WriteHeader(http.StatusNoContent)
        return
    }

    c := appengine.NewContext(r)

    // For how to use the Datastore see
https://developers.google.com/appengine/docs/go/datastore/overview
    count, err := inc(c, datastore.NewKey(c, key, "singleton", 0, nil),
r.URL.Path)
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }

    // Write something
    w.Header().Set("Content-Type", "text/plain; charset=utf-8")
    fmt.Fprintf(w, "Path=%s, Count=%d, When=%s", count.Path, count.Count,
count.Timestamp)
}

// Initialize the application, binding URLS to handlers.
func init() {
    http.HandleFunc("/", handle)
}

Try it out!

Launch the application using the SDK; from go-count-urls directory type:

$> $GAE_PATH/dev_appserver.py .

Now visit localhost:8080/hello. Refresh. Refresh again. And again…

Publish

Publishing the application on Google infrastructure is a matter of seconds:

$> $GAE_PATH/appcfg.py update .

You can visit it at: go-count-urls.appspot.com/hello. The code is available here: github.com/lbolla/go-count-urls.

December 01, 10:43 AM

Asynchronous programming can be tricky for beginners, therefore I think it's useful to iron some basic concepts to avoid common pitfalls. For an explanation about generic asynchronous programming, I recommend you one of the many resources online. I will focus solely on asynchronous programming in Tornado.

From Tornado's homepage:

FriendFeed's web server is a relatively simple, non-blocking web server written in Python. The FriendFeed application is written using a web framework that looks a bit like web.py or Google's webapp, but with additional tools and optimizations to take advantage of the non-blocking web server and tools. Tornado is an open source version of this web server and some of the tools we use most often at FriendFeed. The framework is distinct from most mainstream web server frameworks (and certainly most Python frameworks) because it is non-blocking and reasonably fast. Because it is non-blocking and uses epoll or kqueue, it can handle thousands of simultaneous standing connections, which means the framework is ideal for real-time web services. We built the web server specifically to handle FriendFeed's real-time features every active user of FriendFeed maintains an open connection to the FriendFeed servers. (For more information on scaling servers to support thousands of clients, see The C10K problem.)

The first step as a beginner is to figure out if you really need to go asynchronous. Asynchronous programming is more complicated that synchronous programming, because, as someone described, it does not fit human brain nicely.

You should use asynchronous programming when your application needs to monitor some resources and react to changes in their state. For example, a web server sitting idle until a request arrives through a socket is an ideal candidate. Or an application that has to execute tasks periodically or delay their execution after some time. The alternative is to use multiple threads (or processes) to control multiple tasks and this model becomes quickly complicated.

The second step is to figure out if you can go asynchronous. Unfortunately in Tornado, not all the tasks can be executed asynchronously.

Tornado is single threaded (in its common usage, although in supports multiple threads in advanced configurations), therefore any “blocking” task will block the whole server. This means that a blocking task will not allow the framework to pick the next task waiting to be processed. The selection of tasks is done by the IOLoop, which, as everything else, runs in the only available thread.

For example, this is a wrong way of using IOLoop:

Note that blocking_call is called correctly, but, being blocking (time.sleep blocks!), it will prevent the execution of the following task (the second call to the same function). Only when the first call will end, the second will be called by IOLoop. Therefore, the output in console is sequential (“sleeping”, “awake!”, “sleeping”, “awake!”).

Compare the same “algorithm”, but using an “asynchronous version” of time.sleep, i.e. add_timeout:

In this case, the first task will be called, it will print “sleeping” and then it will ask IOLoop to schedule the execution of the rest of the routine after 1 second. IOLoop, having the control again, will fire the second call the function, which will print “sleeping” again and return control to IOLoop. After 1 second IOLoop will carry on where he left with the first function and “awake” will be printed. Finally, the second “awake” will be printed, too. So, the sequence of prints will be: “sleeping”, “sleeping”, “awake!”, “awake!”. The two function calls have been executed concurrently (not in parallel, though!).

So, I hear you asking, “how do I create functions that can be executed asynchronously”? In Tornado, every function that has a “callback” argument can be used with gen.engine.Task. Beware though: being able to use Task does not make the execution asynchronous! There is no magic going on: the function is simply scheduled to execution, executed and whatever is passed to callback will become the return value of Task. See below:

Most beginners expect to be able to just write: Task(my_func), and automagically execute my_func asynchronously. This is not how Tornado works. This is how Go works! And this is my last remark:

In a function that is going to be used “asynchronously”, only asynchronous libraries should be used.

By this, I mean that blocking calls like time.sleep or urllib2.urlopen or db.query will need to be substituted by their equivalent asynchronous version. For example, IOLoop.add_timeout instead of time.sleep, AsyncHTTPClient.fetch instead of urllib2.urlopen etc. For DB queries, the situation is more complicated and specific asynchronous drivers to talk to the DB are needed. For example: Motor for MongoDB.

November 04, 12:32 PM

For a non-designer, Ext JS is kind-of a blessing. It is a self-contained fully-fledged Javascript framework, with loads of fancy re-usable browser-compatible professionally-looking widgets. It's only lacking in documentation: finding your way through the API documentation is daunting at best.

So, I bought this, and while working my way through it, I decided to share some experiments. You can find them here. These are the ones I prefer:

November 04, 12:32 PM

This is another of those posts to not forget. If printing a PDF file with lp prints a blank page with error messages like:

ERROR: configurationerror OFFENDING COMMAND: setpagedevice STACK: –nostringval– …

the problem is probably that your PDF has a certain page size (let's say letter) but your printer expects another (let's say A4).

Check your printer settings and your PDf (with lpinfo pdffile) to verify. If this is the case, print with this command instead:

lp -o fit-to-page pdffile
December 01, 10:43 AM

Today I tried to benchmark 3 web servers that I've used recently:

  1. Tornado
  2. Warp
  3. Yesod

In fact, these are only 2 web servers, because Yesod runs on top of Warp and it's a fully fledged web framework, rather than a web server: but this was also the intent of the benchmark, i.e. to measure how slower all its goodies made Yesod with respect to Warp.

Tornado and Warp are obviously very different web servers (async vs. threaded, interpreted vs. compiled, etc.) but, who cares?

The benchmark is very simple: a single handler returning “Hello World”, very original. Obviously, this is hardly a real world example, but it can give indications even if only with “orders of magnitude” of approximation.

Nonetheless, the results were very interesting. First of all, here is the code.

Tornado

Warp

Yesod

And the results, obtained using httperf:

$> httperf --hog --client=0/1 --server=localhost --port=8080 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=100 --num-calls=100 --burst-length=20
Tornado 518 req/s
Warp 10079 req/s
Yesod 929 req/s
Yesod w/o session management 7924 req/s

Wait! What?! Yesod is 10 times slower than Warp!?

I asked an explanation to the Yesod developers and they tracked down the issue: the work of these guys is an example worth studying of how to benchmark and debug code! Anyway, it looks like the issue is that serializing timestamps is incredibly inefficient: I hope a patch will be ready soon! In the meantime, I strongly suggest you to disable session management from Yesod if you want high performance. (In the code shown, I've also disabled Hamlet, Yesod's templating system, but it turned out that it didn't make much difference: code using Hamlet is in gist.)

Overall, though, even on my crappy single-core old laptop, the result is amazing: Warp/Yesod is ~20 times faster than one of the fastest Python web servers.

November 04, 12:32 PM

If you are using Gerrit for code review and project management of git-based projects, you might find yourself manually adding the same bunch of reviewers to your patches every single time.

In the past, I alleviated the problem with a simple Javascript bookmarklet: add it to your browser and click it while watching the patch in Gerrit.

But there's a better method: do it from command line, when pushing your local commits to Gerrit. Just add these lines to your .git/config:

pushurl = ssh://user@gerrit:29418/project
push = HEAD:refs/for/master
receivepack = git receive-pack --reviewer reviewer1 --reviewer reviewer2

Now, when you want to push a review, just do: git push review and “reviewer1” and “reviewer2” will be added to your patchset.

November 04, 12:32 PM

This is a vintage post to remind me how to install b43 drivers on Arch Linux for my “shiny” Belkin PCMCIA card (dated 2002…).

  • First install the firmware extractor: $> pacman -S b43-fwcutter.
  • Then install the firmware itself: $> yaourt -S b43-firmware.

Check dmesg. You should see something like:

Broadcom 43xx driver loaded [ Features: PMNLS ]

and lsmod | grep b43:

b43 330774 0 bcma 19281 1 b43 mac80211 341044 1 b43 cfg80211 147429 2 b43,mac80211 ssb 42167 2 b43,b44 pcmcia 31182 2 b43,ssb mmc_core 72742 2 b43,ssb

Finally, try to connect:

$> pacman -S wifi-select
$> wifi-select
December 01, 10:43 AM

EMpy has moved to Github!

November 05, 07:26 AM

Generators (PEP 255 “Simple Generators”) and Coroutines (PEP 342 “Coroutines via Enhanced Generators”) are the cleanest way I've come across so far to implement the concept of a “pipeline” in Python.

First approximation

A pipeline is made of:

  • a Producer, that generates data;
  • many _Stage_s, that receive data from the previous stage and send it to the next;
  • a Consumer, that receives data from the last stage.

The producer is a coroutine that only send_s data, generated internally from some initial state. _Stage_s are coroutines that both receive and send messages. The _consumer only receives data. Chaining is done in function pipeline: each argument but the last is instantiated with an instance of the next stage. The full pipeline is started by issuing a next (or send(None)) to the Producer.

In the following example, a stream of integers is produced and pushed down the pipeline: each stage adds 1 and finally the result is printed in the consumer.

Wrapping it up

A pattern emerges, so we'd better wrap it up in a class. Moreover, let's split the “architecture” of the pipeline from the behavior of each stage.

More useful example

As a more interesting application, here is how to use a pipeline to implement a simple crawler, to download links from news.ycombinator.com/ and find all the posts where the word “Python” is mentioned.

Cleaning things up

Things are still far from clean and bulletproof. One step in the right direction is to follow the suggestions found in David Beazley's presentation on coroutines.

The previous examples is by no means “production ready”, but maybe someone will find some good idea to apply to real world problems.

Recent tracks

  • The Vespertine Park by {'mbid': 'af88ef96-ba9c-441c-9291-ac4389cd1464', '#text': 'Gavin Bryars'}
    32 hours ago
  • De profundis by {'mbid': '', '#text': 'Sofia Gubaidulina'}
    32 hours ago
  • Theme Of The Uprooting by {'mbid': '', '#text': 'Eleni Karaindrou'}
    32 hours ago
  • Hibiki-Hana-Ma by {'mbid': '', '#text': 'Iannis Xenakis'}
    32 hours ago
  • Veni creator spiritus by {'mbid': 'c8db3d2b-19d8-4dc7-b2cb-deea37aa274a', '#text': 'The Hilliard Ensemble'}
    32 hours ago
  • Fratres by {'mbid': 'ae0b2424-d4c5-4c54-82ac-fe3be5453270', '#text': u'Arvo P\xe4rt'}
    32 hours ago
  • VII. Galamb borong by {'mbid': '074113e2-d052-4f6d-aeb9-0d2ab2ca0adc', '#text': u'Gy\xf6rgy Ligeti'}
    32 hours ago
  • Lento - Cantabile-Semplice by {'mbid': '', '#text': u'Henryk G\xf3recki'}
    39 hours ago
  • Summa by {'mbid': 'ae0b2424-d4c5-4c54-82ac-fe3be5453270', '#text': u'Arvo P\xe4rt'}
    39 hours ago
  • Diverge by {'mbid': 'e60a4481-472a-42cf-a84d-6a9419e4e5e3', '#text': 'Peter Broderick'}
    2 days ago

Top tracks

Photos

Favorites

Profile

Senior Software Architect
Computer Software | London, United Kingdom, GB

Summary

As a Telecommunication Engineer with a PhD in numerical methods, I have a strong background in mathematics and physics, especially in classical optics, and a broad experience in mathematical modeling of physical problems.

For more than 5 years I worked on the implementation of numerical algorithms for electromagnetic problems, with a strong focus on efficiency for computationally intensive simulations.

5+ years ago, I shifted my attention to the World Wide Web and I've been working on high performance applications since 2007. My main interests are on High Availability / High Throughput services and Artificial Intelligence.

I love programming and I am always studying new languages and techniques (Go, Haskell, and related libraries) to learn new approaches to problem solving.

Specialties: computer programming, problem solving, numerical modelling.

Experience

  • May 2012 - Present
    Senior Software Architect / RAID Research Services LLP
    Design and implementation of high performance data analysis tools. Implementation of a very efficient in-memory database to query and analyse tabular data. Marshaling of incoming data from different sources (emails, xml/json feeds, online folders), processing of unstructured information using machine-learning techniques. Automatic deployment and scaling on AWS.
  • Mar 2012 - Present
    Director / Networkscale Ltd.
  • Apr 2012 - Present
    Senior Software Engineer / Artirix
    Implementation of massively parallel web search crawlers using Python and Twisted.
  • Mar 2012 - Present
    Software developer / The Social Gaming Company
    Worked as contractor to develop back-end web servers in Python, Flask and MongoDB.
  • Mar 2011 - Present
    Senior Python Developer / Zugo Services Ltd
    Python web applications development, utilising Tornado, nginx, memcached, sphinx, MySQL, MongoDB, fabric and git. Focus on scalability using AWS solutions. Development of a massively distributed user tracking engine based on Disco and MongoDB. Users's behaviour modelling.
  • Aug 2008 - Present
    Software Developer / Geneity Limited
    High performance e-gaming web application developer. Programming languages used: Python (2.5, 2.6), C, PL/SQL (Oracle 10g-11) for the back-end; HTML/Javascript for the front-end.; with REST and SOAP web services.
  • Jun 2005 - Present
    Optical Designer / Pirelli Labs
    R&D in Photonic Integrated Circuits mainly based on Silicon-on-Insulator technology SOI-PICs. Responsible for the design of optical devices for metro and access networks, including optical filters, fiber-to-chip couplers, polarization splitters and rotators. Broad experience in numerical computing and mathematical modeling of physical problems. Patenting.
  • Mar 2005 - Present
    Engineer / Telesystem
    Hardware and software consultant for digital television broadcasting. JAVA programming of DVB applications.
  • Apr 2004 - Present
    Software Engineer / Photon Design
    Visiting PhD student and software programmer of numerical methods for electromagnetism.
  • May 2001 - Present
    Technical Support / Progetto Mantegna
    Technical assistance in the virtual reconstruction of Mantegna's paintings in the

Education

  • 2003 - 2005
    University of Udine
    Ph.D in Numerical Methods
  • 1996 - 2001
    Università degli Studi di Padova
    Laurea in Engineering in Telecommunication, Electronics, Computer Science.
  • 1991 - 1996
    Liceo Scientifico Statale Fracastoro
  • University of Padua
    Telecommunication Engineering

Additional Information

abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz