Lorenzo Bolla
Yet another personale homepage.
Posts
Postgres supports from version 8.4 a very interesting functionality: LISTEN/NOTIFY allows sending asynchronous messages to clients connected to the database.
As in a normal “chat”, a client “subscribed” (LISTEN) to a channel receives
all the messages that other clients “sent” (NOTIFY) on that channel.
Since version 9.0, a notification message can have a payload string as long as 8000 bytes.
In order to experiment with this feature, I've implemented a simple chat based on Tornado's IOLoop. Each client subscribes to a channel (or “room” in chat jargon) and listens to it adding a callback to react to a new notification. In the meantime, in another thread, the client is free to write and submit messages to the “room”. Here is a screenshot of the chat in action:
This is the code, available also on gist:
Postgres has a lot of useful builtin data types, but only some of them are mapped to Python types when accessing the DB using psycopg2.
Extending the support to other types is not straightforward, and involves the following steps:
- Create a Python class to store the data, e.g.
class Point - Write a function to convert a
Pointto its SQL string representation, e.g.adapt_point - Write the inverse function to parse the SQL string representation of a
Pointand return and instance of aPoint, e.g.cast_point - Finally bind all these functions and types, see
register_point_type
The complete code is as follows, also available as a gist:
redis is often described as an “in-memory persistent key-value store”, but
it's much more than that. One of its nicest features is its support for
the Publish/Subscribe messaging paradigm, which makes it easy to
implement, for example, a chat server.
In order to learn how to use it, I decided to implement a chat server using Redis and Tornado. This is a classical exercise, and others have done the same: but their solution has some pitfalls that I tried to fix.
The code is forked from pelletier's, with some improvements:
- Support for the latest Python Redis's client redis-py version 2.6.9
- Thread-safety: using the only method in Tornado's IOLoop that is thread-safe
- Tested with Python 3.3
This is the code, available also on gist:
Every now and then a new discussion is raised on Tornado's mailling list about what is the best way to execute blocking tasks. It turns out that there are 3 feasible options, in order of increasing complexity:
- Optimize blocking calls. Often, a slow DB query, or an overly complicate template are the blocking bottleneck. Rather than complicating the webserver, the first thing to try is to speed them up. This is sufficient 99% of the time.
- Execute the slow task in a separate thread or process. This means off-loading the task to a different thread (or process) to the one running the
IOLoop, which is then free to accept other requests. - Use an asynchronous driver/library to run the task. For example, something like gevent, motor and the like.
This blog post is about the second option, in particular using Python's concurrent.futures package.
For example, consider this simple web server, with a blocking “SleepHandler” handler:
import time
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world %s" % time.time())
class SleepHandler(tornado.web.RequestHandler):
def get(self, n):
time.sleep(float(n))
self.write("Awake! %s" % time.time())
application = tornado.web.Application([
(r"/", MainHandler),
(r"/sleep/(\d+)", SleepHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
Try to visit http://localhost:8888/sleep/10 in one tab and http://localhost:8888/ in another: you'll see that “Hello, world” is not printed in the second tab until the first one has finished, after 10 seconds. Effectively, the first call is blocking the IOLoop, who cannot serve the second tab.
You can make the “SleepHandler” Tornado-friendly by executing it in another thread. Below is a decorator that can be used to “unblock” it:
from concurrent.futures import ThreadPoolExecutor
from functools import partial, wraps
import tornado.ioloop
import tornado.web
EXECUTOR = ThreadPoolExecutor(max_workers=4)
def unblock(f):
@tornado.web.asynchronous
@wraps(f)
def wrapper(*args, **kwargs):
self = args[0]
def callback(future):
self.write(future.result())
self.finish()
EXECUTOR.submit(
partial(f, *args, **kwargs)
).add_done_callback(
lambda future: tornado.ioloop.IOLoop.instance().add_callback(
partial(callback, future)))
return wrapper
class SleepHandler(tornado.web.RequestHandler):
@unblock
def get(self, n):
time.sleep(float(n))
return "Awake! %s" % time.time()
Very simply, the unblock decorator submits the decorated function to the thread pool, which returns a future; a callback is added to this future to return control to the IOLoop, by calling add_callback, which eventually will call self.finish and conclude the request.
Note that the decorated function must be itself be decorated with tornado.web.asynchronous, in order to not call self.finish too soon! Moreover, self.write is not thread-safe (thanks mrjoes!) therefore it must be called in the main thread with the future's result as parameter.
Full code is below, available on gist.
Last my weekend project was to write something similar to WeHasLinks. In fact, WeHasLinks is a file sharing website, but I misread it as “We-Hash-Links” and the funny thing is that they indeed hash their links (for obvious reasons…). Anyway, WeHasLinks's links are hashed so that only the user who visited the page is allowed to them.
I liked the idea very much, and I decided to implement it in go, as an exercise! You can find the code on github. A demo is available at unshareme.lbolla.info.
The links are encrypted using AES-256 and validated using HMAC, which is the standard way to encrypt secure cookies in web apps. In fact, gorilla provides a library to do just that. The code looks pretty much like this:
var hashKey = securecookie.GenerateRandomKey(32)
var blockKey = securecookie.GenerateRandomKey(32)
var encodeName = "encodeName"
var sc = securecookie.New(hashKey, blockKey)
...
func encode(msg PersonalURL) (string, error) {
enc, err := sc.Encode(encodeName, msg)
...
“Personalization” of links is done coupling each link with the remote IP visiting the page.
// Store URI and IP together
type PersonalURL struct {
URI string
IP string
}
When visited, the web app will decode the link, verify that the remote IP visiting it is the same as the IP who requested the links in the first place and redirect to the real url. Otherwise, a 400 will be raised.
Per se, the app is very simple but I learnt a lot about go while implementing itt: in particular, that in term of speed of development it's very close to a scripting language go's standard library is amazing and gorilla is a very nice complement for web apps.
One thing I didn't like, is how templates are handled: it's overly complicated to specify a relative path for the templates directory and templates are not compiled into the source code automatically. The easiest solution I found was to specify the path on the command line. In this case, [10][yesod has a better solution].
Full code, for reference:
package main
import (
"encoding/base64"
"flag"
"fmt"
"github.com/gorilla/securecookie"
"github.com/gorilla/mux"
"html/template"
"log"
"net/http"
"net/url"
"path/filepath"
"strings"
)
// Random stuff for encoding
var hashKey = securecookie.GenerateRandomKey(32)
var blockKey = securecookie.GenerateRandomKey(32)
var encodeName = "encodeName"
var sc = securecookie.New(hashKey, blockKey)
// Router for handlers
var router = mux.NewRouter()
// Store URI and IP together
type PersonalURL struct {
URI string
IP string
}
// Flags
var templates_path = flag.String("t", "src/unshareme/tmpl/", "Path to the templates")
var templates = template.New("")
func encode(msg PersonalURL) (string, error) {
enc, err := sc.Encode(encodeName, msg)
if err != nil {
return "", err
}
b64enc := base64.URLEncoding.EncodeToString([]byte(enc))
return b64enc, nil
}
func decode(enc string) (msg PersonalURL, err error) {
b64enc, err := base64.URLEncoding.DecodeString(enc)
if err != nil {
return
}
err = sc.Decode(encodeName, string(b64enc), &msg)
if err != nil {
return
}
return
}
// Only works for IPv4, like 127.0.0.1:12345, not IPv6 like [::1]:12345
func remoteIP(r *http.Request) string {
// Get it from headers, as set by nginx
ip := r.Header.Get("X-Real-IP")
if ip == "" {
// Strips port number
ip = strings.Split(r.RemoteAddr, ":")[0]
}
// log.Print("IP:", ip)
return ip
}
func MainHandler(w http.ResponseWriter, r *http.Request) {
err := templates.ExecuteTemplate(w, "index.html", nil)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
func EncodeHandler(w http.ResponseWriter, r *http.Request) {
u, err := url.Parse(r.URL.Query().Get("u"))
if err != nil {
log.Print(err.Error())
http.Error(w, "", http.StatusBadRequest)
return
}
if u.Scheme == "" {
http.Error(w, "Invalid scheme", http.StatusBadRequest)
return
}
msg := PersonalURL{URI: u.String(), IP: remoteIP(r)}
enc, err := encode(msg)
if err != nil {
log.Print(err.Error())
http.Error(w, "", http.StatusBadRequest)
return
}
link, _ := router.Get("Decode").URL("enc", enc)
fmt.Fprint(w, link.String())
}
func DecodeHandler(w http.ResponseWriter, r *http.Request) {
vars := mux.Vars(r)
dec, err := decode(vars["enc"])
if err != nil {
log.Print(err.Error())
http.Error(w, "", http.StatusBadRequest)
return
}
if rip := remoteIP(r); dec.IP != rip {
log.Print(dec.IP, rip)
http.Error(w, "", http.StatusBadRequest)
return
}
http.Redirect(w, r, dec.URI, http.StatusFound)
return
}
func main() {
flag.Parse()
templates = template.Must(template.ParseFiles(filepath.Join(*templates_path, "index.html")))
router.Handle("/favicon.ico", http.NotFoundHandler())
router.HandleFunc("/", MainHandler).Methods("GET")
router.HandleFunc("/enc", EncodeHandler).Methods("GET")
router.HandleFunc("/dec/{enc}", DecodeHandler).Methods("GET").Name("Decode")
http.Handle("/", router)
log.Fatal(http.ListenAndServe(":7001", nil))
}This is the fifth post of a series describing simple scripts that I wrote to ease my life as a programmer.
They are available on github: fork & hack at will!
Watch reacts to changes in a directory executing a command provided by the
user. It can be used, for example, to monitor a directory and run some
unittests as soon as files in it change. This is exactly how I am using Watch
in acme.
Watch is based on the pyinotify library, a very slim, one file library
that I included my repo for simplicity. Basically, pyinotify relies on
inotify, an event-driven notifier merged in the Linux kernel since version
2.6.13: given a directory to watch, it raises events that users can process
defining handlers in the ProcessEvent class.
One note is that Watch refuses to run its command more often that once every
3 seconds. This is to avoid that multiple events raised on the same directory
too quickly queue up too many processes.
Here is the code:
#!/usr/bin/env python
# Watch for modified files in localdir (.) and react.
# ./Watch <cmd>
# i.e.: ./Watch flake8 .
from pylib.pyinotify import WatchManager, EventsCodes, ProcessEvent, Notifier
from subprocess import call
import sys
import time
class ProcessManager(ProcessEvent):
LAST_TIME = None
def __init__(self, cmds):
super(ProcessEvent, self).__init__()
self.cmds = cmds
def is_too_soon(self):
return self.LAST_TIME and time.time() - self.LAST_TIME < 3
def process_IN_CLOSE_WRITE(self, event):
# For some reason, this event is triggered twice
if not self.is_too_soon():
call(self.cmds)
self.LAST_TIME = time.time()
def main():
dir = '.'
cmds = sys.argv[1:]
wm = WatchManager()
mask = EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE']
notifier = Notifier(wm, ProcessManager(cmds))
wm.add_watch(dir, mask, rec=True)
while True:
try:
notifier.process_events()
if notifier.check_events():
notifier.read_events()
except KeyboardInterrupt:
notifier.stop()
break
if __name__ == '__main__':
main()This is the forth post of a series describing simple scripts that I wrote to ease my life as a programmer.
In this post I'll describe 2 simple scripts to indent nicely HTML
and XML files. I use them primarily with acme, to pipe
selected text and get back nicely formatted output.
Code is available here: htmlind and xmlind. Both programs are written in Python and make use of specialized libraries freely available online. In particular, xmlind uses xml.dom.minidom, included in Python's standard library, and htmlind uses a modified version of BeautifulSoup.
The most interesting part of these script is the modification to BeautifulSoup, in order to support variable tabstop width in pretty printing. The patch is here: it basically allows a user to set tabstop width as an environmental variable ($tabstop) which defaults to “4”.
For example:
% echo '<a><b>text text</b><c>more text</c></a>' | htmlind
<a>
<b>
text text
</b>
<c>
more text
</c>
</a>
% tabstop=1 echo '<a><b>text text</b><c>more text</c></a>' | htmlind
<a>
<b>
text text
</b>
<c>
more text
</c>
</a>This is the third post of a series describing simple scripts that I wrote to ease my life as a programmer.
In this post, I'll describe 3 scripts to “pretty print” some common
file types, to improve readability: csvfmt, xmlfmt and
jsonfmt.
csvfmt takes a CSV (“Comma Separated Values”) file from
stdin, parses it and pretty print each record as a Python
dictionary.
#!/usr/bin/env python
import csv
import sys
import pprint
for row in csv.DictReader(sys.stdin):
pprint.pprint(row)
Output looks like this:
% echo 'a,b,c
1,2,3
4,5,6
' | csvfmt
{'a': '1', 'b': '2', 'c': '3'}
{'a': '4', 'b': '5', 'c': '6'}
xmlfmt takes an XML file from either stdin or a file
(specified on the cmd line) and extracts all the text from it. This
script is thought to be used to read the text embedded in XML tags,
and it's analogous to [htmlfmt]5. If you want to format an XML
file, maintaining the XML tags, use [xmllint -format]6, or my
[xmlind]7
#!/usr/bin/env python
import xml.dom.minidom
from pylib.xmlutil import getText, getInput
dom = xml.dom.minidom.parse(getInput())
print(getText(dom))
For example:
% echo '<a>a text<b>b text</b>more a text</a>' | xmlfmt
a textb textmore a text
jsonfmt takes a JSON file from stdin and pretty prints it as
a Python object.
#!/usr/bin/env python
import json
import sys
import pprint
pprint.pprint(json.load(sys.stdin))
Try it out:
$> curl 'http://search.twitter.com/search.json?q=lorenzo' | jsonfmt
{u'completed_in': 0.035,
u'max_id': 267982040698351617L,
u'max_id_str': u'267982040698351617',
u'next_page': u'?page=2&max_id=267982040698351617&q=lorenzo',
u'page': 1,
u'query': u'lorenzo',
u'refresh_url': u'?since_id=267982040698351617&q=lorenzo',
u'results': [{u'created_at': u'Mon, 12 Nov 2012 13:27:52 +0000',
u'from_user': u'michael_174',
u'from_user_id': 234373960,
u'from_user_id_str': u'234373960',
u'from_user_name': u'Michael Adhiyatama',
u'geo': None,
u'id': 267982040698351617L,
u'id_str': u'267982040698351617',
u'iso_language_code': u'in',
etc. etc.
All three scripts are written in Python and available here.
Recently, I moved away from Wordpress. I did it primarily because Wordpress is so much more than just a blogging platform and what I needed was just a simple way of publishing posts with embedded code, links and images. Moreover, writing blogs using Wordpress's web editor is less than ideal…
The biggest problem to solve when moving away from Wordpress is how to not lose all your posts. Luckily, Wordpress allows you to export all your stuff in XML, but you also need a way to import them in whatever other blogging platform you are going to use.
After some research, I decided to choose a static site generator. Out of all the available alternatives, I picked Felix Felicis (aka “liquidluck”): it's written in Python, very simple to customize and extend, and with some pleasing themes. Other solutions, like jekyll, public-static, etc. are way too “powerful” (read “complicated”) for my taste.
Unfortunately, unlike other more popular alternatives, Felix Felicis does not come with an “importer” of Wordpress's XML file. So, I decided to fork one of the existing solutions and adapt it to my needs.
I also forked the liquid luck's default theme and created my own.
If you want to do like me, migrate away from Wordpress and use Felix Felicis as your static site generator, do the following:
- Export your posts from Wordpress in an XML file
git clonemy fork ofwp2mdand run it over the XML file- Manually check that all your links and posts have been properly exported: mine needed almost zero editing!
This is the second post of a series describing simple scripts that I wrote to ease my life as a programmer.
They are available on github: fork & hack at will!
c+/c-
In this post I'll describe a very simple script, c+, and its counterpart
c-.
c+ prepends every line of stdin with #. c- strips # from the
beginning of each line of stdin. I use these scripts to comment/uncomment
lines in Python scripts when using acme.
Here is the code:
c+
#!/usr/bin/env rc
sed 's/^/#/'
c-
#!/usr/bin/env rc
sed 's/^#//'This is the first post of a series describing simple scripts that I wrote to ease my life as a programmer.
They are implemented in various languages (python, bash, go) and thought
to be used in Linux. Some of them are “general purpose”, while others are
specifically designed to interface other tools I use (for example,
acme.)
All of them tend to have the following properties:
- Input from
stdin, output tostdout, errors tostderr - Return zero on success, non-zero on failure
- Do one thing only
- Not too much customizable
These properties allow the scripts to remain very simple, be composable and easy to remember.
They are available on github: fork & hack at will!
a+/a-
In this post I'll describe a very simple script, a+, and its counterpart
a-. They are the first I wrote when I started using acme.
a+ indents every line of stdin by 4 spaces. a- “de-indents” it by the
same amount. The amount of spaces (4) is fixed (to resist the temptation to
change it), and indentation is done with spaces and not tabs.
The code is trivial: it uses sed and rc, the Plan9's shell
ported to *nix (although, in this case, any shell would do.) Here it is:
a+
# !/usr/bin/env rc
sed 's/^/ /'
a-
# !/usr/bin/env rc
sed 's/^ //'After having tinkered with Haskell for quite a bit, I decided that I needed some rest from theory and esoteric concepts, and a more pragmatic programming language to explore.
I've spent the last few days refreshing my memories on Go: I hadn't touched it for almost 2 years and I must say that I find it changed: for the better.
Here is a short tutorial on how to write a simple web application in Go, and
publish it on Google App Engine. The application is not a mere exercise,
but scratches an itch I recently had: it counts how many times each of its
handlers is hit. So, for example, visiting:
go-count-urls.appspot.com/hello returns how many times the /hello
handler has been visited. You can use it as a trivial real-time tracker.
For example, I used it to verify that an email I sent to someone was actually
opened (and presumably read). I just picked a random URL path (like
go-count-urls.appspot.com/random-string-here) and created an html
email with an empty img tag pointing to it: <img
src="http://go-count-urls.appspot.com/random-string-here" width=0 height=0 />.
Every time the email client opens the email, it requires that URL and the hit
is recorded. I admit that this use is pretty lame, and that there are other
services doing this, but I needed a real-world problem to work on!
So here we go!
Setup your development environment
First of all, download and install the App Engine Go software development kit. Then create the following directory structure:
go-count-urls/
app.yaml
app/
counter.go
Show me the code!
The whole application is made of just one file [counter.go]6. Here it is, comments inline:
package counter
import (
"appengine"
"appengine/datastore"
"fmt"
"net/http"
"time"
)
// Object to store in Google's Datastore. Keeps track of how many times a
// URL was hit and when.
type Counter struct {
Path string
Count int
Timestamp time.Time
}
// Return a brand new Counter
func getEmptyCounter(path string) Counter {
return Counter{Path: path, Count: 0, Timestamp: time.Now()}
}
// Increment the counter for a URL. If it's the first time this URL is
// visited, create a brand new Counter before incrementing it.
// On error, return and empty counter and an error.
func inc(c appengine.Context, key *datastore.Key, path string) (Counter, error)
{
var x Counter
if err := datastore.Get(c, key, &x); err != nil && err !=
datastore.ErrNoSuchEntity {
return getEmptyCounter(path), err
}
// Increment it, and update the last modified time
x.Path = path
x.Count++
x.Timestamp = time.Now()
// Save the counter
if _, err := datastore.Put(c, key, &x); err != nil {
return getEmptyCounter(path), err
}
return x, nil
}
// This is the only handler. It just picks the paths, removed the leading
// slash and stores it in the Datastore. As a key in the Datastore, the URL
// itself is used.
func handle(w http.ResponseWriter, r *http.Request) {
key := r.URL.Path[1:]
if key == "" {
// Return 404 on the root handler (we might want a splash page here...)
http.NotFound(w, r)
return
} else if key == "favicon.ico" {
// We are not interested in tracking favicon.ico
w.WriteHeader(http.StatusNoContent)
return
}
c := appengine.NewContext(r)
// For how to use the Datastore see
https://developers.google.com/appengine/docs/go/datastore/overview
count, err := inc(c, datastore.NewKey(c, key, "singleton", 0, nil),
r.URL.Path)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
// Write something
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
fmt.Fprintf(w, "Path=%s, Count=%d, When=%s", count.Path, count.Count,
count.Timestamp)
}
// Initialize the application, binding URLS to handlers.
func init() {
http.HandleFunc("/", handle)
}
Try it out!
Launch the application using the SDK; from go-count-urls directory type:
$> $GAE_PATH/dev_appserver.py .
Now visit localhost:8080/hello. Refresh. Refresh again. And again…
Publish
Publishing the application on Google infrastructure is a matter of seconds:
$> $GAE_PATH/appcfg.py update .
You can visit it at: go-count-urls.appspot.com/hello. The code is available here: github.com/lbolla/go-count-urls.
Asynchronous programming can be tricky for beginners, therefore I think it's useful to iron some basic concepts to avoid common pitfalls. For an explanation about generic asynchronous programming, I recommend you one of the many resources online. I will focus solely on asynchronous programming in Tornado.
From Tornado's homepage:
FriendFeed's web server is a relatively simple, non-blocking web server written in Python. The FriendFeed application is written using a web framework that looks a bit like web.py or Google's webapp, but with additional tools and optimizations to take advantage of the non-blocking web server and tools. Tornado is an open source version of this web server and some of the tools we use most often at FriendFeed. The framework is distinct from most mainstream web server frameworks (and certainly most Python frameworks) because it is non-blocking and reasonably fast. Because it is non-blocking and uses epoll or kqueue, it can handle thousands of simultaneous standing connections, which means the framework is ideal for real-time web services. We built the web server specifically to handle FriendFeed's real-time features every active user of FriendFeed maintains an open connection to the FriendFeed servers. (For more information on scaling servers to support thousands of clients, see The C10K problem.)
The first step as a beginner is to figure out if you really need to go asynchronous. Asynchronous programming is more complicated that synchronous programming, because, as someone described, it does not fit human brain nicely.
You should use asynchronous programming when your application needs to monitor some resources and react to changes in their state. For example, a web server sitting idle until a request arrives through a socket is an ideal candidate. Or an application that has to execute tasks periodically or delay their execution after some time. The alternative is to use multiple threads (or processes) to control multiple tasks and this model becomes quickly complicated.
The second step is to figure out if you can go asynchronous. Unfortunately in Tornado, not all the tasks can be executed asynchronously.
Tornado is single threaded (in its common usage, although in supports multiple
threads in advanced configurations), therefore any “blocking” task will block
the whole server. This means that a blocking task will not allow the framework
to pick the next task waiting to be processed. The selection of tasks is done
by the IOLoop, which, as everything else, runs in the only available
thread.
For example, this is a wrong way of using IOLoop:
Note that blocking_call is called correctly, but, being
blocking (time.sleep blocks!), it will prevent the execution of the following
task (the second call to the same function). Only when the first call will end,
the second will be called by IOLoop. Therefore, the output in console is
sequential (“sleeping”, “awake!”, “sleeping”, “awake!”).
Compare the same
“algorithm”, but using an “asynchronous version” of time.sleep, i.e.
add_timeout:
In this case, the first
task will be called, it will print “sleeping” and then it will ask IOLoop to
schedule the execution of the rest of the routine after 1 second. IOLoop,
having the control again, will fire the second call the function, which will
print “sleeping” again and return control to IOLoop. After 1 second IOLoop
will carry on where he left with the first function and “awake” will be
printed. Finally, the second “awake” will be printed, too. So, the sequence of
prints will be: “sleeping”, “sleeping”, “awake!”, “awake!”. The two function
calls have been executed concurrently (not in parallel, though!).
So, I hear you asking, “how do I create functions that can be executed
asynchronously”? In Tornado, every function that has a “callback” argument can
be used with gen.engine.Task. Beware though: being able to use Task does
not make the execution asynchronous! There is no magic going on: the function
is simply scheduled to execution, executed and whatever is passed to callback
will become the return value of Task. See below:
Most beginners expect to be able to just write: Task(my_func), and
automagically execute my_func asynchronously. This is not how Tornado works.
This is how Go works! And this is my last remark:
In a function that is going to be used “asynchronously”, only asynchronous libraries should be used.
By this, I mean that blocking calls like time.sleep or
urllib2.urlopen or db.query will need to be substituted by their equivalent
asynchronous version. For example, IOLoop.add_timeout instead of
time.sleep, AsyncHTTPClient.fetch instead of urllib2.urlopen etc. For DB
queries, the situation is more complicated and specific asynchronous drivers to
talk to the DB are needed. For example: Motor for MongoDB.
For a non-designer, Ext JS is kind-of a blessing. It is a self-contained fully-fledged Javascript framework, with loads of fancy re-usable browser-compatible professionally-looking widgets. It's only lacking in documentation: finding your way through the API documentation is daunting at best.
So, I bought this, and while working my way through it, I decided to share some experiments. You can find them here. These are the ones I prefer:
This is another of those posts to not forget. If printing a PDF file with
lp prints a blank page with error messages like:
ERROR: configurationerror OFFENDING COMMAND: setpagedevice STACK: –nostringval– …
the problem is probably that your PDF has a certain page size (let's say letter) but your printer expects another (let's say A4).
Check your printer
settings and your PDf (with lpinfo pdffile) to verify. If this is the case,
print with this command instead:
lp -o fit-to-page pdffileToday I tried to benchmark 3 web servers that I've used recently:
In fact, these are only 2 web servers, because Yesod runs on top of Warp
and it's a fully fledged web framework, rather than a web server: but this was
also the intent of the benchmark, i.e. to measure how slower all its goodies
made Yesod with respect to Warp.
Tornado and Warp are obviously very different web servers (async vs. threaded, interpreted vs. compiled, etc.) but, who cares?
The benchmark is very simple: a single handler returning “Hello World”, very original. Obviously, this is hardly a real world example, but it can give indications even if only with “orders of magnitude” of approximation.
Nonetheless, the results were very interesting. First of all, here is the code.
Tornado
Warp
Yesod
And the results, obtained using httperf:
$> httperf --hog --client=0/1 --server=localhost --port=8080 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=100 --num-calls=100 --burst-length=20
| Tornado | 518 req/s |
|---|---|
| Warp | 10079 req/s |
| Yesod | 929 req/s |
| Yesod w/o session management | 7924 req/s |
Wait! What?! Yesod is 10 times slower than Warp!?
I asked an explanation to the Yesod developers and they tracked down the issue: the work of these guys is an example worth studying of how to benchmark and debug code! Anyway, it looks like the issue is that serializing timestamps is incredibly inefficient: I hope a patch will be ready soon! In the meantime, I strongly suggest you to disable session management from Yesod if you want high performance. (In the code shown, I've also disabled Hamlet, Yesod's templating system, but it turned out that it didn't make much difference: code using Hamlet is in gist.)
Overall, though, even on my crappy single-core old laptop, the result is amazing: Warp/Yesod is ~20 times faster than one of the fastest Python web servers.
If you are using Gerrit for code review and project management of git-based projects, you might find yourself manually adding the same bunch of reviewers to your patches every single time.
In the past, I alleviated the problem with a simple Javascript bookmarklet: add it to your browser and click it while watching the patch in Gerrit.
But there's a better method: do it
from command line, when pushing your local commits to Gerrit. Just add these
lines to your .git/config:
pushurl = ssh://user@gerrit:29418/project
push = HEAD:refs/for/master
receivepack = git receive-pack --reviewer reviewer1 --reviewer reviewer2
Now, when you want to push a review, just do: git push review and “reviewer1”
and “reviewer2” will be added to your patchset.
This is a vintage post to remind me how to install b43 drivers on Arch Linux for my “shiny” Belkin PCMCIA card (dated 2002…).
- First install the firmware extractor:
$> pacman -S b43-fwcutter. - Then install the firmware itself:
$> yaourt -S b43-firmware.
Check dmesg. You should see something like:
Broadcom 43xx driver loaded [ Features: PMNLS ]
and lsmod | grep b43:
b43 330774 0 bcma 19281 1 b43 mac80211 341044 1 b43 cfg80211 147429 2 b43,mac80211 ssb 42167 2 b43,b44 pcmcia 31182 2 b43,ssb mmc_core 72742 2 b43,ssb
Finally, try to connect:
$> pacman -S wifi-select
$> wifi-selectGenerators (PEP 255 “Simple Generators”) and Coroutines (PEP 342 “Coroutines via Enhanced Generators”) are the cleanest way I've come across so far to implement the concept of a “pipeline” in Python.
First approximation
A pipeline is made of:
- a Producer, that generates data;
- many _Stage_s, that receive data from the previous stage and send it to the next;
- a Consumer, that receives data from the last stage.
The producer is a coroutine that only send_s data, generated internally from some initial state. _Stage_s are coroutines that both receive and send messages. The _consumer only receives data. Chaining is done in function pipeline: each argument but the last is instantiated with an instance of the next stage. The full pipeline is started by issuing a next (or send(None)) to the Producer.
In the following example, a stream of integers is produced and pushed down the pipeline: each stage adds 1 and finally the result is printed in the consumer.
Wrapping it up
A pattern emerges, so we'd better wrap it up in a class. Moreover, let's split the “architecture” of the pipeline from the behavior of each stage.
More useful example
As a more interesting application, here is how to use a pipeline to implement a simple crawler, to download links from news.ycombinator.com/ and find all the posts where the word “Python” is mentioned.
Cleaning things up
Things are still far from clean and bulletproof. One step in the right direction is to follow the suggestions found in David Beazley's presentation on coroutines.
The previous examples is by no means “production ready”, but maybe someone will find some good idea to apply to real world problems.
Recent tracks
-
The Vespertine Park by {'mbid': 'af88ef96-ba9c-441c-9291-ac4389cd1464', '#text': 'Gavin Bryars'}32 hours ago
-
De profundis by {'mbid': '', '#text': 'Sofia Gubaidulina'}32 hours ago
-
Theme Of The Uprooting by {'mbid': '', '#text': 'Eleni Karaindrou'}32 hours ago
-
Hibiki-Hana-Ma by {'mbid': '', '#text': 'Iannis Xenakis'}32 hours ago
-
Veni creator spiritus by {'mbid': 'c8db3d2b-19d8-4dc7-b2cb-deea37aa274a', '#text': 'The Hilliard Ensemble'}32 hours ago
-
Fratres by {'mbid': 'ae0b2424-d4c5-4c54-82ac-fe3be5453270', '#text': u'Arvo P\xe4rt'}32 hours ago
-
VII. Galamb borong by {'mbid': '074113e2-d052-4f6d-aeb9-0d2ab2ca0adc', '#text': u'Gy\xf6rgy Ligeti'}32 hours ago
-
Lento - Cantabile-Semplice by {'mbid': '', '#text': u'Henryk G\xf3recki'}39 hours ago
-
Summa by {'mbid': 'ae0b2424-d4c5-4c54-82ac-fe3be5453270', '#text': u'Arvo P\xe4rt'}39 hours ago
-
Diverge by {'mbid': 'e60a4481-472a-42cf-a84d-6a9419e4e5e3', '#text': 'Peter Broderick'}2 days ago
Top artists
Top tracks
-
31 plays
-
17 plays
-
15 plays
-
14 plays
-
14 plays
-
14 plays
-
13 plays
-
13 plays
-
13 plays
-
13 plays
-
13 plays
-
12 plays
-
12 plays
-
12 plays
-
12 plays
-
12 plays
-
11 plays
-
11 plays
-
11 plays
-
11 plays
-
11 plays
-
11 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
10 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
9 plays
-
8 plays
Profile
Summary
For more than 5 years I worked on the implementation of numerical algorithms for electromagnetic problems, with a strong focus on efficiency for computationally intensive simulations.
5+ years ago, I shifted my attention to the World Wide Web and I've been working on high performance applications since 2007. My main interests are on High Availability / High Throughput services and Artificial Intelligence.
I love programming and I am always studying new languages and techniques (Go, Haskell, and related libraries) to learn new approaches to problem solving.
Specialties: computer programming, problem solving, numerical modelling.
Experience
- May 2012 - PresentSenior Software Architect / RAID Research Services LLPDesign and implementation of high performance data analysis tools. Implementation of a very efficient in-memory database to query and analyse tabular data. Marshaling of incoming data from different sources (emails, xml/json feeds, online folders), processing of unstructured information using machine-learning techniques. Automatic deployment and scaling on AWS.
- Mar 2012 - PresentDirector / Networkscale Ltd.
- Apr 2012 - PresentSenior Software Engineer / ArtirixImplementation of massively parallel web search crawlers using Python and Twisted.
- Mar 2012 - PresentSoftware developer / The Social Gaming CompanyWorked as contractor to develop back-end web servers in Python, Flask and MongoDB.
- Mar 2011 - PresentSenior Python Developer / Zugo Services LtdPython web applications development, utilising Tornado, nginx, memcached, sphinx, MySQL, MongoDB, fabric and git. Focus on scalability using AWS solutions. Development of a massively distributed user tracking engine based on Disco and MongoDB. Users's behaviour modelling.
- Aug 2008 - PresentSoftware Developer / Geneity LimitedHigh performance e-gaming web application developer. Programming languages used: Python (2.5, 2.6), C, PL/SQL (Oracle 10g-11) for the back-end; HTML/Javascript for the front-end.; with REST and SOAP web services.
- Jun 2005 - PresentOptical Designer / Pirelli LabsR&D in Photonic Integrated Circuits mainly based on Silicon-on-Insulator technology SOI-PICs. Responsible for the design of optical devices for metro and access networks, including optical filters, fiber-to-chip couplers, polarization splitters and rotators. Broad experience in numerical computing and mathematical modeling of physical problems. Patenting.
- Mar 2005 - PresentEngineer / TelesystemHardware and software consultant for digital television broadcasting. JAVA programming of DVB applications.
- Apr 2004 - PresentSoftware Engineer / Photon DesignVisiting PhD student and software programmer of numerical methods for electromagnetism.
- May 2001 - PresentTechnical Support / Progetto MantegnaTechnical assistance in the virtual reconstruction of Mantegna's paintings in the
Education
-
2003 - 2005University of UdinePh.D in Numerical Methods
-
1996 - 2001Università degli Studi di PadovaLaurea in Engineering in Telecommunication, Electronics, Computer Science.
-
1991 - 1996Liceo Scientifico Statale Fracastoro
- University of PaduaTelecommunication Engineering