Some Notes on Serializing Objects in Python

I was playing with .NET serialization at work the other day and got curious about how Python does it. Serialization is a little confusing in the .NET world, but it's not an insurmountable task to grasp it. For one, there is more than a single implementation of serialization within the .NET base class library, or namely, System.Xml.Serialization and System.Runtime.Serialization, which respectively implement XML and binary serialization. The techniques used in each implementation are also disparate, having the binary serialization make heavy use of class attributes, while the XML implementation uses a method call to XmlSerializer.Serialize.

The Python implementation of serialization is much simpler, concise and easier to understand. It is implemented as a Standard Library module called Pickle. The actions to serialize and deserialize classes are implemented as simple function class and there is no need to put attributes on classes. Let's see how it works.

First import the pickle module and then declare a class called Person as in the code below:

import pickle

class Person(object):
    def __init__(self, first_name=None, last_name=None, age=None):
        self.first_name = first_name
        self.last_name = last_name
        self.age = age

Now create two instances of the Person class above and place them in alist.

p1 = Person('Jane', 'Doe', 26)
p2 = Person('John', 'Hancock', 33)
people = []
people.append(p1)
people.append(p2)

Next serialize the list to a file and then read it back into a new list. First serialize the list:

fname = 'peoplelist.dat'
f1 = open(fname, 'wb')
pickle.dump(people, f1)
f1.close()

Finally, read the contents of the serialized file back into a new list and print out the name and age of each person:

f2 = open(fname)
new_people = pickle.load(f2)
for person in new_people:
    print '%s %s is %d years old.' % (person.first_name, person.last_name, person.age)

That's it... Serialization in Python is just too easy!

Some thoughts on Python and Unladen Swallow

Between Twitter and the blogosphere, I have been hearing a lot about Unladen Swallow lately. For those who don't know, Unladen Swallow is an experimental branch of Python that aims at improving performance of the language. In their own words, Unladen Swallow is "An optimization branch of CPython, intended to be fully compatible and significantly faster."

I wanted to find out more and started reading their Project Plan page on Google Code. I think their goals are commendable, as you may see for yourself, and number 5 below explains why I added bold face the word branch above:

  • Produce a version of Python at least 5x faster than CPython.
  • Python application performance should be stable.
  • Maintain source-level compatibility with CPython applications.
  • Maintain source-level compatibility with CPython extension modules.
  • We do not want to maintain a Python implementation forever; we view our work as a branch, not a fork.

This is all fine and dandy, and the list above has made the rounds on the blogs. But what does it all mean? What follows is my impressions of the most important points that the Unladen Swallow branch is addressing.

A New Virtual Machine
The goal is to eventually replace the Python 2.6.1 virtual machine with a just-in-time compiler built for the LLVM. The rest of the Python runtime would be left untouched. The key benefits of this approach are that is a register-based machine and those perform better than stack machines, which is what the current Python VM is implemented as.

The internals of the implementation will assume at the outset that the machine has multiple cores. For instance, very aggressive optimization of code is assigned to a secondary cores while compilation occurs on other cores. The garbage collector for Unladen Swallow will also be implemented to utilize multiple cores.

The Global Interpreter Lock

While Python has had threading for a while, it is not a true multi-threading implementation. This is because of the existence of the GIL. Dave Beazley has written about the GIL and how it works several times and you should read his "The Python GIL Visualized" article to find out more about why the GIL keeps Python from having a real multi-threaded runtime.

I bring up the GIL here because the folks working on Unladen Swallow plan on removing the GIL from Python, although they are not very optimistic about it. And even if they are not able to remove the GIL completely there may be other optimizations in the garbage collector reference counting mechanism that may yield some improvements in the threading area.

Anyway, these are the two major points I take away from the Unladen Swallow plan of record. These changes seem pretty big to me and a major risk of doing this kind of work is that your changes are rejected by the community. However, the Unladen Swallow team is sponsored by Google who also employs Guido, so I'm sure that those guys are talking amongst themselves.

Thanks for reading this, go read the project plan and let me know what you think.

Some thoughts on content distribution

The other day I was looking through the Zune Marketplace and I found a listing for This American Life, which I promptly subscribed to. This American Life is simply the best radio show ever made.

But that's beside the point. Listed in both the Zune Marketplace and the iTunes store, one can easily find several NPR programs. The likes of Science Friday, Talk of the Nation, and All Things Considered are listed among several compilation (or aggregation) streams such as the Hourly News Summary.

In addition to their presence in the Zune and iTunes stores, NPR is also streaming live from their website. You can even setup the online player to manage a customized playlist for you. This is one of the most advanced approaches to content distribution that any major organization is doing today.

I think that NPR must have resisted this tooth-and-nail for a while, but the writing has been on the wall and they were able to read it before it's too late. You see, the way that NPR works (or worked?) is basically a franchise model, where the local stations put on fund drives to pay for their programming subscription dues to headquarters in Washington, DC every year. By going online the way that NPR is doing it, they are basically disintermediating the local affiliate stations and, consequently, shutting down that revenue stream.

I have to commend the people in charge at NPR for promoting this change. For one, it takes balls to do it. There are several organizations that would never compromise a revenue stream in this way. Someone up high in the NPR echelons must have said "sorry stations, but this is the future, it's gonna happen, so we're gonna do it."

I think that NPR is in the forefront of a fundamental change in how media and content is distributed.

This is a perfect example of a business (a non-profit business, but still a business) who's facing a fundamental change in the way that it operates and they are fully engaged and embracing this change. I wish more businesses had the foresight and intestinal fortitude to do what NPR is doing. Yes, newspapers and record companies, I am looking at you.

All in all, I am really glad to be living in a time where I get to see all of these technologies play out in the marketplace.

Am I on the right track? Let me know what you think below.

AfterThought - Visual Studio 2008 Color Themes

I've seen some great color themes for Visual Studio out there, but none of them appealed to me all that much, so I designed my own themes. I call them AfterThought Dark and AfterThought Light. The goal was to achieve something that was easy on the eyes. Something that provided a good degree of contrast and readability without being too saturated. Eventually, I will also have Emacs color themes made from these. Anyway, I included below a couple of screenshots and you can download them at the github.

Figure 1 - AfterThought Dark

Figure 2 - AfterThought Light

New Tumblelog

I started a new tumblelog at http://standardout.tumblr.com/ for short tidbits of information that I need to keep track of. My thinking, which is not yet concrete, is that I need a blog for longer pieces (this one) and another just for notes... Still thinking about the logistics and may shuffle the domains around a bit for optimum output.

A Great Use For Lambdas in C#

If you are writing a console application in C# that has to write a lot of output from different places in your code, then that's is one situation that illustrates a great use for lambda expression.

Consider the following code:

static Action<object> WL = obj => Console.WriteLine(obj);

What's happening in the above line of code is that you declared a System.Action<T> delegate that expects a T, or an object in our case, as a parameters and returns a void. You then assigned a lambda expression to that delegate in the format of obj => Console.WriteLine(obj). Now, every time you need to write out some text to the console, all you have to do is call WL("some text"), and that saves you from writing Console.WriteLine("some text") all the time.

If you have other examples of the uses for lambdas or find anything wrong with this post, I would love to hear it.

Internationalization and Startups

Patrick Collison (of Auctomatic fame) wrote up a post about what surprised him the most about doing his own startup. Here's a small pearl of wisdom from that post that resonated with me:

Internationalization is an underexploited axis—people try to expand on the x-y plane, and ignore this third dimension. We grew our iPhone app revenue by over 200% through internationalization. The biggest competitve advantage we ever had with Auctomatic was supporting the obscure international eBay sites that the big US players ignored. I’m generalizing from pretty limited experience, but if I were a floundering start-up trying to get to cashflow positive, internationalizing is probably the first trick I’d try.

This is such a no brainer, yet you see very few startups courting international markets. Sure, there may be certain local regulations to abide by in certain markets, but most of the social stuff startups put out there today will not raise the eyebrows of most regulatory agencies in the western hemisphere. Case in point is Orkut... Have you heard of them? They are huge in Brazil.

The other issue that comes to mind regarding internationalization is payment processing. If you accept credit cards for your services, this is generally a non-issue as rates are exchanged automatically. You will probably have to deal more with the domestic credit card companies trying to minimize fraud than you will have to deal with any particular issues from international customers. Also, if you are selling a physical good, then shipping internationally is not really that different than shipping domestically.

Anyway, I thought it was a great insight that got me to do some thinking of my own...

Nickel's Worth

Whereas Europeans generally pronounce my name the right way ('Nick-louse Veert'), Americans invariably mangle it into 'Nickel's Worth.' This is to say that Europeans call me by name, but Americans call me by value.

Niklaus Wirth

Emacs: Cannot open termcap database file

I recently created a virtual machine running Ubuntu Server 9.04. I downloaded Emacs from CVS and proceeded to configure, build and install it from source in that little VM of mine. Then when I went to run Emacs, I got the error Cannot open termcap database file.

You may be interested to know that this is actually a common problem in a No X of Ubuntu Server. There is an easy fix for this problem: install the package libncurses5-dev which will provide the file termcap.h. Then compile Emacs again and everything should work normally with it as the default editor in your server.