May 272011

I was debugging a problem in a codebase I’m none too familiar with this evening and decided it was high time I learned how to generate a callgraph in python. A very brief Google search landed me here, and a commenter had exactly what I needed:

+                import inspect
+                bb.plain(" << ".join([i[3] for i in inspect.stack()[1:-4]]))

Which generates a very simple call graph:

parse << parse_file << worker << run << _bootstrap << __init__ << start << __init__ << Pool << start << __init__ << updateCache << runAsyncCommand << runCommands << idle_commands << waitEvent << main

Awesome module, thanks for the blog, and thanks for the comment!

Aug 202010

Red Black trees are a critical data-structure in the Linux kernel. I’ve often wondered what made them unique to other trees, but ignored the impulse to dive into it much beyond reading the excellent Wikipedia article on red black trees.

An rbtree achieves O(log n) time complexity for search, insert, and remove. The key properties of an rbtree are as follows:

  1. A node is either red or black.
  2. The root is black. (This rule is used in some definitions and not others. Since the root can always be changed from red to black but not necessarily vice-versa this rule has little effect on analysis.)
  3. All leaves are black.
  4. Both children of every red node are black.
  5. Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.

I finally manned-up and decided to write a sample red black tree in python. Despite having covered binary trees ad nauseam in college, I was surprised how challenging it was to write a completely functional red black tree. After a few nights of “free time” dedicated to the project, I finally have something to show for it.

$ ./ 
***** Populating Red Black Tree with 1000 Nodes *****
***** Test Insert Complexity O(log N) *****
***** Test In-Order Traversal *****
***** Test Search Complexity O(log N)*****
***** Test Remove Functionality *****

The source comes with a built-in self test that inserts the values 1-1000 in random order, locates them all, then removes them in random order. It verifies the 5 properties of the rbtree at each step, and prints the results.

While I am glad to have done it, I am truly embarrassed at how long it took me to complete. On the bright side, the principles I had to dust off to get this done are now painfully fresh in my head. If you’d like to see the source, it’s available here:

Lastly, there is a clever interactive demo here (requires java):
Red Black Tree Demonstration

Next up… python metaclasses, and why Guido is an evil bastard.

Aug 182010

I’ve been twice bitten by this subtlety of the python __cmp__() method, I thought I’d try to save you the same pain (and perhaps by writing it down I won’t repeat it… again).

When dealing with objects, it’s often useful to be able to compare these objects:

if mySwartz > yourSwartz:
    print "Suck it."

An obvious (and wrong) __cmp__() method would be:

def __cmp__(mine, yours):
    if mine.val < yours.val:
        return -1
    if mine.val > yours.val:
        return 1
    return 0

Looks good right? Well, what if some yahoo tries to compare two totally disimilar things, i.e:

if mySwartz > yourForce:
    print "Neener Neener."

Such a comparison clearly makes no sense, so let’s update __cmp__() to check for similar data types:

def __cmp__(mine, yours):
    if not isinstance(yours, Swartz):
        return -1
    if mine.val < yours.val:
        return -1
    if mine.val > yours.val:
        return 1
    return 0

Good right? WRONG. Consider the following ridiculous scenario:

mySwartz.val == yourSwartz.val

While I’m sure no such thing could happen, let’s imagine for the sake of argument that two objects might be compared on an internal value that just might not be unique across all instances of the object. What would happen if you had to retrieve your object from a container?

storage = []
# time passes
for sw in storage:
    if sw == mySwartz:
        print "I found my Swartz!"
        print "...wait... this is your Swartz... gah!"

Yeah yeah, you could rewrite this to yada yada, that’s not the point. The point is while the values of the objects are the same, they are not the same objects and python uses the __cmp__() method for both less than, greater than, equality, AND identify comparison if you’re used to thinking of your objects as pointers!

If you’re faced with a situation where you want to be able to easily sort objects, but still need to be able to identify them uniquely when they share an internal value, augment your __cmp__() routine to check for id as well:

def __cmp__(mine, yours):
    if not isinstance(yours, Swartz):
        return -1
    if mine.val < yours.val:
        return -1
    if mine.val > yours.val:
        return 1
    if id(mine) < id(yours):
        return -1
    if id(mine) > id(yours):
        return 1
    return 0

If you’re a python guru and think I’m missing something, please share, otherwise:

if mySwartz > yourSwartz:
    print "I see my Swartz is bigger than yours..."

Jun 172010

I use rdiff-backup to keep a few months worth of daily backups for my home systems (and those of my parents for that matter). The ability to recover any version of a file is great – although the process still requires a geek (me).

$ rdiff-backup --restore-as-of "7D" user@backupserver::/path/to/backup/file

Wouldn’t it be great if you could just mount the backup repository and browse by path or date and then just copy the desired version? Enter BackupFS, a fuse filesystem implemented with rdiff-backup.

I’ve only just started digging into this, and rdiff-backup’s python packages were not intended to be used as libraries (not with all the code buried in rdiff_backup.Main and all the global module variables floating around. Still, I was able to get a server test and a listing of the repositories root increments by using the python modules (and not just making multiple subprocess() calls).

I have a glorified version of the example hello world fuse filesystem able to mount and list a few meta-directories:

dvhart@vin:backupfs.git$ ./ mnt && (tree mnt; fusermount -u mnt)
Testing server started by:  ssh -C katara rdiff-backup --server
Server OK
|-- By Date
|   `-- increments.2010-01-07T22:11:42-08:00.dir
|-- By Path
`-- hello

3 directories, 1 file
Fatal Error: Lost connection to the remote system

I still have some basic research to do in order to understand how to operate within the rdiff-backup packages (API isn’t quite the right term ;-). After that, it’ll be on to a more formal design and then some nicer code.

Sep 102007

We’ve been struggling with some pretty nasty bugs relating to predictable runtime. The logs the testcases produced are enormous, and we were struggling to visualize what was actually going on. I spent some time whipping up a visualization tool using python, numpy, scipy, and pylab. I haven’t used scipy since my Signals and Systems course at the university, it felt great to dust it off and make use of it again. With only a meager understanding of signal theory and statistical analysis you can easily create scatter and line plots, histograms, and even fast fourier transforms. One of the most useful things about the pylab plotting interface is that it is interactive. You can zoom in and out and pan the original plot, increasing the amount of discernable information. Kudos to the python guys for creating such an excellent language for rapid tool development.

Update (May 27, 2011): Source available here: