Friday, June 19, 2009

Loving your self.

A while ago I was giving a presentation that included a quick introduction to Python. An audience member asked a question that caught me off guard: isn't explicitly prefixing “self.” to each member access annoying? Looking back, when I started programing Python, yes it was annoying. Nowadays, however, I don't mind it at all, and in fact prefer it compared to languages with implicit member accesses. At the time I couldn't rationalize why that was, however, which indicated that I needed to give the issue further consideration. In a language designed for rapid development, why should anyone put up with typing five extra characters over and over? This question lingered in my mind for a while, but things finally clicked in place when I ran into a bug while programming Actionscript.


The Bug

public function resize(widt:Number, height:Number):void
{
     this.width  = width;
     this.height = height;
}

In this case, I misspelled the name of a function parameter. The parameter should have masked the instance variable, but due to the typo in the declaration, it did not. This resulted in a silent bug, as references to the parameter instead resolved to the instance variable. In effect, the first assignment became a null operation. It's obvious that masking instance variables is a dangerous thing to do in language with implicit references to members.

A reasonable rule of thumb might be to never mask instance variables, but this results in some strange names for locals: _width, w, width2, etc. Creating strange names, particularly if they're similar to existing names, creates a typo hazard like in the previous example. Did you mean to refer to width or _width? Can you notice the difference at a glance?

The Introspection

In this case, it seems that explicit is better than implicit. Prefixing each member access with “self.” helps separate members from other types of references. Every root identifier in a Python function is either a local, or failing that, a global. There's minimal magic behind the scenes in how names are resolved, making the code easier to understand. Easier to understand means less bugs.

The problem with the C++/Java/Actionscript/etc. Way of doing things is that too many things are crammed into the same namespace. A given identifier could reference a local, global, instance variable, or method. Programmers have struggled with this problem for a long time, and have come up with solutions such as prefixing names with characters indicating where they come from (m_, g_, etc.) These types of solutions tend to make code real ugly, real fast. The solutions are addressing a real problem – but they create a new problem while they are at it.

Python simply moves members out of the locals/globals namespace. Bugs arising from masking globals with locals, or masking one member with another are bad enough, let alone the dimensions of pain opened up by merging the namespaces. It is ironic that a dynamically typed language, in some ways, is more explicit than a statically typed language.

No comments:

Post a Comment