Working on redundant load elimination for my dissertation, I wrote a loop:
for statement in statements: if someCond: ... lcl = value1 else: ... lcl = value2 process(lcl)
That is, that is the code I thought I wrote. As it turns out, I actually forgot the final assignment and in fact wrote the following:
for statement in statements: if someCond: ... lcl = value1 else: ... # lcl not assigned to process(lcl)
Now this is a fairly straight forward bug to find and fix, right? As it turns out, this simple typo created a fairly subtle and hard-to-trace bug.
In the best case,
someCond would evaluate to false on the first iteration of the loop, and
process(lcl) would throw an exception because
lcl was undefined. In practice, however,
someCond almost always evaluated to true. As such, no exception was noted when the code was run. Instead, when
someCond would evaluate to false,
lcl was pulled from the previous iteration of the loop. This resulted in a subtle corruption of the output. Worst of all, the nature of the corruption would depend on the order that the statements were processed. All in all, the result was a subtle, non-deterministic bug. The root of this bug was that
lcl was not supposed to be a loop-carried dependency, but a single typo silently turned it into one.
On one hand, if Python supported variable scoping, this bug would have been easier to identify. On the other hand, forcing explicit declaration of variables would degrade the “rapid development” nature of Python. In any case, scoped variables are only treating the symptoms of a bigger problem. The big picture problem is that loop-carried dependencies are really, really ugly. Figuring out what variables are loop-carried dependencies, and figuring out how they behave always requires significantly more thought than straight-line code. It is easy to forget this because it is a familiar ugliness.
There might be some virtue in trying to mainstream the stream programming model of “stateless” loops. Python has taken a step in the right direction with list/dictionary/set comprehensions, although they are somewhat limited. An interesting compromise might make loops naturally stateless, but allow loop-carried dependencies can be explicitly declared.
Without explicit language support, two workarounds suggest themselves. First, document all loop carried dependencies. Understanding them out requires more effort than straight-line code, which indicates they would benefit from comments, even if they seem simple at first glance. Second, any given function should only have a small number of local variables (7? 10?). This is a variation of the rule of thumb that functions should never have more than X lines, where X varies with the zealotry of the speaker and the language that they use. Lines of code is a language-dependent metric, whereas number of local variables seems to be a fairly general measure of “namespace clutter.” The more names in a local namespace, the greater chance that a programmer will forget something and make a mistake.
Defensive programming can help reduce the peril of loop-carried dependencies. After I found the bug, I refactored the code that generated
lcl into its own function.
def getValue(...): if someCond: ... lcl = value1 else: ... lcl = value2 return lcl for statement in statements: lcl = getValue(...) process(lcl)
Not only does this make each function smaller and easier to understand, but it limits the scope of
lcl. In light of this experience, I would propose that variable assignments inside loops should either be painfully clear, or refactored until they are.