October 26th, 2009

coyote Phil

A lucky bug, and evolution

Last Friday, I finally packaged up the quarterly release of JCVI's automatic prokaryote functional annotation pipeline, which looks at genes found in newly-sequenced genomes and guesses what they do; and distributed it to the other 3 sequencing centers for the Human Microbiome Project. As always happens when I release a new version, several minutes afterwards, I discovered a major bug that had been hiding in the code for years.

The program takes each new gene and runs BLAST against a database of known genes, and produces a list of identifiers of genes resembling the new genes. It then takes these identifiers, and calls a program to look up all of the synonyms for these identifiers used in all the different gene databases. This lookup step takes 90% of the program's runtime.

I found that the database lookup usually failed, because most identifiers didn't match the regular expression used in the lookup program to retrieve identifiers. Nobody had noticed this, because nobody had checked the database log files.

I fixed the program so that the database lookup would always work correctly, and re-ran the program. It produced exactly the same output as before, but took five times as long to run.

So instead of going dancing, as I'd planned, I spent Friday evening figuring out why this happened. It turned out that the class of identifiers that failed to match the regular expression were a subset of the set of identifiers for which the lookup didn't have to be done, because the previously-cached results would give the same results. Once I realized this, I was able to speed the program up more, by excluding more such identifiers, and avoiding the overhead of about a million subroutine calls that would eventually fail when the regular expression failed to match.

A bug in a program is like a mutation. Bugs in a computer program are almost always bad. But this was a beneficial bug, which had no effect other than to make the program run much faster than it had been designed to. I was delighted to see this proof of the central non-intuitive idea of evolution: A random change can sometimes be beneficial.