Skip to main content.

Archives

This is the archive for April 2007

Wednesday, April 11, 2007

So I'm in the bowels of the institutional-level financial software application, my small part in the production of which currently pays my mortgage, and I find that it simply blows itself up when it gets in a bad mood. It doesn't get into a bad mood often - in fact I have to intentionally stress it out, tease it mercilessly and egg it on in order to put it in a foul enough mood to snap back at me. This consists mostly of repeatedly telling it that it's a good for nothing lazy bum who will never amount to anything. Or something like that.

Turns out, that even with this relentless badgering, the chance of it getting into a sufficient funk is about one in a million, literally. Which means I can make it reliably happen at will in about 3 seconds. Literally.

I'm working in the bowels of this app (which is probably enough in itself to make anyone cranky). Its the engine of the whole thing, a framework that simply everything else depends on, so it's pretty important. This is a highly multi-threaded piece of work, which means, in normal-people talk, that lots and lots of parts are all whizzing around at the same time, rather than working in a nice polite single file the way computers are meant to operate.

The normal way to find a bug is to isolate the suspected parts until only one part is left and it still makes things blow up. Then that's the aprt to fix or replace. The problem is that with all this multi-threadedness, all those parts whizzing around each other, that it is usually the places where the parts meet that causes the problems. One gear can spin by itself forever without grease, but mesh two of them together, and sparks fly. This makes the divide and conquer approach tricky at best.

So, anyways, now, I've got the engine laying around in pieces, its vital parts spread out on the garage floor among little globs of grease and puddles of oil and those damn extra screws that I can't remember where they came from. And I finally see the problem. It's like a little grain of sand that's been cutting grooves in the crankshaft.

You see, there's a thing we'll call "x" (names have been changed to protect the innocent, and your eyes), and it has to be smaller than "y" in order to fit into the cramped little piece of the Linux operating system that I'm about to stuff it into. So I write:

if (x > y) ...

That says that if x is bigger than y, do some stuff - "stuff" in this case being to break x into smaller pieces.

But these damn computers are far, far pickier than that. Really, it's like trying to guide a three year old in doing brain surgery via a game of Simon Says. See, x has to be smaller than y, so telling it to do something if x is bigger than y is not good enough. What if x is the same size as y? Hmm? Simon says BOOM.

The line shoud read:

if (x >= y) ...

Oy vey! One damn character, and it took a day and a half to find it. And I wrote that line myself not a week ago. The initial problem was that the thing would blow up if the value of x was just any old kind of wrong. All my "fix" did was to make it blow up only when the value of x was one very specific kind of wrong, which only made it harder to find. The worst bit is that this is part of a fairly standard way of normalizing data (which means basically making it the right size to stuff into where it needs to go), and I should have known in my sleep that that extra little equal sign needed to be there.

Now I've got to put all the parts back together - which is why I'm writing about it instead of doing it... I swear, if there's any screws left over, I'm gonna start drinking again.