Ignoring Build Problems

I ran across this blog post that is probably typical of many people who are managing software projects.

Musings of a Software Development Manager » Blog Archive » CruiseControl Warnings
I get about 48 email messages from Cruisecontrol each day for one of our projects. This is not something I’m proud of since this situation has existed for at least 4 weeks now, we’ve had a broken build. The problem stems from some nasty functional tests that no one wants to investigate and we’ve sort of let our process slip.

There is a simple solution to this. Turn off the tests that are failing. People’s first reaction to this is “Oh no, we can’t turn off the tests! They indicate that something is wrong. Eventually we’ll have time to fix it.”

If you are actually going to fix it go ahead, but if something has been broken for more than a week, chances are no one is going to fix it any time soon. You should turn it off so it starts building without errors again.

Why is this better? If your team gets 10 emails each day saying that something is broken, they are going to ignore it. No one is really responsible for all of the problems, so no individual really works on fixing it. However, if the build is working correctly and someone checks in code that breaks a unit test and everyone gets and email, that person is probably going to try to fix it because it shows that he is responsible for the problem.

Think of it another way. Lets say I have 3 smoke alarms, 1 gas alarm, 1 CO2 alarm, and a flooded basement alarm in my house and they all sound pretty much the same. Now lets say that the flooded basement alarm goes off and I decide that it isn’t important enough to fix the cause…. So I just let the alarm go off. How likely do you think I am to notice if another alarm goes off once I get used to ignoring the first alarm.

If I’m not going to fix the problem, the best thing I can do is disable the flooded basement alarm until I have a chance to fix it. After a week of ignoring the alarm and nothing bad happening, I’m not suddenly notice it and decide I should do something about it.

One of the first things I did when I started at my current job, is go through and renamed every test that failed our automatic build process as “pending”. By the time the test would run, I had disabled about 2/3 of the tests. Since they were failing we ignored them anyway, so marking them as pending didn’t change anything. Before they were turned off, it would have been impossible to notice if one of the tests that were previously working broke because of a change.

Over time we’ve turned most of the pending tests back on one at a time as we’ve had more time to fix the code or fix the test.

When your tests fail, it should be unusual. I setup our builds to break if any test fails. I’ve got a lava lamp above my cubicle and everyone in the company know what it means. If something breaks people start asking the developers about it until it gets fixed.

About Mark Shead

2 Replies to “Ignoring Build Problems”

Interestingly enough I just turned off build for this project. It was a 3 year effort that finally got killed by the business today. So you’re right the failing tests didn’t matter, but they were the sign of a larger problem. Essentially functional tests are a lot more difficult to maintain and our system was very difficult to unit test because much of the code was buried in a Enterprise rules engine.

So you have a lava lamp? I haven’t gone there yet because of fire code issues and HR. Not exactly a startup environment.

The lava lamps I got were and off brand (HotRock?) from Wal-Mart and they aren’t really that hot. You can hold your hand on them without problems, so I don’t think they are any more of a fire hazard than a regular desk lamp.

I have a green and red lamp. They both come on during a build. After the build it leaves the red one one if it failed and the green one stays on if it passed.

About Mark Shead

2 Replies to “Ignoring Build Problems”

Leave a Reply to Ed Gibbs Cancel reply