Saturday, November 06, 2004

Processes

I work for a company that maintains large mainframes. We make lots of changes to them on a weekly basis. Over the last year or so, the number of problems we've had has gone down substantially. Usually, those problems have come out of nowhere, and we couldn't have anticipated them. Sometimes, we could have, had we known the right question to ask, but we didn't because you don't ask the question until the problem has occurred. I suppose we could have a list of every possible question - - but the list would grow weekly, and pretty quickly we'd be doing nothing for the first three days of the week but running the questions.

So now the people we work for have noticed this reduction, and they're asking: what'd we do to improve?

What they want to hear is that we decisively analyzed our processes, identified and isolated key elements which were prone to failure, and installed redundant and failover components such that failure in the primary component did not compromise the integrity of the system. They want the equivilent of Evidence-Based Medicine.

The truthful answer is, sometimes you're lucky, and sometimes you're not. We try, and we're experienced, but lately, we've just been lucky.

No comments: