Category Archives: Fail fast

Failing towards excellence

“Think big, act small, fail fast; learn rapidly” is the slogan for Lean Software Development. In order to make that happen we need to create as many short feedback loops as possible. One way to do this is to implement a stop and fix process in your team. A proper designed stop and fix process will transform a failure into continuous improvement. Discovering improvement possibilities shouldn’t be limited to retrospectives or when managers are yelling due to some crisis. Instead let us build with quality, fixing errors where and when they are happening, preventing them from happen again and not propagating them further down the line. It is also important that we lower the threshold for when to stop and fix. If we are only stopping the line when there are severe crises, we will not gain the benefits from this relentless quality and process improving strategy.

And the alternative? Well if we don’t fix problems right a way they will pile up and you will sooner or later have a crisis on your hands. You will get an uneven workflow that will lead to even more defects. The people will be overloaded and overloaded people does not make the best decisions. People who don’t make the best decisions creates defects that… I  have seen this first hand and it is not pretty.

What are the prerequisites for doing stop and fix in a successful way?

  • Work in small batches. Having a lot of work in process makes it hard find anomalies. Defects hides in piles of work. In order to stop and fix we need to be able to spot the failures. That’s one of the reasons that limit work in process is so important and rewarding.
  • An established process for the day to day work is needed. It is hard to stop and fix in a controlled way if the normal situation is chaos or Laissez faire.
  • Define a process for when to stop the line. It should never be a discussion if we have met the criteria. Create a checklist, if all the boxes are ticked you can pull the cord and stop the line.
  • A defined process for what to do when the line is stopped.
  • A culture where it is ok to make mistakes. We must make clear that failure is ok as long as we are learning from the failures. Making the same mistake twice though, is kind of unnecessary and stupid.
  • There will probably be people that will ask you to skip this process just to get some short term gain, just this time… People who are more focused on output than outcome doesn’t value uncompromising quality work. You must have enough discipline to resist that. It is better to deliver fewer things with good quality than a lot of things with bad quality.

 

So what should we do when we have entered stop and fix mode?

First of all everything does not have to stop. Toyota (the creators of TPS, aka Lean) does not stop the whole factory every time something is going wrong. Instead involve the people needed to handle the situation. It is called stop and fix and not stop and repair or even stop and patch, there are some mandatory steps to create the desired outcome.

  1. Fix the fault. We shall fix the fault and deliver the solution as fast as possible. If possible also implement poka-yokes like unit tests or automated GUI test to prevent the fault to reappear undetected.
  2. Fix the faulty process that made it possible to make this mistake. We who think that Lean is great way of running a business and product development believes that the right process will produce the right result. We also believe that we shouldn’t blame mistakes on individual since mistakes happen due to faulty process that allow people to fail. That is why we need to find the root cause of the issue. What on the surface looks like a technical problem will after some digging be revealed as a system problem. My favorite way of doing root cause analysis is to conduct a session of 5-whys. Eric Ries has made a good description on how use this method. (http://www.startuplessonslearned.com/2009/07/how-to-conduct-five-whys-root-cause.html ) Just make sure that everybody involved is present while doing the root cause analysis. Otherwise it might easily become a blame fest (Blamestorming?). When we are implementing the process improvements we are step by step getting a little better and are coming a bit closer to excellence.
  3. Share the failures and the learnings. Wouldn’t it be great if we were working in an environment where we were talking openly about mistakes and what we learned from them? An open climate like this could save us all from repeating failures. Write a post on the blog, create a wall of fail or talk about it on the fika.

And please remember…

15zdfc