Monday, 28 April 2014

What Developers could learn from the Airlines

I have been getting really into the Air Crash Investigation programs on the TV. I like the technical detail and the suspense of an investigation, followed by a conclusion which often points at a chain of errors, many of which are not serious by themself but which, when added together, equals a crash and possibly many deaths.

Most of us wouldn't stop to question the importance of airline safety, being several thousand metres in the air is not a natural position for humans and anything that can make 100 tonne airplane lose control is probably going to end badly. How do airlines maintain this safety? With two things that I think could save much egg on Developer faces, 1) Checklists and 2) A continuous learning experience.

Checklists are the bane of most Developers lives. However experienced we are, we are almost pathologically opposed to paperwork and process. "I don't need a checklist, I know how to code". This is because we misunderstand the point of process and are often not taught its place, especially if we are self-taught, in which case we probably wouldn't even have considered it. Looking again at a pilot's experience, checklists are used for all manner of things. Pre-start checklist, start checklist, post-start checklist, taxi checklist, before takeoff, after takeoff, climb etc. Now, it is very often the case that a pilot is a pilot for their entire career, in other words, many people flying your planes are very experienced, possibly accruing more than 20,000 hours of flying (that would be 833 continuous days!) but they still use checklists. A Captain doesn't argue with their boss about the fact that their experience negates the need for checklists. Checklists are different for each aircraft and they are updated as issues or risks appear so it is in everyone's interest that they are followed to the letter. Perhaps we just don't take our craft seriously enough. So a web site breaks, who cares? We can fix it quickly, worst-case is a little inconvenience, a minor loss of face. It is unlikely that anyone is going to die. The problem with that view, and it appears to be very common from the stories I read online, is that it pervades the whole discipline of development. If I don't care that much, I won't care that much about security or useability or data protection or the fact that people might be relying on my service and it should work properly! A checklist is a very simple beast that mitigates the fact that people forget or they have a bad day or they're tired. It mitigates the fact that information changes and best-practices update and it seems rather foolish to wait till we personally make a mistake before we make allowances for it, rather than allowing the industry to experience problems in one place and let everyone else learn from that.

This brings me onto the second point. The airlines have a very established continuous learning methodology. The air transportation safety departments around the world have an established (and thankfully unchallenged) idea that an issue should only ever happen once to make changes. "What went wrong, why and how can we avoid it in the future". The industry can often make quite large demands on aircraft manufacturers or operators to retro-fit a new device or update a part to take into account something that went wrong and caused a crash or even a near-miss. In Software, I would suggest our corporate ability to do the same is lamentably bad. How many sites have lost data due to a poorly set username/password combo? 100s? 1000s? How many sites have used a very poor hash to store passwords and effectively given them to an attacker for free? 100s? 1000s? more? One of the problems with the world of software engineering is that there is no coherence. No mandatory training, no worldwide certification that says, "I can write software correctly". No laws anywhere, as far as I know, even require a software process. How can you learn from someone elses mistake when you have no process to modifying accordingly?

The answer is not straight-forward. Developers are funny beasts. Some are extremely quiet, shy and conformist, others are wild crazy hippies who can't do what they're told. How do you find agreement across an industry that takes 10 years even to approve HTML5 (and it still didn't really end up nailed down). What would the ultimate intention even be? Software more than any other commodity is cross-border and will only adhere to local laws, if there are any. If you get a high price from a Western country with lots of regulation, you can easily buy the same software from a country with less regulation and therefore less overheads. Naturally, this would be like buying cheap knock-off aircraft parts from EBay, but again, the risks of poor software engineering are still not really appreciated so a customer will see a company that looks good on the web and order something, even if it's poorly written and insecure.

Schemes like ISO27001 are designed to demonstrate a quality management process for information security but like many ISO documents aim to be all things to all people so that they become the lowest common denominator, which in document terms means very wooly abstract documents which can be ticked-off with a process whether or not the process is actually much good at producing quality. The same was true of the older ISO9001, which I saw several companies achieve by box ticking (and spending lots of money) without the culture of the company being very quality-oriented. Sadly, they set out to achieve something that is of limited use at best and in a way that is all but out of reach for many smaller companies.

What we really need is something specifically geared towards software development. An ISO that can be recognised worldwide but is more rooted in specifics. Perhaps there would be different documents for web applications than desktop applications but the intention would be the same. "You will have a development process", "Your process is reviewed regularly and updated in line with mistakes, both internal and external", "You have a code review process that ensures at minimum the following issues, where relevant to your code, are checked". Since I believe in continuous iteration, we could start with a very minimal implementation and increase it to include more factors as people use it and get used to it. To make it useable for smaller companies, you could self-certify compliance which would demonstrate willing and with a suitable insurance policy, could help customers know that a supplier is worth using. If you are self-certified, perhaps a larger company buying your product could pay to have an auditor sign-off on your certification.

As with any idea like this, thought, who starts it? Who would people listen to? Which big name, famous, respected person could say to everyone, "Let's do this and make it work?". The answer is, I don't know. Sigh.

Post a Comment