lishcenthetelevisyen: Continuous Delivery

Showing posts with label Continuous Delivery. Show all posts

Thursday, 12 September 2013

The Real Cost of Change in Software Development

There are two widely opposed (and often misunderstood) positions on how expensive it can be to change or fix software once it has been designed, coded, tested and implemented. One holds that it is extremely expensive to leave changes until late, that the cost of change rises exponentially. The other position is that changes should be left as late as possible, because the cost of changing software is – or at least can be – essentially flat (that’s why we call it “soft”ware).

Which position is right? Why should we care? And what can we do about it?

Exponential Cost of Change

Back in the early 1980s, Barry Boehm published some statistics (Software Engineering Economics, 1981) which showed that the cost of making a software change or fix increases significantly over time – you can see the original curve that he published here.

Boehm looked at data collected from Waterfall-based projects at TRW and IBM in the 1970s, and found that the cost of making a change increases as you move from the stages of requirements analysis to architecture, design, coding, testing and deployment. A requirements mistake found and corrected while you are still defining the requirements costs almost nothing. But if you wait until after you've finished designing, coding and testing the system and delivering it to the customer, it can cost up to 100x as much.

A few caveats here. First, the cost curve is much higher in large projects (in smaller projects, the cost curve is more like 1:4 instead of 1:100). Those cases where the cost of change rises up to 100x are rare, what Boehm calls Architecture-Breakers, where the team gets a fundamental architectural assumption wrong (scaling, performance, reliability) and doesn't find out until after customers are already using the system and running into serious operational problems. And this analysis was all done on a small data sample from more than 30 years ago, when developing code was much more expensive and time-consuming and paperworky, and the tools sucked.

A few other studies have been done since then which mostly back up Boehm's findings – at least the basic idea that the longer it takes for you to find out that you made a mistake, the more expensive it is to correct it. These studies have been widely referenced in books like Steve McConnell’s Code Complete, and used to justify the importance of early reviews and testing:

Studies over the last 25 years have proven conclusively that it pays to do things right the first time. Unnecessary changes are expensive. Researchers at Hewlett-Packard, IBM, Hughes Aircraft, TRW, and other organizations have found that purging an error by the beginning of construction allows rework to be done 10 to 100 times less expensively than when it's done in the last part of the process, during system test or after release (Fagan 1976; Humphrey, Snyder, and Willis 1991; Leffingwell 1997; Willis et al. 1998; Grady 1999; Shull et al. 2002; Boehm and Turner 2004).

In general, the principle is to find an error as close as possible to the time at which it was introduced. The longer the defect stays in the software food chain, the more damage it causes further down the chain. Since requirements are done first, requirements defects have the potential to be in the system longer and to be more expensive. Defects inserted into the software upstream also tend to have broader effects than those inserted further downstream. That also makes early defects more expensive.

There’s some controversy over how accurate and complete this data is, how much we can rely on it, and how relevant it is today when we have much better development tools and many teams have moved from heavyweight sequential Waterfall development to lightweight iterative, incremental development approaches.

Flattening the Cost of Changing Code

The rules of the game should change with iterative and incremental development – because they have to.

Boehm realized back in the 1980s that we could catch more mistakes early (and therefore reduce the cost of development) if we think about risks upfront and design and build software in increments, using what he called the Spiral Model, rather than trying to define, design and build software in a Waterfall sequence.

The same ideas are behind more modern, lighter Agile development approaches. In Extreme Programming Explained (the 1st edition, but not the 2nd) Kent Beck states that minimizing the cost of change is one of the goals of Extreme Programming, and that a flattened change cost curve is “the technical premise of XP”:

Under certain circumstances, the exponential rise in the cost of changing software over time can be flattened. If we can flatten the curve, old assumptions about the best way to develop software no longer hold…

You would make big decisions as late in the process as possible, to defer the cost of making the decisions and to have the greatest possible chance that they would be right. You would only implement what you had to, in hopes that the needs you anticipate for tomorrow wouldn't come true. You would introduce elements to the design only as they simplified existing code or made writing the next bit of code simpler.

It’s important to understand that Beck doesn't say that with XP the change curve is flat. He says that these costs can be flattened if teams work towards this, leveraging key practices and principles in XP, such as:

Simple Design, doing the simplest thing that works, and deferring design decisions as late as possible (YAGNI), so that the design is easy to understand and easy to change

continuous, disciplined Refactoring to keep the code easy to understand and easy to change

Test-First Development – writing automated tests upfront to catch coding mistakes immediately, and to build up a testing safety net to catch mistakes in the future

developers collaborating closely and constantly with the Customer to confirm their understanding of what they need to build and working together in Pairs to design solutions and solve problems, and catch mistakes and misunderstandings early

relying on working software over documentation to minimize the amount of paperwork that needs to be done with each change (write code, not specs)
the team’s experience working incrementally and iteratively – the more that people work and think this way, the better they will get at it.

All of this makes sense and sounds right, although there are no studies that back up these assertions, which is why Beck dropped this change curve discussion from the second edition of his XP book.
But by then the idea that change could be flat with Agile development had already become accepted by many people.

The importance of Feedback

Scott Amber agrees that the cost curve can be flattened in Agile development, not because of Simple Design, but because of the feedback loops which are fundamental to iterative, incremental development. Agile methods optimize feedback within the team, developers working closely together with each other and with the Customer and relying on continuous face-to-face communications. And following technical practices like Test-First Development and Pair Programming and Continuous Integration makes these feedback loops even tighter.

But what really matters is getting feedback from the people using the system – it’s only then that you know if you got it right or what you missed. The longer that it takes to design and build something and get feedback from real users, the more time and work that is required to get working software into a real customer’s hands, the higher your cost of change really is.

Optimizing and streamlining this feedback loop is what is driving the Lean Startup approach to development: defining a Minimum Viable Product (something that just barely does the job), getting it out to customers as quickly as you can, and then responding to user feedback through Continuous Deployment and A/B testing techniques until you find out what customers really want.

Even flat change can still be expensive

Even if you do everything to optimize these feedback loops and minimize your overheads, this still doesn’t mean that change will come cheap. Being fast isn’t good enough if you make too many mistakes along the way.

The Post Agilist uses the example of painting a house: assume that it costs $1,000 each time you paint the house, whether you paint it blue or red or white. The cost of change is flat. But if you have to paint it blue first, then red, then white before everyone is happy, you’re wasting time and money.

“No matter how expensive or change the ‘cost of change’ curve may be, the fewer changes that are made, the cheaper and faster the result will be…Planning is not a four letter word.” [however, I would like to point out that “plan” is].

Spending too much time upfront in planning and design is waste. But not spending enough time upfront to find out what you should be building and how you should be building it before you build it, and not taking the care to build it carefully, is also a waste.

Change gets more expensive over time

You also have to accept that the incremental cost of change will go up over the life of a system, especially once a system is being used. This is not just a technical debt problem. The more people using the system, the more people who might be impacted by the change if you get it wrong, the more careful you have to be, which means you need to spend more time on planning and communicating changes, building and testing a roll-back capability, and roll changes out slowly using Canary Releases and Dark Launching – which add costs and delays to getting feedback.

There are also more operational dependencies that you have to understand and take care of, and more data that you have to change or fix up, making changes even more difficult and expensive. If you do things right, keep a good team together and manage technical debt responsibly, these costs should rise gently over the life of a system – and if you don’t, that exponential change curve will kick in.

What is the Real Cost of Change?

Is the real cost of change exponential, or is it flat? The truth is somewhere in between.

There’s no reason that the cost of making a change to software has to be as high as it was 30 years ago. We can definitely do better today, with better tools and better, cheaper ways of developing software. The keys to minimizing the costs of change seem to be:

Get your software into customer hands as quickly as you can. I am not convinced that any organization really needs to push out software changes 10-50-100x a day, but you don’t want to wait months or years for feedback either. Deliver less, but more often. And because you’re going to deliver more often, it makes sense to build a Continuous Delivery pipeline so that you can push changes out efficiently and with confidence. Use ideas from Lean Software Development and maybe Kanban to identify and eliminate waste and to minimize cycle time.

We know that even with lots of upfront planning and design thinking, we won’t get everything right upfront - this is the Waterfall fallacy. But it’s also important not to waste time and money iterating when you don’t need to. Spending enough time upfront in understanding requirements and in design to get it at least mostly right the first time can save a lot later on.

And whether you’re working incrementally and iteratively, or sequentially, it makes good sense to catch mistakes early when you can, whether you do this through Test First Development and pairing, or requirements workshops and code reviews, whatever works for you.

Thursday, 22 August 2013

Getting Application Security Vulnerabilities Fixed

It’s a lot harder to fix application security vulnerabilities than it should be.

In their May 2013 security report, WhiteHat Security published some discouraging findings about how many application security vulnerabilities found in testing get fixed, and how long it takes to fix them. They found that only 61% of serious security vulnerabilities get fixed, and that on average, it takes 193 days to get them fixed.

Why some vulnerabilities get fixed, or don’t get fixed

Convincing management and the customers paying for software development work – and the developers that need to do the work – that security vulnerabilities really need to be fixed is one part of the problem. Proving that this can be done in a cost effective and safe way is another.

For most organizations, compliance – not operational risk, not customer requirements or other commercial considerations, and not concern for quality – determines whether security vulnerabilities get fixed. WhiteHat customers reported that the #1 reason that a vulnerability gets fixed is because it is required by compliance. And the #1 reason that people don’t fix a vulnerability is because it isn't required by compliance.

One of the other factors that influences how many security vulnerabilities get fixed and how fast they can be fixed is where the system is in its life. It’s a lot easier and much less expensive to fix security vulnerabilities found early in a project, before you’ve written a lot of code that needs to be reviewed, fixed and tested again; when the situation and the system are both plastic enough that you can make course corrections without a lot of time or cost.

Obviously it’s a much different story for legacy systems on life support, where nobody really understands the code or is confident that they can change it safely, and nobody is sure how long the system is going to be around (although these systems almost always hang on longer than anyone expects).

Everything in between is where decisions are difficult to make: the system is already in use and has been for a while, and the team maintaining and supporting it has a full book of committed work to deal with. It can be hard to make fixing security vulnerabilities a priority when things seem to be running fine, and everyone is busy trying to keep it this way, unless maybe compliance is standing over you holding a big enough hammer that you have to do something to show that you are taking them seriously.

What’s it going to cost?

If you can make the case that there are serious security problems that that need to be taken care of, where do you start? A security review could uncover hundreds or even thousands of vulnerabilities – the first time that you do a security scan or pen test of a big system can be overwhelming. How much work is it going to take to “make the system secure”, what is it going to cost?

Denim Group has done some interesting research on understanding how much work is involved in fixing security vulnerabilities.

Like any other bugs in code, some vulnerabilities are easier to find and fix than others. A XSS vulnerability can take anywhere between 10 minutes (stored XSS) to about an hour and a half (stored and reflected) to fix – and most web apps have at least one, often hundreds of these problems. Fixing a SQL Injection problem also takes an hour and a half on average. A missing authorization check? Only 7 minutes. And like any other bugs in code, the coding work is only a small part of the time taken to get the fix done (on average, 30% of the total time). Testing takes around half of the time, and the rest is in getting things setup for making the change, getting the new code built and deployed, and overhead.

Unlike a functional bug, the customer won’t see any immediate advantage in fixing a vulnerability – the code works fine right now as far as they can see. So it’s important that you can fix vulnerabilities without spending too much time or money doing it, and that you can do it without breaking whatever is already working.

This is why Nick Galbreath stresses the value of a Continuous Deployment capability as a pre-requisite to a successful software security program, leveraging Continuous Integration and Continuous Delivery tools and practices so that when developers check in code changes the system is automatically built, tested and it can be automatically deployed if all of the tests pass. It’s not about pushing every code change out immediately – it’s about having a proven pipeline in place for rolling changes out to production quickly and with minimal risk, knowing that you can make fixes and get them out cheaply and with confidence, and that you are able to respond to an emergency if you have to. This will pay dividends outside of security work, reducing the cost and risk of making any software change.

Getting security bugs fixed

Denim Group explains that remediating software security vulnerabilities has to be managed like any other software development project, and they provide some guidelines on how to do it
in a waterfally kind of way, with upfront stakeholder engagement and planning: an approach that can work fine for many organizations, especially larger ones.

A more iterative, Agile approach could start with a short, time-boxed spike. Take a couple of smart developers and give them a couple of weeks to review the list of vulnerabilities (if possible with whoever found them), understand which ones are serious and filter out false positives, and pick some vulnerabilities to fix (a few each of different kinds). They should choose which vulnerabilities to work on by trading off what is easy to understand and fix, against the risk of not fixing them. Tools like the OWASP Top 10 or SANS/CWE Top 25 can help with understanding the issues and making these decisions.

SQL Injections would make a good first choice: a serious vulnerability that is easy to exploit and that can have serious consequences, but also easy for a developer to understand and fix.

Or a missed authorization check: another potentially serious bug that should be easy to understand, trivial to fix and test. A problem like a mistake in secure password storage
might be technically harder to solve, but still easy to isolate and test. Adding server-side validation (instead of validation only at the client) is another easy and good place to start.

It is important that the developers take the time to understand what they are doing and how to do it right: that they understand the vulnerability, why it is a problem, how to fix it correctly, and how to test it (test it to make sure that they actually closed the security hole, and regression test it to make sure that they didn't break anything else by accident). The important thing here isn't to make a few fixes – it’s to learn what’s involved in correctly solving these problems, and know that you can build and deploy the fixed code properly.

Then run another spike. Pick a few other bugs, maybe some that are harder to understand and fix, and some that are easy to fix but less serious (like missing error handling or information leaks), and run through the same steps again.

With a small investment of time like this, you should get an understanding of what work needs to be done, how to do it, what it’s going to cost, and you should also have the confidence that you can do it safely. You should have good enough information to estimate the amount of work that it will take to fix the remaining problems; and a good enough understanding of risk and cost trade-offs that need to be made, what problems need to be fixed – and can be fixed – sooner or later.

Now you can add the remaining bugs that you plan to fix to your backlog. You might decide to fix as many of them as you can all at once in a hardening sprint, or prioritize and fix them with the other work in your backlog.

You can’t fix, or effectively plan to fix, security vulnerabilities until you understand them. Once you understand the problems (what the bugs are, what needs to be fixed and why), and how much it is going to cost to fix them, and once you have the confidence that you can fix them properly, you can treat security vulnerabilities like other bugs – decide what needs to be fixed and when by trading off cost and risk with the other work that you have to get done. Remediation work becomes just another software development problem to be managed, something that developers and managers already know how to do.

Friday, 8 March 2013

Book Review: The Phoenix Project

Everyone who attended the “Rugged Devops” panel at RSA this year received a free copy of The Phoenix Project (by Gene Kim, Kevin Behr and George Spafford – the authors of Visible Ops), the fictional story of the education and transformation of an IT manager, an IT organization, and eventually of an entire company.

I'm not sure why they wrote the Phoenix Project as a novel. But they did. So I’ll review it first as a piece of story telling, and then look at the messaging inside.

The reason that I don’t like didactic fiction is that so much of it is so poorly written – generally forced and artificial. I was pleasantly surprised by the Phoenix Project. The first half of the novel tells a story about an IT manager forced into taking on responsibility for saving his company. Our well-meaning hero, an ex-marine sergeant (for some reason unclear to me, the hero, the CEO, and even the mysterious guru all have a military background) with an MBA but without any ambition except to quietly provide for his family. He has been successfully running his own little part of the IT group, so successfully that he is dramatically promoted to take over all of IT operations (and so successfully that his own group is never mentioned again in the story – it seems to run on auto-pilot without him). For a successful manager, our hero knows alarmingly little about how the rest of the IT organization works, or about the business, and so is unprepared for his new responsibility. He reluctantly accepts the big job, and then regrets it immediately as he realizes what a shit storm he has walked into.

It’s a compelling narrative that draws you in and is seems realistic even with the stock characters: the sociopathic SOB CEO, the unpopular everything-by-the-book CISO, the Software Development Manager who only cares about hitting deadlines (but not about delivering software that works), the Machiavellian Marketing executive, and the autistic genius that the entire IT organization of several hundred people all depend on to get all the important stuff done or fixed.

As a pure piece of story telling, things start to unravel in the second half with the emergence of the IT / Lean Manufacturing guru – when the story ends and the devops fable begins. From this point on, the plot depends on several miraculous conversions: everyone except the marketing exec sees the light and literally transform overnight. They even start to groom themselves nicely and dress better. Lifetime friendships are made over a few drinks, and everyone learns to trust and share: there’s a particularly cringe-inducing management meeting in which people bare their souls and weep. Conflicts and problems disappear magically usually because of the guru’s intervention – including an unnecessary scene where the guru shows up at a crucial audit meeting and helps convince the partner of the audit firm (an old buddy of his) that there aren't any real problems with the company’s messed up IT controls (“these aren't the droids you’re looking for”).

But the real heroes are the people running the manufacturing group: the one part of this spectacularly mismanaged company that somehow functions perfectly is a manufacturing plant where everyone can all go to learn about Kanban and Lean process management and so on. Without the help of the smug and smart-alecky guru - who apparently helped create this manufacturing success - and his tiresome Zen master teaching approach (and sometimes even with his help), our hero is unable to understand what’s wrong or what to do about it. He doesn't understand anything about demand management, how to schedule and prioritize project work, that firefighting is actual work, how to get out of firefighting mode, how to recognize and manage bottlenecks in workflow, or even how important it is to align IT with business priorities (where did he get that MBA any ways?). Luckily, the factory is right there for him learn from, if he only opens his eyes and mind.

What will you learn from The Phoenix Project?

The other problem with this story telling approach is that it takes so damn long to get to the point – it’s a 338 page book with about 50 pages of meat. Like Goldratt's The Goal (which is referenced a couple of times in this book), The Phoenix Project leads the reader to understanding through a kind of detective story. You have to be patient and watch as the hero stumbles and goes down blind alleys and ignores obvious clues and only with help eventually figures out the answers. Unfortunately, I'm not a patient reader.

This is a gentle introduction to Lean IT and devops. If you've read anything on Kanban and devops you won’t find anything surprising, although the authors do a good job of explaining how Lean manufacturing concepts can be applied to managing IT work. The ideas covered in the book are standard Lean and Theory of Constraints stuff, with a little of David Anderson’s Kanban and some devops – especially Continuous Deployment as originally described by John Allspaw and Continuous Delivery.

The guru’s lessons are mostly about visualizing and understanding and limiting demand – that you should stop taking on more work until you can get things actually working so that you’re not spending all of your time task-switching and firefighting; identifying workflow bottlenecks and protecting or optimizing around them; how reducing batch size in development will improve control and to get faster feedback; that in order to do this you have to work on simplifying and standardizing deployment; and how valuable it is to get developers and operations to work together.

My complaints aren't with the ideas – I buy into a lot of Devops and agree that Kanban and Lean have a lot to offer to IT ops and support teams (although I'm not sold on Kanban by itself for development, certainly not at scale). But I was disappointed with the unrealistic turnaround in the second half of the book. It’s all rainbows and unicorns at the end. IT, the business, development and security all start working together seamlessly. Management is completely aligned. Performance problems? No problem – just go the Cloud. And they even bring in the famous Chaos Monkey in the last couple of pages just because.

Spoiler Art: Everything goes so well in a few months that the company is back on track, plans to outsource IT and to break up the company are cancelled, the selfish head of marketing is canned, and our hero is promoted to CIO and put on the fast track to corporate second in command. Sorry: nothing this bad gets that good that easily. It is a fable after all, and too hard to swallow.

The Phoenix Project is a unique book – when was the last time that you read an actual novel about IT management?! It was worth reading, and if it introduces devops and Lean ideas to more people in IT, The Phoenix Project will have accomplished something useful. But it’s not a book that you can use to get things done. There are lessons and recipes and patterns but they take work to pull out of the story. There’s no index, no good way to go back to find things that you thought were useful or interesting. So I am looking forward to the Devops Cookbook: something practical that hopefully explains how these ideas can work in businesses that don’t look all like Facebook and Twitter.