Thursday, 19 September 2013

Code Reviews Change Over Time


We've been doing code reviews for about 4 years now.


Getting Started with Code Reviews



From the start, developers would help each other out, look at code when someone asked, or sometimes a lead or a senior developer would step in and review code if we were seeing problems in testing or if someone had just joined the team and we anticipated that they needed some extra oversight. We also hired an expert to come in and do a security code review.


After we launched the system, we decided to be proactive and started risk-based reviews: we asked anyone writing code in high-risk areas (like framework and security plumbing, APIs, core business logic, or areas where we had seen problems before) to get a code review.
We could have stopped there, and got a lot of value out of code reviews. But we decided to keep going, and made code reviews a standard practice.


This didn't happen overnight. It wasn't difficult to convince the team that code reviews could help – they’d already seen good results from the risk-based reviews that we had done. But it was difficult to change the way that people worked, and to make sure that they had enough time to do reviews properly: time to schedule and carry out the reviews, and time to understand and act on the feedback. And it also took a while to come up with an effective code review process.


At first we asked developers to pick a buddy and arrange a review. The results were mixed. Sometimes a developer would shop around for someone nice or someone really busy who they figured would let them off lightly; or two developers would trade-off (I’ll show you mine if you’ll show me yours) so they could get the reviews over with as quickly as possible. Because people didn't know how much time they needed to get reviews completed, reviews were often left until too late, and done after the code was already tested and released.



And because most people didn't have a lot of experience doing reviews, they weren't sure what they should look for, and how to give meaningful feedback. Developers were frustrated (he told me to do it this way, but on another change she told me that I should do it that way) and sometimes upset by negative criticism.


Eventually we decided that the leads would do most of the reviews. Although this added a lot more to the leads’ workload, and meant that they couldn't do as much coding themselves, it helped in a few ways. The lead developers usually had a much better understanding of the requirements and what the code was supposed to do than another developer who happened to be available, which mean that they had a better chance of finding real mistakes. And because the same person was doing most of their reviews, developers received consistent feedback.


How we do Code Reviews



How we do reviews has stayed pretty much the same over the years.



Non-trivial code changes are reviewed (either before or after the check-in) regardless of who wrote the code or what the code does.
We don’t hold formal review meetings with a projector or print outs, and we haven’t found it necessary to use a review tool like Code Collaborator or Crucible or the internal tools that Google or Facebook use to manage and track reviews, although looking back adopting a tool like this early might have helped the team start off better.


Sometimes reviews are done face-to-face, but most of the time they’re done offline: the reviewer and developer exchanging information and maybe patch files by email, because we've found this to be more convenient and easier for everyone to schedule (although people will meet to go through the review in detail if they decide it is necessary).


What has changed over time is what the reviewers look for, and what they find.


Reviews for Correctness



It seems like programmers spend more time arguing over what to look for in code reviews than they actually spend doing code reviews.



We started doing code reviews to improve code quality and to find problems that we weren't catching in testing. Not as a way of teaching inexperienced developers how to write better code, or a way to spread knowledge about how the code works across the team. These might be happy side effects, but reviews should be about making sure that the code works right.



Instead of using long check lists, reviewers started off by asking a small set of questions when they looked at code:



  1. Does the code do what it is supposed to, and does it look like the logic is correct?




    Are there any obvious coding mistakes or code that looks fishy? Is there anything obviously missing or incomplete? What has to be fixed to make sure that the code will work correctly?



    Reviewing for correctness is hard work – a reviewer has to spend some time understanding the requirements, and more time thinking about the code and trying to get into the head of the person who wrote it.



    But even without a lot of time or familiarity, a good reviewer can still see glaring logical mistakes and oversights, inconsistencies (you changed a, b, and d, but you didn't change c. Was this on purpose?), common coding mixups (looking out for < instead of <= or sometimes > in comparisons, off-by-one errors, using the wrong variables in calculations or comparisons – buyer instead of seller, inviter instead of invitee), debugging code left on by accident, redundant code and checks that don’t seem to be needed and other suspect code that doesn't look right or is confusing.



    Reviewers can also look out for basic security issues (access control rules/roles being applied properly, correct handling of sensitive data), backwards compatibility, and proper use of APIs and language features (especially for people new to the team).



    And while we don’t ask reviewers to spend a lot of time reviewing automated tests, they may also do spot checks to look for holes in the test suite – especially if they find an obvious coding mistake (which means that some tests are either missing or wrong).





  2. Does this code look the same and work the same as the rest of the code base?



    Does it follow the style and conventions that the team has agreed to (indentation, headers, capitalization and naming conventions, import sequence…)? Does it use standard templates and auto-formatting rules? We’re not talking about nitpicky perfection here, but we don’t want every piece of code to look different either.



    Architectural conventions. Does it use the framework and common libraries properly? Does it properly follow the big patterns (MVC etc) that we want people to understand and use?




    We wanted developers to work out issues around consistency and style early on while the team was coming together and everyone was still learning and exploring. This turned out to take longer and cause more trouble than we expected. Not only is it tedious to check for, but style can also be a near religious thing for some people. A developer won’t argue over whether something is a bug or not, at least not after they've understood what’s wrong and why. And most developers appreciate learning how to use the framework or language more effectively, as long as this is done in a polite and respectful way. But some people get seriously wound up over aesthetics and style, what kind of code looks cleaner or is more elegant.



Reviewers could offer other feedback of course, advice and suggestions on how to simplify the solution and improve the code, but we left it up to the developer when or if to act on this advice as long as the code is correct and follows conventions.


Reviewing for Understandability



Over time, what you look for and what you get out of code reviews should change. Because the code has changed. And the people doing the work – the developers and the reviewers – have changed too.



It got easier for reviewers to find bugs and bad code as developers learned more about how to use the code checkers in their IDEs and after we found some good static analysis tools to automatically check for common coding bugs and bad coding practices. This mean that reviewers could spend their time looking for more important design mistakes, or timing errors or sequence accounting errors or concurrency bugs like race conditions and potential deadlocks, and other problems that the tools can’t find.


As the team gained more experience supporting the system in production, troubleshooting and fixing problems and working with Ops, reviewers started to look more closely at defensive programming and run-time safety: error and exception handling (especially exception conditions that are difficult to test for), data validation and limits checking, timeouts and retries, properly enforcing API contracts, and logging and alerting. Making sure that the system “wouldn't boink”.


Because we worked out most of the consistency issues early, it stopped being necessary to review for consistency after a while. There was lots of code, so it was natural and easy for developers to build on what was already there, and to keep following the existing style and patterns.


But this also meant that there was a lot more code that had to be read and understood. So code reviewers became more concerned with anything that made the code harder to read and understand. Although fussing over details like method and variable names and whether comments are needed or make sense might not seem that important, all of this feeds back into correctness – reviewers can’t tell if the code is correct if they aren't sure that they really understand it.


Reviewers also started looking out more for unnecessary code duplication because clones create more opportunities for mistakes when you have to make changes, and code that that hasn't been updated to use new libraries or APIs, or code that isn't needed any more (especially a problem if you use feature switches and don’t get around to removing them),
or refactoring that didn't get finished or that got out of control.



Understanding, not Criticizing



Reviews are about understanding the code, and making sure that it works right, not criticizing it.


It’s important to stay out of arguments about “what I think good code is vs. what you think good code is”. A review should never run along the lines of “I know what good OO code is supposed to look like, and apparently you don’t, so let me show you.”


A review should only be about “I have to be able to understand this code well enough to make sure that it works, and in case I have to fix something in this code myself, even though I wouldn't have written it this way and wish that you hadn't (but I’ll keep this part to myself).”


When reviewers step across this line, when people’s pride and sense of professional self-worth are involved, things can get ugly fast.


Besides avoiding arguments, it’s a waste of time giving feedback and suggestions that nobody is going to act on because they’re too busy to get to it or they don’t agree or it isn't necessary. If you want to get good value out of reviews, keep them focused on what’s important.


The main questions that reviewers try to answer now when they look at code are:



  1. Does this code do what it is supposed to do? And will it keep working even if something goes wrong?



    Reviewers keep looking for the same kinds of obvious logic and functional problems as they did before. But they also ask: What happens if it runs into an error? Will Ops be able to tell what the problem was? What happens if some other part of the system that it relies on fails? Will it retry and recover, will it fail gracefully, or will it go boom? What’s the chance that this code could get garbage passed to it, and what would happen if it did? And can it pass garbage to some other part of the system?





  2. Can I be sure that this code does what it is supposed to do?
    Can I understand the code? Can I follow the changes that were made?



    Is the code maintainable?
    Could I fix it in production if something went wrong? Is there any code too that is cute or too clever for its own good? Are there any obvious, safe and simple things that can be done to make the code simpler and easier to understand and easier to change?



Some things get better with time, other things don’t



By looking at changes to code bases over time, Michael Feathers has found that most code is rarely or never changed once it is written; most changes are made to over and over to the same small percentage of the code base.


Because reviewers keep seeing the same code being changed, it changes how they work:



  1. Reviewers who have seen the code before don’t have to spend as much time figuring out what’s going on, they understand the context and have a better chance of understanding the changes and whether the changes are done correctly.



    Looking at the same code again, reviewers may see legacy problems that they didn't notice before. And they should be able to offer more valuable, less superficial feedback on how the code can be simplified and improved.



    There’s a trade-off being made here too of course. Reviewers can also get stale looking at the same code, and stop looking closely enough. And the reviewer’s biases and blind spots will get amplified over time.





  2. As reviewers work with the same developers over and over, they will get to know each other better. Reviewers will learn what to look out for, what the developer’s strengths and weaknesses are, the developer’s tendencies and biases and blind spots, the kinds of mistakes and oversights that they tend to make.



    As developers get to know the reviewers better, feedback will get shorter and simpler, and there will be fewer misunderstandings.



    And because developers will know more about what to expect from a review, they will start to check their own code more carefully, and try to find the problems before the reviewer does. This is what Jason Cohen at SmartBear calls “the Ego Effect”: developers write better code because they know that somebody is going to be looking at it closely, especially if it’s someone they respect. Some people feel that this is the most important reason to do code reviews – to make developers work more carefully, so they don’t look foolish in front of somebody else; and to think about writing code that other people will read.




Avoiding Diminishing Returns



Like any practice in software development, you will eventually see diminishing returns from code reviews especially if you keep doing the same things, looking for the same problems in the same way. You may even reach a point where code reviews aren't worth the cost or time.


This hasn't happened to us. We continue to get a lot of value out of reviews, maybe even more now.


Even as our tools have improved and as we've strengthened our testing capability we keep doing reviews
because bug checking tools, reviews and testing find different problems and different kinds of problems.

We've also found that reviews make testing more effective and efficient, because there’s less back-and-forth between developers and testers over mistakes that are caught and corrected earlier.


And now that reviews are an accepted part of how people work, we don’t have to spend time selling and re-selling the value of code reviews to the team or to the business.
As long as code reviews are being taken seriously, and as long as you care about writing good code, they’re worth doing.


Saturday, 14 September 2013

Apple Jack 1&2 now available on Desura

Hello again,

Just a little message to say that Apple Jack 1&2 is now available on Desura:

Apple Jack 1&2

Thursday, 12 September 2013

The Real Cost of Change in Software Development


There are two widely opposed (and often misunderstood) positions on how expensive it can be to change or fix software once it has been designed, coded, tested and implemented. One holds that it is extremely expensive to leave changes until late, that the cost of change rises exponentially. The other position is that changes should be left as late as possible, because the cost of changing software is – or at least can be – essentially flat (that’s why we call it “soft”ware).



Which position is right? Why should we care? And what can we do about it?



Exponential Cost of Change



Back in the early 1980s, Barry Boehm published some statistics (Software Engineering Economics, 1981) which showed that the cost of making a software change or fix increases significantly over time – you can see the original curve that he published here.




Boehm looked at data collected from Waterfall-based projects at TRW and IBM in the 1970s, and found that the cost of making a change increases as you move from the stages of requirements analysis to architecture, design, coding, testing and deployment. A requirements mistake found and corrected while you are still defining the requirements costs almost nothing. But if you wait until after you've finished designing, coding and testing the system and delivering it to the customer, it can cost up to 100x as much.




A few caveats here. First, the cost curve is much higher in large projects (in smaller projects, the cost curve is more like 1:4 instead of 1:100). Those cases where the cost of change rises up to 100x are rare, what Boehm calls Architecture-Breakers, where the team gets a fundamental architectural assumption wrong (scaling, performance, reliability) and doesn't find out until after customers are already using the system and running into serious operational problems. And this analysis was all done on a small data sample from more than 30 years ago, when developing code was much more expensive and time-consuming and paperworky, and the tools sucked.




A few other studies have been done since then which mostly back up Boehm's findings – at least the basic idea that the longer it takes for you to find out that you made a mistake, the more expensive it is to correct it. These studies have been widely referenced in books like Steve McConnell’s Code Complete, and used to justify the importance of early reviews and testing:


Studies over the last 25 years have proven conclusively that it pays to do things right the first time. Unnecessary changes are expensive. Researchers at Hewlett-Packard, IBM, Hughes Aircraft, TRW, and other organizations have found that purging an error by the beginning of construction allows rework to be done 10 to 100 times less expensively than when it's done in the last part of the process, during system test or after release (Fagan 1976; Humphrey, Snyder, and Willis 1991; Leffingwell 1997; Willis et al. 1998; Grady 1999; Shull et al. 2002; Boehm and Turner 2004).



In general, the principle is to find an error as close as possible to the time at which it was introduced. The longer the defect stays in the software food chain, the more damage it causes further down the chain. Since requirements are done first, requirements defects have the potential to be in the system longer and to be more expensive. Defects inserted into the software upstream also tend to have broader effects than those inserted further downstream. That also makes early defects more expensive.




There’s some controversy over how accurate and complete this data is, how much we can rely on it, and how relevant it is today when we have much better development tools and many teams have moved from heavyweight sequential Waterfall development to lightweight iterative, incremental development approaches.



Flattening the Cost of Changing Code



The rules of the game should change with iterative and incremental development – because they have to.




Boehm realized back in the 1980s that we could catch more mistakes early (and therefore reduce the cost of development) if we think about risks upfront and design and build software in increments, using what he called the Spiral Model, rather than trying to define, design and build software in a Waterfall sequence.




The same ideas are behind more modern, lighter Agile development approaches. In Extreme Programming Explained (the 1st edition, but not the 2nd) Kent Beck states that minimizing the cost of change is one of the goals of Extreme Programming, and that a flattened change cost curve is “the technical premise of XP”:


Under certain circumstances, the exponential rise in the cost of changing software over time can be flattened. If we can flatten the curve, old assumptions about the best way to develop software no longer hold…




You would make big decisions as late in the process as possible, to defer the cost of making the decisions and to have the greatest possible chance that they would be right. You would only implement what you had to, in hopes that the needs you anticipate for tomorrow wouldn't come true. You would introduce elements to the design only as they simplified existing code or made writing the next bit of code simpler.

It’s important to understand that Beck doesn't say that with XP the change curve is flat. He says that these costs can be flattened if teams work towards this, leveraging key practices and principles in XP, such as:


  • Simple Design, doing the simplest thing that works, and deferring design decisions as late as possible (YAGNI), so that the design is easy to understand and easy to change



  • continuous, disciplined Refactoring to keep the code easy to understand and easy to change



  • Test-First Development – writing automated tests upfront to catch coding mistakes immediately, and to build up a testing safety net to catch mistakes in the future





  • developers collaborating closely and constantly with the Customer to confirm their understanding of what they need to build and working together in Pairs to design solutions and solve problems, and catch mistakes and misunderstandings early



  • relying on working software over documentation to minimize the amount of paperwork that needs to be done with each change (write code, not specs)

  • the team’s experience working incrementally and iteratively – the more that people work and think this way, the better they will get at it.





All of this makes sense and sounds right, although there are no studies that back up these assertions, which is why Beck dropped this change curve discussion from the second edition of his XP book.
But by then the idea that change could be flat with Agile development had already become accepted by many people.



The importance of Feedback



Scott Amber agrees that the cost curve can be flattened in Agile development, not because of Simple Design, but because of the feedback loops which are fundamental to iterative, incremental development. Agile methods optimize feedback within the team, developers working closely together with each other and with the Customer and relying on continuous face-to-face communications. And following technical practices like Test-First Development and Pair Programming and Continuous Integration makes these feedback loops even tighter.




But what really matters is getting feedback from the people using the system – it’s only then that you know if you got it right or what you missed. The longer that it takes to design and build something and get feedback from real users, the more time and work that is required to get working software into a real customer’s hands, the higher your cost of change really is.




Optimizing and streamlining this feedback loop is what is driving the Lean Startup approach to development: defining a Minimum Viable Product (something that just barely does the job), getting it out to customers as quickly as you can, and then responding to user feedback through Continuous Deployment and A/B testing techniques until you find out what customers really want.



Even flat change can still be expensive



Even if you do everything to optimize these feedback loops and minimize your overheads, this still doesn’t mean that change will come cheap. Being fast isn’t good enough if you make too many mistakes along the way.




The Post Agilist uses the example of painting a house: assume that it costs $1,000 each time you paint the house, whether you paint it blue or red or white. The cost of change is flat. But if you have to paint it blue first, then red, then white before everyone is happy, you’re wasting time and money.


“No matter how expensive or change the ‘cost of change’ curve may be, the fewer changes that are made, the cheaper and faster the result will be…Planning is not a four letter word.” [however, I would like to point out that “plan” is].

Spending too much time upfront in planning and design is waste. But not spending enough time upfront to find out what you should be building and how you should be building it before you build it, and not taking the care to build it carefully, is also a waste.


Change gets more expensive over time



You also have to accept that the incremental cost of change will go up over the life of a system, especially once a system is being used. This is not just a technical debt problem. The more people using the system, the more people who might be impacted by the change if you get it wrong, the more careful you have to be, which means you need to spend more time on planning and communicating changes, building and testing a roll-back capability, and roll changes out slowly using Canary Releases and Dark Launching – which add costs and delays to getting feedback.


There are also more operational dependencies that you have to understand and take care of, and more data that you have to change or fix up, making changes even more difficult and expensive. If you do things right, keep a good team together and manage technical debt responsibly, these costs should rise gently over the life of a system – and if you don’t, that exponential change curve will kick in.



What is the Real Cost of Change?



Is the real cost of change exponential, or is it flat? The truth is somewhere in between.



There’s no reason that the cost of making a change to software has to be as high as it was 30 years ago. We can definitely do better today, with better tools and better, cheaper ways of developing software. The keys to minimizing the costs of change seem to be:




  1. Get your software into customer hands as quickly as you can. I am not convinced that any organization really needs to push out software changes 10-50-100x a day, but you don’t want to wait months or years for feedback either. Deliver less, but more often. And because you’re going to deliver more often, it makes sense to build a Continuous Delivery pipeline so that you can push changes out efficiently and with confidence. Use ideas from Lean Software Development and maybe Kanban to identify and eliminate waste and to minimize cycle time.




  2. We know that even with lots of upfront planning and design thinking, we won’t get everything right upfront - this is the Waterfall fallacy. But it’s also important not to waste time and money iterating when you don’t need to. Spending enough time upfront in understanding requirements and in design to get it at least mostly right the first time can save a lot later on.


  3. And whether you’re working incrementally and iteratively, or sequentially, it makes good sense to catch mistakes early when you can, whether you do this through Test First Development and pairing, or requirements workshops and code reviews, whatever works for you.



Wednesday, 11 September 2013

Apple Jack 1&2 in Indie Royale bundle.

Hello!

If you mosey on over to the latest Indie Royale bundle situated here:

http://www.indieroyale.com/

you will find none other than the PC versions of Apple Jack 1&2! See!

Needless to say, the more of these bundles that are sold, the more money I get from them, so if you haven't already got them on Xbox now will be a perfect time to pick them up! You even get four other games absolutely free. I actually don't know very much about them but I'll have a guess based on their names:

Thunder Wolves - a game where you play as a pack of wolves caught in a thunder storm.

Stellar Impact - a game set in 2008 where you play as Stellar McCartney whacking Heather Mills over the head with one of her false legs.

DIVO - a rhythm action game starring operatic pop vocal group Il Divo. 

100% Orange Juice - a game where you play as the quality control manager for Tropicana as he tests batches of orange juice to ensure that they comply fully with the trades description act.

And all of that for the price of a good-quality frozen pizza!


PS - Apple Jack 1&2 have also been accepted by Desura and should be available there very shortly. If I have promised you a code for the game they will be sent as soon as it is up.

Tuesday, 3 September 2013

This is how Facebook Develops and Deploys Software. Should you care?

A recently published academic paper by Prof. Dror Feitelson at Hebrew University, Eitan Frachtenberg a research scientist at Facebook, and Kent Beck (who is also doing something at Facebook), describes Facebook’s approach to developing and deploying their front-end software. While it would be more interesting to understand how back-end development is done (this is where the real heavy lifting is done scaling up to handle hundreds of millions of users), there are a few things in the paper that are worth knowing about.




Continuous Deployment at Facebook is not Continuous Deployment



Rather than planning work out in projects or breaking work into time boxed Sprints, Facebook developers do most of their work in independent, small changes which are released frequently. This makes sense in Facebook’s online business model, everyone constantly tuning the platform and trying out new options and applications in different user communities, seeing what sticks. It’s a credit to their architecture that so many small, independent changes can actually be done independently and cheaply.




Facebook says that they follow Continuous Deployment, but it’s not Continuous Deployment the way that IMVU made popular where every change is pushed out to customers immediately, or even how a company like Etsy does Continuous Deployment.



At Facebook, code can be released twice a day, but this is done mostly for bug fixes and internal code. New production code is released once per week: thousands of changes by hundreds of developers are packaged up by their small release team on Sundays, run through automated regression testing, and released on Tuesday if the developers who contributed the changes are present.
Release engineers assess the risk of changes based on the size of the change, the amount of discussion done in code reviews (which is recorded through an internal code review tool), and on each developer’s “push karma”: how many problems they have seen from code by this developer before.



A tool called “Gatekeeper” controls what features are available to what customers to support dark launching, and all code is released incrementally – to staging, then a subset of users, and so on. Changes can be rolled-back if necessary – individually, or, as a last resort, an entire code release. However, like a lot of Silicon Valley devops shops, they mostly follow the “Real Men only Roll Forward” motto.



Code Ownership



A key to the culture at Facebook is that developers are individually responsible for the code that they wrote, for testing it and supporting it in production. This is reflected in their code ownership model:


Developers must also support the operational use of their software — a combination that’s become known as “devops.” This further motivates writing good code and testing it thoroughly. Developers’ personal stake in keeping the system running smoothly complements the engineering procedures and lets the system maintain quality at scale. Methodologies and tools aren’t enough by themselves because they can always be misused. Thus, a culture of personal responsibility is critical.




Consequently, most source files are modified by only a few engineers. Although at least one other engineer reviews all changes before they’re committed, a third of the source files have only been edited by one engineer, and another quarter by two. Only 10 percent of the files are handled by more than seven engineers. On the other hand, the distribution of engineers per file has a heavy tail, with the most widely shared file handled by no fewer than 870 distinct engineers. These widely shared files are predominantly library files and also include major configuration and top-level PHP files.



Testing? We don’t need no stinking testing…



Facebook doesn't have an independent test team, because, they say, they don’t need one.



First, they depend a lot on code reviews to find bugs:


At Facebook, code review occupies a central position. Every line of code that’s written is reviewed by a different engineer than the original author. This serves multiple purposes: the original engineer is motivated to ensure that the code is of high quality, the reviewer comes with a fresh mind and might find defects or suggest alternatives, and, in general, knowledge about coding practices and the code itself spreads throughout the company.

Developers are also responsible for writing unit tests and their own regression tests – they have “tens of thousands of regression tests” (which doesn't sound like near enough for 10+ million lines of mostly PHP code compiled into C++, both languages which are easy to make coding mistakes in) and automated performance tests.



And developers also test the software by using the development version of Facebook for their personal Facebook use. According to the authors, “this is just one aspect of the departure from traditional software development”. But Facebook developers using their own software internally (and passing this off as “testing”) is no different than the early days at Microsoft where employees were supposed to “eat their own dog food”, a practice that did little if anything to improve the quality of Microsoft products.



Facebook also depends on customers to test the software for them. Software is released in steps for A/B testing and “live experimentation” on subsets of the user base, whether customers want to participate in this testing or not. Because their customer base is so large, they can get meaningful feedback from testing with even a small percentage of users, which at least minimizes the risk and inconvenience to customers.



Security???



While performance is an important consideration for developers at Facebook, there is no mention of security checks or testing anywhere in this description of how Facebook develops and deploys software. No static analysis, dynamic analysis/scanning, pen testing or explanation of how the security team and developers work together, not even for “privacy sensitive code” – although this code is “held to a higher standard” they don’t explain what this “higher standard” is. Presumably they rely on use of libraries and frameworks to handle at least some appsec problems, and possibly look for security bugs in their code reviews, but they don't say.



There isn’t much information available on Facebook’s appsec program anywhere. The security team at Facebook seems to spend a lot of time educating people on how to use Facebook safely and how to develop Facebook apps safely and running their bug bounty program which pays outsiders to find security bugs for them.



A search on security on Facebook mostly comes back with a long list of public security failures, privacy violations and application security vulnerabilities found over the years and continuing up to the present day. Maybe the lack of an effective appsec program is the reason for this.



This is the way Facebook is Developed. Should you care?



While it’s interesting to get a look inside a high-profile organization like Facebook and how they approach development at scale, it’s not clear why this paper was written. There is little about what Facebook is doing (on their front-end development at least) that is unique or innovative, except maybe the way they use BitTorrent to push code changes out to thousands of servers like Twitter does, something that I already heard about a few years ago at Velocity and that has been written about before.



I like the idea of developers being responsible for their work, all the way into production, which is a principle that we also follow. Code reviews are good. Dark launching features is a good practice and has been a common practice in systems for a long time (even before it was called "dark launching"). Not having testers or doing appsec is not good. Otherwise, I'm not sure what the rest of us can learn from or would want to use from this.







Thursday, 22 August 2013

Test Deafness

A few months ago I was talking to a musician friend of mine. He is also Brazilian, same age as me, and moved to the UK on the same year I did. As we were talking about music, I mentioned I like punk rock. He said he likes punk rock as well, but as a musician, he listens to a bit of everything. The conversation was going well until he asked me what my favourite bands were. "Legiao Urbana" is my favourite Brazilian band I said. "Seriously?" he said with a puzzled face. "They are rubbish, man."

As he was a friend (he became an enemy after that stupid comment), I thought to myself: How can I make him suffer for the rest of his life? Luck for him, I remembered that I was not a Brazilian savage anymore. I was British now and I had to act like one: "Oh, are they? Thanks for sharing that with me. Would you mind telling me what your favourite bands are? More tea?" He then named a few bands, including some Forró bands. Shock! Horror!! Blasphemy!!! I really wanted to kill him right there. Fuck the British citizenship. How could a guy, that also liked punk rock music, say that the band I liked was rubbish and then name some stupid Forro bands as his favourite bands?

After quite a long list of swear words pronounced in a very loud tone, I asked him to elaborate. All songs of Legiao Urbana, he said, are played with three or four chords maximum. Their lyrics are great, but they are very poor musicians. The Forro bands are totally the opposite. The lyrics suck but no one cares. Their are great musicians that focus on creating music for people to dance.

That conversation made me realise something really important. If you are a person like me, good music is related to good and strong lyrics. For a musician like my friend, good music is about the techniques used by other musicians when playing their instruments, regardless of the lyrics. For a person that likes to sing, she may appreciate opera, even if she doesn't have a clue about what the words mean.

But what does it have to do with tests?

You cannot expect to produce quality code just by listening to your test. If you don't know what good code looks like, you are pretty much test deaf. Musicians have a trained ear to listen to each instrument individually while the music is playing. They can also imagine, quite precisely, how different instruments could be combined to create new music.

Walking around asking other developers to listen to their tests, as if this advice alone would make them produce quality code immediately, doesn't work. It may make us look smart (or at least feel smart) but it does not really help the other developers receiving the advice. It's too vague.

If we want to produce quality code, we should study the concepts and techniques that lead to it: Domain Driven Design, Clean Code, SOLID principles, design patterns, coupling, cohesion, different programming languages and paradigms, architecture, just to name a few. Once we get a good understanding of all these things, we will have an implied knowledge about what constitutes good code. This implied knowledge is what may cure our test deafness, allowing us to listen to our tests in order to produce quality code. 

Getting Application Security Vulnerabilities Fixed


It’s a lot harder to fix application security vulnerabilities than it should be.



In their May 2013 security report, WhiteHat Security published some discouraging findings about how many application security vulnerabilities found in testing get fixed, and how long it takes to fix them. They found that only 61% of serious security vulnerabilities get fixed, and that on average, it takes 193 days to get them fixed.


Why some vulnerabilities get fixed, or don’t get fixed



Convincing management and the customers paying for software development work – and the developers that need to do the work – that security vulnerabilities really need to be fixed is one part of the problem. Proving that this can be done in a cost effective and safe way is another.



For most organizations, compliance – not operational risk, not customer requirements or other commercial considerations, and not concern for quality – determines whether security vulnerabilities get fixed. WhiteHat customers reported that the #1 reason that a vulnerability gets fixed is because it is required by compliance. And the #1 reason that people don’t fix a vulnerability is because it isn't required by compliance.



One of the other factors that influences how many security vulnerabilities get fixed and how fast they can be fixed is where the system is in its life. It’s a lot easier and much less expensive to fix security vulnerabilities found early in a project, before you’ve written a lot of code that needs to be reviewed, fixed and tested again; when the situation and the system are both plastic enough that you can make course corrections without a lot of time or cost.



Obviously it’s a much different story for legacy systems on life support, where nobody really understands the code or is confident that they can change it safely, and nobody is sure how long the system is going to be around (although these systems almost always hang on longer than anyone expects).



Everything in between is where decisions are difficult to make: the system is already in use and has been for a while, and the team maintaining and supporting it has a full book of committed work to deal with. It can be hard to make fixing security vulnerabilities a priority when things seem to be running fine, and everyone is busy trying to keep it this way, unless maybe compliance is standing over you holding a big enough hammer that you have to do something to show that you are taking them seriously.


What’s it going to cost?



If you can make the case that there are serious security problems that that need to be taken care of, where do you start? A security review could uncover hundreds or even thousands of vulnerabilities – the first time that you do a security scan or pen test of a big system can be overwhelming. How much work is it going to take to “make the system secure”, what is it going to cost?



Denim Group has done some interesting research on understanding how much work is involved in fixing security vulnerabilities.



Like any other bugs in code, some vulnerabilities are easier to find and fix than others. A XSS vulnerability can take anywhere between 10 minutes (stored XSS) to about an hour and a half (stored and reflected) to fix – and most web apps have at least one, often hundreds of these problems. Fixing a SQL Injection problem also takes an hour and a half on average. A missing authorization check? Only 7 minutes. And like any other bugs in code, the coding work is only a small part of the time taken to get the fix done (on average, 30% of the total time). Testing takes around half of the time, and the rest is in getting things setup for making the change, getting the new code built and deployed, and overhead.



Unlike a functional bug, the customer won’t see any immediate advantage in fixing a vulnerability – the code works fine right now as far as they can see. So it’s important that you can fix vulnerabilities without spending too much time or money doing it, and that you can do it without breaking whatever is already working.


This is why Nick Galbreath stresses the value of a Continuous Deployment capability as a pre-requisite to a successful software security program, leveraging Continuous Integration and Continuous Delivery tools and practices so that when developers check in code changes the system is automatically built, tested and it can be automatically deployed if all of the tests pass. It’s not about pushing every code change out immediately – it’s about having a proven pipeline in place for rolling changes out to production quickly and with minimal risk, knowing that you can make fixes and get them out cheaply and with confidence, and that you are able to respond to an emergency if you have to. This will pay dividends outside of security work, reducing the cost and risk of making any software change.


Getting security bugs fixed



Denim Group explains that remediating software security vulnerabilities has to be managed like any other software development project, and they provide some guidelines on how to do it
in a waterfally kind of way, with upfront stakeholder engagement and planning: an approach that can work fine for many organizations, especially larger ones.



A more iterative, Agile approach could start with a short, time-boxed spike. Take a couple of smart developers and give them a couple of weeks to review the list of vulnerabilities (if possible with whoever found them), understand which ones are serious and filter out false positives, and pick some vulnerabilities to fix (a few each of different kinds). They should choose which vulnerabilities to work on by trading off what is easy to understand and fix, against the risk of not fixing them. Tools like the OWASP Top 10 or SANS/CWE Top 25 can help with understanding the issues and making these decisions.



SQL Injections would make a good first choice: a serious vulnerability that is easy to exploit and that can have serious consequences, but also easy for a developer to understand and fix.

Or a missed authorization check: another potentially serious bug that should be easy to understand, trivial to fix and test. A problem like a mistake in secure password storage
might be technically harder to solve, but still easy to isolate and test. Adding server-side validation (instead of validation only at the client) is another easy and good place to start.



It is important that the developers take the time to understand what they are doing and how to do it right: that they understand the vulnerability, why it is a problem, how to fix it correctly, and how to test it (test it to make sure that they actually closed the security hole, and regression test it to make sure that they didn't break anything else by accident). The important thing here isn't to make a few fixes – it’s to learn what’s involved in correctly solving these problems, and know that you can build and deploy the fixed code properly.



Then run another spike. Pick a few other bugs, maybe some that are harder to understand and fix, and some that are easy to fix but less serious (like missing error handling or information leaks), and run through the same steps again.



With a small investment of time like this, you should get an understanding of what work needs to be done, how to do it, what it’s going to cost, and you should also have the confidence that you can do it safely. You should have good enough information to estimate the amount of work that it will take to fix the remaining problems; and a good enough understanding of risk and cost trade-offs that need to be made, what problems need to be fixed – and can be fixed – sooner or later.



Now you can add the remaining bugs that you plan to fix to your backlog. You might decide to fix as many of them as you can all at once in a hardening sprint, or prioritize and fix them with the other work in your backlog.



You can’t fix, or effectively plan to fix, security vulnerabilities until you understand them. Once you understand the problems (what the bugs are, what needs to be fixed and why), and how much it is going to cost to fix them, and once you have the confidence that you can fix them properly, you can treat security vulnerabilities like other bugs – decide what needs to be fixed and when by trading off cost and risk with the other work that you have to get done. Remediation work becomes just another software development problem to be managed, something that developers and managers already know how to do.