In my last post, I talked about Code Ownership models, and why you might want to choose one code ownership model (strong, weak/custodial or collective) over another. Most of the arguments over code ownership focus on managing people, team dynamics, and the effects on delivery. But what about the longer term effects on the shape, structure and quality of code – does the ownership model make a difference? What are the long-term effects of letting everyone working on the same code, or of having 1 or 2 people working on the same pieces of code for a long time?
Collective Code Ownership and Code Quality
Over time, changes tend to concentrate in certain areas of code: in core logic and in and behind interfaces (listen to Michael Feathers’ fascinating talk Discovering Startling Things from your Version Control System). This means that the longer a system has been running, the more chances there are for people to touch the same code.
Some interesting research work backs up what should be obvious: that the people who understand the code the best are the people who work on it the most, and the people who know the code the best make less mistakes when changing it.
In Don’t Touch my Code!, researchers at Microsoft (BTW, the lead author Christian Bird is not a relative of mine, at least not a relative who I know) found that as more people touch the same piece of code, it leads to more opportunities for misunderstandings and more mistakes. Not surprisingly, people who hadn't worked on a piece of code before made more mistakes, and as the number of developers working on the same module increased, so did the chance of introducing bugs.
Another study, Ownership and Experience in Fix-Inducing Code tries to answer which is more important in code quality: “too many cooks spoil the broth”, or “given enough eyeballs, all bugs are shallow”? Does more people working on the same code lead to more bugs, or does having more people working on the code mean that there are more chances to find bugs early? This research team found that a programmer’s specific experience with the code was the most important factor in determining code quality – code that is changed by the programmer who does most of the work on that code is of higher quality than code written by someone who doesn't normally work on the code, even if that someone is a senior developer who has worked on other parts of the code. And they found that the fewer the people working on a piece of code, the fewer the bugs that needed to be fixed.
And a study on contributions to Linux reinforces that as the number of developers working on the same piece of code increase, the chance of bugs and security problems increases significantly: code touched by more than 9 developers is 16x more likely to have security vulnerabilities, and more vulnerabilities are introduced by developers who are making changes across many different pieces of code.
Long-term Effects of Ownership Approach on Code Structure
I've worked at shops where the same programmers have owned the same code for 3 or 4 or 5 or even 10 years or sometimes even longer. Over that time, that programmer’s biases, strengths, weaknesses and idiosyncrasies are all amplified, wearing deep grooves in the code. This can be a good thing, and a bad thing.
The good thing is that with one person making most or all of the changes, internal consistency in any piece of code will be high – you can look at a piece of code written by that developer and once you understand their approach and way of thinking, the patterns and idioms that they prefer, everything should be familiar and easy to follow. Their style and approach might have changed over time as they learned and improved as a developer, but you can generally anticipate how the rest of the code will work, and you’ll recognize what they are good at and what their blind spots are, what kind of mistakes they are prone to: as I mentioned in the earlier post, this makes code easier to review and easier to test and so easier to find and fix bugs.
If a developer tends to write good, clean, tight code, and if they are diligent about refactoring and keeping the code clean and tight, then most of the code will be good, clean, tight and easy to follow. Of course it follows that if they tend to write sloppy, hard-to-understand, poorly structured code, then most of it will be sloppy, hard-to-understand and poorly-structured. Then again, even this can be a good thing – at least bad code is isolated, and you know what you have to rewrite, instead of someone spreading a little bid of badness everywhere.
When ownership changes – when the primary contributor leaves, and a new owner takes over, the structure and style of the code will change as well. Maybe not right away, because a new owner usually takes some time to get used to the code before they put their stamp on it, but at some point they’ll start adapting it – even unconsciously – to their own preferences and biases and ways of thinking, refactoring or rewriting it to suit them.
If a lot of developers have worked on the same piece of code, they will introduce different ideas, techniques and approaches over time as they each do their part, as they refactor and rewrite things according to their own ideas of what is easy to understand and what isn't, what’s right and wrong. They will each make different kinds of mistakes. Even with clear and consistent shared team conventions and standards, differences and inconsistencies can build up over time, as people leave and new people join the team, creating dissonance and making it harder to follow a thought through the code, harder to test and review, and harder to hold on to the design.
Ownership Models and Refactoring
But as Michael Feathers has found through mining version control history, there is also a positive Ownership Effect on code as more people work on the same code.
Over time, methods and classes tend to get bigger because it’s easier to add code to an existing method than to write a new method, and easier to add another method to an existing class than create a new class. By correlating the number of developers who have touched a piece of code with method size, Feathers research shows that as the number of developers working on a piece of code increases, the average method size tends to get smaller. In other words, having multiple people working on a code base encourages refactoring and simpler code, because people who aren't familiar with the code have to simplify it first in order to understand it.
Feathers has also found that code behind APIs tends to be especially messy – because some interfaces are too hard to change, programmers are forced to come up with their own workarounds behind the scenes. Martin Fowler explains how this problem is made worse by strong code ownership, which inhibits refactoring and makes the code more internally rigid:
In strong code ownership, there's my code and your code. I can't change your code. If I want to change the name of one of my methods, and it's called by your code, I've got to get you to change the call into me before I can change my name. Or I've got to go through the whole deprecation business. Essentially any of my interfaces that you use become published in that situation, because I can't touch your code for any reason at all.
There's an intermediate ground that I call weak code ownership. With weak code ownership, there's my code and your code, but it is accepted that I could go in and change your code. There's a sense that you're still responsible for the overall quality of your code. If I were just going to change a method name in my code, I'd just do it. But on the other hand, if I were going to move some responsibilities between classes, I should at least let you know what I'm going to do before I do it, because it's your code. That's different than the collective code ownership model.
Weak code ownership and refactoring are OK. Collective code ownership and refactoring are OK. But strong code ownership and refactoring are a right pain in the butt, because a lot of the refactorings you want to make you can't make. You can't make the refactorings, because you can't go into the calling code and make the necessary updates there. That's why strong code ownership doesn't go well with refactoring, but weak code ownership works fine with refactoring.
(Design Principles and Code Ownership)
Ownership, Technical Debt or Deepening Insight
An individual owner has a higher tolerance for complexity, because after all it’s their code and they know how it works and it’s not really that hard to understand (not for them at least) so they don’t need to constantly simplify it just to make a change or fix something. It's also easy for them to take short cuts, and even short cuts on short cuts. This can build up over time until you end up with a serious technical debt problem – one person is always working on that code, not because the problem is highly specialized, but because the code has reached a point where nobody else but Scotty can understand it and make it work.
There’s a flip side to spending more time on code too. The more time that you spend on the same problem, the deeper you can see into it. As you return to the same code again and again you can recognize patterns, and areas that you can improve, and compromises that you aren't willing to accept any more. As you learn more about the language and the frameworks, you can go back and put in simpler and safer ways of doing things. You can see what the design really should be, where the code needs to go, and take it there.
There's also opportunity cost of not sticking to certain areas. Focusing on a problem allows you to create better solutions. Specifically, it allows you to create a vision of what needs to be done, work towards that vision and constantly revise where necessary... If you're jumping from problem to problem, you're more likely to create an inferior solution. You'll solve problems, but you'll be creating higher maintenance costs for the project in the long term.
Jay Fields Taking a Second Look at Collective Code Ownership
So far I've found that the only way for a team to take on really big problems is by breaking the problems up and letting different people own different parts of the solution. This means taking on problems and costs in the short term and the long term, trading off quality and productivity against flexibility and consistency – not only flexibility and consistency in how the team works, but in the code itself.
What I've also learned is that whether you have a team of people who each own a piece of the system, or a more open custodian environment, or even if everyone is working everywhere all of the time, you can’t let people do this work completely on their own. It’s critical to have people working together, whether you are pairing in XP or doing regular egoless code reviews. To help people work on code that they’ve never seen before – or to help long-time owners recognize their blind spots. To mentor and to share new ideas and techniques. To keep people from falling into bad habits. To keep control over complexity. To reinforce consistency – across the code base or inside a piece of code.
No comments:
Post a Comment