In my years as a software engineer I was always drawn to the shiny new things. But time and time again I got confronted with code ridden with technical debt. If you are working in tech you probably heard of technical debt. For product managers it is the sword of Damocles alluding delayed projects and rejected feature requests. For engineers it can be a tremendous source of frustration and reason to quit jobs and move on. Why is technical debt so widespread and why is it so hard to beat?
I have finally come to terms that fighting technical debt is and will be part of my career both from a product management as well as an engineering perspective. This article aims to help product managers and technical leads to understand and measure technical debt. I will go over the different types of technical debt and common metrics and approaches to measure it. I hope I can give you a fresh perspective to look at technical debt, one that considers both engineering and business.
Defining technical debt
Technical debt is a deficiency in quality which makes it harder to modify or extend the affected component. There are different types of technical debt. While it is often described in discrete categories I like to think of it more like fractals. If you zoom in wide enough you are dealing with local technical debt affecting single classes or methods. If you zoom out far enough you get a view on global or systemic technical debt. Here is a quick summary of those two extremes:
Local technical debt
This type of technical debt is isolated to a very small part of the system. This can range from code smells to the design or interface of single classes.
Local technical debt is generally less crippling to an organisation. This type of technical debt can be tackled on an individual level by shifting to a continuous improvement mindset and adapting industry best practices. If you’re looking for some quick input on how to tackle local technical debt have a look at the two principles don’t live with broken windows and the boy scout rule. There will be a follow-up article on how to tackle and prevent technical debt which will go into detail of these and other concepts.
Global technical debt
Technical debt on the global level is not isolated to a small part of your code base but instead affects a whole (sub) system. This kind of technical debt stems from building on ill-suited technologies, bad system design or architecture as an afterthought. It’s the accumulation of taking shortcuts all the way from understanding your use cases, planning the architecture and implementation.
Measuring technical debt
Much of what you can find on measuring technical debt focusses on technical metrics. It is important to use such metrics in your analysis. However, they must not be your only step. Without tying them to the impact you will not be able to form a coherent problem statement and will inevitably fall into the trap of Goodhart’s law. The negative impact of technical debt is twofold. The first one is cost. How much money is your organisation losing due to technical debt? The second one is risk. What is the possible future damage the technical debt could cause?
Assessing the cost of technical debt
It’s often easy to say that technical debt is costing you money. But how do you assess that? First you have to gather data. Your debt ridden components could have longer response times, higher error rates or downtime which could lead to a loss of business value. Luckily much of the research was done already. If you struggle to calculate the true cost because there are too many variables a Monte Carlo simulation can help you to find a realistic range of the true cost.
Impact on developer productivity
In software companies which do not sell tangible goods the biggest cost factor is often payroll, especially that of engineers. But again the slow-down of technical debt is not directly visible. It takes active effort to measure it. You probably use a system to manage your tasks like Github issues, Asana or Jira (sorry, pal). Start labelling your tasks when you had to pay interest on your technical debt in one way or another.
I am not just talking about directly touching it by editing its code. If you have to build your feature “around” the troublesome component or take extra measures of testing, simply add a label your ticket in your project management tool. You can also add such labels to your bug reports. The level of granularity – your zoom level – matters here. If you have one big Rails monolith adding a label to every single ticket will not give you any meaningful data. Instead you could identify a certain component within that app which causes difficulties. Similarly, in a distributed system it could be one service in a request flow which causes the pain.
Another neat approach to track technical debt is a tech debt board. I already wrote about this in one of my TILs. Every time you encounter technical debt add the component and a short issue description to the board.
Over time you will get a nice overview of which components cause the biggest pain.
Compare and conquer
In a second step you can compare the cycle time (the time it takes from starting to releasing a given task) of labelled tickets with unlabelled tickets.
Consider this as an example: Your organisation has a gateway service. This service handles all outside requests. But developers complain that extending the gateway is difficult. You often run into regression bugs, releases are difficult and so is manual testing. By labelling tickets which encountered any of the problems you now see: A ticket which touched the gateway takes on average 72 hours from starting the task to releasing it. Tickets which do not have to deal with the gateway take on average 45 hours. If your average engineer costs you $100/hour you might be losing $3.700 on every task. Obviously taking only this one metric can lead to many flawed conclusions. We will address this later.
See the forest for the trees
If you want to explore more technical metrics to measure technical debt you can explore cyclomatic complexity, test coverage, coupling or documentation coverage. However I think you should not get too hooked on those metrics unless you can make a strong case to tie those to developer productivity. Simply having a method with high cyclomatic complexity does not cost you anything. Only when this method has to be touched again will you pay the interest on the tech debt. So to assess the impact of developer productivity you will also have to pull information from your version control system on how often the method is touched. This talk goes into detail how you can identify such hotspots of frequently touched technical debt.
Visualise technical debt
Sometimes measuring along the way is not enough. If you want to actively hunt for the troublesome parts of your organisation’s code base start visualising the code. Map the communication between components or draw their interaction sequentially on a timeline. Tools like Zipkin or NewRelic’s service maps do big parts of this for you already. Besides such automated visualisations you can also get inspired by some domain driven design techniques. Visualising the code and its dependencies often clearly highlights the pain and impact it has on other parts of the organisation.
Global technical debt carries big risk since it is often entangled in an integral part of your business. So when we talk about measuring technical debt we also need to measure the risk.
Long tail risks of technical debt
What if the last developer familiar with the legacy system leaves the company? Does you technical debt open your software or hardware to security vulnerabilities which could cause damage to the business and customers? Could you run into irrecoverable data loss when something catches fire?
By framing technical debt under the perspective of risk (or future losses) you can create a solid business case for it. If you can’t, chances are that piece of technical debt is more about style and personal opinion anyway.
So start brainstorming your own maximum credible accidents. Then explore the cost of each such incident by measuring the loss of business or developer productivity. Security vulnerabilities could lead to legal liabilities and damages to the brand. Onboarding of new engineers might take two weeks longer due to the code complexity. This hinders organisational growth which is critical for startups and publicly traded companies alike.
Or maybe multiple engineers mentioned the bad code quality in their exit interviews. Your human resources department can probably easily supply you a number how expensive it is to replace a senior engineer. Add to this the productivity lost while you have not found and trained a suitable replacement.
Finding the smoking gun
Be aware of your own bias when looking at the data. As a software engineer you might be inclined to build a case for rewriting this horrible piece of software from that incompetent ex-employee. As a product manager you might be looking for an argument which proves the engineers are just whining over nothing and you can keep running the feature factory.
None of this is a bulletproof way to assess your technical debt. It is about exploring the problem domain. It’s therefore important to use multiple metrics to get a clearer picture. Let’s go back to the example of the gateway earlier: You have assessed that tickets touching the gateway are much slower to be released. But both cyclomatic complexity and test coverage look good.
What if simply all tasks involving the gateway are more complex by nature and the delay is not in fact caused by the technical debt within the gateway? Maybe all it is lacking is decent documentation for new engineers to be able to extend it easily? If you don’t want to mislead yourself you will have to look at the problem from multiple angles.
Express your problem clearly
The problem statement could look like this: Due to the lack of documentation of the gateway service developers spend on average 60% more time on related tasks. This amounts to additional engineering costs of $81.000 to $135.000 every quarter ($100/h, 30-50 tickets per quarter). Extending and streamlining the documentation of the gateway service is an estimated effort of $8.000.
This is a business case that every manager can understand and support. It is likely that you discover multiple problems worth tackling. Define a clear problem statement for each of those. How you can then prioritise these issues is subject to a follow-up article.
From understanding to action
Technical debt is a hard problem to crack and all its nasty details can be overwhelming both for developers as well as product or engineering leaders. But there are ways to get on top of technical debt. In subsequent articles I will explore possibilities on how to prioritise, tackle and prevent technical debt. But first you have to start with understanding your technical debt. I hope to have given you enough insights to start with this.