Last week, the world received an unusual Valentine’s Day gift: Internet Explorer was permanently disabled on most Windows 10 devices. At one point in time, 95% of people relied on this browser. It’s successor, Edge, commands less than 5% today. This comes almost exactly 15 years after the death of its former arch-rival, Netscape Navigator. Like IE, it once commanded over 90% of the market. Interestingly, these both share the same cause of death – technical debt.
Blessed are the young, for they shall inherit the national debt.
– Herbert Hoover
Some history
In 1999, Jamie Zawinski announced the end of his time with the company and that “our flagship product was heading quickly toward irrelevance … I strongly believed that Netscape was no longer capable of shipping products … Netscape was shipping garbage and shipping it late.”. He observed two of the most common issues: it took developers a long time to be able to start to contribute, and the code was too complicated and crufty and hard to modify". Innovation was stifled. As their market share dropped to under 50%, Netscape attempted to rewrite the code and escape their debt. Their market continued to declined. When the new version did release it has “a number of incomplete features and lots and lots of bugs … a large pile of interesting code, but it didn’t much resemble something you could actually use”.
At its peak, Internet Explorer had over 1,000 people and $100M of budget. They propelled innovation, introducing concepts like the iframe, Dynamic HTML, AJAX (via the oddly named XMLHTTP library), font support, and resource preloading. By IE11, Microsoft was struggling to keep up with more agile browsers, such as Chrome. They were supporting 6 browser variants, legacy ActiveX support, and compatibility modes that went back to IE6. As Chris Jackson posted in 2019, “Internet Explorer was optimized for simplicity at the expense of technical debt.” It had become a “compatibility solution” rather than supporting new standards or being competitive. Similar to Netscape, by the time they decided to rewrite, their market share had dropped to 20%.
Tech debt. That’s why we lost the browser wars at Netscape. It needs to be treated as a business continuity issue. If you get the point where you cannot respond to the market, you have a huge problem.
– Marty Cagan, Product Is Hard
They aren’t alone
I recall working with a company which had a rapidly grown in their industry. As the first mover, they had taken the industry by storm. Technical debt in the company had made it difficult to add new features quickly, and inefficient deployment practices limited how often features and fixes were delivered. The CEO of the company heard about an upstart competitor and commented, “It’s just a few really senior software developers and one person from the industry. We have years of expertise, so they’ll never catch up.” That small company practiced Agile techniques, had no technical debt, and had daily updates to production.
Within 1 year, the CEO watched as his company lost its largest customers to the upstart. Within two years, the company had lost many of its largest customers and most of its market value.
Another path
Consider Freshbooks. The original team was inexperienced, creating more than 1M lines of spaghetti code (see a 2018 interview). The company had grown to more than 10M users and 300 employees, but the co-founder has a growing concern that the design problems in the code made it difficult to perform maintenance or to add new features. “To build one thing, you had to fix three things”. As co-founder Mike McDerment explained to Forbes in 2017, the teams had concluded that creating a new version of the product would need 2.5 years. He decided to try a different approach, creating a competing company. They re-envisioned the product and launched an MVP in just four months. After a year, the site was charging customers, and at 18 months, it was officially re-branded and launched as a new version of Freshbooks. This allowed them to try new ideas while still retaining a product that had a significant user base.
If you’re thinking two and a half years, it will take seven.
– Mike McDerment, Freshbooks (Forbes, 2017)
At the start of the project, Freshbooks had $20M in revenue. They raised $30M in capital in 2013, and spent $7M to launch the new project. By 2015, the revamped product was released. By 2017, the company had exceeded $50M in revenue, more than paying for the project. In 2021, it was announced that Freshbooks serves 30M people in 160 companies, has over 500 employees, and had achieved a valuation of over $1B.
In short, they spent 18 months and $7M to solve the technical and design debts, and more than doubled their annual revenue in the following two years. That pattern is not unusual, either. Companies investing in eliminating long-standing debt often find the increased agility leads to a substantial increase in revenues. McKinsey found that companies with the lowest technical debt have revenue growth that is 20% higher than those with high debt and 10% higher than industry averages. The bottom 20% are also 40% more likely to have incomplete or cancelled IT modernizations.
Understanding your debt
Technical debt creates a business continuity issue. Handled improperly or allowed to grow out of control, it can strangle the business operations. Many companies only acknowledge their technical debt when it has made the uncompetitive and given their competitors an advantage. In the early days, technical debt is like taking out a loan for a new business. It accelerates growth and enables new opportunities. At the same time, it’s a loan that must be repaid or it forecloses the business. As it accrues, it becomes a weight that holds the business back. Or an anchor that sinks the ship.
Debt comes in a variety of forms, but we can broadly consider it in two parts. The first is visible debt. This includes bad coding practices, immature teams, inadequate testing, and lack of time for continuous improvement. It grows the fastest when companies prioritize new features over developer concerns and inexpensive resources over experienced teams. Visible debt is something that the development teams can readily identify and describe.
The second type is hidden debt. This is increased IT costs, lost opportunities, growing mean time to recovery (MTTR), and higher production failure rates. All of these items drive away customers, decrease revenue, and increase costs. Hidden debt typically represents 20-60% of every dollar spent on IT and development codes. Companies may hire more employees (or offshore) to try to offset these costs, but over times it has diminishing returns. In fact, productivity generally decreases and failure rate increases as further technical debt accumulates.
Some of the costs related to hidden debt include person-hours spent in a month on identifying or correcting defects, time spent between “code complete” and the push to production, and the time a defect remains open. In extreme cases, companies can see 80% of their developer time spent in these activities! That is equivalent to a $150K developer only completing $30K of work! In that case, fixing the problem can replace the need to hire 4 additional full-time developers. That’s quite the cost savings.
A final hidden cost to consider is the loss of key staff and talent. Developers are often motivated by personal growth and healthy challenges. In solving the problems, they gain a large volume of expertise and understanding. When companies are stuck dealing with technical debt, top talent will frequently leave. In addition to the reduced work capability, exiting staff members take with them knowledge of the system. In addition to the rising costs for hiring a new developer, there’s an additional expense for training the replacement and enabling them to be productive. At the same time, the loss of knowledge often increases the time required to complete any work.
In short, the various debts accrue interest and the price can be quite high.
Paying down debts
The first and most important step is to minimize creating new debt. Technical debt has an exponential growth curve. As more debt accumulates, it builds on the earlier debt. This makes it difficult to unravel. In fact, time requirements typically increase by 1.6x - 2x for each sprint that passes. Consider limiting the number of bugs or open issues that can ship in a release. If the total number of open items exceeds that limit, new feature work should pause until the debt is paid down. Treat today as the baseline – zero – and monitor from there. To make faster progress, set aside a portion of each sprint to eliminate at least one long-standing issue.
Second, monitor the development process. More specifically, examine the time from when coding is complete to when a release occurs. Also, examine the number of production issues that arise and how long those take to remediate. This helps you to see the progress you’re making towards improving your ability to deliver. Code quality is NOT measured by the amount of code coverage. It’s measured by how many production issues you’re seeing!
Finally, don’t try to jump into rewriting all of the code. While it’s true that there are times where a rewrite is needed, it’s an expensive undertaking that creates very high risks. In most cases, its more efficient to peel off pieces of the code for repair. Repairing incrementally is often less expensive. Most companies can’t take 18 months to start a parallel rebuild (and those that do often end up with two parallel sets of issues). They need to think incrementally. Dedicate a percentage of every release to eliminating the debts. The industry norm is typically 20% - 30% until the issue is under control.
Toppling monoliths
A traditional monolithic codebase can be overwhelming. You can often take one feature or component and refactor that into a standalone package. That package can then be improved with unit tests and made into a self-deployable unit. By reducing the tight coupling, you reduce the build and test times. You also make it easier for others to understand and work with the code. Making it deployable or shareable enables it to be used, while adding lightweight unit tests ensures that the functionality is correct and is not broken over time. Repeated over time, the system can eventually be broken down into more manageable pieces.
The initial goal is not to migrate the code to a new technology or approach. In fact, doing that can introduce additional risks that are not always a good time investment for a growing business. The goal is to make the code easier to grow and maintain while breaking up existing monolithic architectures and deep dependencies. While you may eventually choose to rebuild or replace parts of the code, that should be a later step. Done correctly, you’ll have a battery of tests to ensure further changes don’t break existing expectations.
In short, don’t aim for a “big bang” change. The goal is small, fast iterations. It’s always better to have completed features than dozens of incomplete ones, so start small and focused. It may take some time, but the process moves forward more rapidly than most teams realize. The small changes increase capacity, creating additional headroom over time.
Modeling the costs
In the next post, I’ll explore how to build financial models that consider these costs. Until then, happy DevOp’ing!