The Most Dangerous Phrase in Software Development

Category:

Tags:

#DevOps

#Governance

#Security

Published: February 1, 2025 Reading Time: 6 min

WHen we’re developing software and products, we put in hours writing code and making sure it functions as expected. We often divide the testing of this code into two categories: things that need to be test and things that should work. This seems like a reasonable approach to managing the complexity with limited resources. We are optimistic we know the behavior and outcomes. In reality, however, “it should work” is one of the most dangerous and misleading statements in software development. It often leads to poorly tested, unreliable, and ultimately failure-prone software. Let me explain.

The pitfall of assumed success

The phrase “it should work” is dangerous because it fails to acknowledge the unpredictable nature of software. Even the most experienced developers cannot predict every possible interaction between components, users, and environments. Testing provides a safety net. It identifies flaws before they reach users and verifies that assumptions accurate. Without testing, you’re relying on faith rather than fact.

When we say “it should work,” we are expressing confidence in our code’s ability to perform as intended. It’s important, however, to recognize that the nature of testing is that we must verify any assumptions. Failing to do so can result in a false sense of security and numerous issues. Consider setting a property in a class in a language:

1var config = new Configuration();
2config.SslEnabled = true;

We assume that the underlying framework and compiler will handle the assignment correctly. This assumption, however, is tested as part of developing the framework and the compiler. The assignment is assured to work because it is validated as part of that development lifecycle. It will work. This the confidence in tested code.

Now consider running this code as part of an application that runs on Linux, Windows, and macOS. A common approach is to test the code on one platform (such as Linux), then assume “it should work” on the other platforms. Without testing, how would you know? You might be surprised to find that the code runs on Linux, but fails on Windows. This situation is not uncommon, especially with cross-platform libraries. This is a classic example of overconfidence in untested code. Overconfidence opens the door to bigger issues and unexpected bugs.

There is another case where “it should work” is highly unreliable. When code is built specifically for each environment, it falls into the pit of overconfidence. For example, consider code that is built and tested in a “dev” environment, then promoted to a “QA” environment. During this promotion, the code is rebuilt and retested. If it succeeds, the code is then built for the “production” environment and deployed. This is a common approach in waterfall-based development practices. The problem is that each environment is different. In addition, rebuilding the code will often lead to code that is not identical to the previous build. The code may have conditional compilation that results in the code changing between builds. The third-party dependencies may have also changed or been updated between builds. You’re testing apples, but deploying oranges. In this case, “it should work” will certainly lead to a future failure.

The cost of “it should work”

There are numerous examples of high-profile software failures that can be attributed to the “it should work” mentality. For example, the Boeing 737 MAX was grounded worldwide following two tragic crashes that led to the loss of multiple lives. One of the contributing factors to the crashes was a flaw in the Maneuvering Characteristics Augmentation System (MCAS). The engineers assumed that a single sensor could activate the system safely. In addition, they assumed that this system would make the plane handle like it’s predecessor. As a result, crews could handle hazards similarly without further training. Unfortunately, these assumptions were not validated and the results were catastrophic. Since then, the plane was updated to use multiple sensors and pilots for the plane were required to undergo additional training in flight simulators to understand the MCAS behaviors.

Another example is the initial launch of Healthcare.gov in 2013, the U.S. government’s health insurance exchange website. When the site went live, it was plagued with issues including long load times, the site becoming unavailable, and login errors. In their rush to meet the deadlines, the developers tested at a smaller scale and planned for up to 50,000 users. They assumed that “it should work” when deployed. The load was 5 times higher than expected, causing the site to fail in production. They assumed that any issues could be handled by technicians, but they assumed the shared login component should work. That component failed at much smaller loads, blocking the technicians from accessing the site. It took two more months for the site to become available (and capable of handling 35,000 concurrent users).¹

Perhaps on of the most infamous examples of overconfidence is the case of the Therac-25 radiation therapy machine. It borrowed ts design from the earlier Therac-20, but had some of its hardware safety mechanisms replaced with software controls. The machine reused some modules and code from the earlier machine. It was assumed that since it reused hardware and software from earlier versions, it should work. As a result, the hardware and software were never tested together. They would later discover that the code had multiple race conditions, and the hardware safety features had hidden these defects. In fact, pressing a set of keys in a specific sequence could lead to the machine delivering fatal dosages of radiation that were 100x higher than expected. With the hardware safety features removed, it resulted in the deaths of several patients.²³

A better approach: “it must work”

Instead of it should work, developers need to adopt the mindset of it must work.

Before 1963, the US Navy lost 16 submarines, an average of one submarine every three years. After the loss of the USS Thresher, the navy implemented the Submarine Safety Program (SUBSAFE) quality assurance program. Rather than assessing a probability of risk, decisions are made using only Objective Quality Evidence (OQE) – “any statement of fact, either quantitative or qualitative, pertaining to the quality of a product or service, based on observations, measurements, or tests that can be verified.”⁴ Since implementing the program, only a single submarine (which was not SUBSAFE certified) has been lost.

Instead of it should work, developers need to adopt the mindset of it must work. They need to take proactive steps to ensure software quality through testing, validation, and verification. Instead of using probability to hide risk or assuming that something shouldn’t happen, developers should only rely on objective evidence that can prove their assumptions. If no work was done to test the assumption, there is no basis for trusting the assumption. Testing should be automated and integrated into the development process to catch issues early and often. If bugs are discovered in the wild, additional testing should be created to reproduce and eliminate the issues.

In software development, “it should work” is not a guarantee. It’s a red flag that testing and validation are necessary. Relying on assumptions can lead to costly and dangerous failures. By adopting a mindset that embraces thorough testing and validation, developers can ensure that software works reliably, safely, and as intended. In the world of software, it’s not enough to hope our systems will work. We must be willing to prove it does.

The Failed Launch of www.HealthCare.govthe-failed-launch-of-www-healthcare-gov/) ↩︎
The Therac-25: 30 Years Later ↩︎
Therac-25 (Wikipedia) ↩︎
Engineering a Safe World: Systems Thinking Applied to Safety (Leveson, 2012) ↩︎