Introduction

Legacy code has a bad rap, certainly amongst developers. The coderati often spoke of legacy code with hushed voices and downcast eyes. As a developer, you are afraid to change code you didn’t write, because of the potentially negative side effects of making a change. Fear of change lies at the root of this problem. Simultaneously, legacy code often underlies the company’s core business. Many customers could be affected. Legacy code implies large potential financial repercussions. Largely, I think this is because there is a lack of awareness of the real problem, i.e., a prevalence of coupling.

At risk of overloading my metaphors, I’ll claim it’s even hard to agree what legacy code really is. Probably the most apt visual metaphor about legacy code is that of spaghetti. It’s long. It intertwines. If you pull one strand of spaghetti, half of what’s on your plate moves. There can be many unexpected tastes, in terms of what can be mixed in. At the same time, the spaghetti is still really tasty to the user, and they want it. They look forward to the taste of the pasta, parmesan cheese, homemade tomato sauce-which is nothing like off the shelf Ragu.

Testy, Testy

The most sensible definition that I’ve heard has come from Michael Feathers’ book Working with Legacy Code. Feathers proposes that legacy code is any code that doesn’t have automated tests written against it. Unlike my musings above, it’s a clear, objective, and strict definition. It requires the discipline of writing unit and integration tests, not only as you go, but also on code, whose requirements you don’t understand. Feathers’ book goes on to catalogue a number of refactoring techniques, created for the express purpose of getting code in a test harness. The book is an intellectual tour-de-force, on a rarely covered topic in the programming literature.

Most importantly, though, writing tests doesn’t add new features while taking up developer time. Time always remains a finite resource. Based on the estimates of Oram and Wilson in Making Software from a survey of academic and practitioner research, TDD-which includes unit testing-increases overall project time by 40% on average to write automated tests. The actual figure varies based on a number of factors including: characteristics of the language used, tooling available, and developer skill level.

Let’s be honest. Writing good unit tests requires a good understanding of dependency injection. It takes some time to really understand dependency injection on a “gut feel” level.

This time always needs to be factored in to the cost/benefit analysis. In a nutshell, the argument for unit tests against legacy code is a little twisted: The code is legacy. It takes a long time to make any changes. So let’s write unit tests, which takes even more time both on new features and bug fixes, so that someday when we get over the rainbow, it won’t take so much time anymore. Sounds perfect, as a pitch by an external consultant (like Feathers). <g>

You’d have to have a high tolerance for pain to buy an argument like that. I am a recovering serial marathon runner, so I buy it. Despite my high tolerance for pain, not everyone shares it. But in a real world, deadline-driven, team environment, the whole team needs to buy into it. Even more importantly, you need the budget to accommodate for a 40% increase in the cost of every change, even though it already takes weeks to make seemingly trivial UI-level functionality changes.

Is There a Middle Ground?

Looking at the problem pragmatically, I think I’ve finally found a sensible compromise: refactoring to interfaces. The biggest source of pain, with respect to legacy spaghetti code, is coupling. Coupling refers to when half of the spaghetti moves, when you put it on your fork and tug a little bit. Because making a small change can potentially affect the whole system, it significantly increases the costs associated with maintenance work. Of course, experience mitigates the time cost, but that often comes packaged (dare I say coupled?) with a higher salary.

There are a lot of hidden design assumptions, dependencies, and mis-scoped variables in legacy code; all of this made completely explicit, when you introduce an interface. In C++ code, if you introduce an interface, your code will not compile and link until you remove all of these problems. If the code accesses public variables in conceptually unrelated parts of the code, you have to break this link, so that the interface compiles. You break circular header includes when introducing an interface.

The very act of introducing a new interface is an aggressive act to your code. It’s a showdown with your technical debt. An interface is meant to be a pure concept, not an implementation. It is an “entree”, as opposed to “lasagna”. If something in your implementation doesn’t fit the concept, it means the implementation has unnecessary functionality. In this case, it’s hard to introduce an interface. The remainder should be extracted into a separate class, so that you increase your cohesion. At the same time, the existence of the interface makes sure that the code continues to reflect your understanding of how the code should work.

If you have a hard time creating a test against your code, you are feeling the effects of high coupling. The same goes for introducing interfaces. In fact, introducing interfaces is often a prerequisite for adding tests. Adding an interface reduces the existing coupling in the code.

Inheritance Is a Poor Cousin of Interfaces

There was a time, early in the days of object oriented development, when inheritance was considered a good idea. The main benefit was that it helped you write code in one place, and also to do so relatively quickly. (Personally, I find that hard to believe anyone believed this was true in C++, given the initializer lists you have to rewrite).

The main problem: inheritance is a compile-time dependency. One class needs to know the internals of another. This is a less obvious form of coupling. Do you want to be pulling on spaghetti?

There is a very real risk of negative side effects, where you make a change in the parent class, of say Animal, and it affects some of its child classes unpredictably. Compile times are slower in C++. If anything in the header file changes, all code that includes the header (either directly or indirectly) must be recompiled.

Why Do Interfaces Work..Better?

The key reason why interfaces are so important, particularly in large enterprise software, is that you minimize the potential bad side effects. Each component has a clear scope, defined by interfaces. Ideally, these are actually software entities, such as the interface construct supported in C#. You know, without looking at what’s behind the interface, everything external needs to know about that particular component.

Ideally, the component has a clear sense of purpose, a responsibility in life. It’s much less likely that the component will cause you problems, either on its own, or in relation to other components. The internals of the component are bounded by the interface. It is easy to “black box” the outside world, once you know exactly what input and outputs are expected of the component. These relationships are enforced by the interface, where it serves as a “contract”. It encapsulates the minimum you need to know, thus reducing complexity in data flow.

To return to the lasagna analogy, you can make your pasta using various proportions of flour, eggs, water, as long as the pasta has the following characteristics when raw, according to pasta.go.it:

it must have a uniformly smooth appearance and texture;
no spots or dark shades must be visible when light shines through it;
it must have a clear and unmistakable amber yellow colour;
it must be odourless;
it must taste slightly sweet;
when broken, it must make a dry sound and the fracture must appear smooth and glassy with no air bubbles.

You are welcome to make it however you want, as long as it fulfills those requirements. There is a range of possible options, in terms of how you do this; however, anything or anyone using the pasta will be indifferent to the exact way you created it.

The same happens from the outside looking inwards. As the outside world only cares about the interfaces a component provides, it’s irrelevant what it does to the other components of the system. It reliably takes in one set of ingredients, and outputs either a more refined output, or the final product.

Using the lasagna analogy, if you have a few ingredients for the sauce, and you can reliably add chopped tomatoes, fried onions and mushrooms, and simmer it in a pan for a few hours, you will produce a reliably tasty sauce. From the perspective of the oven, whether you made your own sauce, or used a store bought version, it only has to exhibit certain characteristics to be guaranteed to be useable. In this case, we can be sure that any implementation details of the sauce will not affect any of the other components or the process-because there is a clear, software-based, executable interface enforcing the relationship and the requirements.

How Does This Help You Resolve the “Legacy Problem”?

Legacy code has high coupling. Most the time spent on working with legacy code is higher because of high coupling:

first identifying where to make a change,
then adding the actual code by accessing data from strange places,
followed by extensive additional testing to be sure that you haven’t introduced any unwanted side effects.

If the internal coupling of a system was lower, you wouldn’t waste time. Everything is local to a few classes. These classes lie behind interfaces on both sides, so there can’t be side effects forcing you to spend more time on testing, typically manually if you don’t have an automated testing suite. If you need new data from external sources, you will need to change an interface explicitly. You are forced to make such a decision consciously, which is better than not being sure of potential side effects you may have introduced. All in all, you waste much less time on working with the system, even though you don’t necessarily have a suite of automated acceptance and unit tests.

So basically, even though writing an ever expanding suite of tests may introduce multiple benefits for you, as Feathers argues eloquently, the biggest pragmatic gains from his advice come from the preparatory step: refactoring to interfaces. Once you have a number of interfaces through key points in your system, you gain a number of options: such as dropping whole subcomponents, run-time changes in functionality based on configuration, or having a much easier time adding automated tests and breaking the system down further if you think you will gain more benefits from it.

The business benefits of automated tests on various parts of your code can vary drastically. For example, if you don’t have any automated tests, writing tests on accessors gives you very little benefit. In contrast, writing a happy case automated test through your most frequently used functionality will give you a “safety net”. You can run this test regularly to find out if any new change has caused a regression in the key parts of the code. Once you have the happy case tests on your main functionality, you can add some edge cases, as this should now be much easier to do: your system is not only decoupled, but already in a test harness, so it’s just a question of varying inputs and confirming the outputs are what you expect.

The important thing is: if you use a lot of interfaces, you may not even need significant testing, as the code within each component is trivial enough, that it’s easy to maintain, understand, and modify it. As a result, it’s inexpensive in terms of developer time, regardless of your codebase’s total size.

Going back to the above criteria, your code is not too big to fail any more. Each component is small and self contained. Everyone will be clear on what it’s doing. It doesn’t feel like legacy code to someone unfamiliar with the code, if you group the interfaces and objects in folders or directories. Most importantly, your legacy code can continue moo-ing like a cash cow.

[images: dalboz17, hatm]