Design and Implementation Mistakes - Part 2

Paulo Zemek

4.85/5 (4 votes)

Nov 18, 2014

CPOL

16 min read

8589

Design and implementation mistakes - Part 2

In my previous post (Design and Implementation Mistakes - Mostly .NET), I talked about design/implementation problems, presenting situations in which the method signature was exposing too much, the actual implementation was allowing the use of internal details and, of course, including the part where the users do that kind of stuff, even if by accident.

I ended that post saying that I had more ideas in my mind and now I am continuing that post.

String - The Super Type

See this interface:

public interface ISerializer
{
  void Serialize(string filename, object itemToSerialize);
  object Deserialize(string filename);
}

The idea is that you have a common interface for serialization/deserialization, so you can actually use a binary serialization, XML serialization, JSON serializer, etc. But when I see something like this, I must ask: Why is this interface using filenames? Why not ask for any Stream and allow this code to work with memory-streams, TCP/IP streams, etc?

To me, this kind of error happens simply because it is easier to use strings when we want to use a method to read or write directly to a file and we immediately think about files when we think about saving data. So, we give a name and everything works. But the string version of the method should only exist as a helper to avoid users from having to open/close a file. The real signature must be flexible and allow any stream to be used. So, the interface should be:

public interface ISerializer
{
  void Serialize(Stream stream, object itemToSerialize);
  object Deserialize(Stream stream);
}

In fact, the interface can have the overloads that receive the filename or those can be added by extension methods. Each one has advantages and disadvantages, but I will not cover these in this post.

The conclusion is: Don't use "strings" when you are thinking about files. Use Streams and allow your code to be much more versatile. In fact, don't use strings as the only way to represent anything that's not real text. Encoded strings may be necessary when writing things to a file but they aren't necessary in memory. Use appropriate types for what you want to do and, if needed, add the string overloads.

Thread-safe and read-only

As explained in the previous post, read-only doesn't mean immutable. A read-only interface means that a method receiving an object seen by the interface type is not expected to change it, but such object may change from "the outside" (that is, as soon as the method that received the read-only object finishes executing).

This means that any method receiving a read-only input should not store it in internal fields to use it later, as such object might change. This also means that the object is expected to be "immutable" during the duration of the method that received it. So, implementing a read-only interface by a multi-threaded mutable object is probably a terrible idea, as even if the object will not crash from mutation at random moments, the method receiving the "read only" object may crash by reading the same property twice and receiving different values.

So, instead of something like:

class MutableClass:
  IReadOnlyInterface
{
  // mutable and thread-safe properties that are exposed 
  // as get only in the interface here.
}

It is better to have something like:

class MutableClass
{
  // mutable properties here

  IReadOnlyInterface Lock();
  void Unlock();
}

With this approach, the mutable class is always thread-safe but it is not directly exposed by a read-only interface. When it is expected that it will be given to another method as a read-only object, it should be locked. After the read-only object is given to one (or more) method(s) that should see the mutable object as read-only, the Unlock() must be called.

Actually, this technique could be improved to return some disposable helper that contains the "read-only" view of the locked object. This could look like:

class MutableClass
{
  // mutable properties here

  MutableClassLock Lock();
}

And the MutableClassLock must implement both IDisposable and IReadOnlyInterface or it must be disposable only and have an IReadOnlyInterface property.

Note that by the fact that a new instance is allocated, this approach may become useless for small objects. In that situation, it is simply better to return an immutable copy of the object.

It is also a good practice to make the read-only object invalid when the Unlock() (or Dispose()) happens, or else it could be used as a way to access the mutable object without holding a lock.

From Immutable to Observable

Independently on my insistence that a read-only object is not immutable, that's a normal misunderstanding and I already saw many projects use immutable objects/interfaces everywhere without any description if they were limited to be read-only or really immutable (for example, type definitions composed of read-only properties without anything else).

That wasn't a problem as everybody was considering the objects as immutable and implementing them that way. The problem was the breaking change that was about to happen: Making some of the immutable types become mutable (and sometime even observable)!

Let's explain this situation a little further. Adding new methods to an existing class will not break already existing binary code, as it will not be referencing the new methods, and can only break the compilation of code that references it if there's some kind of ambiguity (like new method overloads that receive reference types when null was used as argument). This creates the idea that "adding is not a breaking change".

Well, adding mutator methods to types that were immutable is a big concept breaking change. Many places may be simply storing the objects counting on the fact that they aren't going to change but... surprise! When that old code reads some important information again, it is different (or simply isn't there anymore).

The biggest problem here is that any existing unit-tests will "pass", as the unit-tests will not be using the new mutator methods. Any code receiving the objects for the first time will continue to work. The errors will only be seen when those objects are mutated and then required again by some code that was already holding them (and in some cases they will not crash, they will simply give wrong results). That's one of the hardest kinds of bug to solve, as the places crashing or working incorrectly actually aren't the places mutating the objects. And the reactions will be like: "This was working before (maybe for years) and no-one changed it! How is it crashing now?"

Multiple Steps

Aside from simply deciding to make an immutable object mutable, the most common way to make this mistake is in "multiple steps".

First, the class is created as immutable. Then a new property is added, and that new property is the only one mutable. Maybe the object is immediately made observable (implement INotifyPropertyChanged), maybe that's added later as WPF (or any other framework) is not showing the property changes. Finally, in the future, someone (probably a different person than the one that created the class) sees the object and thinks "Hey, this object is observable... they forgot to put setters in these properties."

At this moment, everything is still working fine. The bad things will only happen when those new setters start to be used.

Interface Types - What's Their Purpose?

Interfaces can only declare method, property and event signatures without any implementation or fields (member variables, if you prefer).

Many people love it, explaining how they are useful for unit-testing, talking that they allow a more reusable design, Aspect Oriented Programming and lots of other things. Well, at this moment, I will focus on something much more simple, that's the base to make all of the other arguments true: Interfaces don't provide an implementation on purpose, allowing the implementation to be done later and also allowing different implementations to be made.

Allowing the implementation to be done later and allowing different implementations are really the key points to me and that's what I am going to discuss. Surely the rules presented in the previous post still apply, but now I want to talk about problems that are directly related to interface design and that will not happen when using classes.

Bound to an Implementation

See the following interfaces:

public interface ISession
{
  object Get(string sessionVariableName);
  void Set(string sessionVariableName, object value);
  // probably some more methods here.
}

public interface ISessionManager
{
  TimeSpan DefaultSessionTimeOut { get; set; }

  Session CreateNewSession();
  // probably some more methods here.
}

Don't focus on the fact that the ISession and ISessionManager are very small. I only wanted to give an idea of their purpose.

See the ISessionManager interface. The CreateNewSession method is not returning an ISession, it is returning an actual Session class. Such class, that I did not even present, is an ISession, but that doesn't matter as any implementation of the ISessionManager class can't create different kinds of session, they will only be able to create that Session class.

If I renamed the Session class to be InMemorySession the error will be more apparent. Why is an ISessionManager bound to the InMemorySession implementation?

And the answers can be:

A real typo. The person writing the interface was thinking in the session as an idea, but ended-up writing Session instead of ISession and, as there was a class with that name, everything "worked fine" (at least it was compiling);
Bad "interface extraction". Maybe the design started with classes, so there was the Session and SessionManager classes. Extracting an interface from all the signatures inside the Session class worked really fine, creating the ISession interface. But when extracting all the methods from the SessionManager to generate the ISessionManager, the reference to Session wasn't replaced by a reference to ISession;
Simple vice of using classes all the time, even when thinking about a new interface;
Well, any other reason that I am not describing. The human mind can be very creative.

So, how can we avoid these problems?

The best solution I know is to write interfaces first, implementations later. So, an ISessionManager referencing a Session class will not compile because that Session class doesn't exist, forcing the interface declaration to be corrected to return an ISession.

As this can be tricky when refactoring old code that only used classes, I suggest to create new empty library to put the interfaces into and then make the projects being refactored reference the new DLL with the interfaces.

If you don't want separate DLLs, I can try to put a simple rule like: Interfaces should only reference primitive types, strings (when the string represents a real string, not a file name), other interfaces and possibly simple types like enums, structs and small immutable classes. About the simple types, see the next topic.

Simple Types - Enforcing a Contract or Binding an Interface to an Implementation?

If we take the idea of "designing interfaces first, doing implementations later" to the extreme, the interfaces will live in separate DLLs that can only contain interface and delegate types. Nothing else, so they will not reference structs or classes.

The reasons for doing this include allowing everything to be tested, every implementation to be replaced and, of course, the idea that any implementation done at this moment, even if extremely simple, can be bugged and then we will tie interfaces to a simple but bugged implementation.

Well... if we are really extremist, any interface declaration may be "bugged" too. In fact, bad interface design is very common, so I don't see why we can't use simple types (usually immutable classes and structs) to enforce some rules the interfaces themselves can't enforce.

The Code Contracts try to overcome some limitations by allowing us to define some constraints even on interfaces. Yet, code contracts aren't a guarantee as they can be disabled (and most developers never saw them).

So, when we want a guarantee, not a hint, and when we want to avoid magic values, creating really small types can be considered as enforcing a contract instead of binding an interface to an implementation. Unfortunately, the most common guarantee, the non-null, can't be enforced by new types. I really hope a non-null or "required" modifier is created for input and output parameter signatures.

Note: I am not saying to create non-null reference types as this always brings to question "what to do if it is not initialized". I am talking about creating an input/output modifier that can be used even on interfaces, doing an implicit "throw new ArgumentNullException(...right name here...)" when input parameters are null, which could also be compile-time validated for constants and giving a similar guarantee on return types, so we can be sure that different implementations of an interface aren't going to return an invalid null. Never. This could even be JIT optimized, so the ifs for nulls could be in the call site, instead of inside the method, and could be completely avoided if there's a guarantee that the value is not null (like a previous if over the same local variable).

Real Guarantees and Simple Types

Imagine that I am creating a new UI framework and I decided to start with the interface design.

I keep the idea that all components are rectangular or "contained inside a rectangular area" (I don't want to discuss if rectangular controls shouldn't exist anymore, it is part of my sample) and one of the things users are expected to do is to ask a control for their rectangular bounds.

So, I can see a method like IRectangle GetBounds(); as part of the interface.

But I want the following guarantees of the IRectangle interface:

It must not change. Users can't change the bounds of the control by changing the resulting rectangle and the returned rectangle should not be modified if a new function is called over the control moving, resizing or even disposing it;
It must have the properties Left, Top, Width, Height, Right and Bottom (all of type int). How those properties are stored is not important, but the rule that Left + Width equals Right and Top + Height equals Bottom should be kept;
Width and Height can't be less than one.

How Can We Guarantee All Those Traits?

And the answer is that with interfaces, we can't. We can use work-arounds like receiving the bounds and validating the Width and Height everytime. I can even see some helper methods created in the implementation to validate the rectangle but the helper will not solve any mutation issues, except if it is making a copy. So, why not start with that "copy"?

If we create structs or classes, we can give much more guarantees. In fact, even a struct can't give all the guarantees when multi-threading is involved, as an "empty" rectangle may have its contents replaced by a valid rectangle while another thread reads it, reading a partially empty, partially filled instance. With a class, you either see the old reference or the new one, without "partial" situations.

And it is not hard to create such class. Here it is:

public sealed class Rectangle
{
  private Rectangle(int left, int top, int width, int height)
  {
    Left = left;
    Top = top;
    Width = width;
    Height = height;
  }

  public static Rectangle CreateUsingWidthAndHeight(int left, int top, int width, int height)
  {
    if (width < 1)
      throw new ArgumentOutOfRangeException("width");

    if (height < 1)
      throw new ArgumentOutOfRangeException("height");

    return new Rectangle(left, top, width, height);
  }

  public static Rectangle CreateUsingRightAndBottom(int left, int top, int right, int bottom)
  {
    return CreateUsingWidthAndHeight(left, top, right-left, bottom-top);
  }

  public int Left { get; private set; }
  public int Top { get; private set; }
  public int Width { get; private set; }
  public int Height { get; private set; }

  public int Right
  {
    get
    {
      return Left + Width;
    }
  }
  public int Bottom
  {
    get
    {
      return Top + Height;
    }
  }
}

To avoid confusion, I used the factory methods CreateUsingWidthAndHeight and CreateUsingRightAndBottom. Looking at the CreateUsingWidthAndHeight, you can see that Width and Height can't be less than 1. Looking at the Right and Bottom properties, you can see that they actually use Left + Width and Top + Height. The CreateUsingRightAndBottom actually uses the CreateUsingWidthAndHeight, so the Width and Height are validated.

All the properties are "get; private set;" and only the constructor is setting them, this means the Rectangle type is immutable.

As you can see, a real class is giving real guarantees. Of course, unsafe code can always kill those guarantees, but let's focus only on normal usage. It will work. The focus of this API is not to have different rectangle implementations, is to have a replaceable UI framework architecture. A Rectangle can exist as a class without problems and, if performance is a problem, even if it is not pre-storing Right and Bottom, it is probably faster than using the virtual dispatch of interfaces.

The null Value

Now, there's one other problem. What about the null value?

In .NET, we can't enforce non-null reference-types. Even if a language presents that feature, it is probably enforcing this by adding ifs in input/output as the .NET IL doesn't understand non-null references. This means that an interface created in a language that supports non-null reference types can still return null if it is implemented by another compiler targetting the .NET and this same null value can be seen when used by any language that doesn't enforce the non-null. So, there's no magic here, null is always a valid value.

Returning to the UI framework idea. Should an invisible control have bounds?

If we look at WPF, the invisible controls still have bounds but the collapsed controls don't have. I can consider the same idea in my framework and, as our Rectangle type doesn't support Width or Height of zero, this means that we must use something else. Putting an IsEmpty property on the Rectangle itself is only increasing its responsibilities. It is better to represent the lack of a rectangle existence by not having one at all instead of having one that says "I'm empty".

null already represents the lack of an object and, as I am used to it, I could accept that null is a valid result. We can say that null is a kind of magic value, but it is the most known magic value, and it is used to represent lack of an object, which is perfectly fine for an invisible collapsed control.

The Optional Solution

I actually can live with null meaning the lack of result. The problem is that a method named GetBounds returning a Rectangle class doesn't appear to be able to return null. One of the solutions is to change the method name to, for example TryGetBounds. That "Try" makes it clear that it may fail and, as there's no out parameter, it can be assumed that the null result means the failure/lack of value.

Some developers don't like that solution, preferring to have an Optional struct. An input or output of type optional makes it pretty clear that the value may not be provided. And in many cases, a null can be used to initialize that optional type. It isn't really enforcing anything, it is simply there to make things clear. If you don't use the optional type, a null can still be returned, but that can be considered an error.

Well, I will not say one is better than the other. As long as it is clear from the method signature that it may or may not return a value, it is OK for me. So, if the method doesn't return an optional type and the name doesn't say that either, the best is to consider that nulls are invalid and always validate input/output values that are considered required. The worst thing is to have a mixed solution where half the parameters and returns types can return null and the other half can't without any clear indication in which category they are.

The End... or Not

The null situation is only one of many of the non-enforced guarantees that we must deal when coding. Unfortunately, interfaces and method signatures don't give us lots of guarantees but we can still live with it by creating our own rules, allowing us to make some assumptions, thanks to the names or even the documentation of some methods.

I want to explore that further but, again, this post is getting long and I believe it will lose focus if I continue in that direction. So I will stop it right here and maybe someday I will post something about how to deal with those non-enforced guarantees. I am saying maybe because I can simply change focus and start posting things about completely unrelated topics, so I am not promising anything.

I hope this post was helpful to understand some problems and to give an idea on how to solve them, as well as to give an idea about when interfaces can or can't be bound to actual classes.

CodeProject