More Unit Testing with Less Code - Combinatorial Unit Testing

Evgeny N

4.85/5 (13 votes)

May 30, 2016

CPOL

8 min read

31105

158

Combinatorial unit tests

Download source code - 8.3 KB

Introduction

I love unit tests. They are awesome, quick, reliable, isolated, easy to read and easy to write. And if your unit tests are not - you still have a space to improve. But this article is not about unit testing and its goodness, it is rather about combinatorial unit testing and how to cover more with less efforts.

Background

Within this article, I assume you are familiar with concepts of lamdas and anonymous types in C#, as well as you are confident with enumerable and enumerators. I also assume you have a certain experience in unit testing. I'm referring to NUnit as a testing framework just to describe some of the technical aspects of unit testing, as well as it was used to perform some internal assertions that are easily replaceable by almost any other assertion framework like Shouldly or FluentAssertions.

Combinatorial Tests are Important

So what are combinatorial tests? Combinatorial tests are those tests that provide test cases for all possible combinations of the individual data items provided for the parameters of a test. To rephrase this, these are tests to verify the outcome regardless of the combination of the given data. These tests are frequently used to ensure that there is no correlation between the provided arguments and behavior is consistent. They are quite useful, e.g., to verify that there is no certain logic around strings (null/empty/white space/string with human-readable text/string with trailing spaces, etc.) or, for example, to verify serialization roundtrip - when your Data Transfer Object should not be perfectly serializable and deserializable, back and forth, especially when you use custom serialization engine, which would require helper attributes (e.g., protobuf-net with ProtoMember attributes), where it is so easy to miss something.

Combinatorial Tests are Pain

Now imagine the situation where you would like to test this kind of constructor:

public void SomeConstructor(string stringArg, long longArg, double doubleArg)

For string argument, I would test at least: null, empty string, white space, non-white space;
For long argument, I would test at least: long.MinValue, long.MaxValue, -1L, 0L, 1L;
For double argument, I would test at least: double.MinValue, double.MaxValue, -1.0d, 0.0d, 1.0d;

This gives me a (4 string combinations) x (5 long combinations) x (5 double combinations) = 100 combinations. Not really a heavy load if we would run them in for/foreach-loop, and even in parallel, however having this introduced as e.g., TestCaseSource for NUnit, it will generate a 100 of test cases, and each of them will add a significant extra management overhead:

NUnit will have to generate all these test cases, which will be wrapped in TestCaseData;
For every test, it will have to call SetUp and TearDown;
And every test will be executed sequentially;

Situation will get worse quickly as the number of potential values increases - from practical experiments, a 100K test cases would make NUnit to "prepare" for several minutes.

So I started looking for ways to have the same test cases described in a primitive, short way, and have these test cases created nearly instantaneously.

Theory

I identified the following goals:

Combinatorial unit test should be test framework-agnostic. That means that I should not extend any specific framework functionality, by example, implementing custom attributes/interfaces.
Combinatorial unit test should be self descriptive. That means that I should naturally read the combinations and see the test itself, so I could quickly understand what the test is doing and what kind of test cases are considered as input data.
It should be low-ceremony. Minimize the number of hiccups to get the stuff running. Description part should not be longer than the test part.

Once I abstracted myself from the implementation and started treating my code from the "client" perspective, I suggested a couple of syntactical constructions that could work. Thinking a little bit more, I decided to stop on the one like that:

Combinations
    .Compose(x => new
    {
        Greeting = x.Only("Hello", "Howdy", "GDay"),
        Participant = x.Only("John", "James", "Bob")
    })
    .RunInParallel(test =>
    {
        Console.WriteLine("{0}, {1}", test.Greeting, test.Participant);
    });

This looked quite logical to me, there are two clearly separate parts:

The declaration part is exposed by Compose method. This method expects a lambda that will describe the type-safe test case with values suggested for every parameter. Type safety is highly important during the refactoring, as it helps to ensure the type consistency between declarative part and executive part. So I was reading this as "Compose test case as a combination of Greeting parameter taking Only "Hello", "Howdy" and "GDay", and Participant parameter taking Only "John", "James" and "Bob"".

The test part is exposed by RunInParallel method. This method expect a lambda that will describe the test itself. The lambda provides test argument that will give an access to a specific test case data. With the given example, test.Greeting value should be either "Hello", "Howdy" or "GDay" and test.Participant should be either "John", "James" or "Bob".

The declaration overhead is minimal, the only question is how to implement it.

Implementation

Compose method provides an entity of a certain type which is used to describe the sequences. I call this entity a Combinator - an entity that has a list of declared sequences and methods to populate those sequences. The Combinator type was made public to be accessible for end user, but it was declared sealed as I do not expect any inheritance, as well as its constructor was made internal assuming that the client shall not create instances of this type explicitly. The list of sequences is private and the sequence itself is some sort of enumerable sequence.

public sealed class Combinator
{
    private readonly List<IEnumerable> sequences = new List<IEnumerable>();

    internal Combinator()
    {
    }
    
    /* Other stuff */
}

According to the example above, I expect that Combinator will contain the Only method accepting a list of values representing a sequence of specific type. The return type of this method is used to define the property type in the anonymous class, so apparently the method should be generic. But what about return type, what value the method should return? It is not really important, as that value will never be used. What is really important is to add the given list of item as a sequence to a private collection of sequences. I also decided to adjust the method signature to require at least one item, and any number of extra items using params - this will prevent the invocations with no items (empty sequences).

public T Only<T>(T atLeastOne, params T[] orAnyNumberOfOther)
{
    sequences.Add(new[] { atLeastOne }.Concat(orAnyNumberOfOther).ToArray());
    //// Returning the stub.
    return default(T);
}

To summarize the above, Combinator is created by Compose method and passed to its lambda, where it is used to declare and preserve sequences and identify the test case anonymous type with all properties having a correct type. The assumption being made here and further is that the order of sequence declarations and anonymous type properties order matches.

At this point, we have sequences as "flat" enumerable, however to perform the combinatory test, we will have to generate another "flat" sequence with all possible combinations. That is why Combinator exposes yet another internal method called Yield for that purpose:

internal IEnumerable<T> Yield<T>() 
{
/**/
}

The complete implementation of this method is available in a source code attached to an article and quite long to put it here, but key highlights would be:

T is an anonymous type. It is exactly the same anonymous type that is produced by Compose method. In reality, anonymous types are compiler-generated types, so there will be specific "unnamed" type generated by compiler during the compile time, which will have a constructor accepting all values for all properties declared by anonymous type in the order of declaration. Keeping this in mind, it is quite easy to use activator to create instances of the anonymous type.
This method returns IEnumerable<T>, so we could utilize yield keyword to generate instances on-demand. This fact reduces preparation overhead, especially within parallel run scenarios.
Original sequences are always enumerated at least once, that is why it is important to store them as arrays or collections (rather than other enumerables evaluated on-demand).
An implementation heavily exploits enumerator instances of the given enumerables in sequence as they give power to quickly reset them or access the currently iterated values.

Once Yield part is done and we have a final sequence enumerable, life gets significantly easier as we just need to iterate through the sequence and call the test method with the given combination. This could be done either sequentially using plain foreach loop, or in parallel using, e.g., Parallel.ForEach.

Extensibility

Provided structure is very open for extension. As an example of the extension, let's consider the example with string, which is usually quite repetitive in combinatorial tests. When verifying a certain argument of a constructor or method parameter of type string, developers tend to use helper methods like string.IsNullOrEmpty(...) and string.IsNullOrWhiteSpace(...), which normally makes sense to verify with combinatorial tests as well. I will give the following example implementation for the reference:

public string NullEmptyAndWhiteSpace()
{
    sequences.Add(new object[] { default(string), string.Empty, " ", "\t" });
    //// Returning the stub.
    return default(string);
}

The sequence is represented as null, empty string, single space string and string containing tab. From my practical experience, the case with tab is usually forgotten, however still has to be considered. Like with any other sequence declaration methods, the return value is irrelevant, but its type is not, that is why return type is string and default(string) is returned. For further exercise, try adding the sequence for doubles, and don't forget to include extreme cases like double.NaN, double.PositiveInfinity and double.Epsilon.

History

Version 1.0 - Initial publication
Version 1.1 - Added source code repo URL