Click here to Skip to main content
15,031,687 members
Articles / Programming Languages / C#
Article
Posted 30 Jan 2021

Tagged as

Stats

3.1K views
51 downloads
4 bookmarked

A Performant Items in List A that are not in List B

Rate me:
Please Sign up or sign in to vote.
3.88/5 (7 votes)
30 Jan 2021CPOL3 min read
Performant Items in List A that are not in List B
Comparing two lists with value selectors, implemented as an extension method, that has the similar performance as the Except Linq method.

Contents

Introduction

After reading some Stack Overflow posts on how to find items in List A that are not in List B, and the performance issues of some of the solutions (and encountering said performance issues in real life), I decided to look at how to create a performant "not in" method that meets the requirements that I have, namely that I'm not comparing simple value types, but rather a property (or properties) within a different classes.

Note that I've ignored the whole "override Equals and GetHashCode" concept because it doesn't add anything to the solution.

The Test Setup

My test setup creates 5000 integer entries, 0-4999, and another 5000 entries, 5000-9999, in a couple lists so I can test both direct value comparison and values contained as a property in a simple test class.

Given:

C#
class A
{
    public int Value { get; set; }
}

class B
{
    public int Value { get; set; }
}

and:

C#
static List<A> listOfA;
static List<B> listOfB;
static List<int> listA;
static List<int> listB;
static int samples = 5000;
static int repeat = 10;

The test cases are set up as:

C#
<a>public static void CreateTestData(int samples)
{
    listA = new List<int>();
    listB = new List<int>();
    listOfA = new List<A>();
    listOfB = new List<B>();

    samples.ForEach(n => listA.Add(n));
    samples.ForEach(n => listB.Add(n + samples));

    samples.ForEach(n => listOfA.Add(new A() { Value = n }));
    samples.ForEach(n => listOfB.Add(new B() { Value = n + samples }));
}

Running the Tests

Several different implementations are tested for their performance:

C#
static void Main(string[] args)
{
    CreateTestData(samples);
    Test("Except", ValueNotInUsingExcept);
    Test("Where", ValueNotInUsingWhere);
    Test("NotIn with Equals", ValueNotInUsingEquals);
    Test("NotIn with Comparitor", ValueNotInUsingComparitor);
    Test("NotIn with Selector and Comparitor", ValueNotInUsingSelectorAndComparitor);
    Test("HashedNotIn with Selector and Comparitor", 
          ValueNotInUsingHashedSelectorAndComparitor);
    Test("ExceptNotIn with Selector and Comparitor", ValueNotInUsingExceptSelector);

    Console.WriteLine("\r\nDone.\r\nPress ENTER to close window.");
    Console.ReadLine();
}

And the supporting code to run the tests, the idea being to run each test 10 times and average the time each test took to run:

C#
public static void Test(string msg, Action action)
{
    var avg = repeat.Average(() => Timing.Time(action));

    Console.WriteLine($"{msg}: {avg}ms");
}

and:

C#
public static decimal Average(this int n, Func<long> func)
{
    long count = 0;
    n.ForEach(() => count += func());
    decimal avg = count / (decimal)n;

    return avg;
}

and we use the built in Stopwatch class:

C#
static class Timing
{
    public static long Time(Action action)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        action();
        sw.Stop();

        return sw.ElapsedMilliseconds;
    }
}

The Except Method

Linq has this amazing Except extension method which is very very fast. So, given a simple list of values, we can get the items in list A that are not in list B:

C#
public static void ValueNotInUsingExcept()
{
    // force evaluation with ToList()
    var items = listA.Except(listB).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

And it really is fast:

Except: 2.2ms

Using Where

One of the suggestions I encountered was to code the "not in" function using Linq's Where method:

C#
public static void ValueNotInUsingWhere()
{
    // force evaluation with ToList()
    var items = listA.Where(a => !listB.Any(b => a == b)).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

We can immediately start to see a performance degradation: Where: 290.2ms

Creating an Extension Method

And the performance degrades further when we create an extension method, which forces us to use the Equals method. Given:

C#
public static void ValueNotInUsingEquals()
{
    // force evaluation with ToList()
    var items = listA.NotIn(listB).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

implemented as:

C#
public static IEnumerable<T> NotIn<T>
    (this IEnumerable<T> listA, IEnumerable<T> listB) where T : struct
{
    var items = listA.Where(a => !listB.Any(b => a.Equals(b)));

    return items;
}

The performance gets worse:

NotIn with Equals: 426.5ms

I'm going to continue using an extension method approach however, as we will get to the ultimate point in a bit.

Providing a Comparitor

Interestingly, providing a comparitor improves performance. Given:

C#
public static void ValueNotInUsingComparitor()
{
    // force evaluation with ToList()
    var items = listA.NotIn(listB, (a, b) => a == b).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

implemented as:

C#
public static IEnumerable<T> NotIn<T>
    (this IEnumerable<T> listA, IEnumerable<T> listB, Func<T, T, bool> comparitor)
{
    var items = listA.Where(a => !listB.Any(b => comparitor(a, b)));

    return items;
}

We see an slight improvement:

NotIn with Comparitor: 348.4ms

Adding Selectors

Now let's add selectors to extract the values we want to compare from a property of a class. Given:

C#
public static void ValueNotInUsingSelectorAndComparitor()
{
    // force evaluation with ToList()
    var items = listOfA.NotIn(listOfB, a => a.Value, b => b.Value, (a, b) => a == b).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

implemented as:

C#
public static IEnumerable<A> NotIn<A, B, Q>(this IEnumerable<A> source, 
    IEnumerable<B> target, Func<A, Q> sourceSelector, Func<B, Q> targetSelector, 
    Func<Q, Q, bool> comparitor)
{
    var items = source.Where(s => !target.Any
                (t => comparitor(sourceSelector(s), targetSelector(t))));

    return items;
}

The performance degrades considerably:

NotIn with Selector and Comparitor: 711.1ms

Using a HashSet

If we use a HashSet for the list "B" target:

C#
public static void ValueNotInUsingHashedSelectorAndComparitor()
{
    // force evaluation with ToList()
    var items = listOfA.HashedNotIn(listOfB, a => a.Value, 
                                    b => b.Value, (a, b) => a == b).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

implemented as:

C#
public static IEnumerable<A> HashedNotIn<A, B, Q>(this IEnumerable<A> source, 
       IEnumerable<B> target, Func<A, Q> sourceSelector, Func<B, Q> targetSelector, 
       Func<Q, Q, bool> comparitor)
{
    var hashedTargets = new HashSet<Q>(target.Select(t => targetSelector(t)));
    var items = source.Where(s => !hashedTargets.Any(t => comparitor(sourceSelector(s), t)));

    return items;
}

We actually see a slight performance improvement:

HashedNotIn with Selector and Comparitor: 524.2ms

This still falls very short of the performance of the Except method.

A Performant List A not in List B

So in my final implementation, I take the ability to provide selectors and use the Except method, returning the instances of A that are not in B for the given selectors. With:

C#
public static void ValueNotInUsingExceptSelector()
{
    // force evaluation with ToList()
    var items = listOfA.ExceptNotIn(listOfB, a => a.Value, b => b.Value).ToList();
    Assertion.Assert(items.Count == samples, $"Expected {samples}.");
}

implemented as:

C#
/// <summary>
/// Will only return a single entry of A when A1 and A2 select the same value.
/// </summary>
public static IEnumerable<A> ExceptNotIn<A, B, Q>(this IEnumerable<A> source, 
    IEnumerable<B> target, Func<A, Q> sourceSelector, Func<B, Q> targetSelector)
{
    Dictionary<Q, A> dict = new Dictionary<Q, A>();
            
    var evaldSource = source.Select(s =>
    {
        var val = sourceSelector(s);
        dict[val] = s;
        return val;
    }).ToList();

    var targets = target.Select(t => targetSelector(t));
    var qitems = evaldSource.Except(targets);
    var items = qitems.Select(q => dict[q]);

    return items;
}

Suddenly, we achieve the same performance as the Except method using value types:

ExceptNotIn with Selector and Comparitor: 1.4ms

Which is rather impressive considering the work being done to create the dictionary and invoke the selectors.

The magic here is that we:

  1. Create a dictionary of "selected source value" mapped to the source instance.
  2. Preselect all the target values.
  3. Use the Except method given the values.
  4. Return the source instances based on the returned values.

As the comment indicates, if you have two instances of the source which have the same "select value", you'll only get one instance of the source returned. I can live with that as I usually know that my source instance values are distinct.

Conclusion

What can I say? It certainly behooves one to investigate the performance of the Linq in the context of how one is using Linq!

History

  • 30th January, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Marc Clifton
Architect Interacx
United States United States
Blog: https://marcclifton.wordpress.com/
Home Page: http://www.marcclifton.com
Research: http://www.higherorderprogramming.com/
GitHub: https://github.com/cliftonm

All my life I have been passionate about architecture / software design, as this is the cornerstone to a maintainable and extensible application. As such, I have enjoyed exploring some crazy ideas and discovering that they are not so crazy after all. I also love writing about my ideas and seeing the community response. As a consultant, I've enjoyed working in a wide range of industries such as aerospace, boatyard management, remote sensing, emergency services / data management, and casino operations. I've done a variety of pro-bono work non-profit organizations related to nature conservancy, drug recovery and women's health.

Comments and Discussions

 
GeneralMy vote of 3 Pin
tbayart26-Aug-21 2:40
professionaltbayart26-Aug-21 2:40 
GeneralI have almost down-voted this article Pin
wmjordan1-Feb-21 14:42
professionalwmjordan1-Feb-21 14:42 
I had not down-voted yet.

I could not imagine Marc had written this type of article!

The following code in the article obviously has mistakenly used Any extension method on HashSet.
public static IEnumerable<A> HashedNotIn<A, B, Q>(this IEnumerable<A> source, IEnumerable<B> target, Func<A, Q> sourceSelector, Func<B, Q> targetSelector, Func<Q, Q, bool> comparitor)
{
    var hashedTargets = new HashSet<Q>(target.Select(t => targetSelector(t)));
    var items = source.Where(s => !hashedTargets.Any(t => comparitor(sourceSelector(s), t)));

    return items;
}

We shall use the constructor which takes IEqualityComparer<T> as comparer and use HashSet.Contains (O(1)) instead of Any (O(N)) to check existence of an item.

The legendary Sir. Marc should not make such kind of mistake.

Edit:
In the ExceptNotIn method, the call to ToList() is also redundant, since subsequent call to Except does not require evaldSource to be an IList. This will produce useless memory allocation and consume extra CPU time.
public static IEnumerable<A> ExceptNotIn<A, B, Q>(this IEnumerable<A> source, IEnumerable<B> target, Func<A, Q> sourceSelector, Func<B, Q> targetSelector) {
    Dictionary<Q, A> dict = new Dictionary<Q, A>();

    var evaldSource = source.Select(s =>
    {
        var val = sourceSelector(s);
        dict[val] = s;
        return val;
    }).ToList();

    var targets = target.Select(targetSelector);
    var qitems = evaldSource.Except(targets);
    var items = qitems.Select(q => dict[q]);

    return items;
}


modified 1-Feb-21 21:23pm.

GeneralThat's not fair Pin
Mr.PoorEnglish19-Feb-21 11:12
MemberMr.PoorEnglish19-Feb-21 11:12 
Suggestionmethod revamped Pin
Mr.PoorEnglish1-Feb-21 6:10
MemberMr.PoorEnglish1-Feb-21 6:10 
GeneralRe: method revamped Pin
Marc Clifton3-Apr-21 4:43
mvaMarc Clifton3-Apr-21 4:43 
Questionattached sources? Pin
Mr.PoorEnglish31-Jan-21 3:29
MemberMr.PoorEnglish31-Jan-21 3:29 
AnswerRe: attached sources? Pin
Marc Clifton31-Jan-21 8:32
mvaMarc Clifton31-Jan-21 8:32 
Question+5: Testing tools leads to better desicions Pin
LightTempler30-Jan-21 22:07
MemberLightTempler30-Jan-21 22:07 
Question+5: I don't like LINQ, but I like you article Pin
honey the codewitch30-Jan-21 18:38
mvahoney the codewitch30-Jan-21 18:38 
AnswerRe: +5: I don't like LINQ, but I like you article Pin
LightTempler30-Jan-21 22:03
MemberLightTempler30-Jan-21 22:03 
GeneralRe: +5: I don't like LINQ, but I like you article Pin
honey the codewitch31-Jan-21 1:01
mvahoney the codewitch31-Jan-21 1:01 
GeneralRe: +5: I don't like LINQ, but I like you article Pin
LightTempler1-Feb-21 14:03
MemberLightTempler1-Feb-21 14:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.