The Weird and The Wonderful

Re: ConcurrentQueue & FirstOrDefault() and search speed.

5-Feb-18 10:28

You probably all knew this, but I needed a definitive answer.

I'm using a ConcurrentQueue which is a FIFO collection.
I needed to know that when I went to determine if the queue already had the item, if it would return faster when the item was found at the top (next item out) versus found at the bottom (last out position).

I assumed, but wasn't sure.
This code gives a good definitive answer.

ConcurrentQueue<int> mainList = new ConcurrentQueue<int>();
Console.WriteLine("Adding items...");
for (int x = 1; x < 10000000;x++){
    mainList.Enqueue(x);
}
Console.WriteLine("DONE adding items...");
Stopwatch sw = new Stopwatch();

Console.WriteLine("Searching for FIRST item...");
sw.Start();
mainList.FirstOrDefault(item => item == 1);
sw.Stop();
Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
sw = new StopWatch();
Console.WriteLine("Searching for LAST item...");
sw.Start();
mainList.FirstOrDefault(item => item == 9999999);
sw.Stop();
Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);

Output
Adding items...
DONE adding items...
Searching for FIRST item...
0ms
Searching for LAST item...
99ms

The consistent result is around 100ms on my machine.
EDIT
This addition seems to bear expected results too:

for (int z = 9999999; z >= 1;z--){
    if (z % 1000000 == 0) {
	sw = new Stopwatch();	
	sw.Start();
	mainList.FirstOrDefault(item => item == z);
	sw.Stop();
	Console.WriteLine ("{0} : {1}ms",z, sw.ElapsedMilliseconds);
	}
}

Output for this section is:
9000000 : 99ms
8000000 : 89ms
7000000 : 78ms
6000000 : 71ms
5000000 : 55ms
4000000 : 44ms
3000000 : 33ms
2000000 : 21ms
1000000 : 10ms

Interesting that on my machine it is about 10ms per 1,000,000 items.

EDIT 2

But, wait...there's more!
What about for objects?
I'm creating a queue of anonymous objects via dynamic -- basically cuz I'm lazy.

Then I test to see how long it takes to find the matching value. Answer: Longer
Right, we all assumed longer. The amount of time could just be unboxing of object to get value or whatever.

This answers the question for me though, because I'm going to be putting a bunch of items in a ConcurrentQueue and I want to add more items but don't want to add an item if the collection already contains the item.

void Main()
{
	
	ConcurrentQueue<dynamic> mainList = new ConcurrentQueue<dynamic>();
	Console.WriteLine("Adding items...");
	for (int x = 1; x < 10000000;x++){
		mainList.Enqueue(new {x=x,name="x_" + x});
	}
	Console.WriteLine("DONE adding items...");
	Stopwatch sw = new Stopwatch();
	

	Console.WriteLine("Searching for FIRST item...");
	sw.Start();
	mainList.FirstOrDefault(item => item.x == 1);
	sw.Stop();
	Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
	
	
	sw = new Stopwatch();
	Console.WriteLine("Searching for LAST item...");
	sw.Start();
	mainList.FirstOrDefault(item => item.x == 9999999);
	sw.Stop();
	Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);	

	
	for (int z = 9999999; z >= 1;z--){
		
		if (z % 1000000 == 0) {
			sw = new Stopwatch();	
			sw.Start();
			mainList.FirstOrDefault(item => item.x == z);
			sw.Stop();
			Console.WriteLine ("{0} : {1}ms",z, sw.ElapsedMilliseconds);
			}
		}
}

Output Looks Like
Adding items...
DONE adding items...
Searching for FIRST item...
4ms
Searching for LAST item...
753ms
9000000 : 643ms
8000000 : 575ms
7000000 : 496ms
6000000 : 430ms
5000000 : 362ms
4000000 : 285ms
3000000 : 217ms
2000000 : 141ms
1000000 : 72ms

modified 5-Feb-18 17:25pm.

Eddy Vluggen6-Feb-18 1:02

Eddy Vluggen

6-Feb-18 1:02

raddevus wrote:
I assumed, but wasn't sure.
This code gives a good definitive answer.

Hence, the upvote.

Proof is always better than an assumption Smile | :)

Bastard Programmer from Hell Suspicious | :suss:

If you can't read my code, try converting it here[^]

Re: ConcurrentQueue & FirstOrDefault() and search speed.

Richard Deeming6-Feb-18 2:46

Re: ConcurrentQueue & FirstOrDefault() and search speed.

6-Feb-18 2:46

Not particularly surprising, given the amount of work it's doing. Smile | :)

Reference Source: ConcurrentQueue.cs[^]

It looks like it's implemented as a singly linked list of small arrays (32 items each). Searching a linked list for the last item, or for an item that doesn't exist, is always going to take longer than finding an item close to the start.

It also seems that the order of searches has some effect. Try searching for 1st followed by 31st; then try 31st followed by 1st.

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

raddevus6-Feb-18 5:41

Re: ConcurrentQueue & FirstOrDefault() and search speed.

6-Feb-18 5:41

Very good confirming information on that. Thanks a lot.

Richard Deeming wrote:
It also seems that the order of searches has some effect. Try searching for 1st followed by 31st; then try 31st followed by 1s

I actually noticed that too as I tested, but my original post became so long I decided not to mention more. Smile | :)

Kornfeld Eliyahu Peter6-Feb-18 3:37

Kornfeld Eliyahu Peter

6-Feb-18 3:37

It is seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop...

Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.

Re: ConcurrentQueue & FirstOrDefault() and search speed.

raddevus6-Feb-18 5:37

Re: ConcurrentQueue & FirstOrDefault() and search speed.

6-Feb-18 5:37

Kornfeld Eliyahu Peter wrote:
seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop

Yes, a good observation.
I had initially solved this with a ConcurrentDictionary.
Someone else suggested the Queue to me.

1. I have a memory collection which continues to get re-loaded from the database on an interval.
2. I only want to load new items into the collection
3. However, when I get an item, I just want to pop off the oldest item from the collection.

This lead me to two different ways to do this.
1. Queue - load up the queue on interval - insure an item is never added more than once. -- later, simply pop off the oldest item (First-In). The getter code is a bit more simple (obvious / self-documenting) with the Queue.
2. Dictionary - load up Dictionary on interval - TryAdd() handles this for me. As long as I load the dictionary from the database in reverse order (Stored Proc returns oldest first) all I have to do is call .First() on Dictionary when I want the oldest item. The Queue obviously gives you the First added item automatically.

I've now done detailed testing and using the ConcurrentDictionary is much less CPU intensive, but might be slightly more RAM hungry.

Richard Deeming6-Feb-18 10:11

Re: ConcurrentQueue & FirstOrDefault() and search speed.

6-Feb-18 10:11

raddevus wrote:
all I have to do is call .First() on Dictionary when I want the oldest item.

That's an assumption that's going to bite you in the future. The dictionary classes don't guarantee that the enumerator will return items in any specific order. The order could change with a framework update; or it could change when your dictionary exceeds a certain size.

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

raddevus6-Feb-18 10:19

Re: ConcurrentQueue & FirstOrDefault() and search speed.

6-Feb-18 10:19

Ah, interesting. Very good info. So back to the Queue, it is.
There are always trade-offs.

I was sorting the Dictionary first via
waitingFiles.OrderByDescending(of => of.Value.FileCreated).First();

But that is really intensive because from what I can tell those LINQ objects are instantiated to search for the one that matches.

So, I was hoping (and it is really fast and nice) that since I load the data into the Dictionary in ascending order (oldest files first) that I could count on just calling First().

I mean it does work now. I was actually testing the order that the files are then processed.
However, I understand that it is not guaranteed.
Interesting and why I'm trying not to make assumptions about which is better ( Dictionary or Queue ) in this instance.

Jörgen Andersson9-Feb-18 2:02

Jörgen Andersson

9-Feb-18 2:02

Question, do you need to check all or any of the elements in the list, or is it enough to check the last one?

If so you would do better off with a linked list.

There are no concurrent linked lists though. Apparently it was around in the beta versions of dotNet 4.0 but it was left out from the release version as there were no advantages over using a normal linked list with locking.
But a simple lock would still be faster since you don't need to enumerate the collection.

Wrong is evil and must be defeated. - Jeff Ello

Re: ConcurrentQueue & FirstOrDefault() and search speed.

raddevus9-Feb-18 4:16

Re: ConcurrentQueue & FirstOrDefault() and search speed.

9-Feb-18 4:16

Thanks for chiming in on the conversation. The challenge I have here is that I need to insure the item is not currently anywhere in the collection -- an item may have been loaded at any location.

There are probably other ways around this but just attempting to do it the easiest, yet most correct way which uses the least resources.

The collection is used in two ways :
1. caller will request the oldest unique item
2. collection will periodically load newer items into itself

The interesting thing about this is that the Queue makes on of those requirements easy and the Dictionary makes the other requirement easy.

The first one is easy with the Queue since you just get the first in item.
The second one is difficult with a Queue (there is no key) so you have to determine that the object is not already contained in the Queue for every item, each time the items are retrieved from db.

The second one is easy with a Dictionary since you can do TryAdd() with your unique key and it fails fast if the item has already been added).
However, the first one is more difficult with a Dictionary because the items are not (necessarily) ordered in a significant way and we need to get the oldest item.

It's an interesting problem.

Richard Deeming9-Feb-18 4:47

Re: ConcurrentQueue & FirstOrDefault() and search speed.

9-Feb-18 4:47

If you can afford the extra memory usage, the simplest option would probably be a custom class to store the data twice - once in a HashSet<T>, and once in a Queue<T>.

You'd obviously need to add your own locking code to ensure it was thread-safe.

Off the top of my head, something like this might work:

public sealed class UniqueQueue<T> : IEnumerable<T>, IDisposable
{
    private readonly Queue<T> _queue = new Queue<T>();
    private readonly HashSet<T> _set = new HashSet<T>();
    private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
    
    public int Count
    {
        get
        {
            bool lockTaken = false;
            try
            {
                lockTaken = _lock.TryEnterReadLock(Timeout.Infinite);
                if (!lockTaken) throw new InvalidOperationException();
                
                return _queue.Count;
            }
            finally
            {
                if (lockTaken)
                {
                    _lock.ExitReadLock();
                }
            }
        }
    }
    
    public void Dispose()
    {
        _lock.Dispose();
    }
    
    public T Dequeue()
    {
        bool lockTaken = false;
        try
        {
            lockTaken = _lock.TryEnterWriteLock(Timeout.Infinite);
            if (!lockTaken) throw new InvalidOperationException();
            
            T oldestItem = _queue.Dequeue();
            _set.Remove(oldestItem);
            return oldestItem;
        }
        finally
        {
            if (lockTaken)
            {
                _lock.ExitWriteLock();
            }
        }
    }
    
    public void AddRange(IEnumerable<T> itemsToAdd)
    {
        if (itemsToAdd == null) throw new ArgumentNullException(nameof(itemsToAdd));
        
        bool lockTaken = false;
        try
        {
            lockTaken = _lock.TryEnterWriteLock(Timeout.Infinite);
            if (!lockTaken) throw new InvalidOperationException();
            
            foreach (T item in itemsToAdd)
            {
                if (_set.Add(item))
                {
                    _queue.Enqueue(item);
                }
            }
        }
        finally
        {
            if (lockTaken)
            {
                _lock.ExitWriteLock();
            }
        }
    }
    
    public UniqueQueueEnumerator GetEnumerator()
    {
        T[] copyOfItems = Array.Empty<T>();
        
        bool lockTaken = false;
        try
        {
            lockTaken = _lock.TryEnterReadLock(Timeout.Infinite);
            if (!lockTaken) throw new InvalidOperationException();
            
            copyOfItems = new T[_queue.Count];
            _queue.CopyTo(copyOfItems, 0);
        }
        finally
        {
            if (lockTaken)
            {
                _lock.ExitReadLock();
            }
        }
        
        return new UniqueQueueEnumerator(copyOfItems);
    }
    
    IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();
    
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    
    public struct UniqueQueueEnumerator : IEnumerator<T>
    {
        private readonly T[] _items;
        private int _position;
        
        public UniqueQueueEnumerator(T[] items)
        {
            _items = items;
            _position = -1;
        }
        
        void IDisposable.Dispose()
        {
        }
        
        public void Reset()
        {
            if (_items == null) throw new InvalidOperationException();
            _position = -1;
        }
        
        public bool MoveNext()
        {
            if (_items == null) return false;
            return ++_position < _items.Length;
        }
        
        public T Current
        {
            get
            {
                if (_items == null) throw new InvalidOperationException();
                if (0 > _position || _position >= _items.Length) throw new InvalidOperationException();
                return _items[_position];
            }
        }
        
        object IEnumerator.Current => Current;
    }
}

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

raddevus9-Feb-18 8:31

Re: ConcurrentQueue & FirstOrDefault() and search speed.

9-Feb-18 8:31

This is a very good suggestion and I saw something very similar at StackOverflow.
But, as you mentioned I was attempting to keep the memory size low too and it is interesting that this does store the objects in two collections.

Also, I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically so that would solve that part of the problem.
Thanks for the input and discussion.

Richard Deeming12-Feb-18 8:19

Re: ConcurrentQueue & FirstOrDefault() and search speed.

12-Feb-18 8:19

raddevus wrote:
I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically

Ish.

Modifications to one of those collections will automatically be thread-safe. But if you're trying to read or update the two in a single atomic operation, you'll still need your own locking mechanism.

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Jörgen Andersson9-Feb-18 6:09

Jörgen Andersson

9-Feb-18 6:09

I had a feeling it wouldn't be that simple.
Richards suggestion would've been my second one, minus the code sample though. (impressive that!)

Second question though, you wrote in another post that you load the objects from a database.
Are the objects in the database not unique? How do you ensure the uniqueness if you have popped an object from the list?
Or is the uniqueness coming from the identity column in the database? If so you could still use the linked list. Using a paged and sorted query.

Wrong is evil and must be defeated. - Jeff Ello

modified 9-Feb-18 13:14pm.

Re: ConcurrentQueue & FirstOrDefault() and search speed.

patbob19-Feb-18 5:42

patbob

19-Feb-18 5:42

raddevus wrote:
I needed to know that when I went to determine if the queue already had the item, if it would return faster when the item was found at the top (next item out) versus found at the bottom (last out position).

I assumed, but wasn't sure.
This code gives a good definitive answer

You forgot the important qualifier through -- this code gives a good definitive answer for build XXX of the library. In the next build, the implementer may find a bug with the way they're doing things and have to change it, altering the performance. If you're writing code that depends that much on the performance of some class, you need to write the class yourself.

I live in Oregon, and I'm an engineer.

sql maths

kmoorevs27-Jan-18 7:17

kmoorevs

27-Jan-18 7:17

I chased this sql server error for weeks. This a query for getting received items total value. For brevity, the fields are:
q (int) = unit quantity
us (int) = units per sleeve (never null or 0)
sc (int) = sleeves per case (never null or 0)
cv (money) = case value

The basic query provided to me from the vendor looked like this:

Sum((q / (us*sc)) * cv) as totalValue

The query ran just fine, but we were always short... Laugh | :laugh:

It turns out the original query was only ever tested with full case counts. OMG | :OMG:

btw, it's fixed, but just thought someone might get a chuckle out of it! Smile | :)

"Go forth into the source" - Neal Morse

RedDk28-Jan-18 11:12

RedDk

28-Jan-18 11:12

Without showing code there's little interesting here (especially since "nothing" was fixed Laugh | :laugh:

) ...

There's 'yer chuckle though.

kmoorevs wrote:
fixed

That's like some sort of like quantum foam, right?

if i'm not mistaken

kmoorevs29-Jan-18 2:53

kmoorevs

29-Jan-18 2:53

for the pun. Laugh | :laugh:

"Go forth into the source" - Neal Morse

Tomz_KV29-Jan-18 5:43

Tomz_KV

29-Jan-18 5:43

Is decimal portion not required?

TOMZ_KV