Click here to Skip to main content
15,921,351 members
Please Sign up or sign in to vote.
4.86/5 (7 votes)
See more:
Hi. Please bear with me as this is a serious question. I cannot figure out the popularity of searching for the ultimate "lock free" algorithm for queues, etc. I just finished reading an article Yet another implementation of a lock-free circular array queue[^] and followed up on some of the references and Googled some other articles, going back almost 20 years. For reference, see An Optimistic Approach to Lock-Free FIFO Queues[^]

Here's my confusion, all of the articles that address the N-producers and M-consumers end up creating some sort of loop, a while(true){} or do{}while (CAS...) block to wrap a set of instructions. Both referenced articles demonstrate this. Effectively, they replace a blocking operation with a "spin loop" that continues until the operation succeeds without contention from other processes.

Back in the '60s, we used to call this a "spin lock", protecting a block of code using a "read / modify / write" hardware instruction to "test / set" a condition. Early IBM/360s had "TS" (Test and Set). PDP-10's had "AOSE" (Add one and skip if equal to zero) allowing similar logic.

Many of the algorithms I've seen then used complicated methods to detect and inconsistent queue setup or the "ABA" problem, basically discovering a pointer has been reused and doesn't point to what you think it does anymore. Object serial numbers are usually introduced to alieviate this problem.

Now my question. Why go through all these convoluted steps? Why isn't the simpliest solution used instead? I mean I get it, you want to avoid using system calls for locking (Mutex, Conditioned Variables, et al) for the "low contention" cases, when collisions on the queue (or whatever) are rare. If folks are willing to use a "compute bound spin loop" during contention cases, why not just
// assume "m_lock_taken" is initialized to 0 (false)
...
while (!CAS(&m_lock_taken, 0, 1));
// do enqueue or dequeue now that you own the "lock"
m_lock_taken = 0;
...

Simple spin, no complex detection nor unwinding. It has all the drawbacks of a spin loop in that it chews up cpu time doing nothing until the lock is released, probably memory contention for the m_lock_taken variable, and worse, when the N+M exceeds the number of processors / cores in the system, the spinning thread is using execution cycles that could be given to the thread that owns the lock and was pre-empted by and event or the scheduler. I simply do not understand the need for all the complexity.

And if you care about chewing up cpu time, then what you want is the case where the lock is taken out quickly when there is no contention and a waitable object (Mutex, Conditioned Variable) when the rare contention happens.

It has always been my experience that when multiple threads need to access a structure / object that need more than 1 data item to be in a consistent state before the object is "valid", that sequence needs to be protected by an "interlock", blocking or looping.

All the algorithms I've seen so far end up implementing just this simple case, loop until the complex set of operations yield a consistent object state yet there are far more simple ways of doing that exact thing.

Discuss?
Posted
Updated 8-Feb-11 12:11pm
v2
Comments
DaveAuld 8-Feb-11 18:11pm    
Well maybe when you find the 'answer', then i am sure you would have a pretty good article on your hands covering the for and against of the different locking types / methods. :)
Chuck O'Toole 8-Feb-11 18:25pm    
Suggestions or pointers to good articles would be appreciated. I see you are into industrial automation, monitoring, and control. I spent just over 10 years in real time kernels, embedded systems, and robotics. So I too have "been there, done that". And my question is really serious, I have many ingrained "truisms" based on years of experience. One of them is folks often end up re-inventing a simple wheel into an overly complex one for no benefit. Now as an academic exercise, I can see how students can benefit from the analysis, after all, it's how I learned things. Still, if I'm wrong (or just underinformed), I'd like to know what I'm missing.
Sergey Alexandrovich Kryukov 8-Feb-11 18:31pm    
I presently work in the same field (+ very close computer-aided experimental physics). I cannot stop wondering how illiterate most people in the industry are in programming field are. Threading and distributed systems are in my major area of interest.
Could you contact me directly via my Web site to establish some contact? You could find my contact via my CodeProject profile.
--SA
DaveAuld 8-Feb-11 18:54pm    
The problems we currently experience in the control system we are currently implementing are mainly due to overly complex code being written just in case....Quite often a back to basics approach should be followed, but it appears the devs don't like this approach!
Sergey Alexandrovich Kryukov 8-Feb-11 19:55pm    
Yes, this is indeed a big problem... As I sometimes say about one of them related to device design: OK, you want to do all that primitive polling, but do it at the expense of your CPU, not mine! :-)
Sometimes going to lower level helps. We recently dug out how to work with TCP channel of DeltaTau (oh gosh!) directly and observed that with throwing all client-part software from DeltaTau so many problems dissapeared!
--SA

1 solution

Lock-free algorithms solve problems associated with the very concept of locking, not with an implementation of it. Replacing OS-supplied mutex with a spin lock does not make your algorithm lock-free: it's still locking, only with a lot more waste.

Consider your implementation of the spin lock. A thread has just swapped "1" into the m_lock_taken, and is about to do some useful work. But then the scheduler comes along, and puts that thread to sleep! The remaining threads will be waiting for m_lock_taken to become "0", but it's not going to happen until the lock owner is allowed to run again. So the remaining threads would waste a lot of CPU time while making zero progress.

In contrast to spin locks, spin loops in lock-free algorithms are designed in such a way as to make your thread spin only when a competing thread makes some progress. As the result, at least one thread from the competing bunch is guaranteed to be making some progress. This is different from spinning when a competing thread does not make progress, which is what happens in case of spin locks: the group of competing threads will be making no progress if the owner of the lock has been pre-empted.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900