Hi Folks,
First of all sorry, this is going to be a moderately long post.So, please have the patience to read it through.

Here, I am going to put down some concepts which I have learned while skimming through some articles on lock-free programming and present my doubts on that learning.
Also, the discussion is on *NIX multiprocessor platform.

Starting by saying that "LOCKLESS = BLOCKLESS" as it is said that the system of threads as a whole makes progress because a CAS/DCAS can only fail if some thread has made progress.
So, we can say that, here in case of blocking on a mutex, we are spinning/waiting on a condition (The CAS while loop for eg.).

Quest1 > How would spinning on a while loop be more efficient than blocking on a
mutex ?
Quest2 > A good design employing mutexes also ensure that system progresses, so
isnt that BLOCKLESS too by definition?

As an answer to Question 1, one would argue that the blocking may go into kernel wait and there may be context switches which may be costly. Any further clarity on it would be appreciated.

Okay, assuming that after getting the answer for first 2 question I will be convinced to think that lock-free is really fast and real time when the atomic operations to be done are not big/time consuming.

Quest3 > So, isnt lock-free something like spinlock ? If yes, why cant we use 
pthread spin lock ?

Going ahead, in most of the literature available on net, one would see an implementation of atomic operation like this:

__asm__ __volatile__("lock\nxadd" X " %0,%1"                                               
                          : "=r"(result),"=m"(*(T *)i_pAddress)                                            
                          : "0"(i_addValue)                                                              
                          : "memory");  // What does this mean ? Memory Fencing ?
Quest4 > Does ":memory" in the above assemble mean memory fencing ? If yes, 
doesnt that take around 100 cycles to implement ?
Quest5 > Doesnt the lock instruction here assert that the operation is being
done on a shared resource, thus other threads are blocking here ?
As far as I know
this question is not valid for the more or less recent Intel multi proc arch
as the locking is done on cache lines.

Thanks in advance.

有帮助吗?

解决方案

That's a lot of questions!

How would spinning on a while loop be more efficient than blocking on a mutex ?

If the resource is largely uncontended, on average you don't have to spin long. That might make it cheaper than using a mutex.

A good design employing mutexes also ensure that system progresses, so isn't that BLOCKLESS too by definition?

It probably is more fair to the threads waiting. If spinning while waiting for a resource, an "unlucky" thread might have to wait a long time.

So, isn't lock-free something like spinlock ? If yes, why can't we use pthread spin lock ?

If you have a good idea about how to make an algorithm lock-free, you might not have to spin at all.

Does ":memory" in the above assemble mean memory fencing ? If yes, doesn't that take around 100 cycles to implement ?

Yes, it is memory fencing on systems that need that. Synchronizing lots of CPU caches make take a long time (possibly more that 100 clocks). On the other hand, a spinlock or a mutex would also need memeory fences to work correctly.

Doesn't the lock instruction here assert that the operation is being done on a shared resource, thus other threads are blocking here?

It's a different kind of blocking, likely at the hardware level. If other threads need the data you have just updated, they would need to wait for it to become available on their CPU.

其他提示

  1. lock-free does somehow use some spinlock like things. But when we talk about spin as lock, we usually want to synchronize large blocks of code. And when we talk about lock-free, it targets one assignment, that is usually only one instruction. The first one spins when something long is doing, and the second when spins when something short has done. The first one keeps spinning, and the second "spin" is merely a "retry".

  2. if I made no mistake, that "memory" means don't optimize memory variables into register variable.(Not quite right. Precisely, it tells the compiler the usage of memory must be strongly ordered. But it has nothing to do with hardware memory fencing.)

  3. lock instruction is not as fast as non-lock one, but still faster than context switching. But it is slower, so usually when writing a spinning thing, we will first do a non-lock test. If it passed, we will do the locked one.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top