Monday, June 11, 2012

C++11 lockless queues

I have been messing around with the new c++11 threading in order to write a post on it... as per usual i ended up side tracked on something else.. while coding it i got thinking about the lockless vs lock implementations. Basically lockless implementations require you to design to very hard restrictions. They are "write mastering" and "obstruction free updating"

Write-Mastering
Write Mastering is where a data variable is updated by a single thread. Its often surprising how easy this is to do. Eg for a Queue the head is mastered from the producer side and the tail is mastered from the consumer side. (this is the example below)

Obstruction-freedom
When write mastering is not possible an obstruction-free check can be used This technique doesn't explicitly lock a data structure but instead uses a pair of "consistency markers" that are updated the sequance of read, update check 1, copy, copy update, (reread)check, write back copy(or rollback), update check 2.

To fully explain it: When an update is planed the first consistency marker is compared to the second. If they dont match enter a spin lock until they do. If/When they match the structure is free for an update. The first marker is scrambled and written to something new. The update proceeds on a copy and once its done the markers are rechecked to see if they are still scrambled as the thread choose. If both markers are cleared then nothing else changed them so then the update writes back the data copy and then updates the secondary marker to match the first. If the compare fails then the updated copy is tossed and the process starts again.

Lockless designs tend to be big cpu and memory wasters when they get into the lockless spins or are constantly clashing over updating shared structures. As a result the system running them should be balanced, tuned and as large as possible so that there is always data ready to be processed in each thread with as few conflicts as possible. You can get a taste of how cpu intensive they are by running the following example and watching your CPU hit the roof.

//complie with 
//g++ -std=c++11 lockless_queue.cpp -o lockless_queue.exe
#include <stdint.h>
#include <thread>
#include <iostream>

class LocklessQueueSys
{
  enum { size=1000};

public:
  LocklessQueueSys() :
    head(0),
    tail(0)
  {}
  
  void producer()
  {
    uint32_t count= 0;
    while(1)
      {
 while(((head+1)%size)== tail); //spin lock
 msg[head] = count++;
 head = (head + 1) % size;
      }
  }
  
  void consumer()
  {
    uint32_t expect=0;
    while(1)
      {
 while(head == tail); //spin lock
 if(expect != msg[tail])
    std::cout << "Error:" << expect  <<  "\n";
 if(expect%10000000 == 0)
    std::cout << "check:" << expect  <<  "\n";
 expect=msg[tail]+1;
 tail = (tail + 1) % size;
      }
  }
private:
  int32_t msg[size];
  int32_t head;
  int32_t  tail;
};

int main()
{
  //Use a member function in a thread
  LocklessQueueSys x;
  std::thread tpro(&LocklessQueueSys::producer, &x);
  std::thread tcon(&LocklessQueueSys::consumer, &x);
  
  tpro.join();
  tcon.join(); 
}