Monday, June 3, 2013

Memory barrier

When you see the " memory barrier "words, when what is your first reaction? Register took out the wrong values? ifence, sfence kind of instruction? Or the like, such as volatile keyword? Well, I first saw the words when the mind emerges out of the Warcraft covered in green moss rock barrier - #, and when I understood what the specific memory barrier, and Since its very familiar with that, my first reaction is that a few are still green stones, but wanted to go A one!
Closer to home, first explain what is a memory barrier. Memory barrier means " because the compiler optimization and the use of caching, resulting in the memory write operation can not be timely response out, that is when the memory write operation to complete after reading out the contents may be old "(From"inventive product kernel" ). (This concept seemingly not very accurate, correct definition: In order to prevent the compiler and hardware incorrectly optimized so that the memory access order (in fact, variables) and write access to the program proposed by inconsistent order a solution. It is not a mistake of the phenomenon, but a phenomenon of the error made ​​solutions hair ---- please correct me!!)
Concept is the concept, blunt things, one will be able realize understand something, do not know people still confused. Do not worry, we give the memory barrier points lower class, and then one by one to study about it, and so after reading this article, and then read back the concept, you understand!
Memory barrier Category:
  1. Compiler memory barrier caused
  2. Cache memory barrier caused
  3. Order execution caused by memory barrier
A compiler memory barrier caused by:
We all know, from the inside to take a register number taken from memory faster than many, so sometimes the compiler to compile a higher degree of optimization procedures, it will bring some common variables into registers, the next time you use the variable when it is taken directly from the register, rather than accessing memory, which is a problem when other threads change the value in the memory of how to do? You might think, how could the compiler so stupid, it made ​​such a stupid mistake! Yes, the compiler did not you think so clever! Let's look at the following code: (excerpt from "inventive product core" )
  1. int  flag = 0  
  2. void  wait () {  
  3.     while  (flag == 0)  
  4.         sleep (1000);  
  5.         ....  
  6.     }  
  7. }  
  8. void  Wakeup () {  
  9.     flag = 1;  
  10. }  
This code represents a thread in a loop waiting for another thread modifies flag. Gcc and so the compiler when found, sleep () does not modify the value of flag, so, in order to improve efficiency, it will assign a register flag, so the compiler generated after such a pseudo-assembly code:
  1. void  wait ()  
  2. {  
  3.     movl flag,% eax;  
  4.     while (% eax == 0)  
  5.         sleep (1000);  
  6. }  

At this time, when the wakeup function modifies the value of flag, wait function is still silly reading register values ​​without knowing actually flag has changed, the thread will die cycle continues. Thus, the compiler optimization brings the opposite effect!
But you can not say yes to let the compiler to give up this optimization, because in many cases, to bring the performance of this optimization is very impressive! Then how can we do it? Is there any way to avoid this situation? The answer must be yes, we can use the keyword volatile to avoid this situation .
  1. volatile  int  flag = 0;  
In this way, we can avoid compiler to flag up a register allocation.
Well, those described above, is called "induced memory compiler optimization barrier" is not to understand what the point? And then go back and look at the concept?
2, the cache memory barrier caused
Well, since registers can cause such a problem, then cache it? As we all know, CPU will fetch the data into a place called the cache, and then take the time to directly access the next cache, write time, that value is written to the first cache.
So, let us consider the case of a single core will be no problem it? First think about, monocytes case, in addition there will be what will modify the CPU memory? By the way, is an external device DMA ! Well, DMA modify the memory, it will not cause a memory barrier problem? The answer is that in the present architecture, no.
When the external device's DMA operation is completed, there will be a mechanism to ensure that the CPU cache line corresponding to know that he has been a failure; And when the CPU launch DMA operation, the external device sends a start command like before, we need to cache the contents of the corresponding written back to memory. In most RISC architecture, this mechanism is a write special instructions to achieve. In X86, using a technique called bus monitoring technology methods to achieve. Is the CPU and peripheral devices when they are required to access memory through the bus arbitration, there is a dedicated hardware module is used to record the cache memory area, when an external device when writing to memory, it is through this hardware to determine the next change memory whether the region in the cache, and then make the appropriate action.
So, when will generate cache memory barrier caused it? Multi-CPU? Yes, in a multi-CPU system inside, each CPU has its own cache, when the same area of memory that exist in both the CPU's cache when, CPU1 changed their values ​​in the cache, but still in his CPU2 The cache reads the old values, this result is not very cup with it? Because there is no memory access operation, the bus is no way to monitor, this time how to do?
Yeah, how to do it? We need to make before the read operation CPU2 own cache failure, x86, many commands can be done, such as lock prefix instruction, cpuid, iret other . Kernel uses some functions to accomplish this function: mb (), rmb (), wmb (). Also used those instructions above, interested can go to the next kernel code.
3, order execution caused by memory barrier:
We all know that superscalar processors become increasingly popular, even Godson are four launch. Is actually a superscalar CPU has a number of separate lines, one can launch multiple instructions, therefore, a lot of out of order execution allows instructions, specifically how a disordered way, you can see the architectural aspects of the book, just to say memory barrier.
Instructions out of order, and will be a problem, assuming command an assignment to a memory, the memory value from instruction 2 to operations. If the two of them upside down, command 2 start value in memory computing, is not it wrong?
In this case, x86 on special offers lfence, sfence, and mfence command to stop the pipeline:
lfence: stop the relevant lines, know lfence prior to instruction memory read operation completed
sfence: stop the relevant lines, know lfence memory before the write operation instruction completed
mfence: stop the relevant lines, know lfence prior to instruction memory read and write operations completed

No comments:

Post a Comment