從硬體觀點了解 memory barrier 的實作和效果

2022-12-26, 685 words, 4 min read

之前從軟體的角度寫過 memory barrier 的介紹。《Memory Barriers: a Hardware View for Software Hackers》則是從硬體的角度了解硬體設計者的需求，以及 read/write memory barrier 如何運作。我只有讀完前五章，後面用我理解的方式摘要這篇文章。本文的圖示都是從該篇文章取出來的。

因为对内存访问顺序的重排可以获取更好的性能，如果某些场合下，程序的逻辑正确性需要内存访问顺序和program order一致，例如：同步原语，那么软件工程师可以使用memory barrier这样的工具阻止CPU对内存访问的优化。

Memory barriers are used to provide control over the order of memory accesses. This is necessary sometimes because optimizations performed by the compiler and hardware can cause memory to be accessed in a different order than intended by the developer.

A memory barrier affects instructions that access memory in two ways:

provides control over the order that memory access instructions are performed, and
provides control over when memory access instructions will complete.

Memory access instructions, such as loads and stores, typically take longer to execute than other instructions. Therefore, compilers use registers to hold frequently used values and processors use high speed caches to hold the most frequently used memory locations. Another common optimization is for compilers and processors to rearrange the order that instructions are executed so that the processor does not have to wait for memory accesses to complete. This can result in memory being accessed in a different order than specified in the source code. While this typically will not cause a problem in a single thread of execution, it can cause a problem if the location can also be accessed from another processor or device.

Types of Memory Barriers

As mentioned above, both compilers and processors can optimize the execution of instructions in a way that necessitates the use of a memory barrier. A memory barrier that affects both the compiler and the processor is a hardware memory barrier, and a memory barrier that only affects the compiler is a software memory barrier.

In addition to hardware and software memory barriers, a memory barrier can be restricted to memory reads, memory writes, or both. A memory barrier that affects both reads and writes is a full memory barrier.

There is also a class of memory barrier that is specific to multi-processor environments. The name of these memory barriers are prefixed with "smp". On a multi-processor system, these barriers are hardware memory barriers and on uni-processor systems, they are software memory barriers.

The barrier() macro is the only software memory barrier, and it is a full memory barrier. All other memory barriers in the Linux kernel are hardware barriers. A hardware memory barrier is an implied software barrier.

Using Memory Barriers

The two most common needs for memory barriers are to manage memory shared by more than one processor and IO control registers that are mapped to memory locations.

In the case of shared memory, when there is only one CPU, hardware memory barriers are not needed. Because of this, memory barriers that are only needed to control shared memory between processors can be optimized for better performance on systems with only one processor. As mentioned above, the name of these memory barriers are prefixed with "smp".

参考

https://medium.com/fcamels-notes/%E5%BE%9E%E7%A1%AC%E9%AB%94%E8%A7%80%E9%BB%9E%E4%BA%86%E8%A7%A3-memry-barrier-%E7%9A%84%E5%AF%A6%E4%BD%9C%E5%92%8C%E6%95%88%E6%9E%9C-416ff0a64fc1
https://bruceblinn.com/linuxinfo/MemoryBarriers.html