Efficient persist barriers for multicores
Proceedings of the 48th International Symposium on Microarchitecture, 2015•dl.acm.org
Emerging non-volatile memory technologies enable fast, fine-grained persistence compared
to slow block-based devices. In order to ensure consistency of persistent state, dirty cache
lines need to be periodically flushed from caches and made persistent in an order specified
by the persistency model. A persist barrier is one mechanism for enforcing this ordering. In
this paper, we first show that current persist barrier implementations, flowing to certain
ordering dependencies, add cache line flushes to the critical path. Our main contribution is …
to slow block-based devices. In order to ensure consistency of persistent state, dirty cache
lines need to be periodically flushed from caches and made persistent in an order specified
by the persistency model. A persist barrier is one mechanism for enforcing this ordering. In
this paper, we first show that current persist barrier implementations, flowing to certain
ordering dependencies, add cache line flushes to the critical path. Our main contribution is …
Emerging non-volatile memory technologies enable fast, fine-grained persistence compared to slow block-based devices. In order to ensure consistency of persistent state, dirty cache lines need to be periodically flushed from caches and made persistent in an order specified by the persistency model. A persist barrier is one mechanism for enforcing this ordering.
In this paper, we first show that current persist barrier implementations, flowing to certain ordering dependencies, add cache line flushes to the critical path. Our main contribution is an efficient persist barrier, that reduces the number of cache line ushes happening in the critical path. We evaluate our proposed persist barrier by using it to enforce two persistency models: buffered epoch persistency with programmer inserted barriers; and buffered strict persistency in bulk mode with hardware inserted barriers. Experimental evaluations using micro-benchmarks (buffered epoch persistency) and multi-threaded workloads (buffered strict persistency) show that using our persist barrier improves performance by 22% and 20% respectively over the state-of-the-art.
ACM Digital Library