TOSTING: Investigating Total Store Ordering on ARM
-
ARCS
Conference
Best Paper Award
TOSTING: Investigating Total Store Ordering on ARM -
Proceedings of the 36th GI/ITG International Conference on Architecture of Computing Systems (ARCS 23)Springer International Publishing2023Best Paper Award.
PDF Details Slides [BibTex]
Abstract
The Apple M1 ARM processors incorporate two memory consistency models: the conventional ARM weak memory ordering and the total store ordering (TSO) model from the x86 architecture employed by Apple’s x86 emulator, Rosetta 2. The presence of both memory ordering models on the same hardware enables us to thoroughly benchmark and compare their performance characteristics and worst-case workloads.
In this paper, we assess the performance implications of TSO on the Apple M1 processor architecture. Based on various workloads, our findings indicate that TSO is, on average, 8.94 percent slower than ARM’s weaker memory ordering. Through synthetic benchmarks, we further explore the workloads that experience the most significant performance degradation due to TSO.