DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures

Srivatsa, Akshay; Mansour, Mostafa; Rheindt, Sven; Gabriel, Dirk; Wild, Thomas; Herkersdorf, Andreas

doi:10.1007/s10766-020-00688-6

DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures

Published: 03 January 2021

Volume 49, pages 570–599, (2021)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Akshay Srivatsa ORCID: orcid.org/0000-0002-1427-9511¹,
Mostafa Mansour¹,
Sven Rheindt¹,
Dirk Gabriel¹,
Thomas Wild¹ &
…
Andreas Herkersdorf¹

310 Accesses
Explore all metrics

Abstract

Embedded system applications, with their inherently limited parallelism, rarely exploit all available processing resources in large DSM-based manycore architectures. From a cache coherence perspective, this provides an opportunity to move away from global coherence spanning across all tiles, which does not scale well. Therefore, we favor a region-based cache coherence (RBCC) approach that enables coherence among a selectable cluster of tiles in accordance with application requirements. We present the design and hardware implementation of a flexibly configurable coherency region manager (CRM) that enables RBCC. We introduce two novel features that enhance RBCC, namely, runtime coherency region re-configuration and RBCC-malloc(), that dynamically tailor coherence to actually shared application working sets. Further, we propose, implement and evaluate additional CRM functions such as a non-intrusive barrier synchronization mechanism and a false sharing resolution strategy for our DSM-based manycore architecture. We have synthesized the CRM on an FPGA prototype for a 64-core system and observe a 38% reduction in BRAM-utilization compared to a global coherence directory for regions with up to 32 cores. Experiments using a video streaming application reveal a speed-up of up to 42% compared to an alternative message passing based implementation. We also evaluate the benefits of runtime coherency region re-configuration using two scenarios and present a formal analysis on when a re-configuration is beneficial.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CoD: Coherence-on-Demand – Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures

Mosaic: A Scalable Coherence Protocol

Article 29 January 2018

CaCAO: Complex and Compositional Atomic Operations for NoC-Based Manycore Platforms

Notes

In our system, coherence and their acknowledgement messages are not re-ordered.
Multiple coherence barriers can be supported by increasing the number of barrier and shadow registers per tile.
For some applications, this can additionally contain state transfers.

References

Fleisch, B., Popek, G.: Mirage: a coherent distributed shared memory design. In: Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pp. 211–223. SOSP ’89, Association for Computing Machinery, New York (1989). https://doi.org/10.1145/74850.74871
Bennett, J.K., Carter, J.B., Zwaenepoel, W.: Munin: distributed shared memory based on type-specific memory coherence. In: Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 168–176. PPOPP ’90, Association for Computing Machinery, New York (1990). https://doi.org/10.1145/99163.99182
de Dinechin, B.D.: Kalray mppa$\textregistered$: massively parallel processor array: revisiting dsp acceleration with the kalray mppa manycore processor. In: 2015 IEEE Hot Chips 27 Symposium, pp. 1–27 (2015). https://doi.org/10.1109/HOTCHIPS.2015.7477332
Lenoski, D., Laudon, J., Gharachorloo, K., Weber, W., Gupta, A., Hennessy, J., Horowitz, M., Lam, M.S.: The stanford dash multiprocessor. Computer 25(3), 63–79 (1992)
Article Google Scholar
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown III, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro. 27(5), 15–31 (2007)
Article Google Scholar
Kessler, R.E.: The cavium 32 core octeon ii 68xx. In: 2011 IEEE Hot Chips 23 Symposium (HCS), pp. 1–33 (2011). https://doi.org/10.1109/HOTCHIPS.2011.7477487
Srivatsa, A., Rheindt, S., Wild, T., Herkersdorf, A.: Region based cache coherence for tiled mpsocs. In: 2017 30th IEEE International System-on-Chip Conference (SOCC), pp. 286–291 (2017)
Southern, G., Renau, J.: Analysis of parsec workload scalability. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 133–142 (2016). https://doi.org/10.1109/ISPASS.2016.7482081
Srivatsa, A., Rheindt, S., Gabriel, D., Wild, T., Herkersdorf, A.: Cod: coherence-on-demand-runtime adaptable working set coherence for dsm-based manycore architectures. In: Pnevmatikatos, D.N., Pelcat, M., Jung, M. (eds.) Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 18–33. Springer, Cham (2019)
Chapter Google Scholar
Eggers, S.J., Katz, R.H.: Evaluating the performance of four snooping cache coherency protocols. In: Proceedings of the 16th Annual International Symposium on Computer Architecture, pp. 2–15. ISCA ’89, Association for Computing Machinery, New York (1989). https://doi.org/10.1145/74925.74927
Hennessy, J., Heinrich, M., Gupta, A.: Cache-coherent distributed shared memory: perspectives on its development and future challenges. Proc. IEEE 87(3), 418–429 (1999). https://doi.org/10.1109/5.747863
Article Google Scholar
Gupta, A., dietrich Weber, W., Mowry, T.: Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: International Conference on Parallel Processing, pp. 312–321 (1990)
Yao, Y., Wang, G., Ge, Z., Mitra, T., Chen, W., Zhang, N.: Selectdirectory: a selective directory for cache coherence in many-core architectures. In: 2015 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 175–180 (2015)
Ferdman, M., Lotfi-Kamran, P., Balet, K., Falsafi, B.: Cuckoo directory: a scalable directory for many-core systems. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp. 169–180 (2011)
Chaiken, D., Kubiatowicz, J., Agarwal, A.: Limitless Directories: A Scalable Cache Coherence Scheme, pp. 224–234. ASPLOS IV, ACM, New York (1991). https://doi.org/10.1145/106972.106995
Sodani, A., Gramunt, R., Corbal, J., Kim, H., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.: Knights landing: Second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)
Article Google Scholar
Fu, Y., Nguyen, T.M., Wentzlaff, D.: Coherence domain restriction on large scale systems. In: 48th International Symposium on Microarchitecture, pp. 686–698. MICRO-48, ACM, New York (2015). https://doi.org/10.1145/2830772.2830832
Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Multiprocessor System-on-Chip: Hardware Design and Tool Integration. https://doi.org/10.1007/978-1-4419-6460-1_11
Torrellas, J., Lam, H.S., Hennessy, J.L.: False sharing and spatial locality in multiprocessor caches. IEEE Trans. Comput. 43(6), 651–663 (1994). https://doi.org/10.1109/12.286299
Article MATH Google Scholar
Jeremiassen, T.E., Eggers, S.J.: Reducing false sharing on shared memory multiprocessors through compile time data transformations. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 179–188. PPOPP ’95, Association for Computing Machinery, New York (1995). https://doi.org/10.1145/209936.209955
Liu, T., Tian, C., Hu, Z., Berger, E.D.: Predator: predictive false sharing detection. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 3–14. PPoPP ’14, Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2555243.2555244
Liu, T., Liu, X.: Cheetah: detecting false sharing efficiently and effectively. In: Proceedings of the 2016 International Symposium on Code Generation and Optimization, pp. 1–11. CGO ’16, Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2854038.2854039
Liu, T., Berger, E.D.: Sheriff: precise detection and automatic mitigation of false sharing. SIGPLAN Not. 46(10), 3–18 (2011). https://doi.org/10.1145/2076021.2048070
Article Google Scholar
Freeh, V.W., Andrews, G.R.: Dynamically controlling false sharing in distributed shared memory. In: Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing, pp. 403–411 (1996). https://doi.org/10.1109/HPDC.1996.546211
Waliullah, M., Stenstrom, P.: Classification and elimination of conflicts in hardware transactional memory systems. In: 2011 23rd International Symposium on Computer Architecture and High Performance Computing, pp. 96–103 (2011). https://doi.org/10.1109/SBAC-PAD.2011.18

Download references

Acknowledgements

This work was partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project Number 146371743-TRR 89: Invasive Computing.

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Akshay Srivatsa, Mostafa Mansour, Sven Rheindt, Dirk Gabriel, Thomas Wild & Andreas Herkersdorf

Authors

Akshay Srivatsa
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa Mansour
View author publications
You can also search for this author in PubMed Google Scholar
Sven Rheindt
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Gabriel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Wild
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Herkersdorf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akshay Srivatsa.

Ethics declarations

Conflict of interest

The authors would like to thank Sai Varun Brahmadevara, Li-Yu Peng and Miguel Montoya Rendon for their contributions as master and internship students at the Chair of Integrated Systems, TUM. We would also like to thank Sebastian Maier at the Computer Science 4 department, FAU, Erlangen-Nuremberg for his OS support.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srivatsa, A., Mansour, M., Rheindt, S. et al. DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures. Int J Parallel Prog 49, 570–599 (2021). https://doi.org/10.1007/s10766-020-00688-6

Download citation

Received: 01 April 2020
Accepted: 05 November 2020
Published: 03 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10766-020-00688-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CoD: Coherence-on-Demand – Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures

Mosaic: A Scalable Coherence Protocol

CaCAO: Complex and Compositional Atomic Operations for NoC-Based Manycore Platforms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CoD: Coherence-on-Demand – Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures

Mosaic: A Scalable Coherence Protocol

CaCAO: Complex and Compositional Atomic Operations for NoC-Based Manycore Platforms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation