research-article

WCET Preserving Hardware Prefetch for Many-Core Real-Time Systems

Authors:

Neil C. AudsleyAuthors Info & Claims

RTNS '14: Proceedings of the 22nd International Conference on Real-Time Networks and Systems

Pages 193 - 202

https://doi.org/10.1145/2659787.2659824

Published: 08 October 2014 Publication History

Abstract

There is an obvious bus bottleneck when multiple CPUs within a Many-Core architecture share the same physical off-chip memory (eg. DDR / DRAM). Worst-Case Execution Time (WCET) analysis of application tasks will inevitably include the effects of sharing the memory bus amongst CPUs; likewise average case execution times will include effects of individual memory accesses being slowed by interference with other memory requests from other CPUs. One approach for mitigating this is to use a hardware prefetch to move instructions and data from memory to the CPU cache before a cache miss instigates a memory request. However, in a real-time system, there is a trade-off between issuing prefetch requests to off-chip memory and hence reducing bandwidth available to serving CPU cache misses; and the gain in the fact that some CPU cache misses are avoided by the prefetch with the memory system seeing reduced memory requests.

In this paper we propose, analyse and show the implementation of a hardware prefetcher designed so that WCET of application tasks are not affected by the run-time behaviour of the prefetcher, i.e. it utilises spare time within the memory system to issue prefetch requests and forward them to the appropriate CPU. As well as not affecting WCET times, the prefetcher enables significant reduction in average case execution times of application tasks, showing the efficacy of the approach.

References

[1]

TACLeBench, 2013.

[2]

A. Agarwal. The Tile Processor: A 64-Core Multicore for Embedded Processing Markets Demanding More Performance. 2007.

[3]

B. Akesson, L. Steffens, E. Strooisma, and K. Goossens. Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration. 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 3--14, Aug. 2008.

Digital Library

[4]

J.-L. Baer and T.-F. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing -Supercomputing '91, pages 176--186, New York, New York, USA, 1991. ACM Press.

Digital Library

[5]

G. Bernat, A. Colin, and S. Petters. pWCET: a Tool for Probabilistic Worst-Case Execution Time Analysis of Real-Time Systems. pages 1--18, 2003.

[6]

H. Cassé and P. Sainrat. OTAWA, a Framework for Experimenting WCET Computations. Number January, pages 1--8, 2006.

[7]

K. Chapman. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources, 2012.

[8]

J. Collins, S. S. Sair, B. Calder, and D. M. Tullsen. Pointer Cache Assisted Prefetching. In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 62--73, 2002.

Digital Library

[9]

R. Cooksey, S. Jourdan, and D. Grunwald. A stateless, content-directed data prefetching mechanism. ACM SIGPLAN Notices, 37(10):279, Oct. 2002.

Digital Library

[10]

D. Dasari, B. Andersson, V. Nelis, S. M. Petters, A. Easwaran, and J. Lee. Response Time Analysis of COTS-Based Multicores Considering the Contention on the Shared Memory Bus. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, (100202):1068--1075, Nov. 2011.

Digital Library

[11]

A. Ermedahl. A Modular Tool Architecture for Worst-Case Execution Time Analysis. PhD thesis, Uppsala University, 2003.

[12]

J. W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter, 23(1-2):102--110, Dec. 1992.

Digital Library

[13]

A. Hansson, K. Goossens, M. Bekooij, and J. Huisken. CoMPSoC: A template for composable and predictable multi-processor system on chips. ACM Transactions on Design Automation of Electronic Systems (TODAES), 14(1):2, 2009.

Digital Library

[14]

R. Heckmann and C. Ferdinand. Worst-Case Execution Time Prediction by Static Program Analysis, 2006.

[15]

I. Hur and C. Lin. Memory Prefetching Using Adaptive Stream Detection. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pages 397--408. IEEE, Dec. 2006.

Digital Library

[16]

D. Joseph and D. Grunwald. Prefetching using Markov predictors. In Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97, pages 252--263, New York, New York, USA, 1997. ACM Press.

Digital Library

[17]

N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In {1990} Proceedings. The 17th Annual International Symposium on Computer Architecture, pages 364--373. IEEE Comput. Soc. Press, 1990.

Digital Library

[18]

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 261--270. IEEE, Sept. 2009.

Digital Library

[19]

K. Nesbit and J. Smith. Data Cache Prefetching Using a Global History Buffer. In 10th International Symposium on High Performance Computer Architecture (HPCA'04), pages 96--96. IEEE, 2004.

Digital Library

[20]

R. Pellizzoni, A. Schranzhofer, M. Caccamo, and L. Thiele. Worst case delay analysis for memory interference in multicore systems. 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pages 741--746, Mar. 2010.

Digital Library

[21]

G. Plumbridge, J. Whitham, and N. Audsley. Blueshell: A Platform for Rapid Prototyping of Multiprocessor NoCs and Accelerators. In Proceedings HEART Workshop. University of York, 2013.

[22]

Rapita. RapiTime Explained, 2014.

[23]

S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. Cycle, pages 128--138, 2000.

Digital Library

[24]

A. Roth and G. Sohi. Effective jump-pointer prefetching for linked data structures. In Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367), pages 111--121. IEEE Comput. Soc. Press, 1999.

Digital Library

[25]

S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 63--74, 2007.

Digital Library

[26]

S. V. Tota, M. R. Casu, M. R. Roch, L. Rostagno, and M. Zamboni. MEDEA: a hybrid shared-memory/message-passing multiprocessor NoC-based architecture. In 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pages 45--50. IEEE, Mar. 2010.

Digital Library

Cited By

Jiang ZAudsley NShill DYang KFisher NDong Z(2021) Brief Industry Paper: AXI-Interconnect RT : Towards a Real-Time AXI-Interconnect for System-on-Chips 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00046(437-440)Online publication date: May-2021
https://doi.org/10.1109/RTAS52030.2021.00046
Wang HAudsley NChang W(2020)Addressing Resource Contention and Timing Predictability for Multi-Core Architectures with Shared Memory Interconnects2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS48715.2020.00-16(70-81)Online publication date: Apr-2020
https://doi.org/10.1109/RTAS48715.2020.00-16

Index Terms

WCET Preserving Hardware Prefetch for Many-Core Real-Time Systems

Recommendations

PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

In multi-core systems, hardware prefetchers aggravate the preemption of some access-intensive programs for shared last level cache (LLC) resources, resulting in lower system performance. As a solution, we propose a prefetch-aware multi-core shared cache ...
Prefetch-aware shared resource management for multi-core systems
ISCA '11

Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. ...
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
ASPLOS '13

Most hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

RTNS '14: Proceedings of the 22nd International Conference on Real-Time Networks and Systems

October 2014

335 pages

ISBN:9781450327275

DOI:10.1145/2659787

General Chairs:
Mathieu Jan
CEA LIST, Gif-sur-Yvette, France
,
Belgacem Ben Hedia
CEA LIST, Gif-sur-Yvette, France
,
Program Chairs:
Joël Goossens
Université Libre de Bruxelles, Belgium
,
Claire Maiza
Grenoble INP / Verimag, France

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

CEA: Commissariat à l'énergie atomique et aux énergies alternatives
GDR ASR: GDR Architecture, Systèmes et Réseaux

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

RTNS '14

RTNS '14: 22nd International Conference on Real-Time Networks and Systems

October 8 - 10, 2014

Versaille, France

Acceptance Rates

Overall Acceptance Rate 119 of 255 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
137
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)4

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang ZAudsley NShill DYang KFisher NDong Z(2021) Brief Industry Paper: AXI-Interconnect RT : Towards a Real-Time AXI-Interconnect for System-on-Chips 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00046(437-440)Online publication date: May-2021
https://doi.org/10.1109/RTAS52030.2021.00046
Wang HAudsley NChang W(2020)Addressing Resource Contention and Timing Predictability for Multi-Core Architectures with Shared Memory Interconnects2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS48715.2020.00-16(70-81)Online publication date: Apr-2020
https://doi.org/10.1109/RTAS48715.2020.00-16

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents