Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2659787.2659824acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrtnsConference Proceedingsconference-collections
research-article

WCET Preserving Hardware Prefetch for Many-Core Real-Time Systems

Published: 08 October 2014 Publication History

Abstract

There is an obvious bus bottleneck when multiple CPUs within a Many-Core architecture share the same physical off-chip memory (eg. DDR / DRAM). Worst-Case Execution Time (WCET) analysis of application tasks will inevitably include the effects of sharing the memory bus amongst CPUs; likewise average case execution times will include effects of individual memory accesses being slowed by interference with other memory requests from other CPUs. One approach for mitigating this is to use a hardware prefetch to move instructions and data from memory to the CPU cache before a cache miss instigates a memory request. However, in a real-time system, there is a trade-off between issuing prefetch requests to off-chip memory and hence reducing bandwidth available to serving CPU cache misses; and the gain in the fact that some CPU cache misses are avoided by the prefetch with the memory system seeing reduced memory requests.
In this paper we propose, analyse and show the implementation of a hardware prefetcher designed so that WCET of application tasks are not affected by the run-time behaviour of the prefetcher, i.e. it utilises spare time within the memory system to issue prefetch requests and forward them to the appropriate CPU. As well as not affecting WCET times, the prefetcher enables significant reduction in average case execution times of application tasks, showing the efficacy of the approach.

References

[1]
TACLeBench, 2013.
[2]
A. Agarwal. The Tile Processor: A 64-Core Multicore for Embedded Processing Markets Demanding More Performance. 2007.
[3]
B. Akesson, L. Steffens, E. Strooisma, and K. Goossens. Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration. 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 3--14, Aug. 2008.
[4]
J.-L. Baer and T.-F. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing -Supercomputing '91, pages 176--186, New York, New York, USA, 1991. ACM Press.
[5]
G. Bernat, A. Colin, and S. Petters. pWCET: a Tool for Probabilistic Worst-Case Execution Time Analysis of Real-Time Systems. pages 1--18, 2003.
[6]
H. Cassé and P. Sainrat. OTAWA, a Framework for Experimenting WCET Computations. Number January, pages 1--8, 2006.
[7]
K. Chapman. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources, 2012.
[8]
J. Collins, S. S. Sair, B. Calder, and D. M. Tullsen. Pointer Cache Assisted Prefetching. In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 62--73, 2002.
[9]
R. Cooksey, S. Jourdan, and D. Grunwald. A stateless, content-directed data prefetching mechanism. ACM SIGPLAN Notices, 37(10):279, Oct. 2002.
[10]
D. Dasari, B. Andersson, V. Nelis, S. M. Petters, A. Easwaran, and J. Lee. Response Time Analysis of COTS-Based Multicores Considering the Contention on the Shared Memory Bus. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, (100202):1068--1075, Nov. 2011.
[11]
A. Ermedahl. A Modular Tool Architecture for Worst-Case Execution Time Analysis. PhD thesis, Uppsala University, 2003.
[12]
J. W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter, 23(1-2):102--110, Dec. 1992.
[13]
A. Hansson, K. Goossens, M. Bekooij, and J. Huisken. CoMPSoC: A template for composable and predictable multi-processor system on chips. ACM Transactions on Design Automation of Electronic Systems (TODAES), 14(1):2, 2009.
[14]
R. Heckmann and C. Ferdinand. Worst-Case Execution Time Prediction by Static Program Analysis, 2006.
[15]
I. Hur and C. Lin. Memory Prefetching Using Adaptive Stream Detection. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pages 397--408. IEEE, Dec. 2006.
[16]
D. Joseph and D. Grunwald. Prefetching using Markov predictors. In Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97, pages 252--263, New York, New York, USA, 1997. ACM Press.
[17]
N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In {1990} Proceedings. The 17th Annual International Symposium on Computer Architecture, pages 364--373. IEEE Comput. Soc. Press, 1990.
[18]
D. Molka, D. Hackenberg, R. Schone, and M. S. Muller. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 261--270. IEEE, Sept. 2009.
[19]
K. Nesbit and J. Smith. Data Cache Prefetching Using a Global History Buffer. In 10th International Symposium on High Performance Computer Architecture (HPCA'04), pages 96--96. IEEE, 2004.
[20]
R. Pellizzoni, A. Schranzhofer, M. Caccamo, and L. Thiele. Worst case delay analysis for memory interference in multicore systems. 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pages 741--746, Mar. 2010.
[21]
G. Plumbridge, J. Whitham, and N. Audsley. Blueshell: A Platform for Rapid Prototyping of Multiprocessor NoCs and Accelerators. In Proceedings HEART Workshop. University of York, 2013.
[22]
Rapita. RapiTime Explained, 2014.
[23]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. Cycle, pages 128--138, 2000.
[24]
A. Roth and G. Sohi. Effective jump-pointer prefetching for linked data structures. In Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367), pages 111--121. IEEE Comput. Soc. Press, 1999.
[25]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 63--74, 2007.
[26]
S. V. Tota, M. R. Casu, M. R. Roch, L. Rostagno, and M. Zamboni. MEDEA: a hybrid shared-memory/message-passing multiprocessor NoC-based architecture. In 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pages 45--50. IEEE, Mar. 2010.

Cited By

View all
  • (2021) Brief Industry Paper: AXI-Interconnect RT : Towards a Real-Time AXI-Interconnect for System-on-Chips 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00046(437-440)Online publication date: May-2021
  • (2020)Addressing Resource Contention and Timing Predictability for Multi-Core Architectures with Shared Memory Interconnects2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS48715.2020.00-16(70-81)Online publication date: Apr-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
RTNS '14: Proceedings of the 22nd International Conference on Real-Time Networks and Systems
October 2014
335 pages
ISBN:9781450327275
DOI:10.1145/2659787
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • CEA: Commissariat à l'énergie atomique et aux énergies alternatives
  • GDR ASR: GDR Architecture, Systèmes et Réseaux

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RTNS '14

Acceptance Rates

Overall Acceptance Rate 119 of 255 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021) Brief Industry Paper: AXI-Interconnect RT : Towards a Real-Time AXI-Interconnect for System-on-Chips 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00046(437-440)Online publication date: May-2021
  • (2020)Addressing Resource Contention and Timing Predictability for Multi-Core Architectures with Shared Memory Interconnects2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS48715.2020.00-16(70-81)Online publication date: Apr-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media