Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Data cache locking for tight timing calculations

Published: 12 December 2007 Publication History
  • Get Citation Alerts
  • Abstract

    Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. In addition, caches are a source of unpredictability, resulting in programs sometimes behaving in a different way than expected. Detailed information about the number of cache misses and their causes allows us to predict cache behavior and to detect bottlenecks. Small modifications in the source code may change memory patterns, thereby altering the cache behavior. Code transformations, which take the cache behavior into account, might result in a high cache performance improvement. However, cache memory behavior is very hard to predict, thus making the task of optimizing and timing cache behavior very difficult. This article proposes and evaluates a new compiler framework that times cache behavior for multitasking systems. Our method explores the use of cache partitioning and dynamic cache locking to provide worst-case performance estimates in a safe and tight way for multitasking systems. We use cache partitioning, which divides the cache among tasks to eliminate intertask cache interferences. We combine static cache analysis and cache-locking mechanisms to ensure that all intratask conflicts, and consequently, memory access times, are exactly predictable. The results of our experiments demonstrate the capability of our framework to describe cache behavior at compile time. We compare our timing approach with a system equipped with a nonpartitioned, but statically, locked data cache. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, demonstrating that our fully predictable scheme does not compromise the performance of the transformed programs.

    References

    [1]
    Alt, M., Ferdinand, C., Martin, F., and Wilhelm, R. 1996. Cache behaviour prediction by abstract interpretation. In Proceedings of Static Analysis Symposium (SAS'96). Lecture Notes in Computer Science (LNCS) 1145. Springer-Verlag, New York, 52--66.
    [2]
    Arnold, R., Müeller, F., Whalley, D., and Harmon, M. 1994. Bounding worst-case instruction cache performance. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 172--181.
    [3]
    Basumallick, S. and Nielsen, K. 1994. Cache issues in real-time systems. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'94).
    [4]
    Burns, A. and Wellings, A. 1993. The impact of an Ada run-time system's performance characteristics on scheduling models. In Proceedings of 12th Ada-Europe International Conference. 240--248.
    [5]
    Burns, A., Tindell, K., and Wellings, A. 1995. Effective analysis for engineering real-time fixed priority schedulers. IEEE Transactions on Software Engineering 21, 475--480.
    [6]
    Busquets-Mataix, J. V., Serrano, J. J., .Ors, R., Gil, P., and Wellings, A. 1996. Adding instruction cache effect to schedulability analysis of preemptive real-time systems. In Proceedings of 2nd Real-Time Technology and Applications Symposium (RTAS'96).
    [7]
    Busquets-Mataix, J. V., Serrano, J. J., and Wellings, A. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of 9th Euromicro Workshop on Real-Time Systems (EUROMICRO-RTS'97).
    [8]
    Campoy, M., Ivars, A. P., and Busquets-Mataix, J. V. 2001. Static use of locking caches in multitask preemptive real-time systems. In Proceedings of IEEE/IEE Real-Time Embedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium).
    [9]
    Engblom, J. and Ermedahl, A. 1999. Pipeline timing analysis using a trace-driven simulator. In Proceedings of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99).
    [10]
    Engblom, J. and Ermedahl, A. 2000. Modeling complex flows for worst-case execution time analysis. In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS'00).
    [11]
    Ermedahl, A. and Gustafsson, J. 1997. Deriving annotations for tight calculation of execution time. In Proceedings of Euro-Par (EUROPAR'97). 1298--1307.
    [12]
    Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, G. R. Perrin and A. Darte, Eds. Lecture Notes in Computer Science 1132. Springer Verlag, New York, 79--103.
    [13]
    Ferdinand, C. and Wilhelm, R. 1999. Efficient and precise cache behavior prediction for real-time systems. Real-Time Systems 17, 131--181.
    [14]
    Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing 5, 587--616.
    [15]
    Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4, 703--746.
    [16]
    Gustafsson, J. 2000. Analyzing execution time of object-oriented programs using abstract interpretation. Ph.D. thesis, Uppsala University.
    [17]
    Healey, C. A., Whalley, D., and Harmon, M. 1995. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 288--297.
    [18]
    Hennessy, J. L. and Patterson, D. A. 1996. Computer architecture: a quantitative approach. Morgan Kaufman Pub., San Francisco, CA.
    [19]
    IBM Microelectronics Division. 1999. The PowerPC 440 core.
    [20]
    Integrated Device Technologies. 2001. 79RC64574/RC64575 Data Sheet.
    [21]
    Jeffay, K. and Stone, D. L. 1993. Accounting for interrupt handling costs in dynamic priority task systems. In Proceedings of 14th Real-Time Systems Symposium (RTSS'93). 212--221.
    [22]
    Joseph, M. and Pandya, P. 1986. Finding response times in a real-time system. The Computer Journal 29, 5, 390--395.
    [23]
    Katcher, A. I., Arakawa, H., and Strosnider, J. K. 1993. Engineering and analysis of fixed priority schedulers. IEEE Transactions on Software Engineering 19, 920--934.
    [24]
    Kim, S. K., Min, S. L., and Ha, R. 1996. Efficient worst case timing analysis of data caching. In Proceedings of IEEE Real-Time Technology and Applications Symposium (RTAS'96).
    [25]
    Kirk, D. B. 1989. SMART (strategic memory allocation for real-time) cache design. In Proceedings of 10th Real-Time Systems Symposium (RTSS'89).
    [26]
    Lam, M., Rothberg, E. E., and Wolf, M. E. 1991. The cache performance of blocked algorithms. In Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91).
    [27]
    Lee, C. G., Hahn, J., Seo, Y. M., Min, S. L., Ha, R., Hong, S., Park, C. Y., Lee, M., and Kim, C. S. 1998. Analysis of cache-related preemption delay in fixed-priority preemptive scheduling. IEEE Transaction on Computers 47.
    [28]
    Li, Y. T. S., Malik, S., and Wolfe, A. 1995. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 298--307.
    [29]
    Li, Y. T. S., Malik, S., and Wolfe, A. 1996. Cache modeling and path analysis for real-time software. In Proceedings of 17th Real-Time Systems Symposium (RTSS'96).
    [30]
    Liedtke, J., Härtig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97).
    [31]
    Lim, S. S., Bae, Y. H., Jang, G. T., Rhee, B. D., Min, S. L., Park, C. Y., Shin, H., Park, K., and Kim, C. S. 1994. An accurate worst case timing analysis technique for RISC processors. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 97--108.
    [32]
    Lundqvist, T. and Stenström, P. 1998. Integrating path and timing analysis using instruction-level simulation techniqes. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98). 1--15.
    [33]
    Lundqvist, T. and Stenström, P. 1999a. A method to improve the estimated worst-case performance of data caching. In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99). 255--262.
    [34]
    Lundqvist, T. and Stenström, P. 1999b. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of 20th Real-Time Systems Symposium (RTSS'99).
    [35]
    MIPS Technologies. 2001. MIPS32 4Kp- Embedded, MIPS Processor Core.
    [36]
    Motorola Inc. 1996. PowerPC 604e RISC Microprocessor Technical Summary.
    [37]
    Müeller, F. 1995. Compiler support for software-based cache partitioning. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'95).
    [38]
    Puaut, I. and Decotigny, D. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of 23th Real-Time Systems Symposium (RTSS'02).
    [39]
    Rivera, G. and Tseng, C.-W. 1998. Data transformations for eliminating conflict misses. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98). 38--49.
    [40]
    Sánchez, F., González, A., and Valero, M. 1997. Static locality analysis for cache management. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'97).
    [41]
    Sun Microelectronics. 1997. microSPARC-IIep User's Manual.
    [42]
    Tindell, K., Burns, A., and Wellings, A. 1994. An extendible approach for analysing fixed priority hard real-time tasks. Real-Time Systems 6, 1, 133--151.
    [43]
    Vera, X. and Xue, J. 2002. Let's study whole program cache behaviour analytically. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8). Cambridge.
    [44]
    Vera, X., Abella, J., González, A., and Llosa, J. 2003a. Optimizing program locality through CMEs and GAs. In Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03). New Orleans.
    [45]
    Vera, X., Lisper, B., and Xue, J. 2003b. Data cache locking for higher program predictability. In Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'03). 272--282.
    [46]
    Vera, X., Lisper, B., and Xue, J. 2003c. Data caches in multitasking hard real-time systems. In Proceedings of International Real-Time Systems Symposium (RTSS03).
    [47]
    Vera, X., Bermudo, N., Llosa, J., and González, A. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 26, 2.
    [48]
    White, R. T., Müeller, F., Healy, C., Whalley, D., and Harmon, M. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of Third IEEE Real-Time Technology and Applications Symposium (RTAS'97). 192--202.
    [49]
    Wilson, R. P. 1997. Efficient context-sensitive pointer analysis for C programs. Ph.D. thesis, Stanford University.
    [50]
    Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91). 30--44.
    [51]
    Wolfe, A. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems.
    [52]
    Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publ., Boston, MA.
    [53]
    Xue, J. and Huang, C.-H. 1998. Reuse-driven tiling for data locality. International Journal of Parallel Programming 26, 6, 671--696.

    Cited By

    View all
    • (2020)Execution Model to Reduce the Interference of Shared Memory in ARINC 653 Compliant Multicore RTOSApplied Sciences10.3390/app1007246410:7(2464)Online publication date: 3-Apr-2020
    • (2020)SPECTRUMACM Transactions on Embedded Computing Systems10.1145/340003219:5(1-28)Online publication date: 26-Sep-2020
    • (2019)SPECTRUM: a software defined predictable many-core architecture for LTE baseband processingProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326352(82-96)Online publication date: 23-Jun-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 7, Issue 1
    December 2007
    234 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/1324969
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 12 December 2007
    Accepted: 01 April 2006
    Revised: 01 December 2004
    Received: 01 April 2003
    Published in TECS Volume 7, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Worst-case execution time
    2. data cache analysis
    3. embedded systems
    4. safety critical systems

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Execution Model to Reduce the Interference of Shared Memory in ARINC 653 Compliant Multicore RTOSApplied Sciences10.3390/app1007246410:7(2464)Online publication date: 3-Apr-2020
    • (2020)SPECTRUMACM Transactions on Embedded Computing Systems10.1145/340003219:5(1-28)Online publication date: 26-Sep-2020
    • (2019)SPECTRUM: a software defined predictable many-core architecture for LTE baseband processingProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326352(82-96)Online publication date: 23-Jun-2019
    • (2018)PhLock: A Cache Energy Saving Technique Using Phase-Based Cache LockingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.275747726:1(110-121)Online publication date: Jan-2018
    • (2018)Time-Critical Systems Design: A SurveyIEEE Design & Test10.1109/MDAT.2018.279420435:2(8-26)Online publication date: Apr-2018
    • (2017)Dynamic Data-Cache Locking for Minimizing the WCET of a Single TaskACM Transactions on Embedded Computing Systems10.1145/299460216:2(1-29)Online publication date: 2-Jan-2017
    • (2017)Memory-Aware Optimization of Embedded Software for Multiple ObjectivesHandbook of Hardware/Software Codesign10.1007/978-94-017-7358-4_27-2(1-37)Online publication date: 17-Apr-2017
    • (2017)Memory-Aware Optimization of Embedded Software for Multiple ObjectivesHandbook of Hardware/Software Codesign10.1007/978-94-017-7267-9_27(829-865)Online publication date: 27-Sep-2017
    • (2016)A Survey of Techniques for Cache LockingACM Transactions on Design Automation of Electronic Systems10.1145/285879221:3(1-24)Online publication date: 16-May-2016
    • (2016)Predicting Cross-Core Performance Interference on Multicore Processors with Regression AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.244298327:5(1443-1456)Online publication date: 1-May-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media