Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Static Task Partitioning for Locked Caches in Multicore Real-Time Systems

Published: 21 January 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Growing processing demand on multitasking real-time systems can be met by employing scalable multicore architectures. For such environments, locking cache lines for hard real-time systems ensures timing predictability of data references and may lower worst-case execution time. This work studies the benefits of cache locking on massive multicore architectures with private caches in the context of hard real-time systems. In shared cache architectures, the cache is a single resource shared among all of the tasks. However, in scalable cache architectures with private caches, conflicts exist only among the tasks scheduled on one core. This calls for a cache-aware allocation of tasks onto cores.
    The objective of this work is to increase the predictability of memory accesses resolved by caches while reducing the number of cores for a given task set. This allows designers to reduce the footprint of their subsystem of real-time tasks and thereby cost, either by choosing a product with fewer cores as a target or to allow more subsystems to be co-located on a given fixed number of cores.
    Our work proposes a novel variant of the cache-unaware First Fit Decreasing (FFD) algorithm called Naive locked First Fit Decreasing (NFFD) policy. We propose two cache-aware static scheduling schemes: (a) Greedy First Fit Decreasing (GFFD) and (b) Colored First Fit Decreasing (CoFFD) for task sets where tasks do not have intratask conflicts among locked regions (Scenario A). NFFD is capable of scheduling high utilization task sets that FFD cannot schedule. Experiments also show that CoFFD consistently outperforms GFFD, resulting in a lower number of cores and lower system utilization. CoFFD reduces the number of core requirements by 30% to 60% compared to NFFD.
    For a more generic case where tasks have intratask conflicts, we split the task partitioning between two phases: task selection and task allocation (Scenario B). Instead of resolving conflicts at a global level, these algorithms resolve conflicts among regions while allocating a task onto a core and unlocking at region level instead of task level. We show that a combination of dynamic ordering (task selection) with Chaitin’s Coloring (task allocation) scheme reduces the number of cores required by up to 22% over a basic scheme (in a combination of monotone ordering and regional FFD). Regional unlocking allows this scheme to outperform CoFFD for medium utilization task sets from Scenario A. However, CoFFD performs better than any other scheme for high utilization task sets from Scenario A. Overall, this work is unique in considering the challenges of future multicore architectures for real-time systems and provides key insights into task partitioning and cache-locking mechanisms for architectures with private caches.

    References

    [1]
    Adapteva. 2014. Parallella Computer Specifications. Retrieved October 27, 2014, from http://www.parallella.org/board/.
    [2]
    B. Akesson, K. Goossens, and M. Ringhofer. 2007. Predator: A predictable SDRAM memory controller. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 251--256.
    [3]
    J. Anderson, J. Calandrino, and U. Devi. 2006. Real-time scheduling on multicore platforms. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 179--190.
    [4]
    ARM. 2014. ARM11 MPCore Processor. Retrieved October 27, 2014, from http://www.arm.com/products/processors/classic/arm11/arm11-mpcore.php.
    [5]
    A. Burchard, J. Liebeherr, Y. Oh, and S. H. Son. 1995. New strategies for assigning real-time tasks to multiprocessor systems. IEEE Transactions on Computers 44, 12, 1429--1442.
    [6]
    J. V. Busquets-Matraix. 1996. Adding instruction cache effect to an exact schedulability analysis of preemptive real-time systems. In Proceedings of the 8th Euromicro Workshop on Real-Time Systems. 271--276.
    [7]
    J. V. Busquets-Matraix. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of the 9th EuroMicro Workshop on Real-Time Systems. 56--63.
    [8]
    J. Calandrino and J. Anderson. 2008. Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study. In Proceedings of the 20th Euromicro Conference on Real-Time Systems. 209--308.
    [9]
    G. J. Chaitin. 1982. Register allocation and spilling via graph coloring. ACM SIGPLAN Notices 17, 6, 98--101.
    [10]
    S. Chattopadhyay, A. Roychoudhury, and T. Mitra. 2010. Modeling shared cache and bus in multi-cores for timing analysis. In Proceedings of the 13th International Workshop on Software and Compilers for Embedded Systems (SCOPES’10). ACM, New York, NY, Article No. 6.
    [11]
    D. Choffnes, M. Astley, and M. J. Ward. 2008. Migration policies for multi-core fair-share scheduling. ACM SIGOPS Operating Systems Review 42, 92--93.
    [12]
    B. D. de Dinechin, P. G. de Massas, G. Lager, C. Leger, B. Orgogozo, J. Reybert, and T. Strudel. 2013. A distributed run-time environment for the Kalray MPPAÂ-256 integrated manycore processor. Procedia Computer Science 18, 1654--1663.
    [13]
    R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign (CODES/CASHE’98). IEEE, Los Alamitos, CA, 97--101. http://dl.acm.org/citation.cfm?id=278241.278309.
    [14]
    N. Eisley, L.-S. Peh, and L. Shang. 2008. Leveraging on-chip networks for data cache migration in chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 197--207.
    [15]
    Jakob Engblom. 2003. Analysis of the execution time unpredictability caused by dynamic branch prediction. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 152.
    [16]
    Freescale. 2008. P4080 Multicore Processor. Retrieved October 27, 2014, from http://cache.freescale.com/files/netcomm/doc/fact_sheet/QorIQ_P4080.pdf.
    [17]
    M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co.
    [18]
    N. Guan, M. Stigge, W. Yi, and G. Yu. 2009. Cache-aware scheduling and analysis for multicores. In Proceedings of the 7th ACM International Conference on Embedded Software (EMSOFT’09). ACM, New York, NY, 245--254.
    [19]
    D. Hardy, T. Piquet, and I. Puaut. 2009. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS’09). IEEE, Los Alamitos, CA, 68--77.
    [20]
    J. Herter, P. Backes, F. Haupenthal, and J. Reineke. 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 23--32.
    [21]
    J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Bork, G. Schrom, and others. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10) IEEE International. IEEE, 108--109.
    [22]
    T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). ACM, New York, NY, Article No. 53.
    [23]
    T. Li, P. Brett, B. Hohlt, R. Knauerhase, S. D. McElderry, and S. Hahn. 2008. Operating system support for shared-ISA asymmetric multi-core architectures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture. 19--26.
    [24]
    J. Liedke, H. Härtig, and M. Hohmuth. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 213--223.
    [25]
    T. Liu, M. Li, and C. Jason Xue. 2009. Minimizing WCET for real-time embedded systems via static instruction cache locking. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’09). 35--44.
    [26]
    T. Liu, Y. Zhao, M. Li, and C. J. Xue. 2010. Task assignment with cache partitioning and locking for WCET minimization on MPSoC. In Proceedings of the 39th International Conference on Parallel Processing. 573--582.
    [27]
    R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni. 2013. Real-time cache management framework for multi-core architectures. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 45--54.
    [28]
    F. Mueller. 1995. Compiler support for software-based cache partitioning. In Proceedings of the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems. 137--145.
    [29]
    J. Ouyang and Y. Xie. 2010. LOFT: A high performance network-on-chip providing quality-of-service support. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 409--420.
    [30]
    M. Paolieri, E. Quiñones, F. J. Cazorla, G. Bernat, and M. Valero. 2009. Hardware support for WCET analysis of hard real-time multicore systems. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 57--68.
    [31]
    M. Paolieri, E. Quiñones, F. J. Cazorla, R. I. Davis, and M. Valero. 2011. IA3: An interference aware allocation algorithm for multicore hard real-time systems. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 280--290.
    [32]
    S. Plazar, J. C. Kleinsorge, P. Marwedel, and H. Falk. 2012. WCET-aware static locking of instruction caches. In Proceedings of the 10th International Symposium on Code Generation and Optimization. 44--52.
    [33]
    I. Puaut. 2006. WCET-centric software-controlled instruction caches for hard real-time systems. In Proceedings of the 8th Euromicro Conference on Real-Time Systems (ECRTS’06). IEEE, Los Alamitos, CA, 217--226.
    [34]
    I. Puaut and D. Decotigny. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of the 23rd IEEE Real-Time Systems Symposium (RTSS’02). IEEELos Alamitos, CA, 114. http://dl.acm.org/citation.cfm?id=827272.829141
    [35]
    I. Puaut and D. Hardy. 2007. Predictable paging in real-time systems: A compiler approach. In Proceedings of the 19th Euromicro Conference on Real-Time Systems. 169--178.
    [36]
    I. Puaut and C. Pais. 2007. Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison. In Proceedings of the Conference on Design, Automation, and Test in Europe. 1484--1489. http://portal.acm.org/citation.cfm?id=1266366.1266692.
    [37]
    H. Ramaprasad and F. Mueller. 2011. Tightening the bounds on feasible preemptions. ACM Transactions on Embedded Computing Systems 10, 2, Article No. 27.
    [38]
    A. Sarkar, F. Mueller, and H. Ramaprasad. 2012. Static task partitioning for locked caches in multi-core real-time systems. In Proceedings of the Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 161--170.
    [39]
    V. Suhendra and T. Mitra. 2008. Exploring locking and partitioning for predictable shared caches on multi-cores. In Proceedings of the 45th Annual Design Automation Conference. ACM, New York, NY, 300--303.
    [40]
    Tilera. 2009. Tilera Processor Family. Retrieved October 27, 2014, from http://www.tilera.com/.
    [41]
    X. Vera, B. Lisper, and J. Xue. 2003. Data caches in multitasking hard real-time systems. In Proceedings of the 24th IEEE International Real-Time Systems Symposium (RTSS’03). IEEE, Los Alamitos, CA, 154. http://dl.acm.org/citation.cfm?id=956418.956619.
    [42]
    X. Vera, B. Lisper, and J. Xue. 2007. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems 7, 1, 4:1--4:38.
    [43]
    B. C. Ward, J. L. Herman, C. J. Kenna, and J. H. Anderson. 2013. Making shared caches more predictable on multicore platforms. In Proceedings of the 25th Euromicro Conference on Real-Time Systems. 157--167.
    [44]
    A. Wolfe. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the Workshop on Responsive Computer Systems.
    [45]
    H. Yuny, R. Mancusoz, Z.-P. Wu, and R. Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium.

    Cited By

    View all
    • (2024)LAG-based schedulability analysis for preemptive global EDF scheduling with dynamic cache allocationJournal of Systems Architecture10.1016/j.sysarc.2023.103045147(103045)Online publication date: Feb-2024
    • (2023)Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00028(224-236)Online publication date: 5-Dec-2023
    • (2023)LAG-Based Analysis for Preemptive Global Scheduling with Dynamic Cache Allocation2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA58653.2023.00022(107-116)Online publication date: 30-Aug-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 14, Issue 1
    January 2015
    443 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2724585
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 21 January 2015
    Accepted: 01 June 2014
    Revised: 01 June 2014
    Received: 01 September 2012
    Published in TECS Volume 14, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Real-time systems
    2. multicore architectures
    3. timing analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)LAG-based schedulability analysis for preemptive global EDF scheduling with dynamic cache allocationJournal of Systems Architecture10.1016/j.sysarc.2023.103045147(103045)Online publication date: Feb-2024
    • (2023)Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00028(224-236)Online publication date: 5-Dec-2023
    • (2023)LAG-Based Analysis for Preemptive Global Scheduling with Dynamic Cache Allocation2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA58653.2023.00022(107-116)Online publication date: 30-Aug-2023
    • (2022)Holistic Resource Allocation Under Federated Scheduling for Parallel Real-time TasksACM Transactions on Embedded Computing Systems10.1145/348946721:1(1-29)Online publication date: 14-Jan-2022
    • (2022)A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore PlatformsIEEE Access10.1109/ACCESS.2022.315189110(21853-21882)Online publication date: 2022
    • (2018)PhLock: A Cache Energy Saving Technique Using Phase-Based Cache LockingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.275747726:1(110-121)Online publication date: Jan-2018
    • (2016)A Survey of Techniques for Cache LockingACM Transactions on Design Automation of Electronic Systems10.1145/285879221:3(1-24)Online publication date: 16-May-2016

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media