Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

An optimal memory allocation scheme for scratch-pad-based embedded systems

Published: 01 November 2002 Publication History
  • Get Citation Alerts
  • Abstract

    This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.

    References

    [1]
    Appel, A. and George, L. 2001. Optimal spilling for CISC machines with few registers. In Proceedings of the SIGPLAN '01 Conference on Program Language Design and Implementation (Snowbird, UT, June).]]
    [2]
    Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In 10th International Symposium on Hardware/Software Codesign (CODES) (Estes Park, CO, May 6--8). ACM, New York.]]
    [3]
    Barua, R., Lee, W., Amarasinghe, S., and Agarwal, A. 2001. Compiler support for scalable and efficient memory systems. IEEE Trans. Comput., Special Issue on Advances in High Performance Memory Systems (Nov.).]]
    [4]
    Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. 47, 9 (Sept.).]]
    [5]
    Consortium, T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]]
    [6]
    Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. Available at http://www.eecs.umich.edu/jringenb/mibench/.]]
    [7]
    Hallnor, G., and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA) (Vancouver, B.C., Canada, June).]]
    [8]
    Hennessy, J. and Patterson, D. 1996. Computer Architecture A Quantitative Approach. 2nd ed. Morgan-Kaufmann, Palo Alto, Calif.]]
    [9]
    Lee, T.-C., Tiwari, V., Malik, S., and Fujita, M. 1997. Power analysis and minimization techniques for embedded DSP software. IEEE Trans. VLSI Syst. (Mar.).]]
    [10]
    Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.]]
    [11]
    Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for flexible compiler generated data caching. In The 2d Workshop on Intelligent Memory Systems (Boston, MA, Nov. 12).]]
    [12]
    CPU12 Reference Manual. Motorola Corporation, 2000. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16_BIT/68HC12_FAMILY/REF_MAT/CPU12RM.pdf.]]
    [13]
    M-CORE---MMC2001 Reference Manual. Motorola Corporation, 1998. http://www.motorola. com/SPS/MCORE/info_documentation.htm.]]
    [14]
    New York City, Office of Budget and Management. 1999. Website on frequently asked questions on linear programming. http://www.eden.rutgers.edu/∼pil/FAQ.html. New York, NY.]]
    [15]
    University of Toronto Digital Signal Processing (UTDSP). 1992. University of Toronto Digital Signal Processing (UTDSP) Benchmark Suite. Available at http://www.eecg.toronto.edu/.]]
    [16]
    Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July).]]
    [17]
    Paulin, P., Liem, C., Cornero, M., Nacabal, F., and Goossens, G. 1997. Embedded software in real-time signal processing systems: Application and architecture trends. Invited paper, Proc. IEEE 85, 3 (Mar.).]]
    [18]
    Rutter, P., Orost, J., and Gloistein, D. BTOA: Binary to printable ASCII converter source code. Available at http://www.bookcase.com/library/software/msdos.devel.lang.c.html.]]
    [19]
    Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems. Dec.]]
    [20]
    Sjodin, J. and von Platen, C. 2001. Storage allocation for embedded processors. 2001 Compiler and Architecture Support for Embedded Computing Systems. Nov.]]
    [21]
    TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.]]

    Cited By

    View all
    • (2023)MinUn: Accurate ML Inference on MicrocontrollersProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596278(26-39)Online publication date: 13-Jun-2023
    • (2022)Optimal Shelf Arrangement to Minimize Robot Retrieval Time2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)10.1109/CASE49997.2022.9926722(993-1000)Online publication date: 20-Aug-2022
    • (2022)Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem SpeicherlayoutEchtzeit 202110.1007/978-3-658-37751-9_10(83-92)Online publication date: 25-May-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 1, Issue 1
    November 2002
    140 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/581888
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 November 2002
    Published in TECS Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Memory
    2. allocation
    3. embedded
    4. heterogeneous
    5. storage

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)MinUn: Accurate ML Inference on MicrocontrollersProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596278(26-39)Online publication date: 13-Jun-2023
    • (2022)Optimal Shelf Arrangement to Minimize Robot Retrieval Time2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)10.1109/CASE49997.2022.9926722(993-1000)Online publication date: 20-Aug-2022
    • (2022)Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem SpeicherlayoutEchtzeit 202110.1007/978-3-658-37751-9_10(83-92)Online publication date: 25-May-2022
    • (2021)Accelerating Data-Flow Analysis with Full-Partitioning2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00184(1345-1352)Online publication date: Sep-2021
    • (2020)Local Memory Mapping of Multicore Processors on an Automatic Parallelizing CompilerIEICE Transactions on Electronics10.1587/transele.2019LHP0010E103.C:3(98-109)Online publication date: 1-Mar-2020
    • (2020)SPECTRUMACM Transactions on Embedded Computing Systems10.1145/340003219:5(1-28)Online publication date: 26-Sep-2020
    • (2020)SoMMA: A software-managed memory architecture for multi-issue processorsMicroprocessors and Microsystems10.1016/j.micpro.2020.10313977(103139)Online publication date: Sep-2020
    • (2020)Identification of Safety and Security Vulnerabilities in Cyber Physical SystemsData Science and Analytics10.1007/978-981-15-5830-6_1(3-13)Online publication date: 28-May-2020
    • (2020)Data-Driven Resource AllocationEncyclopedia of Wireless Networks10.1007/978-3-319-78262-1_94(300-302)Online publication date: 30-Aug-2020
    • (2019)Developments in memory management in OpenMPInternational Journal of High Performance Computing and Networking10.5555/3302714.330271913:1(70-85)Online publication date: 1-Jan-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media