article

An optimal memory allocation scheme for scratch-pad-based embedded systems

Authors:

Dave StewartAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 1, Issue 1

Pages 6 - 26

https://doi.org/10.1145/581888.581891

Published: 01 November 2002 Publication History

Abstract

This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.

References

[1]

Appel, A. and George, L. 2001. Optimal spilling for CISC machines with few registers. In Proceedings of the SIGPLAN '01 Conference on Program Language Design and Implementation (Snowbird, UT, June).]]

[2]

Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In 10th International Symposium on Hardware/Software Codesign (CODES) (Estes Park, CO, May 6--8). ACM, New York.]]

[3]

Barua, R., Lee, W., Amarasinghe, S., and Agarwal, A. 2001. Compiler support for scalable and efficient memory systems. IEEE Trans. Comput., Special Issue on Advances in High Performance Memory Systems (Nov.).]]

[4]

Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. 47, 9 (Sept.).]]

[5]

Consortium, T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]]

[6]

Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. Available at http://www.eecs.umich.edu/jringenb/mibench/.]]

[7]

Hallnor, G., and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA) (Vancouver, B.C., Canada, June).]]

[8]

Hennessy, J. and Patterson, D. 1996. Computer Architecture A Quantitative Approach. 2nd ed. Morgan-Kaufmann, Palo Alto, Calif.]]

[9]

Lee, T.-C., Tiwari, V., Malik, S., and Fujita, M. 1997. Power analysis and minimization techniques for embedded DSP software. IEEE Trans. VLSI Syst. (Mar.).]]

[10]

Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.]]

[11]

Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for flexible compiler generated data caching. In The 2d Workshop on Intelligent Memory Systems (Boston, MA, Nov. 12).]]

[12]

CPU12 Reference Manual. Motorola Corporation, 2000. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16_BIT/68HC12_FAMILY/REF_MAT/CPU12RM.pdf.]]

[13]

M-CORE---MMC2001 Reference Manual. Motorola Corporation, 1998. http://www.motorola. com/SPS/MCORE/info_documentation.htm.]]

[14]

New York City, Office of Budget and Management. 1999. Website on frequently asked questions on linear programming. http://www.eden.rutgers.edu/&sim;pil/FAQ.html. New York, NY.]]

[15]

University of Toronto Digital Signal Processing (UTDSP). 1992. University of Toronto Digital Signal Processing (UTDSP) Benchmark Suite. Available at http://www.eecg.toronto.edu/.]]

[16]

Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July).]]

[17]

Paulin, P., Liem, C., Cornero, M., Nacabal, F., and Goossens, G. 1997. Embedded software in real-time signal processing systems: Application and architecture trends. Invited paper, Proc. IEEE 85, 3 (Mar.).]]

[18]

Rutter, P., Orost, J., and Gloistein, D. BTOA: Binary to printable ASCII converter source code. Available at http://www.bookcase.com/library/software/msdos.devel.lang.c.html.]]

[19]

Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems. Dec.]]

[20]

Sjodin, J. and von Platen, C. 2001. Storage allocation for embedded processors. 2001 Compiler and Architecture Support for Embedded Computing Systems. Nov.]]

[21]

TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.]]

Cited By

Jaiswal SGoli RKumar ASeshadri VSharma REgger BLee D(2023)MinUn: Accurate ML Inference on MicrocontrollersProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596278(26-39)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596278
Chen LHuang HDanielczuk MIchnowski JGoldberg K(2022)Optimal Shelf Arrangement to Minimize Robot Retrieval Time2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)10.1109/CASE49997.2022.9926722(993-1000)Online publication date: 20-Aug-2022
https://dl.acm.org/doi/10.1109/CASE49997.2022.9926722
Körür SJungklass PBerekovic M(2022)Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem SpeicherlayoutEchtzeit 202110.1007/978-3-658-37751-9_10(83-92)Online publication date: 25-May-2022
https://doi.org/10.1007/978-3-658-37751-9_10
Show More Cited By

Index Terms

An optimal memory allocation scheme for scratch-pad-based embedded systems

Recommendations

Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

This article presents the first memory allocation scheme for embedded systems having a scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. All ...
Heterogeneous memory management for embedded systems
CASES '01: Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems

This paper presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in micro-controllers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, ...
Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems

Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 1, Issue 1

November 2002

140 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/581888

Issue’s Table of Contents

Copyright © 2002 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 November 2002

Published in TECS Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

293
Total Citations
View Citations
2,716
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)5

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jaiswal SGoli RKumar ASeshadri VSharma REgger BLee D(2023)MinUn: Accurate ML Inference on MicrocontrollersProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596278(26-39)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596278
Chen LHuang HDanielczuk MIchnowski JGoldberg K(2022)Optimal Shelf Arrangement to Minimize Robot Retrieval Time2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)10.1109/CASE49997.2022.9926722(993-1000)Online publication date: 20-Aug-2022
https://dl.acm.org/doi/10.1109/CASE49997.2022.9926722
Körür SJungklass PBerekovic M(2022)Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem SpeicherlayoutEchtzeit 202110.1007/978-3-658-37751-9_10(83-92)Online publication date: 25-May-2022
https://doi.org/10.1007/978-3-658-37751-9_10
Zhang YChen LNie XZhang ZWei HShi G(2021)Accelerating Data-Flow Analysis with Full-Partitioning2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00184(1345-1352)Online publication date: Sep-2021
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00184
OKI YABE YYAMAMOTO KYAMAMOTO KSHIRAKAWA TYOSHIDA AKIMURA KKASAHARA H(2020)Local Memory Mapping of Multicore Processors on an Automatic Parallelizing CompilerIEICE Transactions on Electronics10.1587/transele.2019LHP0010E103.C:3(98-109)Online publication date: 1-Mar-2020
https://doi.org/10.1587/transele.2019LHP0010
Venkataramani VKulkarni AMitra TPeh L(2020)SPECTRUMACM Transactions on Embedded Computing Systems10.1145/340003219:5(1-28)Online publication date: 26-Sep-2020
https://dl.acm.org/doi/10.1145/3400032
Jost TNazar GCarro L(2020)SoMMA: A software-managed memory architecture for multi-issue processorsMicroprocessors and Microsystems10.1016/j.micpro.2020.10313977(103139)Online publication date: Sep-2020
https://doi.org/10.1016/j.micpro.2020.103139
Vyas ABatra U(2020)Identification of Safety and Security Vulnerabilities in Cyber Physical SystemsData Science and Analytics10.1007/978-981-15-5830-6_1(3-13)Online publication date: 28-May-2020
https://doi.org/10.1007/978-981-15-5830-6_1
Li H(2020)Data-Driven Resource AllocationEncyclopedia of Wireless Networks10.1007/978-3-319-78262-1_94(300-302)Online publication date: 30-Aug-2020
https://doi.org/10.1007/978-3-319-78262-1_94
(2019)Developments in memory management in OpenMPInternational Journal of High Performance Computing and Networking10.5555/3302714.330271913:1(70-85)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3302714.3302719
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents