Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors

Published: 24 December 2013 Publication History

Abstract

We study the trade-off between throughput and memory footprint of embedded software that is synthesized from acyclic static dataflow (task graph) specifications targeting distributed memory multiprocessors. We identify iteration overlapping as a knob in the synthesis process by which one can trade application throughput for its memory requirement. Given an initial processor assignment and non-overlapped task schedule, we formally present underlying properties of the problem, such as constraints on a valid iteration overlapping, maximum possible throughput, and minimum memory footprint. Moreover, we develop an effective algorithm for generation of a rich set of design points that provide a range of trade-off options. Experimental results on a number of applications and architectures validate the effectiveness of our approach.

References

[1]
Battacharyya, S. S., Lee, E. A., and Murthy, P. K. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, Norwell, MA.
[2]
Bell, S., Edwards, B., Amann, J., Conlin, R., Joyce, K., Lenng, V., MacKay, J., Reif, M., Bao, L., Brown, J., Mattina, M., Mia., C.-C., Ramey, C., Wentzlaff, D., Anderson, W., Berger, E., Fairbanks, N., Khan, D., Montenegro, F., Sticknay, J., and Zooks, J. 2008. TILE64 processor: A 64-core SoC with mesh interconnect. In Proceedings of the International Solid-State Circuits Conference (ISSCC).
[3]
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. 2010. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2, 16--29.
[4]
Geilen, M. and Basten, T. 2004. Reactive process networks. In Proceedings of the International Conference on Embedded Software (EMSOFT), 137--146.
[5]
Gordon, M. I., Theis, W., and Amarasingher, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[6]
Hormati, A. H., Choi, Y., Kudlur, M., Rabbah, R., Mudge, T., and Mahlke, S. 2009. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 214--223.
[7]
Iosifidis, Y., Mallik, A., Mamagkakis, S., DeGreef, E., Bartzas, A., Sondris, D., and Catthoor, F. 2010. A framework for automatic parallelization, static and dynamic memory optimization in MPSoC platforms. In Proceedings of the Design Automation Conference (DAC), 549--554.
[8]
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the Cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604.
[9]
Ko, M.-Y., Murthy, P. K., and Bhattacharyya, S. S. 2007. Beyond single-appearance schedules: Efficient DSP software synthesis using nested procedure calls. ACM Trans. Embed. Comput. Syst. (TECS) 6, 2.
[10]
Kudlur, M. and Mahlke, S. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 114--124.
[11]
Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. Proc. IEEE 75, 9, 1235--1245.
[12]
Murthy, P. K. and Bhattacharyya, S. S. 2004. Buffer merging - a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 9, 2, 212--237.
[13]
Panesar, G., Towner, D., Duller, A., Gray, A., and Robbins, W. 2006. Deterministic parallel processing. Int. J. Parallel Program. 34, 323--341.
[14]
Pimentel, A. D., Hertzberger, L. O., Lieverse, P., van der Wolf, P., and Deprettere, E. F. 2001. Exploring embedded-systems architectures with Artemis. IEEE Comput. 34, 11, 57--63.
[15]
Rau, R. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the International Symposium on Microarchitecture (MICRO). 63--74.
[16]
Rusu, S., Tam, S., Muljono, H., Ayers, D., Chang, J., Varada, R., Ratta, M., and Vora, S. 2009. A 45nm 8-core enterprise Xeon processor. In Proceedings of the International Solid-State Circuits Conference (ISSCC). 9--12.
[17]
Ruttenbergand, J., Gao, G., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI).
[18]
Stuijk, S., Geilen, M., and Basten, T. 2008. Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. IEEE Trans. Comput. 57, 10, 1331--1345.
[19]
Thiele, L. and Wandeler, E. 2005. Performance analysis of distributed embedded systems. In Proceedings of the Embedded Systems Handbook. CRC Press.
[20]
Truong, D., Cheng, W. H., Mohsenin, T., Yu, Z., Jacobsen, A. T., Landge, G., Meenwsen, M. J., Watnik, C., Tran, A. T., Xiao, Z., Work, E. W., Webb, J. W., Mejia, P. V., and Bass, B. M. 2008. A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling. In Proceedings of the Symposium on VLSI Circuits.
[21]
Wiggers, M., Bekooij, M., Geilen, M., and Basten, T. 2010. Simultaneous budget and buffer size computation for throughput-constrained task graphs. In Proceedings of the Design, Automation, and Test in Europe (DATE).
[22]
Xue, L., Ozturk, O., Li, F., Kandemir, M. T., and Kolcu, I. 2006. Dynamic partitioning of processing and memory resources in embedded MPSoC architectures. In Proceedings of the Design, Automation, and Test in Europe (DATE), 690--695.

Cited By

View all
  • (2023)Debt Financing and Financial Performance of Manufacturing Firms in Kenya. African Journal of Commercial StudiesAfrican Journal of Commercial Studies10.59413/ajocs/v3.i2.13:2(86-95)Online publication date: 9-Sep-2023
  • (2019)ECG Classification Algorithm Based on STDP and R-STDP Neural Networks for Real-Time Monitoring on Ultra Low-Power Personal Wearable DevicesIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2019.294892013:6(1483-1493)Online publication date: Dec-2019
  • (2016)Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and RemediesModel-Implementation Fidelity in Cyber Physical System Design10.1007/978-3-319-47307-9_4(91-127)Online publication date: 10-Dec-2016
  • Show More Cited By

Index Terms

  1. Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 13, Issue 3
    December 2013
    385 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2539036
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 24 December 2013
    Accepted: 01 May 2012
    Revised: 01 December 2011
    Received: 01 July 2011
    Published in TECS Volume 13, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Iteration overlapping
    2. distributed-memory message-passing multiprocessor SoC
    3. modulo scheduling
    4. software pipelining
    5. stream applications

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Debt Financing and Financial Performance of Manufacturing Firms in Kenya. African Journal of Commercial StudiesAfrican Journal of Commercial Studies10.59413/ajocs/v3.i2.13:2(86-95)Online publication date: 9-Sep-2023
    • (2019)ECG Classification Algorithm Based on STDP and R-STDP Neural Networks for Real-Time Monitoring on Ultra Low-Power Personal Wearable DevicesIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2019.294892013:6(1483-1493)Online publication date: Dec-2019
    • (2016)Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and RemediesModel-Implementation Fidelity in Cyber Physical System Design10.1007/978-3-319-47307-9_4(91-127)Online publication date: 10-Dec-2016
    • (2015)Implementation-Aware Model AnalysisACM SIGPLAN Notices10.1145/2808704.275496850:5(1-10)Online publication date: 4-Jun-2015
    • (2015)Implementation-Aware Model AnalysisProceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM10.1145/2670529.2754968(1-10)Online publication date: 4-Jun-2015
    • (2014)Time-Scalable Mapping for Circuit-Switched GALS Chip Multiprocessor PlatformsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2014.229995833:5(752-762)Online publication date: May-2014

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media