Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multithreaded Simulation for Synchronous Dataflow Graphs

Published: 01 June 2011 Publication History
  • Get Citation Alerts
  • Abstract

    For system simulation, Synchronous DataFlow (SDF) has been widely used as a core model of computation in design tools for digital communication and signal processing systems. The traditional approach for simulating SDF graphs is to compute and execute static schedules in single-processor desktop environments. Nowadays, however, multicore processors are increasingly popular desktop platforms for their potential performance improvements through thread-level parallelism. Without novel scheduling and simulation techniques that explicitly explore thread-level parallelism for executing SDF graphs, current design tools gain only minimal performance improvements on multicore platforms. In this article, we present a new multithreaded simulation scheduler, called MSS, to provide simulation runtime speedup for executing SDF graphs on multicore processors. MSS strategically integrates graph clustering, intracluster scheduling, actor vectorization, and intercluster buffering techniques to construct InterThread Communication (ITC) graphs at compile-time. MSS then applies efficient synchronization and dynamic scheduling techniques at runtime for executing ITC graphs in multithreaded environments. We have implemented MSS in the Advanced Design System (ADS) from Agilent Technologies. On an Intel dual-core, hyper-threading (4 processing units) processor, our results from this implementation demonstrate up to 3.5 times speedup in simulating modern wireless communication systems (e.g., WCDMA3G, CDMA 2000, WiMax, EDGE, and Digital TV).

    References

    [1]
    Ade, M., Lauwereins, R., and Peperstraete, J. A. 1997. Data memory minimization for synchronous data flow graphs emulated on DSP-FPGA targets. In Proceedings of the Design Automation Conference.
    [2]
    Bhattacharyya, S. S. and Lee, E. A. 1994. Memory management for dataflow programming of multirate signal processing algorithms. IEEE Trans. Signal Process. 42, 5, 1190--1201.
    [3]
    Bhattacharyya, S. S., Murthy, P. K., and Lee, E. A. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers.
    [4]
    Buck, J. T. 1993. Scheduling dynamic dataflow graphs with bounded memory using the token flow model. Ph.D. Thesis UCB/ERL 93/69, Department of EECS, University of California Berkeley.
    [5]
    Buck, J. T. and Vaidyanathan, R. 2000. Heterogeneous modeling and simulation of embedded systems in El Greco. In Proceedings of the International Workshop on Hardware/Software Codesign.
    [6]
    Buck, J. T., Ha, S., Lee, E. A., and Messerschmitt, D. G. 1994. Ptolemy: A framework for simulating and prototyping heterogeneous systems. Int. J. Comput. Simul. 4, 155--182.
    [7]
    Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms 2nd Ed. The MIT Press.
    [8]
    Cubric, M. and Panangaden, P. 1993. Minimal memory schedules for dataflow networks. In Proceedings of the International Conference on Concurrency Theory (CONCUR’93). 368--383.
    [9]
    Eker, J., Janneck, J. W., Lee, E. A., Liu, J., Liu, X., Ludvig, J., Neuendorffer, S., Sachs, S., and Xiong, Y. 2003. Taming heterogeneity - The Ptolemy approach. Proc. IEEE 91, 1, 127--144.
    [10]
    Geilen, M., Basten, T., and Stuijk, S. 2005. Minimising buffer requirements of synchronous dataflow graphs with model checking. In Proceedings of the Design Automation Conference. 819--824.
    [11]
    Ghamarian, A. H., Geilen, M. C. W., Stuijk, S., Basten, T., Moonen, A. J. M., Bekooij, M. J. G., Theelen, B. D., and Mousavi, M. R. 2006. Throughput analysis of synchronous data flow graphs. In Proceedings of the International Conference on Application of Concurrency to System Design.
    [12]
    Govindarajan, R., Gao, G. R., and Desai, P. 2002. Minimizing buffer requirements under rate-optimal schedule in regular dataflow networks. J. VLSI Signal Process. 31, 207--229.
    [13]
    Hsu, C. 2007. Dataflow integration and simulation techniques for DSP system design tools. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Maryland, College Park.
    [14]
    Hsu, C., Ko, M., and Bhattacharyya, S. S. 2005. Software synthesis from the Dataflow Interchange Format. In Proceedings of the International Workshop on Software and Compilers for Embedded Systems. 37--49.
    [15]
    Hsu, C., Ko, M., Bhattacharyya, S. S., Ramasubbu, S., and Pino, J. L. 2007. Efficient simulation of critical synchronous dataflow graphs. ACM Trans. Des. Autom. Electron. Syst. 12, 3, 21.
    [16]
    Hsu, C., Pino, J. L., and Bhattacharyya, S. S. 2008. Multithreaded simulation for synchronous dataflow graphs. In Proceedings of the Design Automation Conference. 331--336.
    [17]
    Hsu, C., Pino, J. L., and Hu, F. 2010. A mixed-mode vector-based dataflow approach for modeling and simulating LTE physical layer. In Proceedings of the Design Automation Conference.
    [18]
    Hsu, C., Ramasubbu, S., Ko, M., Pino, J. L., and Bhattacharyya, S. S. 2006. Efficient simulation of critical synchronous dataflow graphs. In Proceedings of the Design Automation Conference. 893--898.
    [19]
    Kianzad, V. and Bhattacharyya, S. S. 2006. Efficient techniques for clustering and scheduling onto embedded multiprocessors. IEEE Trans. Parall. Distrib. Syst. 17, 7, 667--680.
    [20]
    Kim, S. J. and Browne, J. C. 1988. A general approach to mapping of parallel computations upon multiprocessor architectures. In Proceedings of the International Conference on Parallel Processing. Vol. 3.
    [21]
    Kin, J. S. and Pino, J. L. 2003. Multithreaded synchronous data flow simulation. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition.
    [22]
    Ko, M., Murthy, P. K., and Bhattacharyya, S. S. 2004. Compact procedural implementation in DSP software synthesis through recursive graph decomposition. In Proceedings of the International Workshop on Software and Compilers for Embedded Systems. 47--61.
    [23]
    Ko, M., Shen, C., and Bhattacharyya, S. S. 2006. Memory-constrained block processing optimization for synthesis of DSP software. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. 137--143.
    [24]
    Lalgudi, K. N., Papaefthymiou, M. C., and Potkonjak, M. 2000. Optimizing computations for effective block-processing. ACM Trans. Des. Autom. Electron. Syst. 5, 3, 604--630.
    [25]
    Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous dataflow. Proc. IEEE 75, 9, 1235--1245.
    [26]
    Mozilla.org. 2011. NSPR reference. http://www.mozilla.org/projects/nspr/reference/html/index.html.
    [27]
    Murthy, P. K. and Bhattacharyya, S. S. 2006. Memory Management for Synthesis of DSP Software. CRC Press.
    [28]
    Pino, J. L. and Kalbasi, K. 1998. Cosimulating synchronous DSP applications with analog RF circuits. In Proceedings of the IEEE Asilomar Conference on Signals, Systems, and Computers.
    [29]
    Reiter, R. 1968. Scheduling parallel computations. J. Assoc. Comput. Mach. 15, 4.
    [30]
    Ritz, S., Pankert, M., Zivojinovic, V., and Meyr, H. 1993. Optimum vectorization of scalable synchronous dataflow graphs. In Proceedings of the International Conference on Application-Specific Array Processors. 285--296.
    [31]
    Saha, S., Shen, C., Hsu, C., Veeraraghavan, A., Sussman, A., and Bhattacharyya, S. S. 2006. Model-based OpenMP implementation of a 3D facial pose tracking system. In Proceedings of the Workshop on Parallel and Distributed Multimedia.
    [32]
    Sarkar, V. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. The MIT Press.
    [33]
    Sriram, S. and Bhattacharyya, S. S. 2009. Embedded Multiprocessors: Scheduling and Synchronization 2nd Ed. CRC Press.
    [34]
    Stefanov, T., Zissulescu, C., Turjan, A., Kienhuis, B., and Deprettere, E. 2004. System design using Kahn process networks: The Compaan/Laura approach. In Proceedings of the Design, Automation and Test in Europe Conference.
    [35]
    Stuijk, S., Geilen, M., and Basten, T. 2006. Exploring tradeoffs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the Design Automation Conference.
    [36]
    Sung, W., Oh, M., Im, C., and Ha, S. 1997. Demonstration of hardware software codesign workflow in PeaCE. In Proceedings of the International Conference on VLSI and CAD.

    Cited By

    View all
    • (2018)Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU PlatformsACM Transactions on Embedded Computing Systems10.1145/315766917:2(1-25)Online publication date: 30-Jan-2018
    • (2015)Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static SchedulingProceedings of the 18th International Workshop on Software and Compilers for Embedded Systems10.1145/2764967.2764972(68-75)Online publication date: 1-Jun-2015

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 16, Issue 3
    June 2011
    330 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/1970353
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 June 2011
    Accepted: 01 March 2011
    Revised: 01 March 2011
    Received: 01 March 2010
    Published in TODAES Volume 16, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Synchronous dataflow
    2. multithreaded simulation
    3. scheduling

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU PlatformsACM Transactions on Embedded Computing Systems10.1145/315766917:2(1-25)Online publication date: 30-Jan-2018
    • (2015)Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static SchedulingProceedings of the 18th International Workshop on Software and Compilers for Embedded Systems10.1145/2764967.2764972(68-75)Online publication date: 1-Jun-2015

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media