research-article

Scratchpad-Memory Management for Multi-Threaded Applications on Many-Core Architectures

Authors:

Vanchinathan Venkataramani,

Mun Choon Chan,

Tulika MitraAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 18, Issue 1

Article No.: 10, Pages 1 - 28

https://doi.org/10.1145/3301308

Published: 05 February 2019 Publication History

Get Access

Abstract

Contemporary many-core architectures, such as Adapteva Epiphany and Sunway TaihuLight, employ per-core software-controlled Scratchpad Memory (SPM) rather than caches for better performance-per-watt and predictability. In these architectures, a core is allowed to access its own SPM as well as remote SPMs through the Network-On-Chip (NoC). However, the compiler/programmer is required to explicitly manage the movement of data between SPMs and off-chip memory. Utilizing SPMs for multi-threaded applications is even more challenging, as the shared variables across the threads need to be placed appropriately. Accessing variables from remote SPMs with higher access latency further complicates this problem as certain links in the NoC may be heavily contended by multiple threads. Therefore, certain variables may need to be replicated in multiple SPMs to reduce the contention delay and/or the overall access time. We present Coordinated Data Management (CDM), a compile-time framework that automatically identifies shared/private variables and places them with replication (if necessary) to suitable on-chip or off-chip memory, taking NoC contention into consideration. We develop both an exact Integer Linear Programming (ILP) formulation as well as an iterative, scalable algorithm for placing the data variables in multi-threaded applications on many-core SPMs. Experimental evaluation on the Parallella hardware platform confirms that our allocation strategy reduces the overall execution time and energy consumption by 1.84× and 1.83×, respectively, when compared to the existing approaches.

References

[1]

Adapteva. 2014. Epiphany Architecture Reference Manual - Adapteva. Retrieved on January 24, 2019 from http://www.adapteva.com/docs/epiphany_arch_ref.pdf.

Google Scholar

[2]

Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. 2001. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. Int. J. Parallel Program. 29, 5 (Oct. 2001), 493--544.

Digital Library

Google Scholar

[3]

Federico Angiolini, Francesco Menichelli, Alberto Ferrero, Luca Benini, and Mauro Olivieri. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’04). ACM, New York, 259--267.

Digital Library

Google Scholar

[4]

Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1 (Nov. 2002), 6--26.

Digital Library

Google Scholar

[5]

Ke Bai and Aviral Shrivastava. 2010. Heap data management for limited local memory (LLM) multi-core processors. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. ACM, 317--326.

Digital Library

Google Scholar

[6]

Ke Bai and Aviral Shrivastava. 2013. Automatic and efficient heap data management for limited local memory multicore architectures. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’13). IEEE, 593--598.

Digital Library

Google Scholar

[7]

Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). ACM, New York, 73--78.

Digital Library

Google Scholar

[8]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. {n.d.}. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of PACT’08.

Digital Library

Google Scholar

[9]

Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ Algorithm: A practical approach for parallelization and locality optimization of affine loop nests. ACM Trans. Program. Lang. Syst. 38, 3 (April 2016), Article 12, 32 pages.

Digital Library

Google Scholar

[10]

Shekhar Borkar. 2007. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual Design Automation Conference (DAC’07). ACM, New York, 746--749.

Digital Library

Google Scholar

[11]

Peter Brauer, Martin Lundqvist, and Aare Mällo. 2016. Improving latency in a signal processing system on the epiphany architecture. In Proceedings of the 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). IEEE, 796--800.

Crossref

Google Scholar

[12]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of IISWC.

Digital Library

Google Scholar

[13]

Thomas Chen, Ram Raghavan, Jason N. Dale, and Eiji Iwata. 2007. Cell broadband engine architecture and its first implementation—A performance view. IBM J. Res. Dev. 51, 5 (2007), 559--572.

Digital Library

Google Scholar

[14]

Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embedded Comput. 1, 4 (Dec. 2005), 521--540.

Digital Library

Google Scholar

[15]

Bernhard Egger, Jaejin Lee, and Heonshik Shin. 2008. Dynamic scratchpad memory management for code in portable systems with an MMU. ACM Trans. Embed. Comput. Syst. 7, 2 (Jan. 2008), Article 11, 38 pages.

Digital Library

Google Scholar

[16]

Lei Fang, Peng Liu, Qi Hu, Michael C. Huang, and Guofan Jiang. 2013. Building expressive, area-efficient coherence directories. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT’13). IEEE Press, Piscataway, NJ, 299--308.

Digital Library

Google Scholar

[17]

Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference (DAC’04). ACM, New York, 238--243.

Digital Library

Google Scholar

[18]

Haohuan Fu et al. 2016. The sunway TaihuLight supercomputer: System and applications. Sci. China, Inf. Sci. (2016).

Google Scholar

[19]

Linley Gwennap. 2011. Adapteva: More flops, less watts. Microprocessor Report (2011).

Google Scholar

[20]

Abdelsalam A. Helal, Abdelsalam A. Heddaya, and Bharat B. Bhargava. 2006. Replication Techniques in Distributed Systems. Vol. 4. Springer Science 8 Business Media.

Google Scholar

[21]

Wei Hu, Gang Wang, Jian Chen, Xueqing Lou, and Tianzhou Chen. 2009. Efficient scratchpad memory management based on multi-thread for MPSoC architecture. In Proceedings of the International Conference on Scalable Computing and Communications; 8th International Conference on Embedded Computing (SCALCOM-EMBEDDEDCOM’09). IEEE, 429--434.

Digital Library

Google Scholar

[22]

Andhi Janapsatya, Aleksandar Ignjatović, and Sri Parameswaran. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference. IEEE Press, 612--617.

Digital Library

Google Scholar

[23]

Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh. 2017. On-Chip Networks (2nd ed.). Morgan and Claypool Publishers. 116--116 pages.

Google Scholar

[24]

SA Kalray. 2014. Kalray MPPA Manycore 256.

Google Scholar

[25]

M. Kandemir and A. Choudhary. 2002. Compiler-directed scratch pad memory hierarchy design and management. In Proceedings of the 2002 Design Automation Conference (IEEE Cat. No. 02CH37324). 628--633.

Digital Library

Google Scholar

[26]

Jussi Kangasharju, James Roberts, and Keith W. Ross. 2002. Object replication strategies in content distribution networks. Comput. Commun. 25, 4 (2002), 376--383.

Digital Library

Google Scholar

[27]

Chetana N. Keltcher, Kevin J. McGrath, Ardsher Ahmed, and Pat Conway. 2003. The AMD Opteron processor for multiprocessor servers. IEEE Micro 2 (2003), 66--76.

Digital Library

Google Scholar

[28]

Jakob Krarup and Peter Mark Pruzan. 1983. The simple plant location problem: Survey and synthesis. Eur. J. Op. Res. 12, 1 (1983), 36--81.

Crossref

Google Scholar

[29]

Lian Li, Hui Feng, and Jingling Xue. 2009. Compiler-directed scratchpad memory management via graph coloring. ACM Trans. Archit. Code Optim. 6, 3, Article 9 (Oct. 2009), 17 pages.

Digital Library

Google Scholar

[30]

Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. 1999. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proceedings of the 13th International Conference on Supercomputing (ICS’99). ACM, New York, 228--237.

Digital Library

Google Scholar

[31]

Jing Lu, Ke Bai, and A. Shrivastava. 2013. SSDM: Smart stack data management for software managed multicores (SMMs). In Proceedings of the 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 1--8.

Digital Library

Google Scholar

[32]

Jing Lu, Ke Bai, and Aviral Shrivastava. 2015. Efficient code assignment techniques for local memory on software managed multicores. ACM Trans. Embed. Comput. Syst. 14, 4 (Dec. 2015), Article 71, 24 pages.

Digital Library

Google Scholar

[33]

Timothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin Borkar, Greg Ruhl, et al. 2010. The 48-core SCC processor: The programmer’s view. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11.

Digital Library

Google Scholar

[34]

Nghi Nguyen, Angel Dominguez, and Rajeev Barua. 2009. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. ACM Trans. Embed. Comput. Syst. 8, 3 (April 2009), Article 21, 32 pages.

Digital Library

Google Scholar

[35]

Andreas Olofsson, Tomas Nordström, and Zain Ul-Abdin. 2014. Kickstarting high-performance energy-efficient manycore architectures with Epiphany. In Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers.

Crossref

Google Scholar

[36]

Amit Pabalkar, Aviral Shrivastava, Arun Kannan, and Jongeun Lee. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the 15th International Conference on High Performance Computing (HiPC’08). Springer-Verlag, Berlin, 569--582.

Digital Library

Google Scholar

[37]

Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3 (July 2000), 682--704.

Digital Library

Google Scholar

[38]

Louis-Noël Pouchet and T Yuki. 2012. PolyBench/C 3.2.

Google Scholar

[39]

Rajiv A. Ravindran, Pracheeti D. Nagarkar, Ganesh S. Dasika, Eric D. Marsman, Robert M. Senger, Scott A. Mahlke, and Richard B. Brown. 2005. Compiler managed dynamic instruction placement in a low-power code cache. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’05). IEEE Computer Society, Washington, DC, 179--190.

Digital Library

Google Scholar

[40]

David A. Richie and James A. Ross. 2016. OpenCL+ OpenSHMEM hybrid programming model for the Adapteva Epiphany architecture. In Workshop on OpenSHMEM and Related Technologies.

Google Scholar

[41]

Magnus Sjalander, Sally A. McKee, Peter Brauer, David Engdal, and Andras Vajda. 2012. An LTE uplink receiver PHY benchmark and subframe-based power management. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’12). IEEE Computer Society, Washington, DC, 25--34.

Digital Library

Google Scholar

[42]

Avinash Sodani. 2015. Knights landing (KNL): 2nd generation Intel® Xeon Phi processor. In Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS’15). IEEE, 1--24.

Crossref

Google Scholar

[43]

Vivy Suhendra, Chandrashekar Raghavan, and Tulika Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, 401--410.

Digital Library

Google Scholar

[44]

Rohan Tabish, Renato Mancuso, Saud Wasly, Ahmed Alhammad, Sujit S. Phatak, Rodolfo Pellizzoni, and Marco Caccamo. 2016. A real-time scratchpad-centric os for multi-core embedded systems. In Proceedings of the 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’16). IEEE, 1--11.

Crossref

Google Scholar

[45]

Brown Deer Technology. 2016. COPRTHR2 API Reference. Retrieved January 24, 2019 from https://bit.ly/2SIEvnf.

Google Scholar

[46]

Top 500 The List. 2017. List of Top 500 Supercomputers. Retrieved January 24, 2019 from https://www.top500.org/list/2017/11/.

Google Scholar

[47]

Sumesh Udayakumaran and Rajeev Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’03). ACM, New York, 276--286.

Digital Library

Google Scholar

[48]

Sumesh Udayakumaran, Angel Dominguez, and Rajeev Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2 (May 2006), 472--511.

Digital Library

Google Scholar

[49]

Manish Verma and Peter Marwedel. 2006. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Trans. Very Large Scale Integr. Syst. 14, 8 (Aug. 2006), 802--815.

Digital Library

Google Scholar

[50]

Manish Verma, Klaus Petzold, Lars Wehmeyer, Heiko Falk, and Peter Marwedel. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia. IEEE, 115--120.

Crossref

Google Scholar

[51]

Manish Verma, Lars Wehmeyer, and Peter Marwedel. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, Vol. 2. 1264--1269.

Digital Library

Google Scholar

[52]

Manish Verma, Lars Wehmeyer, and Peter Marwedel. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’04). ACM, New York, 104--109.

Digital Library

Google Scholar

[53]

Lars Wehmeyer, Urs Helmig, and Peter Marwedel. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture (WMPI’04). ACM, New York, 114--120.

Digital Library

Google Scholar

[54]

Hongzhou Zhao, Arrvindh Shriraman, and Sandhya Dwarkadas. 2010. SPACE: Sharing pattern-based directory coherence for multicore scalability. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). IEEE, 135--146.

Digital Library

Google Scholar

Cited By

View all

Gonzalez-Martinez GSandoval-Arechiga RSolis-Sanchez LGarcia-Luciano LIbarra-Delgado SSolis-Escobedo JGomez-Rodriguez JRodriguez-Abdala V(2024)A Survey of MPSoC Management toward Self-AwarenessMicromachines10.3390/mi1505057715:5(577)Online publication date: 26-Apr-2024
https://doi.org/10.3390/mi15050577
Sun ZZhou ZFu F(2024)Optimizing code allocation for hybrid on-chip memory in IoT systemsIntegration10.1016/j.vlsi.2024.10219597(102195)Online publication date: Jul-2024
https://doi.org/10.1016/j.vlsi.2024.102195
Sundari KNarmadha RRamani S(2022)A Classy Memory Management System (CyM2S) using an Isolated Dynamic Two-Level Memory Allocation (ID2LMA) Algorithm for the Real Time Embedded SystemsInternational Journal of Electrical and Electronics Research10.37391/ijeer.10025410:2(387-393)Online publication date: 30-Jun-2022
https://doi.org/10.37391/ijeer.100254
Show More Cited By

Index Terms

Scratchpad-Memory Management for Multi-Threaded Applications on Many-Core Architectures
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software

Recommendations

Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, ...

Reviews

Reviewer: Joseph M. Arul

This paper focuses on improving many-core architectures via software programmable or scratchpad memory (SPM): An SPM contains an array of [static random-access memory, SRAM] cells. A portion of the memory address space is dedicated to the SPM. Any address that falls within this dedicated address space can directly index into the SPM to access the corresponding data. Thus, by maintaining a dedicated area, the "coherency among multiple SPMs" at the software level can be eliminated. This use of software-level access to the data "thereby eliminat[es] the hardware area/power required for cache coherence," as well as cache access. In a many-core architecture environment, data access on many cores can drastically reduce performance due to coherency issues and long delays related to data access from different cores. In a many-core, multi-threaded architecture, as well as on-chip and off-chip, data accesses can lead to nonuniform, long-latency, and irregular data accesses. To overcome these difficulties in nonuniform data accesses, the paper proposes "a compile-time, coordinated data management framework called CDM, for many-core SPMs." For this paper, "the 16-core Epiphany SoC consists of an array of simple RISC processors (eCores) programmable in C connected together in a 2D-mesh NOC and supporting a single shared address space." Because a Xilinx Zynq system on chip (SoC) supports these eCores on the same development board, it is more energy efficient, unlike traditional cache memory. The eCores are not only able to access local memory, but are also capable of accessing remote memory. Several kernel applications from embedded, multithreaded benchmarks are used in the evaluation, including two benchmarks related to the decryption and encryption of data (AESD and AESE) and three long-term evolution (LTE) benchmarks (PHY_ACI, PHY_DEMAP, and PHY_MICF). The authors use a GREEDY approach as their baseline; SNAP-S allows only one copy of data, and SNAP-M uses a replication mechanism. As a result, "the SNAP-M approach provides an average speed-up of 1.84x and an energy reduction of 1.83x when compared to the GREEDY strategy." The SNAP-S approach "provides an average speed-up and energy reduction of 1.09x." Thus, these two approaches effectively speed up as well as reduce the energy usage due to no cache-like memory, which consumes more power when the data is accessed. The authors take advantage of bringing in off-chip data to the on-chip memory and not using cache-like memory; the use of SoC reduces energy consumption. Currently, a new type of memory is on the rise that can drastically reduce power consumption and is faster than DRAM and cache. When such memory comes into use, this paper will be obsolete. The overhead of bringing in off-chip data to the on-chip memory must also be considered. Besides, the SNAP-S speed-up compared to the GREEDY strategy is not significant; only when the data is replicated is significant improvement observed. One would expect a significant reduction in the SNAP-S strategy, because even the remote memory access data is reduced to the local memory accesses; however, that is not seen in the experimental results.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 18, Issue 1

Special Issue on MEMOCODE 2017 and Regular Papers

January 2019

259 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3305158

Editor:
Sandeep K. Shukla
Indian Institute of Technology, India

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 05 February 2019

Accepted: 01 December 2018

Revised: 01 July 2018

Received: 01 December 2017

Published in TECS Volume 18, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Research Foundation, Prime Minister?s Office, Singapore under its Industry-IHL Partnership Grant and Huawei International Pte. Ltd.

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
430
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)2

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gonzalez-Martinez GSandoval-Arechiga RSolis-Sanchez LGarcia-Luciano LIbarra-Delgado SSolis-Escobedo JGomez-Rodriguez JRodriguez-Abdala V(2024)A Survey of MPSoC Management toward Self-AwarenessMicromachines10.3390/mi1505057715:5(577)Online publication date: 26-Apr-2024
https://doi.org/10.3390/mi15050577
Sun ZZhou ZFu F(2024)Optimizing code allocation for hybrid on-chip memory in IoT systemsIntegration10.1016/j.vlsi.2024.10219597(102195)Online publication date: Jul-2024
https://doi.org/10.1016/j.vlsi.2024.102195
Sundari KNarmadha RRamani S(2022)A Classy Memory Management System (CyM2S) using an Isolated Dynamic Two-Level Memory Allocation (ID2LMA) Algorithm for the Real Time Embedded SystemsInternational Journal of Electrical and Electronics Research10.37391/ijeer.10025410:2(387-393)Online publication date: 30-Jun-2022
https://doi.org/10.37391/ijeer.100254
Shekarisaz MHoseinghorban ABazzaz MSalehi MEjlali A(2022)MASTER: Reclamation of Hybrid Scratchpad Memory to Maximize Energy Saving in Multi-Core Edge SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2021.30494477:4(749-760)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TSUSC.2021.3049447
Venkataramani VBodin BKulkarni Mohite AMitra TPeh L(2022)ASCENT: Communication Scheduling for SDF on Bufferless Software-Defined NoCIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312844541:10(3266-3275)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TCAD.2021.3128445
Long LDu JDeng XLiu RJiang YWang Y(2022)Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systemsFuture Generation Computer Systems10.1016/j.future.2022.05.005135(270-282)Online publication date: Oct-2022
https://doi.org/10.1016/j.future.2022.05.005
Fördős VBarbosa Rodrigues A(2022)Lesser Evil: Embracing Failure to Protect Overall System AvailabilityDistributed Applications and Interoperable Systems10.1007/978-3-031-16092-9_5(57-73)Online publication date: 6-Sep-2022
https://doi.org/10.1007/978-3-031-16092-9_5
Nasif AOthman ZSani N(2021)The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on IoT Nodes in Smart CitiesSensors10.3390/s2112422321:12(4223)Online publication date: 20-Jun-2021
https://doi.org/10.3390/s21124223
Sinha MHarsha GBhattacharyya PDeb S(2021)Design Space Optimization of Shared Memory Architecture in Accelerator-rich SystemsACM Transactions on Design Automation of Electronic Systems10.1145/344600126:4(1-31)Online publication date: 13-Mar-2021
https://dl.acm.org/doi/10.1145/3446001
Petrongonas ELeon VLentaris GSoudris D(2021)ParalOS: A Scheduling & Memory Management Framework for Heterogeneous VPUs2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00043(221-228)Online publication date: Sep-2021
https://doi.org/10.1109/DSD53832.2021.00043
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations