research-article

Public Access

Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching

Authors:

Dan Zhang,

Xiaoyu Ma,

Michael Thomson,

Derek ChiouAuthors Info & Claims

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 593 - 607

https://doi.org/10.1145/3173162.3173197

Published: 19 March 2018 Publication History

PDF eReader

Abstract

The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low compute intensity, frequent synchronization, uneven task sizes, and dynamic task generation. At high thread counts, execution time is dominated by worklist synchronization overhead and cache misses. Researchers have proposed hardware worklist accelerators to address scheduling costs, but these proposals often harden a specific scheduling policy and do not address high cache miss rates. We address this with Minnow, a technique that augments each core in a CMP with a lightweight Minnow accelerator. Minnow engines offload worklist scheduling from worker threads to improve scalability. The engines also perform worklist-directed prefetching, a technique that exploits knowledge of upcoming tasks to issue nearly perfectly accurate and timely prefetch operations. On a simulated 64-core CMP running a parallel graph benchmark suite, Minnow improves scalability and reduces L2 cache misses from 29 to 1.2 MPKI on average, resulting in 6.01x average speedup over an optimized software baseline for only 1% area overhead.

References

[1]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing Proceedings of the 42nd International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 105--117.

Digital Library

Google Scholar

[2]

Sam Ainsworth and Timothy M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 39, 11 pages.

Digital Library

Google Scholar

[3]

Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith. 1990. The Tera computer system. In ACM SIGARCH Computer Architecture News, Vol. Vol. 18. ACM, 1--6.

Digital Library

Google Scholar

[4]

S. Beamer, K. Asanovic, and D. Patterson. 2015. Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server 2015 IEEE International Symposium on Workload Characterization. 56--65.

Digital Library

Google Scholar

[5]

R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, and Y. N. Patt. 1999. Simultaneous subordinate microthreading (SSMT). In Proceedings of the 26th International Symposium on Computer Architecture. 186--195. 1109/SBAC-PAD.2014.39

Digital Library

Google Scholar

[6]

A. Tumeo and J. Feo. 2015. Irregular Applications: From Architectures to Algorithms {Guest editors' introduction}. Computer, Vol. 48, 8 (Aug. 2015), 14--16. showISSN0018--9162

Crossref

Google Scholar

[7]

Joyce Jiyoung Whang, Andrew Lenharth, Inderjit S Dhillon, and Keshav Pingali. 2015. Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned. Euro-Par 2015: Parallel Processing. Springer, 438--450.

Google Scholar

[8]

Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: Indirect Memory Prefetcher. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 178--190.

Digital Library

Google Scholar

Cited By

View all

Rajasukumar AZhang TXu RChien A(2024)UpDown: A Novel Architecture for Unlimited Memory ParallelismProceedings of the International Symposium on Memory Systems10.1145/3695794.3695801(61-77)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695801
Mao FLiu XZhang YLiu HLiao XJin HZhang WZhou JWu YNie LGuo YJiang ZLiu J(2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3689337
Xue FHan CLi XWu JZhang TLiu THao YDu ZGuo QZhang F(2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/3641853Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641853
Show More Cited By

Index Terms

Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching

Recommendations

Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching
ASPLOS '18

The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low ...
Overcoming Limitations Of Prefetching In Multiprocessors By Compiler-Initiated Coherence Action
PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques

In this paper we first identify limitations of compiler-controlled prefetching in a CC-NUMA multiprocessor with a write-invalidate cache coherence protocol. Compiler-controlled prefetch techniques for CC-NUMAs often are focused only, on stride-accesses, ...
Correlation Prefetching with a User-Level Memory Thread

This paper proposes using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs ...

Comments

Information & Contributors

Information

Published In

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

March 2018

827 pages

ISBN:9781450349116

DOI:10.1145/3173162

General Chairs:
Xipeng Shen
North Carolina State University, USA
,
James Tuck
North Carolina State University, USA
,
Program Chairs:
Ricardo Bianchini
Microsoft Research, USA
,
Vivek Sarkar
Georgia Institute of Technology, USA

ACM SIGPLAN Notices Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ASPLOS '18

Sponsor:

ASPLOS '18: Architectural Support for Programming Languages and Operating Systems

March 24 - 28, 2018

VA, Williamsburg, USA

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
1,540
Total Downloads

Downloads (Last 12 months)268
Downloads (Last 6 weeks)42

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Rajasukumar AZhang TXu RChien A(2024)UpDown: A Novel Architecture for Unlimited Memory ParallelismProceedings of the International Symposium on Memory Systems10.1145/3695794.3695801(61-77)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695801
Mao FLiu XZhang YLiu HLiao XJin HZhang WZhou JWu YNie LGuo YJiang ZLiu J(2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3689337
Xue FHan CLi XWu JZhang TLiu THao YDu ZGuo QZhang F(2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/3641853Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641853
Zhang XLiu CNi JCheng YZhang LLi HLi X(2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3335880
Nye JAli UKhan O(2024)SSE: Security Service Engines to Scale Enclave Parallelism for System Interactive Applications2024 International Symposium on Secure and Private Execution Environment Design (SEED)10.1109/SEED61283.2024.00019(84-95)Online publication date: 16-May-2024
https://doi.org/10.1109/SEED61283.2024.00019
Schwedock BBeckmann N(2024)Leviathan: A Unified System for General-Purpose Near-Data Computing2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00095(1278-1294)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00095
Fox DMonsalve Diaz JLi X(2023)A gem5 Implementation of the Sequential Codelet Model: Reducing Overhead and Expanding the Software Memory InterfaceProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624152(839-846)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624152
Naithani ARoelandts JAinsworth SJones TEeckhout L(2023)Decoupled Vector RunaheadProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614255(17-31)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614255
Vijaykumar NOlgun AKanellopoulos KBostanci FHassan HLotfi MGibbons PMutlu O(2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
https://dl.acm.org/doi/10.1145/3505250
Dadu VNowatzki TFalsafi BFerdman MLu SWenisch T(2022)TaskStream: accelerating task-parallel workloads by recovering program structureProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507706(1-13)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507706
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching

Overcoming Limitations Of Prefetching In Multiprocessors By Compiler-Initiated Coherence Action

Correlation Prefetching with a User-Level Memory Thread