article

The Vector-Thread Architecture

Authors:

Ronny Krashinsky,

Christopher Batten,

Krste AsanovicAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 32, Issue 2

Page 52

https://doi.org/10.1145/1028176.1006736

Published: 02 March 2004 Publication History

Abstract

The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute models. The VT abstraction providesthe programmer with a control processor and a vector of virtualprocessors (VPs). The control processor can use vector-fetch commandsto broadcast instructions to all the VPs or each VP can usethread-fetches to direct its own control flow. A seamless intermixingof the vector and threaded control mechanisms allows a VT architectureto flexibly and compactly encode application parallelismand locality, and a VT machine exploits these to improve performanceand efficiency. We present SCALE, an instantiation of theVT architecture designed for low-power and high-performance embeddedsystems. We evaluate the SCALE prototype design usingdetailed simulation of a broad range of embedded applications andshow that its performance is competitive with larger and more complexprocessors.

References

[1]

{1} T.-C. Chiueh. Multi-threaded vectorization. In ISCA-18, May 1991.

Digital Library

[2]

{2} C. R. Jesshope. Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines. Australia Computer Science Communications, 23(4):80-88, 2001.

Digital Library

[3]

{3} K. Kitagawa, S. Tagaya, Y. Hagihara, and Y. Kanoh. A hardware overview of SX-6 and SX-7 supercomputer. NEC Research & Development Journal, 44(1):2-7, Jan 2003.

[4]

{4} C. Kozyrakis. Scalable vector media-processors for embedded systems. PhD thesis, University of California at Berkeley, May 2002.

Digital Library

[5]

{5} C. Kozyrakis and D. Patterson. Overcoming the limitations of conventional vector processors. In ISCA-30, June 2003.

Digital Library

[6]

{6} C. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovi¿, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick. Scalable Processors in the Billion-Transistor Era: IRAM. IEEE Computer, 30(9):75-78, Sept 1997.

Digital Library

[7]

{7} K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart Memories: A modular reconfigurable architecture. In Proc. ISCA 27, pages 161-171, June 2000.

Digital Library

[8]

{8} S. Rixner, W. Dally, U. Kapasi, B. Khailany, A. Lopez-Lagunas, P. Mattson, and J. Owens. A bandwidth-efficient architecture for media processing. In MICRO-31, Nov 1998.

Digital Library

[9]

{9} R. M. Russel. The CRAY-1 computer system. Communications of the ACM, 21(1):63-72, Jan 1978.

Digital Library

[10]

{10} K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ISCA-30, June 2003.

Digital Library

[11]

{11} J. E. Smith. Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer, 22(7):21-35, July 1989.

Digital Library

[12]

{12} G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA-22, pages 414-425, June 1995.

Digital Library

[13]

{13} E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9):86-93, Sept 1997.

Digital Library

[14]

{14} J. Wawrzynek, K. Asanovi¿, B. Kingsbury, J. Beck, D. Johnson, and N. Morgan. Spert-II: A vector microprocessor system. IEEE Computer, 29(3):79-86, Mar 1996.

Digital Library

[15]

{15} M. Zhang and K. Asanovi¿. Highly-associative caches for low-power processors. In Kool Chips Workshop, MICRO-33, Dec 2000.

Cited By

Zhang DLang QWang RShen L(2023)Extension VM: Interleaved Data Layout in Vector MemoryACM Transactions on Architecture and Code Optimization10.1145/363152821:1(1-23)Online publication date: 7-Nov-2023
https://dl.acm.org/doi/10.1145/3631528
Cavalcante MSchuiki FZaruba FSchaffner MBenini L(2020)Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOIIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.295008728:2(530-543)Online publication date: Feb-2020
https://doi.org/10.1109/TVLSI.2019.2950087
Dadu VWeng JLiu SNowatzki TUnknown (2019)Towards General Purpose Acceleration by Exploiting Common Data-Dependence FormsProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture - MICRO '5210.1145/3352460.3358276(924-939)Online publication date: 2019
https://doi.org/10.1145/3352460.3358276
Show More Cited By

Recommendations

The Vector-Thread Architecture
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture

The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute models. The VT abstraction providesthe programmer with a control processor and a vector of virtualprocessors (VPs). The control processor can use vector-fetch ...
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running ...
Vector-thread architecture and implementation

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 32, Issue 2

ISCA 2004

March 2004

373 pages

ISSN:0163-5964

DOI:10.1145/1028176

Issue’s Table of Contents

ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436

Copyright © 2004 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2004

Published in SIGARCH Volume 32, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

75
Total Citations
View Citations
1,133
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang DLang QWang RShen L(2023)Extension VM: Interleaved Data Layout in Vector MemoryACM Transactions on Architecture and Code Optimization10.1145/363152821:1(1-23)Online publication date: 7-Nov-2023
https://dl.acm.org/doi/10.1145/3631528
Cavalcante MSchuiki FZaruba FSchaffner MBenini L(2020)Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOIIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.295008728:2(530-543)Online publication date: Feb-2020
https://doi.org/10.1109/TVLSI.2019.2950087
Dadu VWeng JLiu SNowatzki TUnknown (2019)Towards General Purpose Acceleration by Exploiting Common Data-Dependence FormsProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture - MICRO '5210.1145/3352460.3358276(924-939)Online publication date: 2019
https://doi.org/10.1145/3352460.3358276
Nowatzki TGangadhar VArdalani NSankaralingam K(2017)Stream-Dataflow AccelerationProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080255(416-429)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3079856.3080255
Ni XYang LMa C(2016)Language-Extension-Based Vectorizing Compiling Scheme on SDR-DSPComputer Engineering and Technology10.1007/978-981-10-3159-5_2(15-23)Online publication date: 9-Dec-2016
https://doi.org/10.1007/978-981-10-3159-5_2
Hughes C(2015)Single-Instruction Multiple-Data ExecutionSynthesis Lectures on Computer Architecture10.2200/S00647ED1V01Y201505CAC03210:1(1-121)Online publication date: 27-May-2015
https://doi.org/10.2200/S00647ED1V01Y201505CAC032
Rogers TJohnson DO'Connor MKeckler S(2015)A variable warp size architectureACM SIGARCH Computer Architecture News10.1145/2872887.275041043:3S(489-501)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750410
Nowatzki TGangadhar VSankaralingam K(2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsACM SIGARCH Computer Architecture News10.1145/2872887.275038043:3S(298-310)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750380
Rogers TJohnson DO'Connor MKeckler SMarr DAlbonesi D(2015)A variable warp size architectureProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750410(489-501)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750410
Nowatzki TGangadhar VSankaralingam KMarr DAlbonesi D(2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750380(298-310)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750380
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents