Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Data Flow Transformation for Energy-Efficient Implementation of Givens Rotation--Based QRD

Published: 13 January 2016 Publication History

Abstract

QR decomposition (QRD), a matrix decomposition algorithm widely used in embedded application domain, can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence the overall implementation energy. With modern low-power embedded processors evolving toward register files with wide memory interfaces and vector functional units (FUs), data flow in these algorithms needs to be carefully devised to efficiently utilize the costly wide memory accesses and the vector FUs. In this article, we present an energy-efficient data flow transformation strategy for the Givens rotation--based QRD.

References

[1]
Cadence. 2012. RTL Compiler. Available at http://www.cadence.com/.
[2]
D. Cescato and H. Bolcskei. 2011. Algorithms for interpolation-based QR decomposition in MIMO-OFDM systems. IEEE Transactions on Signal Processing 59, 4, 1719--1733.
[3]
Y. Chien and K.-S. Fu. 1967. On the generalized Karhunen-Loeve expansion (Corresp.). IEEE Transactions on Information Theory 13, 3, 518--520.
[4]
Alan George, Joseph W. Liu, and Ng Esmond. 1984. Row ordering schemes for sparse Givens transformations. Linear Algebra and Its Applications 61, 55--81.
[5]
Marc Hofmann and Erricos John Kontoghiorghes. 2006. Pipeline Givens sequences for computing the QR decomposition on a EREW PRAM. Parallel Computing 32, 3, 222--230.
[6]
Zheng-Yu Huang and Pei-Yun Tsai. 2011. Efficient implementation of QR decomposition for gigabit MIMO-OFDM systems. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 10, 2531--2542.
[7]
Yin-Tsung Hwang and Wei-Da Chen. 2008. A low complexity complex QR factorization design for signal detection in MIMO OFDM systems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'08). 932--935.
[8]
Min-Woo Lee, Ji-Hwan Yoon, and Jongsun Park. 2012. High-speed tournament Givens rotation-based QR decomposition architecture for MIMO receiver. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'12). 21--24.
[9]
K.-H. Lin, R. C. Chang, C.-L. Huang, F.-C. Chen, and S.-C. Lin. 2008. Implementation of QR decomposition for MIMO-OFDM detection systems. In Proceedings of the International Conference on Electronics, Circuits, and Systems (ICECS'08). 57--60.
[10]
L. Ma, K. Dickson, J. McAllister, and J. McCanny. 2011. QR decomposition-based matrix inversion for high performance embedded MIMO receivers. IEEE Transactions on Signal Processing 59, 4, 1858--1867.
[11]
A. Maltsev, V. Pestretsov, R. Maslennikov, and A. Khoryaev. 2006. Triangular systolic array with reduced latency for QR-decomposition of complex matrices. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'06). 385--388.
[12]
K. V. Mardia, J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic Press, New York, NY.
[13]
Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'02). 166--173.
[14]
N. Park, B. Hong, and V. K. Prasanna. 2003. Tiling, block data layout, and memory hierarchy performance. IEEE Transactions on Parallel and Distributed System 14, 7, 640--654.
[15]
W. K. Pratt. 1975. Digital Image Processing. John Wiley & Sons, New York, NY.
[16]
Jochen Rust, Frank Ludwig, and Steffen Paul. 2013. Low complexity QR-decomposition architecture using the logarithmic number system. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'13). 97--102.
[17]
Namita Sharma, Tom Vander Aa, Prashant Agrawal, Praveen Raghavan, Preeti Ranjan Panda, and Francky Catthoor. 2013. Data memory optimization in LTE downlink. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'13). 2610--2614.
[18]
Namita Sharma, Preeti Ranjan Panda, Francky Catthoor, Praveen Raghavan, and Tom Vander Aa. 2015. Array interleaving—an energy-efficient data layout transformation. ACM Transactions on Design Automation of Electronic Systems 20, 3, 44.
[19]
Namita Sharma, Preeti Ranjan Panda, Min Li, Prashant Agrawal, and Francky Catthoor. 2014. Energy efficient data flow transformation for Givens rotation based QR decomposition. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'14). 1--4.
[20]
C. K. Singh, S. H. Prasad, and P. T. Balsara. 2007. VLSI architecture for matrix inversion using modified Gram-Schmidt based QR decomposition. In Proceedings of the International Conference on Embedded Systems (VLSI Design'07). 836--841.
[21]
Synopsys. 2006. PrimePower. Available at http://www.synopsys.com/.
[22]
Tom Vander Aa, Martin Palkovic, Matthias Hartmann, Praveen Raghavan, Antoine Dejonghe, and Liesbet Van der Perre. 2011. A multi-threaded coarse-grained array processor for wireless baseband. In Proceedings of the IEEE 9th Symposium on Application Specific Processors (SASP'11). 102--107.

Cited By

View all
  • (2021)SecRec: A Privacy-Preserving Method for the Context-Aware Recommendation SystemIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.3085562(1-1)Online publication date: 2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 15, Issue 1
February 2016
530 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2872313
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 13 January 2016
Accepted: 01 October 2015
Revised: 01 July 2015
Received: 01 January 2015
Published in TECS Volume 15, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data flow transformation
  2. SIMD architectures
  3. energy optimization
  4. matrix decomposition

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)SecRec: A Privacy-Preserving Method for the Context-Aware Recommendation SystemIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.3085562(1-1)Online publication date: 2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media