research-article

A High-throughput Parallel Viterbi Algorithm via Bitslicing

Authors:

Saleh Khalaj Monfared,

Omid Hajihassani,

Vahid Mohsseni,

Saeid GorginAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 8, Issue 4

Article No.: 19, Pages 1 - 25

https://doi.org/10.1145/3470642

Published: 15 October 2021 Publication History

Abstract

In this work, we present a novel bitsliced high-performance Viterbi algorithm suitable for high-throughput and data-intensive communication. A new column-major data representation scheme coupled with the bitsliced architecture is employed in our proposed Viterbi decoder that enables the maximum utilization of the parallel processing units in modern parallel accelerators. With the help of the proposed alteration of the data scheme, instead of the conventional bit-by-bit operations, 32-bit chunks of data are processed by each processing unit. This means that a single bitsliced parallel Viterbi decoder is capable of decoding 32 different chunks of data simultaneously. Here, the Viterbi’s Add-Compare-Select procedure is implemented with our proposed bitslicing technique, where it is shown that the bitsliced operations for the Viterbi internal functionalities are efficient in terms of their performance and complexity. We have achieved this level of high parallelism while keeping an acceptable bit error rate performance for our proposed methodology. Our suggested hard and soft-decision Viterbi decoder implementations on GPU platforms outperform the fastest previously proposed works by

and

, achieving 21.41 and 8.24 Gbps on Tesla V100, respectively.

References

[1]

Armin Ahmadzadeh, Omid Hajihassani, and Saeid Gorgin. 2018. A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems. J. Supercomput. 74, 1 (2018), 160–182.

Digital Library

[2]

Xavier Anguera, Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Gerald Friedland, and Oriol Vinyals. 2012. Speaker diarization: A review of recent research. IEEE Trans. Aud. Speech Lang. Process. 20, 2 (2012), 356–370.

Digital Library

[3]

Erdal Arikan. 2009. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory 55, 7 (2009), 3051–3073.

Digital Library

[4]

Hüseyin Arslan. 2007. Cognitive Radio, Software Defined Radio, and Adaptive Wireless Systems. Springer.

Digital Library

[5]

Claude Berrou, Patrick Adde, Ettiboua Angui, and Stephane Faudeil. 1993. A low complexity soft-output Viterbi decoder architecture. In Proceedings of the IEEE International Conference on Communications (ICC’93), Vol. 2. IEEE, 737–740.

[6]

Eli Biham. 1997. A fast new DES implementation in software. In Proceedings of the International Workshop on Fast Software Encryption. Springer, 260–272.

Digital Library

[7]

Adrien Cassagne, Thibaud Tonnellier, Camille Leroux, Bertrand Le Gal, Olivier Aumage, and Denis Barthou. 2016. Beyond Gbps turbo decoder on multi-core CPUs. In Proceedings of the 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC’16). IEEE, 136–140.

[8]

T. M. Synchronization. 2017. Synchronization and Channel Coding. Report Concerning Space Data System Standard. Informational Report CCSDS (2017).

[9]

Shun-Wen Cheng. 2003. A high-speed magnitude comparator with small transistor count. In Proceedings of the 10th IEEE International Conference on Electronics, Circuits and Systems (ICECS’03), Vol. 3. IEEE, 1168–1171.

[10]

Gerhard Fettweis, Herbert Dawid, and Heinrich Meyr. 1990. Minimized method viterbi decoding: 600 Mbit/s per chip. In Proceedings of the IEEE Global Telecommunications Conference and Exhibition (GLOBECOM’90). IEEE, 1712–1716.

[11]

Gerhard Fettweis and Heinrich Meyr. 1989. Parallel Viterbi algorithm implementation: Breaking the ACS-bottleneck. IEEE Trans. Commun. 37, 8 (1989), 785–790.

[12]

Gerhard P. Fettweis. 2014. The tactile internet: Applications and challenges. IEEE Vehic. Technol. Mag. 9, 1 (2014), 64–70.

[13]

Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri, and Shihjong Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel white paper 19, 20 (2008).

[14]

G. David Forney. 1973. The viterbi algorithm. Proc. IEEE 61, 3 (1973), 268–278.

[15]

Himanshu Gautam, Pradeep Srinivasa, and Sarnath Kannan. 2014. Accelerating convolution coding & viterbi decodingon gpus using opencl. In Proceedings of the International Conference on Recent Advances and Innovations in Engineering (ICRAIE’14). IEEE, 1–9.

[16]

Reza Ghanaatian, Alexios Balatsoukas-Stimming, Thomas Christoph Müller, Michael Meidlinger, Gerald Matz, Adam Teman, and Andreas Burg. 2017. A 588-Gb/s LDPC decoder based on finite-alphabet message passing. IEEE Trans. VLSI Syst. 26, 2 (2017), 329–340.

Digital Library

[17]

Joachim Hagenauer and Peter Hoeher. 1989. A Viterbi algorithm with soft-decision outputs and its applications. In Proceedings of the 1989 IEEE Global Telecommunications Conference and Exhibition “Communications Technology for the 1990s and Beyond.” IEEE, 1680–1686.

[18]

Omid Hajihassani, Saleh Khalaj Monfared, Seyed Hossein Khasteh, and Saeid Gorgin. 2019. Fast AES implementation: A high-throughput bitsliced approach. IEEE Trans. Parallel Distrib. Syst. 30, 10 (2019), 2211–2222.

[19]

Mehran Mozaffari Kermani, Vineeta Singh, and Reza Azarderakhsh. 2016. Reliable low-latency Viterbi algorithm architectures benchmarked on ASIC and FPGA. IEEE Trans. Circ. Syst. I: Regul. Pap. 64, 1 (2016), 208–216.

[20]

Wolfgang Koch and Alfred Baier. 1990. Optimum and sub-optimum detection of coded data disturbed by time-varying intersymbol interference (applicable to digital mobile radio receivers). In Proceedings of the IEEE Global Telecommunications Conference and Exhibition (GLOBECOM’90). IEEE, 1679–1684.

[21]

Vinh Hoang Son Le, Charbel Abdel Nour, Emmanuel Boutillon, and Catherine Douillard. 2020. Revisiting the max-log-map algorithm with SOVA update rules: New simplifications for high-radix SISO decoders. IEEE Trans. Commun. 68, 4 (2020), 1991–2004.

[22]

Inkyu Lee and Jeff L. Sonntag. 2000. A new architecture for the fast Viterbi algorithm. In Proceedings of the IEEE Global Telecommunications Conference (Globecom’00), Vol. 3. IEEE, 1664–1668.

[23]

Rongchun Li, Yong Dou, and Dan Zou. 2014. Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA. Concurr. Comput.: Pract. Exper. 26, 3 (2014), 821–840.

Digital Library

[24]

Chien-Ching Lin, Yen-Hsu Shih, Hsie-Chia Chang, and Chen-Yi Lee. 2005. Design of a power-reduction Viterbi decoder for WLAN applications. IEEE Trans. Circ. Syst. I: Regul. Pap. 52, 6 (2005), 1148–1156.

[25]

Shu Lin and Marc Fossorier. 1999. Tail Biting Trellis Representation of Codes: Decoding and Construction. Technical Report to NASA.

[26]

H.-L. Lou. 1995. Implementing the Viterbi algorithm. IEEE Sign. Process. Mag. 12, 5 (1995), 42–52.

[27]

Alireza Mohammadidoost and Matin Hashemi. 2020. High-throughput and memory-efficient parallel viterbi decoder for convolutional codes on GPU. arXiv:2011.09337. Retrieved from https://arxiv.org/abs/2011.09337.

[28]

Saleh Khalaj Monfared, Omid Hajihassani, Mohammad Sina Kiarostami, Soroush Meghdadi Zanjani, Dara Rahmati, and Saeid Gorgin. 2020. BSRNG: A high throughput parallel bitsliced approach for random number generators. In Proceedings of the 49th International Conference on Parallel Processing (ICPP’20 Workshops). 1–10.

Digital Library

[29]

CUDA Nvidia. 2007. Compute unified device architecture programming guide. (2007).

[30]

Hao Peng, Rongke Liu, Yi Hou, and Ling Zhao. 2016. A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU. In Proceedings of the 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP’16). IEEE, 1–6.

[31]

Christoph Roth, Sandro Belfanti, Christian Benkeser, and Qiuting Huang. 2014. Efficient parallel turbo-decoding for high-throughput wireless systems. IEEE Trans. Circ. Syst. I: Regul. Pap. 61, 6 (2014), 1824–1835.

[32]

Christoph Roth, Sandro Belfanti, Christian Benkeser, and Qiuting Huang. 2014. Efficient parallel turbo-decoding for high-throughput wireless systems. IEEE Trans. Circ. Syst. I: Regul. Pap. 61, 6 (2014), 1824–1835.

[33]

Helmut Schmid. 2004. Efficient parsing of highly ambiguous context-free grammars with bit vectors. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). 162–168.

Digital Library

[34]

Frank K. Soong and Eng-Fong Huang. 1990. A Tree. Trellis based fast search for finding the n best sentence hypotheses in continuous speech recognition. In Proceedings of the Workshop on Speech and Natural Language.

Digital Library

[35]

I. Stamoulias, Kristina Georgoulakis, S. Blionas, and George-Othon Glentis. 2015. FPGA implementation of an MLSE equalizer in 10Gb/s optical links. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP’15). IEEE, 794–798.

[36]

John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12, 3 (2010), 66.

Digital Library

[37]

Altuğ Süral, E. Göksu Sezer, Yiğit Ertuğrul, Orhan Arikan, and Erdal Arikan. 2019. Terabits-per-second throughput for polar codes. In Proceedings of the 2019 IEEE 30th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC’19 Workshops). IEEE, 1–7.

[38]

Winston Timp, Jeffrey Comer, and Aleksei Aksimentiev. 2012. DNA base-calling from a nanopore using a Viterbi algorithm. Biophys. J. 102, 10 (2012), L37–L39.

[39]

Andrew Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 2 (1967), 260–269.

Digital Library

[40]

Andrew Viterbi. 1971. Convolutional codes and their performance in communication systems. IEEE Trans. Commun. Technol. 19, 5 (1971), 751–772.

[41]

Stefan Weithoffer, Rami Klaimi, Charbel Abdel Nour, Norbert Wehn, and Catherine Douillard. 2020. Fully pipelined iteration unrolled decoders the road to TB/S turbo decoding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 5115–5119.

[42]

Stephen B. Wicker. 1995. Error Control Systems for Digital Communication and Storage. Vol. 1. Prentice-Hall, Englewood Cliffs, NJ.

Digital Library

[43]

Engling Yeo, Stephanie A. Augsburger, W. Rhett Davis, and Borivoje Nikolic. 2003. A 500-Mb/s soft-output Viterbi decoder. IEEE J. Solid-State Circ. 38, 7 (2003), 1234–1241.

Cited By

Zhang SYang LZhang YLu ZCui Z(2024)Tensor-Based Viterbi Algorithms for Collaborative Cloud-Edge Cyber-Physical-Social Activity PredictionACM Transactions on Sensor Networks10.1145/3639467Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1145/3639467
Özbaltan MKurucan M(2024)Obtaining the Most Likely Path in Stochastic Hidden Input Automata by Using Limited Optimal Discrete ControlIEEE Access10.1109/ACCESS.2024.335760812(14776-14786)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357608

Index Terms

A High-throughput Parallel Viterbi Algorithm via Bitslicing

Recommendations

Two step SOVA-based decoding algorithm for tailbiting codes

In this work we propose a novel decoding algorithm for tailbiting convolutional codes and evaluate its performance over different channels. The proposed method consists on a fixed two-step Viterbi decoding of the received data. In the first step, an ...
FPGA implementation of Viterbi decoder
EHAC'07: Proceedings of the 6th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications

Convolutional encoding with Viterbi decoding is a powerful method for forward error correction. It has been widely deployed in many wireless communication systems to improve the limited capacity of the communication channels. The Viterbi algorithm, ...
Pattern-flipping chase-type decoders with error pattern extracting viterbi algorithm over partial response channels

Towards the goal of achieving better error correction performance in data storage systems, iterative soft decoding of low density parity check (LDPC) codes and soft-decision decoding of Reed-Solomon (RS) codes have started receiving increasing research ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 8, Issue 4

December 2021

118 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3481693

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021

Accepted: 01 April 2021

Revised: 01 March 2021

Received: 01 May 2020

Published in TOPC Volume 8, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
515
Total Downloads

Downloads (Last 12 months)194
Downloads (Last 6 weeks)16

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang SYang LZhang YLu ZCui Z(2024)Tensor-Based Viterbi Algorithms for Collaborative Cloud-Edge Cyber-Physical-Social Activity PredictionACM Transactions on Sensor Networks10.1145/3639467Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1145/3639467
Özbaltan MKurucan M(2024)Obtaining the Most Likely Path in Stochastic Hidden Input Automata by Using Limited Optimal Discrete ControlIEEE Access10.1109/ACCESS.2024.335760812(14776-14786)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357608

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents