Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A High-throughput Parallel Viterbi Algorithm via Bitslicing

Published: 15 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    In this work, we present a novel bitsliced high-performance Viterbi algorithm suitable for high-throughput and data-intensive communication. A new column-major data representation scheme coupled with the bitsliced architecture is employed in our proposed Viterbi decoder that enables the maximum utilization of the parallel processing units in modern parallel accelerators. With the help of the proposed alteration of the data scheme, instead of the conventional bit-by-bit operations, 32-bit chunks of data are processed by each processing unit. This means that a single bitsliced parallel Viterbi decoder is capable of decoding 32 different chunks of data simultaneously. Here, the Viterbi’s Add-Compare-Select procedure is implemented with our proposed bitslicing technique, where it is shown that the bitsliced operations for the Viterbi internal functionalities are efficient in terms of their performance and complexity. We have achieved this level of high parallelism while keeping an acceptable bit error rate performance for our proposed methodology. Our suggested hard and soft-decision Viterbi decoder implementations on GPU platforms outperform the fastest previously proposed works by and , achieving 21.41 and 8.24 Gbps on Tesla V100, respectively.

    References

    [1]
    Armin Ahmadzadeh, Omid Hajihassani, and Saeid Gorgin. 2018. A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems. J. Supercomput. 74, 1 (2018), 160–182.
    [2]
    Xavier Anguera, Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Gerald Friedland, and Oriol Vinyals. 2012. Speaker diarization: A review of recent research. IEEE Trans. Aud. Speech Lang. Process. 20, 2 (2012), 356–370.
    [3]
    Erdal Arikan. 2009. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory 55, 7 (2009), 3051–3073.
    [4]
    Hüseyin Arslan. 2007. Cognitive Radio, Software Defined Radio, and Adaptive Wireless Systems. Springer.
    [5]
    Claude Berrou, Patrick Adde, Ettiboua Angui, and Stephane Faudeil. 1993. A low complexity soft-output Viterbi decoder architecture. In Proceedings of the IEEE International Conference on Communications (ICC’93), Vol. 2. IEEE, 737–740.
    [6]
    Eli Biham. 1997. A fast new DES implementation in software. In Proceedings of the International Workshop on Fast Software Encryption. Springer, 260–272.
    [7]
    Adrien Cassagne, Thibaud Tonnellier, Camille Leroux, Bertrand Le Gal, Olivier Aumage, and Denis Barthou. 2016. Beyond Gbps turbo decoder on multi-core CPUs. In Proceedings of the 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC’16). IEEE, 136–140.
    [8]
    T. M. Synchronization. 2017. Synchronization and Channel Coding. Report Concerning Space Data System Standard. Informational Report CCSDS (2017).
    [9]
    Shun-Wen Cheng. 2003. A high-speed magnitude comparator with small transistor count. In Proceedings of the 10th IEEE International Conference on Electronics, Circuits and Systems (ICECS’03), Vol. 3. IEEE, 1168–1171.
    [10]
    Gerhard Fettweis, Herbert Dawid, and Heinrich Meyr. 1990. Minimized method viterbi decoding: 600 Mbit/s per chip. In Proceedings of the IEEE Global Telecommunications Conference and Exhibition (GLOBECOM’90). IEEE, 1712–1716.
    [11]
    Gerhard Fettweis and Heinrich Meyr. 1989. Parallel Viterbi algorithm implementation: Breaking the ACS-bottleneck. IEEE Trans. Commun. 37, 8 (1989), 785–790.
    [12]
    Gerhard P. Fettweis. 2014. The tactile internet: Applications and challenges. IEEE Vehic. Technol. Mag. 9, 1 (2014), 64–70.
    [13]
    Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri, and Shihjong Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel white paper 19, 20 (2008).
    [14]
    G. David Forney. 1973. The viterbi algorithm. Proc. IEEE 61, 3 (1973), 268–278.
    [15]
    Himanshu Gautam, Pradeep Srinivasa, and Sarnath Kannan. 2014. Accelerating convolution coding & viterbi decodingon gpus using opencl. In Proceedings of the International Conference on Recent Advances and Innovations in Engineering (ICRAIE’14). IEEE, 1–9.
    [16]
    Reza Ghanaatian, Alexios Balatsoukas-Stimming, Thomas Christoph Müller, Michael Meidlinger, Gerald Matz, Adam Teman, and Andreas Burg. 2017. A 588-Gb/s LDPC decoder based on finite-alphabet message passing. IEEE Trans. VLSI Syst. 26, 2 (2017), 329–340.
    [17]
    Joachim Hagenauer and Peter Hoeher. 1989. A Viterbi algorithm with soft-decision outputs and its applications. In Proceedings of the 1989 IEEE Global Telecommunications Conference and Exhibition “Communications Technology for the 1990s and Beyond.” IEEE, 1680–1686.
    [18]
    Omid Hajihassani, Saleh Khalaj Monfared, Seyed Hossein Khasteh, and Saeid Gorgin. 2019. Fast AES implementation: A high-throughput bitsliced approach. IEEE Trans. Parallel Distrib. Syst. 30, 10 (2019), 2211–2222.
    [19]
    Mehran Mozaffari Kermani, Vineeta Singh, and Reza Azarderakhsh. 2016. Reliable low-latency Viterbi algorithm architectures benchmarked on ASIC and FPGA. IEEE Trans. Circ. Syst. I: Regul. Pap. 64, 1 (2016), 208–216.
    [20]
    Wolfgang Koch and Alfred Baier. 1990. Optimum and sub-optimum detection of coded data disturbed by time-varying intersymbol interference (applicable to digital mobile radio receivers). In Proceedings of the IEEE Global Telecommunications Conference and Exhibition (GLOBECOM’90). IEEE, 1679–1684.
    [21]
    Vinh Hoang Son Le, Charbel Abdel Nour, Emmanuel Boutillon, and Catherine Douillard. 2020. Revisiting the max-log-map algorithm with SOVA update rules: New simplifications for high-radix SISO decoders. IEEE Trans. Commun. 68, 4 (2020), 1991–2004.
    [22]
    Inkyu Lee and Jeff L. Sonntag. 2000. A new architecture for the fast Viterbi algorithm. In Proceedings of the IEEE Global Telecommunications Conference (Globecom’00), Vol. 3. IEEE, 1664–1668.
    [23]
    Rongchun Li, Yong Dou, and Dan Zou. 2014. Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA. Concurr. Comput.: Pract. Exper. 26, 3 (2014), 821–840.
    [24]
    Chien-Ching Lin, Yen-Hsu Shih, Hsie-Chia Chang, and Chen-Yi Lee. 2005. Design of a power-reduction Viterbi decoder for WLAN applications. IEEE Trans. Circ. Syst. I: Regul. Pap. 52, 6 (2005), 1148–1156.
    [25]
    Shu Lin and Marc Fossorier. 1999. Tail Biting Trellis Representation of Codes: Decoding and Construction. Technical Report to NASA.
    [26]
    H.-L. Lou. 1995. Implementing the Viterbi algorithm. IEEE Sign. Process. Mag. 12, 5 (1995), 42–52.
    [27]
    Alireza Mohammadidoost and Matin Hashemi. 2020. High-throughput and memory-efficient parallel viterbi decoder for convolutional codes on GPU. arXiv:2011.09337. Retrieved from https://arxiv.org/abs/2011.09337.
    [28]
    Saleh Khalaj Monfared, Omid Hajihassani, Mohammad Sina Kiarostami, Soroush Meghdadi Zanjani, Dara Rahmati, and Saeid Gorgin. 2020. BSRNG: A high throughput parallel bitsliced approach for random number generators. In Proceedings of the 49th International Conference on Parallel Processing (ICPP’20 Workshops). 1–10.
    [29]
    CUDA Nvidia. 2007. Compute unified device architecture programming guide. (2007).
    [30]
    Hao Peng, Rongke Liu, Yi Hou, and Ling Zhao. 2016. A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU. In Proceedings of the 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP’16). IEEE, 1–6.
    [31]
    Christoph Roth, Sandro Belfanti, Christian Benkeser, and Qiuting Huang. 2014. Efficient parallel turbo-decoding for high-throughput wireless systems. IEEE Trans. Circ. Syst. I: Regul. Pap. 61, 6 (2014), 1824–1835.
    [32]
    Christoph Roth, Sandro Belfanti, Christian Benkeser, and Qiuting Huang. 2014. Efficient parallel turbo-decoding for high-throughput wireless systems. IEEE Trans. Circ. Syst. I: Regul. Pap. 61, 6 (2014), 1824–1835.
    [33]
    Helmut Schmid. 2004. Efficient parsing of highly ambiguous context-free grammars with bit vectors. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). 162–168.
    [34]
    Frank K. Soong and Eng-Fong Huang. 1990. A Tree. Trellis based fast search for finding the n best sentence hypotheses in continuous speech recognition. In Proceedings of the Workshop on Speech and Natural Language.
    [35]
    I. Stamoulias, Kristina Georgoulakis, S. Blionas, and George-Othon Glentis. 2015. FPGA implementation of an MLSE equalizer in 10Gb/s optical links. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP’15). IEEE, 794–798.
    [36]
    John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12, 3 (2010), 66.
    [37]
    Altuğ Süral, E. Göksu Sezer, Yiğit Ertuğrul, Orhan Arikan, and Erdal Arikan. 2019. Terabits-per-second throughput for polar codes. In Proceedings of the 2019 IEEE 30th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC’19 Workshops). IEEE, 1–7.
    [38]
    Winston Timp, Jeffrey Comer, and Aleksei Aksimentiev. 2012. DNA base-calling from a nanopore using a Viterbi algorithm. Biophys. J. 102, 10 (2012), L37–L39.
    [39]
    Andrew Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 2 (1967), 260–269.
    [40]
    Andrew Viterbi. 1971. Convolutional codes and their performance in communication systems. IEEE Trans. Commun. Technol. 19, 5 (1971), 751–772.
    [41]
    Stefan Weithoffer, Rami Klaimi, Charbel Abdel Nour, Norbert Wehn, and Catherine Douillard. 2020. Fully pipelined iteration unrolled decoders the road to TB/S turbo decoding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 5115–5119.
    [42]
    Stephen B. Wicker. 1995. Error Control Systems for Digital Communication and Storage. Vol. 1. Prentice-Hall, Englewood Cliffs, NJ.
    [43]
    Engling Yeo, Stephanie A. Augsburger, W. Rhett Davis, and Borivoje Nikolic. 2003. A 500-Mb/s soft-output Viterbi decoder. IEEE J. Solid-State Circ. 38, 7 (2003), 1234–1241.

    Cited By

    View all
    • (2024)Tensor-Based Viterbi Algorithms for Collaborative Cloud-Edge Cyber-Physical-Social Activity PredictionACM Transactions on Sensor Networks10.1145/3639467Online publication date: 17-Jan-2024
    • (2024)Obtaining the Most Likely Path in Stochastic Hidden Input Automata by Using Limited Optimal Discrete ControlIEEE Access10.1109/ACCESS.2024.335760812(14776-14786)Online publication date: 2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 8, Issue 4
    December 2021
    118 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/3481693
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2021
    Accepted: 01 April 2021
    Revised: 01 March 2021
    Received: 01 May 2020
    Published in TOPC Volume 8, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CUDA
    2. viterbi algorithm
    3. convolutional codes
    4. bitslicing
    5. HPC

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)194
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Tensor-Based Viterbi Algorithms for Collaborative Cloud-Edge Cyber-Physical-Social Activity PredictionACM Transactions on Sensor Networks10.1145/3639467Online publication date: 17-Jan-2024
    • (2024)Obtaining the Most Likely Path in Stochastic Hidden Input Automata by Using Limited Optimal Discrete ControlIEEE Access10.1109/ACCESS.2024.335760812(14776-14786)Online publication date: 2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media