research-article

FPGA Implementations of Kernel Normalised Least Mean Squares Processors

Authors:

Nicholas J. Fraser,

Duncan J. M. Moss,

Julian Faraone,

Stephen Tridgell,

Philip H. W. LeongAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 10, Issue 4

Article No.: 26, Pages 1 - 20

https://doi.org/10.1145/3106744

Published: 15 December 2017 Publication History

Abstract

Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit non-linear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for hyperparameter optimisation fill pipeline stages, so no stall cycles to resolve dependencies are required. Together with other optimisations to reduce resource utilisation and latency, our core achieves 161 GFLOPS on a Virtex 7 XC7VX485T FPGA for a floating point implementation and 211 GOPS for fixed point. Our PCI Express based floating-point system implementation achieves 80% of the core’s speed, this being a speedup of 10× over an optimised implementation on a desktop processor and 2.66× over a GPU.

References

[1]

Nikolaos Alachiotis and Alexandros Stamatakis. 2011. FPGA Optimizations for a Pipelined Floating-Point Exponential Unit. In Proceedings of the 7th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC 2011), Belfast, UK, March 23-25, 2011. Springer, Berlin, Heidelberg, 316--327.

Digital Library

[2]

Davide Anguita, Luca Carlino, Alessandro Ghio, and Sandro Ridella. 2011. A FPGA core generator for embedded classification systems. Journal of Circuits, Systems and Computers 20, 02, 263--282.

[3]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281--305.

Digital Library

[4]

Badong Chen, Songlin Zhao, Pingping Zhu, and José Carlos Principe. 2012. Quantized kernel least mean square algorithm. IEEE Transactions on Neural Networks and Learning Systems 23, 1, 22--32.

[5]

Badong Chen, Nanning Zheng, and Jose C. Principe. 2013. Survival kernel with application to kernel adaptive filtering. In The 2013 International Joint Conference on Neural Networks (IJCNN’13). IEEE, 1--6.

[6]

Marc Claesen and Bart De Moor. 2015. Hyperparameter search in machine learning. In The XI Metaheuristics International Conference (MIC’15).

[7]

David Cox and Nicolas Pinto. 2011. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In 2011 IEEE International Conference on Automatic Face 8 Gesture Recognition and Workshops (FG’11). IEEE, 8--15.

[8]

J. Detrey and F. de Dinechin. 2005. A parameterized floating-point exponential function for FPGAs. In Proceedings of 2005 IEEE International Conference on Field-Programmable Technology. 27--34.

[9]

Scott C. Douglas, Quanhong Zhu, and Kent F. Smith. 1998. A pipelined LMS adaptive FIR filter architecture without adaptation delay. IEEE Transactions on Signal Processing 46, 3, 775--779.

Digital Library

[10]

N. J. Fraser, D. J. M. Moss, JunKyu Lee, S. Tridgell, C. T. Jin, and P. H. W. Leong. 2015. A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation. In 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--6.

[11]

Matthew Jacobsen, Yoav Freund, and Ryan Kastner. 2012. RIFFA: A reusable integration framework for FPGA accelerators. In IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, 216--219. http://dblp.uni-trier.de/db/conf/fccm/fccm2012.html/#JacobsenFK12.

Digital Library

[12]

E. Jamro, K. Wiatr, and M. Wielgosz. 2007. FPGA implementation of 64-bit exponential function for HPC. In International Conference on Field Programmable Logic and Applications (FPL’07). 718--721.

[13]

Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176.

Digital Library

[14]

Neil Lawrence, Matthias Seeger, and Ralf Herbrich. 2003. Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems. 609--616.

Digital Library

[15]

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathemational Software 5, 3, 308--323.

Digital Library

[16]

Weifeng Liu, José C. Príncipe, and Simon Haykin. 2011. Kernel Adaptive Filtering: A Comprehensive Introduction. Vol. 57. John Wiley 8 Sons, Hoboken, NJ.

Digital Library

[17]

Guoz-hu Long, Fuyun Ling, and John G. Proakis. 1989. The LMS algorithm with delayed coefficient adaptation. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 9, 1397--1405.

[18]

Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. 2012. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Transactions on Architecture and Code Optimization 9, 1, Article 6, 30 pages.

Digital Library

[19]

Yeyong Pang, Shaojun Wang, Yu Peng, Nicholas J. Fraser, and Philip H. W. Leong. 2013. A low latency kernel recursive least squares processor using FPGA technology. In FPT. 144--151.

[20]

M. Papadonikolakis and C. Bouganis. 2008. A scalable FPGA architecture for non-linear SVM training. In International Conference on ICECE Technology (FPT’08). 337--340.

[21]

Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Computational Biology 5, 11, 1--12.

[22]

John Platt and others. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/.

[23]

Rainer D. Poltmann. 1995. Conversion of the delayed LMS algorithm into the LMS algorithm. Signal IEEE Processing Letters 2, 12, 223.

[24]

Robin Pottathuparambil and Ron Sass. 2009. A parallel/vectorized double-precision exponential core to accelerate computational science applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, NY,285--285.

Digital Library

[25]

Carl E. Rasmussen and Christoper K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.

Digital Library

[26]

Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. 2014. Hardware implementation of KLMS algorithm using FPGA. In 2014 International Joint Conference on Neural Networks (IJCNN’14). IEEE, 2276--2281.

[27]

Cédric Richard, José Carlos M. Bermudez, and Paul Honeine. 2009. Online prediction of time series data with kernels. IEEE Transactions on Signal Processing, 57, 3, 1058--1067.

Digital Library

[28]

Bernhard Scholkopf and Alexander J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.

Digital Library

[29]

Matthias Seeger. 2000. Relationships between Gaussian processes, support vector machines and smoothing splines. Machine Learning).

[30]

Stephen Tridgell, Duncan J. M. Moss, Nicholas J. Fraser, and Philip H. W. Leong. 2015. Braiding: A scheme for resolving hazards in NORMA. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 136--143.

[31]

Steven Van Vaerenbergh. 2012. Kernel Methods Toolbox KAFBOX: a Matlab benchmarking toolbox for kernel adaptive filtering. Retrieved October 1, 2017 at http://sourceforge.net/p/kafbox.

[32]

S. Van Vaerenbergh, J. Via, and I. Santamaria. 2006. A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 5. 789--792.

[33]

R. Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35, 2, 101--121. http://www.cs.utsa.edu/whaley/papers/spercw04.ps.

[34]

B. Widrow and M. E. Hoff Jr. 1960. Adaptive switching circuits. In IRE WESCON Convention Record. 96--104.

[35]

Maciej Wielgosz, Ernest Jamro, and Kazimierz Wiatr. 2008. Highly efficient structure of 64-bit exponential function implemented in FPGAs. In Proceedings of the 4th International Workshop, Reconfigurable Computing: Architectures, Tools and Applications (ARC’08), London, UK, March 26-28, 2008. Roger Woods, Katherine Compton, Christos Bouganis, and Pedro C. Diniz (Eds.). Springer, Berlin, 274--279.

Digital Library

[36]

James H. Wilkinson. 1994. Rounding Errors in Algebraic Processes. Dover Publications, Incorporated, Mineola, NY.

Digital Library

[37]

Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2014. Openblas. Retrieved October 1, 2017 from http://xianyi.github.io/OpenBLAS.

[38]

Ying Yi, Roger Woods, Lok-Kee Ting, and CFN Cowan. 2005. High speed FPGA-based implementations of delayed-LMS filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 39, 1--2, 113--131.

Digital Library

[39]

M. Yukawa. 2012. Multikernel adaptive filtering. IEEE Transactions on Signal Processing 60, 9, 4672--4682.

Digital Library

Cited By

Ganewattha CKhan ZLehtomäki JLatva-Aho M(2023)Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF DataACM Transactions on Reconfigurable Technology and Systems10.1145/356339416:2(1-29)Online publication date: 11-Mar-2023
https://dl.acm.org/doi/10.1145/3563394
Gogineni VSambangi RAlex DMula SWerner S(2023)Algorithm and Architecture Design of Random Fourier Features-Based Kernel Adaptive FiltersIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.322772770:2(833-845)Online publication date: Feb-2023
https://doi.org/10.1109/TCSI.2022.3227727
de Dinechin FKumm Mde Dinechin FKumm M(2023)Floating-Point ExponentialApplication-Specific Arithmetic10.1007/978-3-031-42808-1_22(641-666)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_22
Show More Cited By

Index Terms

FPGA Implementations of Kernel Normalised Least Mean Squares Processors
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing

Recommendations

Kernel Normalised Least Mean Squares with Delayed Model Adaptation

Kernel adaptive filters (KAFs) are non-linear filters which can adapt temporally and have the additional benefit of being computationally efficient through use of the “kernel trick”. In a number of real-world applications, such as channel equalisation, ...
A Microcoded Kernel Recursive Least Squares Processor Using FPGA Technology

Kernel methods utilize linear methods in a nonlinear feature space and combine the advantages of both. Online kernel methods, such as kernel recursive least squares (KRLS) and kernel normalized least mean squares (KNLMS), perform nonlinear regression in ...
Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

In machine learning and scientific computing, some of the biggest challenges are efficient and performant portable computing. With our Parallel Least Squares Support Vector Machine (PLSSVM) library, we have not only developed an unrivaled Support ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 10, Issue 4

December 2017

119 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3166118

Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 December 2017

Accepted: 01 June 2017

Revised: 01 January 2017

Received: 01 April 2016

Published in TRETS Volume 10, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Councils Linkage Projects
Zomojo Pty Ltd

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
203
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ganewattha CKhan ZLehtomäki JLatva-Aho M(2023)Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF DataACM Transactions on Reconfigurable Technology and Systems10.1145/356339416:2(1-29)Online publication date: 11-Mar-2023
https://dl.acm.org/doi/10.1145/3563394
Gogineni VSambangi RAlex DMula SWerner S(2023)Algorithm and Architecture Design of Random Fourier Features-Based Kernel Adaptive FiltersIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.322772770:2(833-845)Online publication date: Feb-2023
https://doi.org/10.1109/TCSI.2022.3227727
de Dinechin FKumm Mde Dinechin FKumm M(2023)Floating-Point ExponentialApplication-Specific Arithmetic10.1007/978-3-031-42808-1_22(641-666)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_22
Lee JNikolopoulos DVandierendonck H(2022)Mixed-Precision Kernel Recursive Least SquaresIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.304167733:3(1284-1298)Online publication date: Mar-2022
https://doi.org/10.1109/TNNLS.2020.3041677
Fraser NLeong P(2020)Kernel Normalised Least Mean Squares with Delayed Model AdaptationACM Transactions on Reconfigurable Technology and Systems10.1145/337692413:2(1-30)Online publication date: 13-Feb-2020
https://dl.acm.org/doi/10.1145/3376924

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents