Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FPGA Implementations of Kernel Normalised Least Mean Squares Processors

Published: 15 December 2017 Publication History

Abstract

Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit non-linear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for hyperparameter optimisation fill pipeline stages, so no stall cycles to resolve dependencies are required. Together with other optimisations to reduce resource utilisation and latency, our core achieves 161 GFLOPS on a Virtex 7 XC7VX485T FPGA for a floating point implementation and 211 GOPS for fixed point. Our PCI Express based floating-point system implementation achieves 80% of the core’s speed, this being a speedup of 10× over an optimised implementation on a desktop processor and 2.66× over a GPU.

References

[1]
Nikolaos Alachiotis and Alexandros Stamatakis. 2011. FPGA Optimizations for a Pipelined Floating-Point Exponential Unit. In Proceedings of the 7th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC 2011), Belfast, UK, March 23-25, 2011. Springer, Berlin, Heidelberg, 316--327.
[2]
Davide Anguita, Luca Carlino, Alessandro Ghio, and Sandro Ridella. 2011. A FPGA core generator for embedded classification systems. Journal of Circuits, Systems and Computers 20, 02, 263--282.
[3]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281--305.
[4]
Badong Chen, Songlin Zhao, Pingping Zhu, and José Carlos Principe. 2012. Quantized kernel least mean square algorithm. IEEE Transactions on Neural Networks and Learning Systems 23, 1, 22--32.
[5]
Badong Chen, Nanning Zheng, and Jose C. Principe. 2013. Survival kernel with application to kernel adaptive filtering. In The 2013 International Joint Conference on Neural Networks (IJCNN’13). IEEE, 1--6.
[6]
Marc Claesen and Bart De Moor. 2015. Hyperparameter search in machine learning. In The XI Metaheuristics International Conference (MIC’15).
[7]
David Cox and Nicolas Pinto. 2011. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In 2011 IEEE International Conference on Automatic Face 8 Gesture Recognition and Workshops (FG’11). IEEE, 8--15.
[8]
J. Detrey and F. de Dinechin. 2005. A parameterized floating-point exponential function for FPGAs. In Proceedings of 2005 IEEE International Conference on Field-Programmable Technology. 27--34.
[9]
Scott C. Douglas, Quanhong Zhu, and Kent F. Smith. 1998. A pipelined LMS adaptive FIR filter architecture without adaptation delay. IEEE Transactions on Signal Processing 46, 3, 775--779.
[10]
N. J. Fraser, D. J. M. Moss, JunKyu Lee, S. Tridgell, C. T. Jin, and P. H. W. Leong. 2015. A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation. In 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--6.
[11]
Matthew Jacobsen, Yoav Freund, and Ryan Kastner. 2012. RIFFA: A reusable integration framework for FPGA accelerators. In IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, 216--219. http://dblp.uni-trier.de/db/conf/fccm/fccm2012.html/#JacobsenFK12.
[12]
E. Jamro, K. Wiatr, and M. Wielgosz. 2007. FPGA implementation of 64-bit exponential function for HPC. In International Conference on Field Programmable Logic and Applications (FPL’07). 718--721.
[13]
Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176.
[14]
Neil Lawrence, Matthias Seeger, and Ralf Herbrich. 2003. Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems. 609--616.
[15]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathemational Software 5, 3, 308--323.
[16]
Weifeng Liu, José C. Príncipe, and Simon Haykin. 2011. Kernel Adaptive Filtering: A Comprehensive Introduction. Vol. 57. John Wiley 8 Sons, Hoboken, NJ.
[17]
Guoz-hu Long, Fuyun Ling, and John G. Proakis. 1989. The LMS algorithm with delayed coefficient adaptation. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 9, 1397--1405.
[18]
Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. 2012. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Transactions on Architecture and Code Optimization 9, 1, Article 6, 30 pages.
[19]
Yeyong Pang, Shaojun Wang, Yu Peng, Nicholas J. Fraser, and Philip H. W. Leong. 2013. A low latency kernel recursive least squares processor using FPGA technology. In FPT. 144--151.
[20]
M. Papadonikolakis and C. Bouganis. 2008. A scalable FPGA architecture for non-linear SVM training. In International Conference on ICECE Technology (FPT’08). 337--340.
[21]
Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Computational Biology 5, 11, 1--12.
[22]
John Platt and others. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/.
[23]
Rainer D. Poltmann. 1995. Conversion of the delayed LMS algorithm into the LMS algorithm. Signal IEEE Processing Letters 2, 12, 223.
[24]
Robin Pottathuparambil and Ron Sass. 2009. A parallel/vectorized double-precision exponential core to accelerate computational science applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, NY,285--285.
[25]
Carl E. Rasmussen and Christoper K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
[26]
Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. 2014. Hardware implementation of KLMS algorithm using FPGA. In 2014 International Joint Conference on Neural Networks (IJCNN’14). IEEE, 2276--2281.
[27]
Cédric Richard, José Carlos M. Bermudez, and Paul Honeine. 2009. Online prediction of time series data with kernels. IEEE Transactions on Signal Processing, 57, 3, 1058--1067.
[28]
Bernhard Scholkopf and Alexander J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.
[29]
Matthias Seeger. 2000. Relationships between Gaussian processes, support vector machines and smoothing splines. Machine Learning).
[30]
Stephen Tridgell, Duncan J. M. Moss, Nicholas J. Fraser, and Philip H. W. Leong. 2015. Braiding: A scheme for resolving hazards in NORMA. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 136--143.
[31]
Steven Van Vaerenbergh. 2012. Kernel Methods Toolbox KAFBOX: a Matlab benchmarking toolbox for kernel adaptive filtering. Retrieved October 1, 2017 at http://sourceforge.net/p/kafbox.
[32]
S. Van Vaerenbergh, J. Via, and I. Santamaria. 2006. A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 5. 789--792.
[33]
R. Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35, 2, 101--121. http://www.cs.utsa.edu/whaley/papers/spercw04.ps.
[34]
B. Widrow and M. E. Hoff Jr. 1960. Adaptive switching circuits. In IRE WESCON Convention Record. 96--104.
[35]
Maciej Wielgosz, Ernest Jamro, and Kazimierz Wiatr. 2008. Highly efficient structure of 64-bit exponential function implemented in FPGAs. In Proceedings of the 4th International Workshop, Reconfigurable Computing: Architectures, Tools and Applications (ARC’08), London, UK, March 26-28, 2008. Roger Woods, Katherine Compton, Christos Bouganis, and Pedro C. Diniz (Eds.). Springer, Berlin, 274--279.
[36]
James H. Wilkinson. 1994. Rounding Errors in Algebraic Processes. Dover Publications, Incorporated, Mineola, NY.
[37]
Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2014. Openblas. Retrieved October 1, 2017 from http://xianyi.github.io/OpenBLAS.
[38]
Ying Yi, Roger Woods, Lok-Kee Ting, and CFN Cowan. 2005. High speed FPGA-based implementations of delayed-LMS filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 39, 1--2, 113--131.
[39]
M. Yukawa. 2012. Multikernel adaptive filtering. IEEE Transactions on Signal Processing 60, 9, 4672--4682.

Cited By

View all
  • (2023)Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF DataACM Transactions on Reconfigurable Technology and Systems10.1145/356339416:2(1-29)Online publication date: 11-Mar-2023
  • (2023)Algorithm and Architecture Design of Random Fourier Features-Based Kernel Adaptive FiltersIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.322772770:2(833-845)Online publication date: Feb-2023
  • (2023)Floating-Point ExponentialApplication-Specific Arithmetic10.1007/978-3-031-42808-1_22(641-666)Online publication date: 23-Aug-2023
  • Show More Cited By

Index Terms

  1. FPGA Implementations of Kernel Normalised Least Mean Squares Processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 4
    December 2017
    119 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/3166118
    • Editor:
    • Steve Wilton
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 December 2017
    Accepted: 01 June 2017
    Revised: 01 January 2017
    Received: 01 April 2016
    Published in TRETS Volume 10, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGAs
    2. hyperparameter search
    3. machine learning
    4. pipeline

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Australian Research Councils Linkage Projects
    • Zomojo Pty Ltd

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF DataACM Transactions on Reconfigurable Technology and Systems10.1145/356339416:2(1-29)Online publication date: 11-Mar-2023
    • (2023)Algorithm and Architecture Design of Random Fourier Features-Based Kernel Adaptive FiltersIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.322772770:2(833-845)Online publication date: Feb-2023
    • (2023)Floating-Point ExponentialApplication-Specific Arithmetic10.1007/978-3-031-42808-1_22(641-666)Online publication date: 23-Aug-2023
    • (2022)Mixed-Precision Kernel Recursive Least SquaresIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.304167733:3(1284-1298)Online publication date: Mar-2022
    • (2020)Kernel Normalised Least Mean Squares with Delayed Model AdaptationACM Transactions on Reconfigurable Technology and Systems10.1145/337692413:2(1-30)Online publication date: 13-Feb-2020

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media