Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3523181.3523209acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasseConference Proceedingsconference-collections
research-article

A Task-Parallel and Reconfigurable FPGA-Based Hardware Implementation of Extreme Learning Machine

Published: 18 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Extreme learning machine (ELM) is an emerging machine learning algorithm and widely used in various real-world applications due to its extremely fast training speed, good generalization and universal approximation capability. In order to further explore the ELM to be used in practical embedded systems, a task-parallel and reconfigurable FPGA-based hardware architecture of ELM algorithm is presented in this paper. The proposed architecture performs the on-chip machine learning for both training and prediction phases which are implemented parameterizably based on the reconfigurable parameters. Meanwhile, the task-parallel efforts are focused on the training phase to improve the computational efficiency by resolving the serial computations into subtasks for task-parallel computations. In addition, the on-chip block RAMs reuse scheme is also applied in proposed architecture for saving on-chip resource consumption. The experimental results show that the proposed ELM architecture can achieve similar accuracy compared with floating-point implementation on Matlab and outperform the recently published ELM implementations in terms of hardware performance, power consumption and resource utilization.

    References

    [1]
    G-B. Huang, H. zhou, X. Ding, and R. Zhang, 2012, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513-529.
    [2]
    A. A. Mohammed, R. Minhas, Q. M. J. Wu, and M. A. Sid-Ahmed, 2011, Human face recognition based on multidimensional PCA and extreme learning machine, Pattern Recognit., vol. 44, nos. 10-11, pp. 2588-2597.
    [3]
    C. Pan, D. S. Park, Y. Yang, and H. M. Yoo, 2012, Leukocyte image segmentation by visual attention and extreme learning machine, Neural Comput. Appl., vol. 21, no. 6, pp. 1217-1227.
    [4]
    R. Minhas, A. Baradarani, S. Seifzadeh, and Q. M. J. Wu, 2010, Human action recognition using extreme learning machine based on visual vocabularies, Neurocomputing, vol. 73, nos. 10-12, pp. 1906-1917.
    [5]
    M. van Heeswijk, Y. Miche, E. Oja, A. Lendasse, 2011, GPU-accelerated and parallelized elm ensembles for large-scale regression, Neurocomputing., vol. 74, no. 16, pp. 2430-2437.
    [6]
    S. Li, X. Niu, Y. Dou, Q. Lv, Y. Wang, 2017, Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine, Neurocomputing, vol. 261, no. 25, pp.153-163.
    [7]
    T. Chen, Z. Du, N. Sun, J. Wang, 2014, DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning, ACM Sigplan Notices, vol. 49, no. 4, pp. 269-284.
    [8]
    E. Kavvousanos and V. Paliouras, 2021, Optimizing deep learning decoders for FPGA implementation, in Proc. 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), pp. 271-272.
    [9]
    H. Younes, A. Ibrahim, M. Rizk and M. Valle, 2021, Efficient FPGA implementation of approximate singular value decomposition based on shallow neural networks, in Proc. 2021 IEEE 3rd International Conference on AICAS, pp. 1-4.
    [10]
    E. Ragusa, C. Gianoglio, R. Zunino, P. Gastaldo, 2020, A design strategy for the efficient implementation of random basis neural networks on resource-constrained devices, Neural Processing Letters, vol. 51, no. 2, pp. 1611-1629.
    [11]
    G. Zhang, B. Li, J. Wu, 2020, A low-cost and high-speed hardware implementation of spiking neural network, Neurocomputing, vol. 382, pp. 106-115.
    [12]
    W. Yi, J. Park and J. Kim, 2020, GeCo: Classification Restricted Boltzmann Machine hardware for on-chip semisupervised learning and Bayesian inference, IEEE Trans. Neur. Net. Learn. Syst. vol. 31, no. 1, pp. 53-65.
    [13]
    A. Kosuge, Y. -C. Hsu, M. Hamada and T. Kuroda, 2022, A 0.61-μJ/Frame pipelined wired-logic DNN processor in 16-nm FPGA using convolutional non-linear neural network, IEEE Open Journal of Circuits and Systems, vol. 3, pp. 4-14.
    [14]
    J. Meng, S. K. Venkataramanaiah, C. Zhou, P. Hansen, P. Whatmough and J. -s. Seo, 2021, FixyFPGA: Efficient FPGA accelerator for deep neural networks with high Element-Wise sparsity and without external memory access, in Proc.  2021 31st International Conference on FPL, 2021, pp. 9-16.
    [15]
    J. Han, Z. Li, W. Zheng, Y. Zhang, 2020, Hardware implementation of spiking neural networks on FPGA, Tsinghua Science and Technology, vol. 25, no. 4, pp. 479-486.
    [16]
    Hui Huang, Jing Yang, Hai-Jun Rong, Shaoyi Du, 2021, A generic FPGA-based hardware architecture for recursive least mean p-power extreme learning machine, Neurocomputing, vol 456, pp. 421-435.
    [17]
    J. V. Frances-Villora, A. Rosado-Munoz, J. M. Martınez-Villena, 2016, Hardware implementation of realtime Extreme Learning Machine in FPGA: analysis of precision, resource occupation and performance, Computers & Electrical Engineering, vol. 51, pp. 139-156.
    [18]
    A. Safaei, Q. J. Wu, T. Akilan, Y. Yang, 2019, System-on-a-chip (SoC)-based hardware acceleration for an online sequential extreme learning machine (OS-ELM), IEEE Trans. Com-Aided Design. Int Circ. Syst, vol. 38, no. 11, pp. 2127-2138.
    [19]
    Golub G, Van Loan C. 1996, Matrix computations. Johns Hopkins University Press, Baltimore, Maryland.
    [20]
    Gander W. 1980, Algorithms for the QR-decomposition. Eidgenoessische Technische Hochschule, Zurich, Research Report no. 80-02 ed.
    [21]
    A. Bjõrck 1996, Numerical methods for least squares problems, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania.
    [22]
    Z. Xie, 2012, A non-linear approximation of the sigmoid function based on FPGA, in: Proc. 2012 IEEE. Fifth. Int. Conf. Adv. Comp. Int, pp. 221-223.

    Cited By

    View all
    • (2022)An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devicesNeural Computing and Applications10.1007/s00521-022-08034-235:9(6753-6766)Online publication date: 29-Nov-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ASSE' 22: 2022 3rd Asia Service Sciences and Software Engineering Conference
    February 2022
    202 pages
    ISBN:9781450387453
    DOI:10.1145/3523181
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the Science and Technology Development Fund, Macau SAR
    • the Macao Young Scholar Program
    • the Natural Science Basic Research Plan in Shaanxi Province of China
    • the National Natural Science Foundation of China
    • the National Natural Science Foundation of China
    • the Zhuhai Science and Technology Innovation Bureau Zhuhai-Hong Kong-Macau Special Cooperation Project

    Conference

    ASSE' 22

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devicesNeural Computing and Applications10.1007/s00521-022-08034-235:9(6753-6766)Online publication date: 29-Nov-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media