research-article

A Task-Parallel and Reconfigurable FPGA-Based Hardware Implementation of Extreme Learning Machine

Authors:

Zhao-Xu YangAuthors Info & Claims

ASSE' 22: 2022 3rd Asia Service Sciences and Software Engineering Conference

Pages 194 - 202

https://doi.org/10.1145/3523181.3523209

Published: 18 April 2022 Publication History

Abstract

Extreme learning machine (ELM) is an emerging machine learning algorithm and widely used in various real-world applications due to its extremely fast training speed, good generalization and universal approximation capability. In order to further explore the ELM to be used in practical embedded systems, a task-parallel and reconfigurable FPGA-based hardware architecture of ELM algorithm is presented in this paper. The proposed architecture performs the on-chip machine learning for both training and prediction phases which are implemented parameterizably based on the reconfigurable parameters. Meanwhile, the task-parallel efforts are focused on the training phase to improve the computational efficiency by resolving the serial computations into subtasks for task-parallel computations. In addition, the on-chip block RAMs reuse scheme is also applied in proposed architecture for saving on-chip resource consumption. The experimental results show that the proposed ELM architecture can achieve similar accuracy compared with floating-point implementation on Matlab and outperform the recently published ELM implementations in terms of hardware performance, power consumption and resource utilization.

References

[1]

G-B. Huang, H. zhou, X. Ding, and R. Zhang, 2012, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513-529.

Digital Library

[2]

A. A. Mohammed, R. Minhas, Q. M. J. Wu, and M. A. Sid-Ahmed, 2011, Human face recognition based on multidimensional PCA and extreme learning machine, Pattern Recognit., vol. 44, nos. 10-11, pp. 2588-2597.

Digital Library

[3]

C. Pan, D. S. Park, Y. Yang, and H. M. Yoo, 2012, Leukocyte image segmentation by visual attention and extreme learning machine, Neural Comput. Appl., vol. 21, no. 6, pp. 1217-1227.

Digital Library

[4]

R. Minhas, A. Baradarani, S. Seifzadeh, and Q. M. J. Wu, 2010, Human action recognition using extreme learning machine based on visual vocabularies, Neurocomputing, vol. 73, nos. 10-12, pp. 1906-1917.

Digital Library

[5]

M. van Heeswijk, Y. Miche, E. Oja, A. Lendasse, 2011, GPU-accelerated and parallelized elm ensembles for large-scale regression, Neurocomputing., vol. 74, no. 16, pp. 2430-2437.

[6]

S. Li, X. Niu, Y. Dou, Q. Lv, Y. Wang, 2017, Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine, Neurocomputing, vol. 261, no. 25, pp.153-163.

[7]

T. Chen, Z. Du, N. Sun, J. Wang, 2014, DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning, ACM Sigplan Notices, vol. 49, no. 4, pp. 269-284.

Digital Library

[8]

E. Kavvousanos and V. Paliouras, 2021, Optimizing deep learning decoders for FPGA implementation, in Proc. 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), pp. 271-272.

[9]

H. Younes, A. Ibrahim, M. Rizk and M. Valle, 2021, Efficient FPGA implementation of approximate singular value decomposition based on shallow neural networks, in Proc. 2021 IEEE 3rd International Conference on AICAS, pp. 1-4.

[10]

E. Ragusa, C. Gianoglio, R. Zunino, P. Gastaldo, 2020, A design strategy for the efficient implementation of random basis neural networks on resource-constrained devices, Neural Processing Letters, vol. 51, no. 2, pp. 1611-1629.

Digital Library

[11]

G. Zhang, B. Li, J. Wu, 2020, A low-cost and high-speed hardware implementation of spiking neural network, Neurocomputing, vol. 382, pp. 106-115.

Digital Library

[12]

W. Yi, J. Park and J. Kim, 2020, GeCo: Classification Restricted Boltzmann Machine hardware for on-chip semisupervised learning and Bayesian inference, IEEE Trans. Neur. Net. Learn. Syst. vol. 31, no. 1, pp. 53-65.

[13]

A. Kosuge, Y. -C. Hsu, M. Hamada and T. Kuroda, 2022, A 0.61-μJ/Frame pipelined wired-logic DNN processor in 16-nm FPGA using convolutional non-linear neural network, IEEE Open Journal of Circuits and Systems, vol. 3, pp. 4-14.

[14]

J. Meng, S. K. Venkataramanaiah, C. Zhou, P. Hansen, P. Whatmough and J. -s. Seo, 2021, FixyFPGA: Efficient FPGA accelerator for deep neural networks with high Element-Wise sparsity and without external memory access, in Proc. 2021 31st International Conference on FPL, 2021, pp. 9-16.

[15]

J. Han, Z. Li, W. Zheng, Y. Zhang, 2020, Hardware implementation of spiking neural networks on FPGA, Tsinghua Science and Technology, vol. 25, no. 4, pp. 479-486.

[16]

Hui Huang, Jing Yang, Hai-Jun Rong, Shaoyi Du, 2021, A generic FPGA-based hardware architecture for recursive least mean p-power extreme learning machine, Neurocomputing, vol 456, pp. 421-435.

Digital Library

[17]

J. V. Frances-Villora, A. Rosado-Munoz, J. M. Martınez-Villena, 2016, Hardware implementation of realtime Extreme Learning Machine in FPGA: analysis of precision, resource occupation and performance, Computers & Electrical Engineering, vol. 51, pp. 139-156.

Digital Library

[18]

A. Safaei, Q. J. Wu, T. Akilan, Y. Yang, 2019, System-on-a-chip (SoC)-based hardware acceleration for an online sequential extreme learning machine (OS-ELM), IEEE Trans. Com-Aided Design. Int Circ. Syst, vol. 38, no. 11, pp. 2127-2138.

[19]

Golub G, Van Loan C. 1996, Matrix computations. Johns Hopkins University Press, Baltimore, Maryland.

[20]

Gander W. 1980, Algorithms for the QR-decomposition. Eidgenoessische Technische Hochschule, Zurich, Research Report no. 80-02 ed.

[21]

A. Bjõrck 1996, Numerical methods for least squares problems, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania.

[22]

Z. Xie, 2012, A non-linear approximation of the sigmoid function based on FPGA, in: Proc. 2012 IEEE. Fifth. Int. Conf. Adv. Comp. Int, pp. 221-223.

Cited By

Ragusa EGianoglio CZunino RGastaldo P(2022)An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devicesNeural Computing and Applications10.1007/s00521-022-08034-235:9(6753-6766)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1007/s00521-022-08034-2

Recommendations

Hardware implementation of real-time Extreme Learning Machine in FPGA

Extreme Learning Machine (ELM) on-chip learning is implemented on FPGA.Three hardware architectures are evaluated.Parametrical analysis of accuracy, resource occupation and performance is carried out. Display Omitted Extreme Learning Machine (ELM) ...
A Reconfigurable Parallel Hardware Implementation of the Self-Tuning Regulator

The self-tuning regulator (STR) is a popular adaptive control algorithm. A high-performance computer is required for its implementation due to the heavy online computational burden. To extend STR for more real-time applications, a parallel hardware ...
Hardware Implementation of Reconfigurable 1D Convolution

Convolution has been extensively used in image processing and computer vision, including image enhancement, smoothing, and structure extraction. However, convolution operation typically requires a significant amount of computing resources. A novel one-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ASSE' 22: 2022 3rd Asia Service Sciences and Software Engineering Conference

February 2022

202 pages

ISBN:9781450387453

DOI:10.1145/3523181

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the Science and Technology Development Fund, Macau SAR
the Macao Young Scholar Program
the Natural Science Basic Research Plan in Shaanxi Province of China
the National Natural Science Foundation of China
the National Natural Science Foundation of China
the Zhuhai Science and Technology Innovation Bureau Zhuhai-Hong Kong-Macau Special Cooperation Project

Conference

ASSE' 22

ASSE' 22: 2022 3rd Asia Service Sciences and Software Engineering Conference

February 24 - 26, 2022

Macau, Macao

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
55
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Ragusa EGianoglio CZunino RGastaldo P(2022)An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devicesNeural Computing and Applications10.1007/s00521-022-08034-235:9(6753-6766)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1007/s00521-022-08034-2

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents