Abstract
The parallel computation capabilities of modern graphics processing units (GPUs) have attracted increasing attention from researchers and engineers who have been conducting high computational throughput studies. However, current single GPU based engineering solutions are often struggling to fulfill their real-time requirements. Thus, the multi-GPU-based approach has become a popular and cost-effective choice for tackling the demands. In those cases, the computational load balancing over multiple GPU “nodes” is often the key and bottleneck that affect the quality and performance of the real-time system. The existing load balancing approaches are mainly based on the assumption that all GPU nodes in the same computer framework are of equal computational performance, which is often not the case due to cluster design and other legacy issues. This paper presents a novel dynamic load balancing (DLB) model for rapid data division and allocation on heterogeneous GPU nodes based on an innovative fuzzy neural network (FNN). In this research, a 5-state parameter feedback mechanism defining the overall cluster and node performance is proposed. The corresponding FNN-based DLB model will be capable of monitoring and predicting individual node performance under different workload scenarios. A real-time adaptive scheduler has been devised to reorganize the data inputs to each node when necessary to maintain their runtime computational performance. The devised model has been implemented on two dimensional (2D) discrete wavelet transform (DWT) applications for evaluation. Experiment results show that this DLB model enables a high computational throughput while ensuring real-time and precision requirements from complex computational tasks.
Similar content being viewed by others
References
D. B. Kirk, W. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach, 3rd ed, New York, USA: Morgan Kaufmann, 2016.
R. Couturier. Designing Scientific Applications on GPUs, Boca Raton, USA: CRC Press, 2013.
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, D. Glasco. GPUs and the future of parallel computing. IEEE Micro, vol. 31, no. 5, pp. 7–17, 2011. DOI: 10.1109/MM.2011.89.
C. W. Lee, J. Ko, T. Y. Choe. Two-way partitioning of a recursive Gaussian filter in CUDA. EURASIP Journal on Image and Video Processing, vol. 2014, no. 1, Article number 33, 2014. DOI: 10.1186/1687-5281-2014-33.
J. A. Belloch, A. Gonzalez, F. J. Martínez-Zaldívar, A. M. Vidal. Real-time massive convolution for audio applications on GPU. The Journal of Supercomputing, vol. 58, no. 3, pp. 449–457, 2011. DOI: 10.1007/s11227-011-0610.
F. Nasse, C. Thurau, G. A. Fink. Face detection using GPU-based convolutional neural networks. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns, Münster, Germany, pp. 83–90, 2009. DOI: 10.1007/978-3-642-03767-2 10.
NVIDIA. CUDA C Programming Guide v8.0. [Online], Available: http://docs.nvidia.com cuda/cuda-cprogramming- guide/index.htm, 2017.
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017. DOI: 10.1145/3065386.
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: 10.1109/CVPR.2015.7298594.
E. Guerra, J. De Lara, A. Malizia, P. Díaz. Supporting user-oriented analysis for multi-view domain-specific visual languages. Information and Software Technology, vol. 51, no. 4, pp. 769–784, 2009. DOI: 10.1016/j.infsof.2008.09.005.
X. J. Jiang, D. J. Whitehouse. Technological shifts in surface metrology. CIRP Annals, vol. 61, no. 2, pp. 815–836, 2012. DOI: 10.1016/j.cirp.2012.05.009.
J. J. Wang, W. L. Lu, X. J. Liu, X. Q. Jiang. Highspeed parallel wavelet algorithm based on CUDA and its application in three-dimensional surface texture analysis. In Proceedings of International Conference on Electric Information and Control Engineering, IEEE, Wuhan, China, pp. 2249–2252, 2011. DOI: 10.1109/ICEICE.2011.5778225.
S. Chen, X. M. Li. A hybrid GPU/CPU FFT library for large FFT problems. In Proceedings of the 32nd International Performance Computing and Communications Conference, IEEE, San Diego, USA, 2013. DOI: 10.1109/PCCC.2013.6742796.
C. L. Zhang, Y. P. Xu, J. He, J. Lu, L. Lu, Z. J. Xu. Multi-GPUs Gaussian filtering for real-time big data processing. In Proceedings of the 10th International Conference on Software, Knowledge, Information Management & Applications, IEEE, Chengdu, China, 2016. DOI: 10.1109/SKIMA.2016.7916225.
S. Schaetz, M. Uecker. A multi-GPU programming library for real-time applications. In Proceedings of the 12th International Conference on Algorithms and Architectures for Parallel Processing, Fukuoka, Japan, pp. 231–236, 2012. DOI: 10.1007/978-3-642-33078-0 9.
J. A. Stuart, J. D. Owens. Multi-GPU MapReduce on GPU clusters. In Proceedings of 2011 IEEE International Parallel & Distributed Processing Symposium, IEEE, Anchorage, USA, pp. 1068–1079, 2011. DOI: 10.1109/IPDPS.2011.102.
M. Grossman, M. Breternitz, V. Sarkar. HadoopCL: MapReduce on distributed heterogeneous platforms through seamless integration of Hadoop and OpenCL. In Proceedings of the 27th Parallel and Distributed Processing Symposium Workshops & PhD Forum, IEEE, Cambridge, MA, USA, pp. 1918–1927, 2013. DOI: 10.1109/IPDPSW.2013.246.
M. Boyer, K. Skadron, S. Che, N. Jayasena. Load balancing in a changing world: Dealing with heterogeneity and performance variability. In Proceedings of ACM International Conference on Computing Frontiers, Ischia, Italy, 2013. DOI: 10.1145/2482767.2482794.
L. Chen, O. Villa, S. Krishnamoorthy, G. R. Gao. Dynamic load balancing on single- and multi-GPU systems. In Proceedings of IEEE International Symposium on Parallel & Distributed Processing, IEEE, Atlanta, USA, 2010. DOI: 10.1109/IPDPS.2010.5470413.
A. Acosta, R. Corujo, V. Blanco, F. Almeida. Dynamic load balancing on heterogeneous multicore/multiGPU systems. In Proceedings of International Conference on High Performance Computing and Simulation, IEEE, Caen, France, pp. 467–476, 2010. DOI: 10.1109/HPCS.2010.5547097.
A. Acosta, V. Blanco, F. Almeida. Towards the dynamic load balancing on heterogeneous multi-GPU systems. In Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE, Leganes, Spain, pp. 646–653, 2012. DOI: 10.1109/ISPA.2012.96.
B. Pérez, E. Stafford, J. L. Bosque, R. Beivide. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. The Journal of Supercomputing, vol. 73, no. 1, pp. 330–342, 2017. DOI: 10.1007/s11227-016- 1864-y.
R. Kaleem, R. Barik, T. Shpeisman, C. L. Hu, B. T. Lewis, K. Pingali. Adaptive heterogeneous scheduling for integrated GPUs. In Proceedings of the 23rd International Conference on Parallel Architecture and Compilation Techniques, IEEE, Edmonton, Canada, pp. 151–162, 2014. DOI: 10.1145/2628071.2628088.
C. L. Zhang, Y. P. Xu, J. L. Zhou, Z. J. Xu, L. Lu, J. Lu. Dynamic load balancing on multi-GPUs system for big data processing. In Proceedings of the 23rd International Conference on Automation and Computing, IEEE, Huddersfield, UK, 2017. DOI: 10.23919/IConAC.2017.8082085.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90.
H. Zermane, H. Mouss. Development of an internet and fuzzy based control system of manufacturing process. International Journal of Automation and Computing, vol. 14, no. 6, pp. 706–718, 2017. DOI: 10.1007/s11633-016-1027-x.
J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, W. J. Lou. Fuzzy keyword search over encrypted data in cloud computing. In Proceedings of IEEE Conference on Computer Communications, IEEE, San Diego, CA, USA, pp. 1–5, 2010. DOI: 10.1109/INFCOM.2010.5462196.
S. Krinidis, V. Chatzis. A robust fuzzy local information C-means clustering algorithm. IEEE Transactions on Image Processing, vol. 19, no. 5, pp. 1328–1337, 2010. DOI: 10.1109/TIP.2010.2040763.
M. Algabri, H. Mathkour, H. Ramdane. Mobile robot navigation and obstacle-avoidance using ANFIS in unknown environment. International Journal of Computer Applications, vol. 91, no. 14, pp. 36–41, 2014. DOI: 10.5120/15952- 5400.
R. J. Kuo, S. Y. Hong, Y. C. Huang. Integration of particle swarm optimization-based fuzzy neural network and artificial neural network for supplier selection. Applied Mathematical Modelling, vol. 34, no. 12, pp. 3976–3990, 2010. DOI: 10.1016/j.apm.2010.03.033.
C. L. P. Chen, Y. J. Liu, G. X. Wen. Fuzzy neural network-based adaptive control for a class of uncertain nonlinear stochastic systems. IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 583–593, 2014. DOI: 10.1109/TCYB. 2013.2262935.
A. Saffar, R. Hooshmand, A. Khodabakhshian. A new fuzzy optimal reconfiguration of distribution systems for loss reduction and load balancing using ant colony search-based algorithm. Applied Soft Computing, vol. 11, no. 5, pp. 4021–4028, 2011. DOI: 10.1016/j.asoc.2011.03.003.
N. Susila, S. Chandramathi, R. Kishore. A fuzzy-based firefly algorithm for dynamic load balancing in cloud computing environment. Journal of Emerging Technologies in Web Intelligence, vol. 6, no. 4, pp. 435–440, 2014. DOI:10.4304/jetwi.6.4.435-440
A. N. Toosi, R. Buyya. A fuzzy logic-based controller for cost and energy efficient load balancing in geo-distributed data centers. In Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing, IEEE, Limassol, Cyprus, pp. 186–194, 2015. DOI: 10.1109/UCC.2015.35.
H. Muhamedsalih, X. Jiang, F. Gao. Accelerated surface measurement using wavelength scanning interferometer with compensation of environmental noise. Procedia CIRP, vol. 10, pp. 70–76, 2013. DOI: 10.1016/j.procir.2013.08.014.
S. H. Lee, J. S. Lim. Forecasting KOSPI based on a neural network with weighted fuzzy membership functions. Expert Systems with Applications, vol. 38, no. 4, pp. 4259–4263, 2011. DOI: 10.1016/j.eswa.2010.09.093.
W. Sweldens. The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, vol. 29, no. 2, pp. 511–546, 1998. DOI: 10.1137/S0036141095289051.
S. Mittal, J. S. Vetter. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys, vol. 47, no. 4, Article number 69, 2015. DOI: 10.1145/2788396.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by National Natural Science Foundation of China (No. 61203172), the SSTP of Sichuan (Nos. 2018YYJC0994 and 2017JY0011) and Shenzhen STPP (No. GJHZ20160301164521358).
Recommended by Associate Editor Hong-Ji Yang
Chao-Long Zhang received the B.Eng. and M. Sc. degrees in software engineering from Chengdu University of Information Technology, China in 2014 and 2017, respectively. He is currently a Ph.D. degree candidate with School of Computing and Engineering, University of Huddersfield, UK.
His research interests include high-performance computing (HPC), computer vision, and deep learning network applications.
Yuan-Ping Xu received the B.Eng. degree in computer science and technology from Southwest Jiaotong University, China in 2003, and M. Sc. and Ph.D. degrees in software engineering from University of Huddersfield, UK in 2004 and 2009, respectively. From February 2009 to November 2010, he worked as a research fellow in the Centre of Precision Technologies, University of Huddersfield, UK. He is currently a professor with School of Software Engineering, Chengdu University of Information Technology, China.
His research interests include knowledge-based systems, expert systems, big data analysis and deep learning network applications.
Zhi-Jie Xu received the B.Eng. degree in communication engineering from the Xi’an University of Science and Technology, China in 1991. After graduation, he first started as an electronics engineer before moving to the United Kingdom and worked as a research scientist in the Robotics Lab at the University of Derby. He received the Ph.D. degree in 2000 from the University of Derby based on his research work in virtual reality-based manufacturing simulation and robotics systems. He has been employed as a full time academic member of staff since April 1999 serving the roles of lecturer, senior lecturer, reader and professor respectively at the University of Huddersfield in UK. He has published over one hundred peer-reviewed journal and conference papers as well as edited 5 books in the relevant fields. He has successfully supervised 8 Ph. D. students to completion while securing substantial research and industrial grants. He is a member of the IEEE, IET, BCS, and a fellow of HEA, and editors for multiple prestigious academic journals and conferences. He is the current President of the Chinese Automation and Computing Society in the United Kingdom.
His research interests include visualization, HCI, vision systems, and machine learning.
Jia He received B. Eng. and M. sc. degrees in computer science and technology from Southwest Normal University of China, China in 1989 and 1996, respectively, and received Ph.D. degree in computer science from University of Electronic Science and Technology of China, China in 2012. She is currently a professor and the Dean with School of Computer Science, Chengdu University of Information Technology, China.
Her research interests include computer vision, artificial intelligence, and pattern recognition.
Jing Wang received the Ph.D. degree from University of Huddersfield, UK in 2012. He worked as a research fellow and carried out independent research work on image processing, analysing and understanding in University of Huddersfield, UK before 2017. He is now working at Sheffield Hallam University as a lecturer in software engineering and computer science.
His research interest is real-world applications of computer vision systems.
Jian-Hua Adu received B. Sc. degree in applied physics from Minzu University of China, China in 1999, received M. Sc. degree in computer science from Shandong University, China in 2006, and received Ph.D. degree in computer science from Sichuan University, China in 2012. He is currently an associate professor with School of Software Engineering, Chengdu University of Information Technology, China.
His research interests include image fusion and segmentation, medical image processing and analysis, and pattern recognition.
Rights and permissions
About this article
Cite this article
Zhang, CL., Xu, YP., Xu, ZJ. et al. A Fuzzy Neural Network Based Dynamic Data Allocation Model on Heterogeneous Multi-GPUs for Large-scale Computations. Int. J. Autom. Comput. 15, 181–193 (2018). https://doi.org/10.1007/s11633-018-1120-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-018-1120-4