Shan Sung Liew was born in Kuching, Sarawak, Malaysia in 1989. He received his B.Eng. (Hons.) degree in computer engineering from Universiti Teknologi Malaysia in 2012, where he is currently pursuing the Ph.D. degree in electrical engineering. His Ph.D. research focuses on artificial neural networks, deep learning, distributed machine learning, and computer vision. Supervisors: Prof. Dr. Mohamed Khalil bin Mohd. Hani
This paper focuses on the enhancement of the generalization ability and training stability of dee... more This paper focuses on the enhancement of the generalization ability and training stability of deep neural networks (DNNs). New activation functions that we call bounded rectified linear unit (ReLU), bounded leaky ReLU, and bounded bi-firing are proposed. These activation functions are defined based on the desired properties of the universal approximation theorem (UAT). An additional work on providing a new set of coefficient values for the scaled hyperbolic tangent function is also presented. These works result in improved classification performances and training stability in DNNs. Experimental works using the multilayer perceptron (MLP) and convolutional neural network (CNN) models have shown that the proposed activation functions outperforms their respective original forms in regards to the classification accuracies and numerical stability. Tests on MNIST, mnist-rot-bg-img handwritten digit, and AR Purdue face databases show that significant improvements of 17.31%, 9.19%, and 74.99% can be achieved in terms of the testing misclassification error rates (MCRs), applying both mean squared error (MSE) and cross-entropy (CE) loss functions This is done without sacrificing the computational efficiency. With the MNIST dataset, bounding the output of an activation function results in a 78.58% reduction in numerical instability , and with the mnist-rot-bg-img and AR Purdue databases the problem is completely eliminated. Thus, this work has demonstrated the significance of bounding an activation function in helping to alleviate the training instability problem when training a DNN model (particularly CNN).
This paper proposes an efficient asynchronous stochastic second order learning algorithm for dist... more This paper proposes an efficient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is based on the B-SDLM algorithm that converges fast and requires only minimal computational overhead than the stochastic gradient descent (SGD) method. The proposed algorithm is implemented based on the parameter server thread model in the MPICH implementation. Experiments on the MNIST dataset have shown that training using the distributed B-SDLM on a 16-core CPU cluster allows the convolutional neural network (CNN) model to reach the convergence state very fast, with speedups of 6.03× and 12.28× to reach 0.01 training and 0.08 testing loss values, respectively. This also results in significantly less time taken to reach a certain classification accuracy (5.67× and 8.72× faster to reach 99 % training and 98 % testing accuracies on the MNIST dataset, respectively).
An approach using a convolutional neural network (CNN) is proposed for real-time gender classific... more An approach using a convolutional neural network (CNN) is proposed for real-time gender classification based on facial images. The proposed CNN architecture exhibits a much reduced design complexity when compared with other CNN solutions applied in pattern recognition. The number of processing layers in the CNN is reduced to only four by fusing the convolutional and subsampling layers. Unlike in conventional CNNs, we replace the convolution operation with cross-correlation, hence reducing the computational load. The network is trained using a second-order backpropagation learning algorithm with annealed global learning rates. Performance evaluation of the proposed CNN solution is conducted on two publicly available face databases of SUMS and AT&T. We achieve classification accuracies of 98.75% and 99.38% on the SUMS and AT&T databases, respectively. The neural network is able to process and classify a 32 × 32 pixel face image in less than 0.27 ms, which corresponds to a very high throughput of over 3700 images per second. Training converges within less than 20 epochs. These results correspond to a superior classification performance, verifying that the proposed CNN is an effective real-time solution for gender recognition.
This paper proposes an improved stochastic second order learning algorithm for supervised neural ... more This paper proposes an improved stochastic second order learning algorithm for supervised neural network training. The proposed algorithm, named bounded stochastic diagonal Levenberg–Marquardt (B-SDLM), utilizes both gradient and curvature information to achieve fast convergence while requiring only minimal computational overhead than the stochastic gradient descent (SGD) method. B-SDLM has only a single hyperparameter as opposed to most other learning algorithms that suffer from the hyperparameter overfitting problem due to having more hyperparameters to be tuned. Experiments using the multilayer perceptron (MLP) and convolutional neural network (CNN) models have shown that B-SDLM outperforms other learning algorithms with regard to the classification accuracies and computational efficiency (about 5.3% faster than SGD on the mnist-rot-bg-img database). It can classify all testing samples correctly on the face recognition case study based on AR Purdue database. In addition, experiments on handwritten digit classification case studies show that significant improvements of 19.6% on MNIST database and 17.5% on mnist-rot-bg-img database can be achieved in terms of the testing misclassification error rates (MCRs). The computationally expensive Hessian calculations are kept to a minimum by using just 0.05% of the training samples in its estimation or updating the learning rates once per two training epochs, while maintaining or even achieving lower testing MCRs. It is also shown that B-SDLM works well in the mini-batch learning mode, and we are able to achieve 3:32× performance speedup when deploying the proposed algorithm in a distributed learning environment with a quad-core processor.
The performance of a neural network depends critically on its model structure and the correspondi... more The performance of a neural network depends critically on its model structure and the corresponding learning algorithm. This paper proposes bounded stochastic diagonal Levenberg-Marquardt (B-SDLM), an improved second order stochastic learning algorithm for supervised neural network training. The algorithm consists of a single hyperparameter only and requires negligible additional computations compared to conventional stochastic gradient descent (SGD) method while ensuring better learning stability. The experiments have shown very fast convergence and better generalization ability achieved by our proposed algorithm , outperforming several other learning algorithms.
In this paper, we propose an effective convolutional neural network (CNN) model to the problem of... more In this paper, we propose an effective convolutional neural network (CNN) model to the problem of face recognition. The proposed CNN architecture applies fused convolution/ subsampling layers that result in a simpler model with fewer network parameters; that is, a smaller number of neurons, trainable parameters, and connections. In addition, it does not require any complex or costly image preprocessing steps that are typical in existing face recognizer systems. In this work, we enhance the stochastic diagonal Levenberg–Marquardt algorithm , a second-order back-propagation algorithm to obtain faster network convergence and better generalization ability. Experimental work completed on the ORL database shows that a recognition accuracy of 100% is achieved, with the network converging within 15 epochs. The average processing time of the proposed CNN face recognition solution, executed on a 2.5 GHz Intel i5 quad-core processor, is 3 s per epoch, with a recognition speed of less than 0.003 s. These results show that the proposed CNN model is a computationally efficient architecture that exhibits faster processing and learning times, and also produces higher recognition accuracy, outperforming other existing work on face recognizers based on neural networks.
We propose an asynchronous version of stochastic second-order optimization algorithm for parallel... more We propose an asynchronous version of stochastic second-order optimization algorithm for parallel distributed learning. Our proposed algorithm, namely Asynchronous Stochastic Diagonal Levenberg-Marquardt (A-SDLM) contains only a single hyper-parameter (i.e. the learning rate) while still retaining its second-order properties. We also present a machine learning framework for neural network learning to show the effectiveness of proposed algorithm. The framework includes additional learning procedures which can contribute to better learning performance as well. Our framework is derived from peer worker thread model, and is designed based on data parallelism approach. The framework has been implemented using multi-threaded programming. Our experiments have successfully shown the potentials of applying a second-order learning algorithm on distributed learning to achieve better training speedup and higher accuracy compared to traditional SGD.
In this paper, we present a convolutional neural network (CNN) approach for the face verification... more In this paper, we present a convolutional neural network (CNN) approach for the face verification task. We propose a " Siamese " architecture of two CNNs, with each CNN reduced to only four layers by fusing convolutional and subsampling layers. Network training is performed using the stochastic gradient descent algorithm with annealed global learning rate. Generalization ability of network is investigated via unique pairing of face images, and testing is done on AT&T face database. Experimental work shows that the proposed CNN system can classify a pair of 46×46 pixel face images in 0.6 milliseconds, which is significantly faster compared to equivalent network architecture with cascade of convolutional and subsampling layers. The verification accuracy achieved is 3.33% EER (equal error rate). Learning converges within 20 epochs, and the proposed technique can verify a test subject unseen in training. This work shows the viability of the " Siamese " CNN for face verification applications, and further improvements to the architecture are under construction to enhance its performance.
Face recognition remains a challenging problem till today. The main challenge is how to improve t... more Face recognition remains a challenging problem till today. The main challenge is how to improve the recognition performance when affected by the variability of non-linear effects that include illumination variances, poses, facial expressions, occlusions, etc. In this paper, a robust 4-layer Convolutional Neural Network (CNN) architecture is proposed for the face recognition problem, with a solution that is capable of handling facial images that contain occlusions, poses, facial expressions and varying illumination. Experimental results show that the proposed CNN solution outperforms existing works, achieving 99.5% recognition accuracy on AR database. The test on the 35-subjects of FERET database achieves an accuracy of 85.13%, which is in the similar range of performance as the best result of previous works. More significantly, our proposed system completes the facial recognition process in less than 0.01 seconds.
This paper focuses on the enhancement of the generalization ability and training stability of dee... more This paper focuses on the enhancement of the generalization ability and training stability of deep neural networks (DNNs). New activation functions that we call bounded rectified linear unit (ReLU), bounded leaky ReLU, and bounded bi-firing are proposed. These activation functions are defined based on the desired properties of the universal approximation theorem (UAT). An additional work on providing a new set of coefficient values for the scaled hyperbolic tangent function is also presented. These works result in improved classification performances and training stability in DNNs. Experimental works using the multilayer perceptron (MLP) and convolutional neural network (CNN) models have shown that the proposed activation functions outperforms their respective original forms in regards to the classification accuracies and numerical stability. Tests on MNIST, mnist-rot-bg-img handwritten digit, and AR Purdue face databases show that significant improvements of 17.31%, 9.19%, and 74.99% can be achieved in terms of the testing misclassification error rates (MCRs), applying both mean squared error (MSE) and cross-entropy (CE) loss functions This is done without sacrificing the computational efficiency. With the MNIST dataset, bounding the output of an activation function results in a 78.58% reduction in numerical instability , and with the mnist-rot-bg-img and AR Purdue databases the problem is completely eliminated. Thus, this work has demonstrated the significance of bounding an activation function in helping to alleviate the training instability problem when training a DNN model (particularly CNN).
This paper proposes an efficient asynchronous stochastic second order learning algorithm for dist... more This paper proposes an efficient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is based on the B-SDLM algorithm that converges fast and requires only minimal computational overhead than the stochastic gradient descent (SGD) method. The proposed algorithm is implemented based on the parameter server thread model in the MPICH implementation. Experiments on the MNIST dataset have shown that training using the distributed B-SDLM on a 16-core CPU cluster allows the convolutional neural network (CNN) model to reach the convergence state very fast, with speedups of 6.03× and 12.28× to reach 0.01 training and 0.08 testing loss values, respectively. This also results in significantly less time taken to reach a certain classification accuracy (5.67× and 8.72× faster to reach 99 % training and 98 % testing accuracies on the MNIST dataset, respectively).
An approach using a convolutional neural network (CNN) is proposed for real-time gender classific... more An approach using a convolutional neural network (CNN) is proposed for real-time gender classification based on facial images. The proposed CNN architecture exhibits a much reduced design complexity when compared with other CNN solutions applied in pattern recognition. The number of processing layers in the CNN is reduced to only four by fusing the convolutional and subsampling layers. Unlike in conventional CNNs, we replace the convolution operation with cross-correlation, hence reducing the computational load. The network is trained using a second-order backpropagation learning algorithm with annealed global learning rates. Performance evaluation of the proposed CNN solution is conducted on two publicly available face databases of SUMS and AT&T. We achieve classification accuracies of 98.75% and 99.38% on the SUMS and AT&T databases, respectively. The neural network is able to process and classify a 32 × 32 pixel face image in less than 0.27 ms, which corresponds to a very high throughput of over 3700 images per second. Training converges within less than 20 epochs. These results correspond to a superior classification performance, verifying that the proposed CNN is an effective real-time solution for gender recognition.
This paper proposes an improved stochastic second order learning algorithm for supervised neural ... more This paper proposes an improved stochastic second order learning algorithm for supervised neural network training. The proposed algorithm, named bounded stochastic diagonal Levenberg–Marquardt (B-SDLM), utilizes both gradient and curvature information to achieve fast convergence while requiring only minimal computational overhead than the stochastic gradient descent (SGD) method. B-SDLM has only a single hyperparameter as opposed to most other learning algorithms that suffer from the hyperparameter overfitting problem due to having more hyperparameters to be tuned. Experiments using the multilayer perceptron (MLP) and convolutional neural network (CNN) models have shown that B-SDLM outperforms other learning algorithms with regard to the classification accuracies and computational efficiency (about 5.3% faster than SGD on the mnist-rot-bg-img database). It can classify all testing samples correctly on the face recognition case study based on AR Purdue database. In addition, experiments on handwritten digit classification case studies show that significant improvements of 19.6% on MNIST database and 17.5% on mnist-rot-bg-img database can be achieved in terms of the testing misclassification error rates (MCRs). The computationally expensive Hessian calculations are kept to a minimum by using just 0.05% of the training samples in its estimation or updating the learning rates once per two training epochs, while maintaining or even achieving lower testing MCRs. It is also shown that B-SDLM works well in the mini-batch learning mode, and we are able to achieve 3:32× performance speedup when deploying the proposed algorithm in a distributed learning environment with a quad-core processor.
The performance of a neural network depends critically on its model structure and the correspondi... more The performance of a neural network depends critically on its model structure and the corresponding learning algorithm. This paper proposes bounded stochastic diagonal Levenberg-Marquardt (B-SDLM), an improved second order stochastic learning algorithm for supervised neural network training. The algorithm consists of a single hyperparameter only and requires negligible additional computations compared to conventional stochastic gradient descent (SGD) method while ensuring better learning stability. The experiments have shown very fast convergence and better generalization ability achieved by our proposed algorithm , outperforming several other learning algorithms.
In this paper, we propose an effective convolutional neural network (CNN) model to the problem of... more In this paper, we propose an effective convolutional neural network (CNN) model to the problem of face recognition. The proposed CNN architecture applies fused convolution/ subsampling layers that result in a simpler model with fewer network parameters; that is, a smaller number of neurons, trainable parameters, and connections. In addition, it does not require any complex or costly image preprocessing steps that are typical in existing face recognizer systems. In this work, we enhance the stochastic diagonal Levenberg–Marquardt algorithm , a second-order back-propagation algorithm to obtain faster network convergence and better generalization ability. Experimental work completed on the ORL database shows that a recognition accuracy of 100% is achieved, with the network converging within 15 epochs. The average processing time of the proposed CNN face recognition solution, executed on a 2.5 GHz Intel i5 quad-core processor, is 3 s per epoch, with a recognition speed of less than 0.003 s. These results show that the proposed CNN model is a computationally efficient architecture that exhibits faster processing and learning times, and also produces higher recognition accuracy, outperforming other existing work on face recognizers based on neural networks.
We propose an asynchronous version of stochastic second-order optimization algorithm for parallel... more We propose an asynchronous version of stochastic second-order optimization algorithm for parallel distributed learning. Our proposed algorithm, namely Asynchronous Stochastic Diagonal Levenberg-Marquardt (A-SDLM) contains only a single hyper-parameter (i.e. the learning rate) while still retaining its second-order properties. We also present a machine learning framework for neural network learning to show the effectiveness of proposed algorithm. The framework includes additional learning procedures which can contribute to better learning performance as well. Our framework is derived from peer worker thread model, and is designed based on data parallelism approach. The framework has been implemented using multi-threaded programming. Our experiments have successfully shown the potentials of applying a second-order learning algorithm on distributed learning to achieve better training speedup and higher accuracy compared to traditional SGD.
In this paper, we present a convolutional neural network (CNN) approach for the face verification... more In this paper, we present a convolutional neural network (CNN) approach for the face verification task. We propose a " Siamese " architecture of two CNNs, with each CNN reduced to only four layers by fusing convolutional and subsampling layers. Network training is performed using the stochastic gradient descent algorithm with annealed global learning rate. Generalization ability of network is investigated via unique pairing of face images, and testing is done on AT&T face database. Experimental work shows that the proposed CNN system can classify a pair of 46×46 pixel face images in 0.6 milliseconds, which is significantly faster compared to equivalent network architecture with cascade of convolutional and subsampling layers. The verification accuracy achieved is 3.33% EER (equal error rate). Learning converges within 20 epochs, and the proposed technique can verify a test subject unseen in training. This work shows the viability of the " Siamese " CNN for face verification applications, and further improvements to the architecture are under construction to enhance its performance.
Face recognition remains a challenging problem till today. The main challenge is how to improve t... more Face recognition remains a challenging problem till today. The main challenge is how to improve the recognition performance when affected by the variability of non-linear effects that include illumination variances, poses, facial expressions, occlusions, etc. In this paper, a robust 4-layer Convolutional Neural Network (CNN) architecture is proposed for the face recognition problem, with a solution that is capable of handling facial images that contain occlusions, poses, facial expressions and varying illumination. Experimental results show that the proposed CNN solution outperforms existing works, achieving 99.5% recognition accuracy on AR database. The test on the 35-subjects of FERET database achieves an accuracy of 85.13%, which is in the similar range of performance as the best result of previous works. More significantly, our proposed system completes the facial recognition process in less than 0.01 seconds.
Uploads
Papers by Shan Sung Liew