Keywords : Source separation, speech processing, speech-music segregation, spectral subtraction, ... more Keywords : Source separation, speech processing, speech-music segregation, spectral subtraction, music filtering. A B S T R A C T Background interference creates voice intelligibility issue for listener. This research work considers background music as interference for communication through smart phone in areas with loud background music. This paper proposes a novel framework for background music segregation from human speech using music fingerprinting and acoustic echo cancellation. Initially, background music is searched in the database by music fingerprinting. Identified background music is registered and segregated using acoustic echo cancellation. Proposed approach generates better quality music speech segregation than existing algorithms. The research work is novel and segregates background music completely in comparison to existing approaches where single instruments are segregated successfully.
Falling is a severe hazard among older adults. Fall treatment is considered to be one of the most... more Falling is a severe hazard among older adults. Fall treatment is considered to be one of the most costly treatments, which usually extends to a long time. One bad fall can cause severe injuries that may lead to permanent disability or even death. Therefore, an efficient and cost-effective fall monitoring system is exceptionally indispensable. With the advancement in technology, wearable sensors and systems provide a lucrative way to continuously monitor the elderly people for detecting any fall incident that may occur. Most of these wearable fall monitoring systems focus only on detecting a fall incident. However, to avoid the risk of any future fall, it is essential to be aware of the cause of a fall incident also. Therefore, to address this challenge, a wearable sensor-based continuous fall monitoring system is proposed in this paper, which is capable of detecting a fall and identifying the falling pattern and the activity associated with the fall incident. The performance of the proposed scheme is investigated with a series of experiments using three machine learning algorithms, namely, $k$ -nearest neighbors (KNNs), support vector machine, and random forest (RF). The proposed methodology achieved the highest accuracy for fall detection, i.e., 99.80%, using KNNs classifier, whereas the highest accuracy achieved in recognizing different falling activities is 96.82% using RF classifier.
HEVC (High Efficiency Video Coding), the state-of-the-art video coding standard has 3D extension ... more HEVC (High Efficiency Video Coding), the state-of-the-art video coding standard has 3D extension known as 3D-HEVC, which is established by JCT-3V. In current design of 3D-HEVC, to exploit the redundancies of the 3D video signal, various tools are integrated. In 3D-HEVC, the neighboring block disparity vector (NBDV) mode is used to replace the original predicted depth map (PDM) for inter-view motion prediction. A new estimated disparity vector depth oriented neighboring block disparity vector (DoNBDV) is used to enhance the accuracy of the NBDV by utilizing the coded depth map. In this paper, the complexity and implementation analysis of the NBDV and DoNBDV architectures are analyzed in terms of performance, complexity, and other design considerations. It is hence concluded that NBDV and DoNBDV for 3D-HEVC video signals provide attractive coding gains with comparable complexity as traditional motion/disparity compensation.
Network-on-chip (NoC) architectures have become a popular communication platform for heterogeneou... more Network-on-chip (NoC) architectures have become a popular communication platform for heterogeneous computing systems owing to their scalability and high performance. Aggressive technology scaling makes these architectures prone to both permanent and transient faults. This study focuses on the tolerance of a NoC router to permanent faults. A permanent fault in a NoC router severely impacts the performance of the entire network. Thus, it is necessary to incorporate component-level protection techniques in a router. In the proposed scheme, the input port utilizes a bypass path, virtual channel (VC) queuing, and VC closing strategies. Moreover, the routing computation stage utilizes spatial redundancy and double routing strategies, and the VC allocation stage utilizes spatial redundancy. The switch allocation stage utilizes run-time arbiter selection. The crossbar stage utilizes a triple bypass bus. The proposed router is highly fault-tolerant compared with the existing state-of-the-art...
Mapping application task graphs on intellectual property (IP) cores into network-on-chip (NoC) is... more Mapping application task graphs on intellectual property (IP) cores into network-on-chip (NoC) is a non-deterministic polynomial-time hard problem. The evolution of network performance mainly depends on an effective and efficient mapping technique and the optimization of performance and cost metrics. These metrics mainly include power, reliability, area, thermal distribution and delay. A state-of-the-art mapping technique for NoC is introduced with the name of sailfish optimization algorithm (SFOA). The proposed algorithm minimizes the power dissipation of NoC via an empirical base applying a shared k-nearest neighbor clustering approach, and it gives quicker mapping over six considered standard benchmarks. The experimental results indicate that the proposed techniques outperform other existing nature-inspired metaheuristic approaches, especially in large application task graphs.
A human body consists of a complex 3D structure. Conversion of 3D structures into 2D leads to a l... more A human body consists of a complex 3D structure. Conversion of 3D structures into 2D leads to a loss of information and may result in incorrect disease diagnosis. This issue has grasped the attention of researchers involved in 3D modeling. MRI scans consist of a large number of 2D slices, which makes 3D reconstruction a complex and time-consuming task. We propose an efficient algorithm that uses limited MRI slices to reconstruct a 3D image on the basis of matching criteria, which aids in the selection of most appropriate slices, which therefore significantly reduces computational complexity and increases accuracy. The methodology involves the acquisition of a brain MRI, pre-processing, OTSU’s segmentation for the identification of suspicious areas, and rule-based classification to extract a tumor area. For appropriate slice selection, Rapid Mode image matching is utilized, 3D modeling is performed using a cubic reconstruction scheme, and finally the tumor volume is calculated. Perfo...
Finding an accurate and computationally efficient vehicle detection and classification algorithm ... more Finding an accurate and computationally efficient vehicle detection and classification algorithm for urban environment is challenging due to large video datasets and complexity of the task. Many algorithms have been proposed but there is no efficient algorithm due to various real-time issues. This paper proposes an algorithm which addresses shadow detection (which causes vehicles misdetection and misclassification) and incorporates solution of other challenges such as camera vibration, blurred image, illumination and weather changing effects. For accurate vehicles detection and classification, a combination of self-adaptive GMM and multi-dimensional Gaussian density transform has been used for modeling the distribution of color image data. RGB and HSV color space based shadow detection is proposed. Measurement-based feature and intensity based pyramid histogram of orientation gradient are used for classification into four main vehicle categories. The proposed method achieved 96.39% ...
Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion r... more Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests,...
Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is ess... more Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is essential for communication with people of different linguistic backgrounds. Mispronunciation detection is an integral part of CALL, which is used for automatic pointing of errors for the non-native speaker. In this paper, we investigated the mispronunciation detection of Arabic words using deep Convolution Neural Network (CNN). For automated pronunciation error detection, we proposed CNN features-based model and extracted features from different layers of Alex Net (layers 6, 7, and 8) to train three machine learning classifiers; K-nearest neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF). We also used a transfer learning-based model in which feature extraction and classification are performed automatically. To evaluate the performance of the proposed method, a comprehensive evaluation is provided on these methods with a traditional machine learning-based method using Mel ...
The advent of new devices, technology, machine learning techniques, and the availability of free ... more The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. However, none of them catered to noise in the same. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. We thoroughly analyse the latest trends in speech rec...
Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS... more Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS) and various intelligent transport system (ITS) applications. In this paper, we propose and study the suitability of the bag of expressions (BoE) approach for VMMR-based applications. The method includes neighborhood information in addition to visual words. BoE improves the existing power of a bag of words (BOW) approach, including occlusion handling, scale invariance and view independence. The proposed approach extracts features using a combination of different keypoint detectors and a Histogram of Oriented Gradients (HOG) descriptor. An optimized dictionary of expressions is formed using visual words acquired through k-means clustering. The histogram of expressions is created by computing the occurrences of each expression in the image. For classification, multiclass linear support vector machines (SVM) are trained over the BoE-based features representation. The approach has been eval...
Mehran University Research Journal of Engineering and Technology, 2019
An exponential growth in multimedia applications has led to fast adoption of digital watermarking... more An exponential growth in multimedia applications has led to fast adoption of digital watermarking phenomena to protect the copyright information and authentication of digital contents. A novel spatial domain symmetric color image robust watermarking scheme based on chaos is presented in this research. The watermark is generated using chaotic logistic map and optimized to improve inherent properties and to achieve robustness. The embedding is performed at 3 LSBs (Least Significant Bits) of all the three color components of the host image. The sensitivity of the chaotic watermark along with redundant embedding approach makes the entire watermarking scheme highly robust, secure and imperceptible. In this paper, various image quality analysis metrics such as homogeneity, contrast, entropy, PSNR (Peak Signal to Noise Ratio), UIQI (Universal Image Quality Index) and SSIM (Structural Similarity Index Measures) are measures to analyze proposed scheme. The proposed technique shows superior r...
Abstract Identifying and restoring distresses in asphalt pavement have key significance in durabi... more Abstract Identifying and restoring distresses in asphalt pavement have key significance in durability and long life of roads and highways. A vast number of accidents occurs on the roads and highways due to the pavement distresses. This paper aims to detect and localize one of the critical roadway distresses, the potholes, using computer vision. We have processed images of asphalt pavement for experimentation containing the pothole and non-pothole regions. We proposed a top-down scheme for the detection and localization of potholes in the pavement images. First, we classified pothole/non-pothole images using a bag of words (BoW) approach. We employed and computed famous scale-invariant feature transform (SIFT) features to establish the visual vocabulary of words to represent pavement surface. Support vector machine (SVM) is employed for the training and testing of histograms of words of pavement images. Secondly, we proposed graph cut segmentation scheme to localize the potholes in the labelled pothole images. This paper presents both, subjective and objective evaluation of potholes localization results with the ground truth. We evaluated the proposed scheme on a pavement surface dataset containing the wide-ranging pavement images in different scenarios. Experimentation results show that we achieved an accuracy of 95.7% for the identification of pothole images with significant precision and recall. Subjective evaluation of potholes localization results in high recall with relatively good accuracy. However, the objective assessment shows the 91.4% accuracy for localization of potholes.
Audio segmentation is a basis for multimedia content analysis which is the most important and wid... more Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs) with artificial neural networks (ANNs). Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and la...
Keywords : Source separation, speech processing, speech-music segregation, spectral subtraction, ... more Keywords : Source separation, speech processing, speech-music segregation, spectral subtraction, music filtering. A B S T R A C T Background interference creates voice intelligibility issue for listener. This research work considers background music as interference for communication through smart phone in areas with loud background music. This paper proposes a novel framework for background music segregation from human speech using music fingerprinting and acoustic echo cancellation. Initially, background music is searched in the database by music fingerprinting. Identified background music is registered and segregated using acoustic echo cancellation. Proposed approach generates better quality music speech segregation than existing algorithms. The research work is novel and segregates background music completely in comparison to existing approaches where single instruments are segregated successfully.
Falling is a severe hazard among older adults. Fall treatment is considered to be one of the most... more Falling is a severe hazard among older adults. Fall treatment is considered to be one of the most costly treatments, which usually extends to a long time. One bad fall can cause severe injuries that may lead to permanent disability or even death. Therefore, an efficient and cost-effective fall monitoring system is exceptionally indispensable. With the advancement in technology, wearable sensors and systems provide a lucrative way to continuously monitor the elderly people for detecting any fall incident that may occur. Most of these wearable fall monitoring systems focus only on detecting a fall incident. However, to avoid the risk of any future fall, it is essential to be aware of the cause of a fall incident also. Therefore, to address this challenge, a wearable sensor-based continuous fall monitoring system is proposed in this paper, which is capable of detecting a fall and identifying the falling pattern and the activity associated with the fall incident. The performance of the proposed scheme is investigated with a series of experiments using three machine learning algorithms, namely, $k$ -nearest neighbors (KNNs), support vector machine, and random forest (RF). The proposed methodology achieved the highest accuracy for fall detection, i.e., 99.80%, using KNNs classifier, whereas the highest accuracy achieved in recognizing different falling activities is 96.82% using RF classifier.
HEVC (High Efficiency Video Coding), the state-of-the-art video coding standard has 3D extension ... more HEVC (High Efficiency Video Coding), the state-of-the-art video coding standard has 3D extension known as 3D-HEVC, which is established by JCT-3V. In current design of 3D-HEVC, to exploit the redundancies of the 3D video signal, various tools are integrated. In 3D-HEVC, the neighboring block disparity vector (NBDV) mode is used to replace the original predicted depth map (PDM) for inter-view motion prediction. A new estimated disparity vector depth oriented neighboring block disparity vector (DoNBDV) is used to enhance the accuracy of the NBDV by utilizing the coded depth map. In this paper, the complexity and implementation analysis of the NBDV and DoNBDV architectures are analyzed in terms of performance, complexity, and other design considerations. It is hence concluded that NBDV and DoNBDV for 3D-HEVC video signals provide attractive coding gains with comparable complexity as traditional motion/disparity compensation.
Network-on-chip (NoC) architectures have become a popular communication platform for heterogeneou... more Network-on-chip (NoC) architectures have become a popular communication platform for heterogeneous computing systems owing to their scalability and high performance. Aggressive technology scaling makes these architectures prone to both permanent and transient faults. This study focuses on the tolerance of a NoC router to permanent faults. A permanent fault in a NoC router severely impacts the performance of the entire network. Thus, it is necessary to incorporate component-level protection techniques in a router. In the proposed scheme, the input port utilizes a bypass path, virtual channel (VC) queuing, and VC closing strategies. Moreover, the routing computation stage utilizes spatial redundancy and double routing strategies, and the VC allocation stage utilizes spatial redundancy. The switch allocation stage utilizes run-time arbiter selection. The crossbar stage utilizes a triple bypass bus. The proposed router is highly fault-tolerant compared with the existing state-of-the-art...
Mapping application task graphs on intellectual property (IP) cores into network-on-chip (NoC) is... more Mapping application task graphs on intellectual property (IP) cores into network-on-chip (NoC) is a non-deterministic polynomial-time hard problem. The evolution of network performance mainly depends on an effective and efficient mapping technique and the optimization of performance and cost metrics. These metrics mainly include power, reliability, area, thermal distribution and delay. A state-of-the-art mapping technique for NoC is introduced with the name of sailfish optimization algorithm (SFOA). The proposed algorithm minimizes the power dissipation of NoC via an empirical base applying a shared k-nearest neighbor clustering approach, and it gives quicker mapping over six considered standard benchmarks. The experimental results indicate that the proposed techniques outperform other existing nature-inspired metaheuristic approaches, especially in large application task graphs.
A human body consists of a complex 3D structure. Conversion of 3D structures into 2D leads to a l... more A human body consists of a complex 3D structure. Conversion of 3D structures into 2D leads to a loss of information and may result in incorrect disease diagnosis. This issue has grasped the attention of researchers involved in 3D modeling. MRI scans consist of a large number of 2D slices, which makes 3D reconstruction a complex and time-consuming task. We propose an efficient algorithm that uses limited MRI slices to reconstruct a 3D image on the basis of matching criteria, which aids in the selection of most appropriate slices, which therefore significantly reduces computational complexity and increases accuracy. The methodology involves the acquisition of a brain MRI, pre-processing, OTSU’s segmentation for the identification of suspicious areas, and rule-based classification to extract a tumor area. For appropriate slice selection, Rapid Mode image matching is utilized, 3D modeling is performed using a cubic reconstruction scheme, and finally the tumor volume is calculated. Perfo...
Finding an accurate and computationally efficient vehicle detection and classification algorithm ... more Finding an accurate and computationally efficient vehicle detection and classification algorithm for urban environment is challenging due to large video datasets and complexity of the task. Many algorithms have been proposed but there is no efficient algorithm due to various real-time issues. This paper proposes an algorithm which addresses shadow detection (which causes vehicles misdetection and misclassification) and incorporates solution of other challenges such as camera vibration, blurred image, illumination and weather changing effects. For accurate vehicles detection and classification, a combination of self-adaptive GMM and multi-dimensional Gaussian density transform has been used for modeling the distribution of color image data. RGB and HSV color space based shadow detection is proposed. Measurement-based feature and intensity based pyramid histogram of orientation gradient are used for classification into four main vehicle categories. The proposed method achieved 96.39% ...
Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion r... more Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests,...
Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is ess... more Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is essential for communication with people of different linguistic backgrounds. Mispronunciation detection is an integral part of CALL, which is used for automatic pointing of errors for the non-native speaker. In this paper, we investigated the mispronunciation detection of Arabic words using deep Convolution Neural Network (CNN). For automated pronunciation error detection, we proposed CNN features-based model and extracted features from different layers of Alex Net (layers 6, 7, and 8) to train three machine learning classifiers; K-nearest neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF). We also used a transfer learning-based model in which feature extraction and classification are performed automatically. To evaluate the performance of the proposed method, a comprehensive evaluation is provided on these methods with a traditional machine learning-based method using Mel ...
The advent of new devices, technology, machine learning techniques, and the availability of free ... more The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. However, none of them catered to noise in the same. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. We thoroughly analyse the latest trends in speech rec...
Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS... more Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS) and various intelligent transport system (ITS) applications. In this paper, we propose and study the suitability of the bag of expressions (BoE) approach for VMMR-based applications. The method includes neighborhood information in addition to visual words. BoE improves the existing power of a bag of words (BOW) approach, including occlusion handling, scale invariance and view independence. The proposed approach extracts features using a combination of different keypoint detectors and a Histogram of Oriented Gradients (HOG) descriptor. An optimized dictionary of expressions is formed using visual words acquired through k-means clustering. The histogram of expressions is created by computing the occurrences of each expression in the image. For classification, multiclass linear support vector machines (SVM) are trained over the BoE-based features representation. The approach has been eval...
Mehran University Research Journal of Engineering and Technology, 2019
An exponential growth in multimedia applications has led to fast adoption of digital watermarking... more An exponential growth in multimedia applications has led to fast adoption of digital watermarking phenomena to protect the copyright information and authentication of digital contents. A novel spatial domain symmetric color image robust watermarking scheme based on chaos is presented in this research. The watermark is generated using chaotic logistic map and optimized to improve inherent properties and to achieve robustness. The embedding is performed at 3 LSBs (Least Significant Bits) of all the three color components of the host image. The sensitivity of the chaotic watermark along with redundant embedding approach makes the entire watermarking scheme highly robust, secure and imperceptible. In this paper, various image quality analysis metrics such as homogeneity, contrast, entropy, PSNR (Peak Signal to Noise Ratio), UIQI (Universal Image Quality Index) and SSIM (Structural Similarity Index Measures) are measures to analyze proposed scheme. The proposed technique shows superior r...
Abstract Identifying and restoring distresses in asphalt pavement have key significance in durabi... more Abstract Identifying and restoring distresses in asphalt pavement have key significance in durability and long life of roads and highways. A vast number of accidents occurs on the roads and highways due to the pavement distresses. This paper aims to detect and localize one of the critical roadway distresses, the potholes, using computer vision. We have processed images of asphalt pavement for experimentation containing the pothole and non-pothole regions. We proposed a top-down scheme for the detection and localization of potholes in the pavement images. First, we classified pothole/non-pothole images using a bag of words (BoW) approach. We employed and computed famous scale-invariant feature transform (SIFT) features to establish the visual vocabulary of words to represent pavement surface. Support vector machine (SVM) is employed for the training and testing of histograms of words of pavement images. Secondly, we proposed graph cut segmentation scheme to localize the potholes in the labelled pothole images. This paper presents both, subjective and objective evaluation of potholes localization results with the ground truth. We evaluated the proposed scheme on a pavement surface dataset containing the wide-ranging pavement images in different scenarios. Experimentation results show that we achieved an accuracy of 95.7% for the identification of pothole images with significant precision and recall. Subjective evaluation of potholes localization results in high recall with relatively good accuracy. However, the objective assessment shows the 91.4% accuracy for localization of potholes.
Audio segmentation is a basis for multimedia content analysis which is the most important and wid... more Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs) with artificial neural networks (ANNs). Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and la...
Uploads
Papers by Fawad Hussain