Sensory information processing is an important feature of robotic agents that must interact with ... more Sensory information processing is an important feature of robotic agents that must interact with humans or the environment. For example, numerous attempts have been made to develop robots that have the capability of performing interactive communication. In most cases, individual sensory information is processed and based on this, an output action is performed. In many robotic applications, visual and audio sensors are used to emulate human-like communication. The Superior Colliculus, located in the mid-brain region of the nervous system, carries out similar functionality of audio and visual stimuli integration in both humans and animals. In recent years numerous researchers have attempted integration of sensory information using biological inspiration. A common focus lies in generating a single output state (i.e. a multimodal output) that can localize the source of the audio and visual stimuli. This research addresses the problem and attempts to find an effective solution by investi...
We report the effectiveness of LU factorization (also called LU decomposition) as a technique for... more We report the effectiveness of LU factorization (also called LU decomposition) as a technique for feature extraction along with naïve Bayes Classifier in recognizing handwritten Odia numerals (a regional language of north-eastern states of India derived from the Devanagari script). Experimental results show that LU factorization could be an alternative choice for feature extraction in pattern classification problems.
This paper presents a study on the performance of transformed domain features in Devnagari digit ... more This paper presents a study on the performance of transformed domain features in Devnagari digit recognition. In this research the recognition performance is measured from features obtained in direct pixel value, Fourier Transform, Discrete Cosine Transform, Gaussian Pyramid, Laplacian Pyramid, Wavelet Transform and Curvelet Transform using classification schemes: Feed Forward, Function Fitting, Pattern Recognition, Cascade Neural Networks and K-Nearest Neighbor (KNN). The Gaussian Pyramid based feature with KNN classifier yielded the best accuracy of 96.93% on the test set. The recognition accuracy was increased to 98.02% by using a majority voting classification scheme at expense of 0.26 % rejection rate. The majority voting classifiers are based on features: Gaussian pyramid, Laplacian pyramid, wavelet pyramid and direct pixel value using KNN classifiers.
The paper presents handwritten Devnagari digit recognition results for benchmark studies. To obta... more The paper presents handwritten Devnagari digit recognition results for benchmark studies. To obtain these results, we conducted several experiments on CPAR-2012 dataset. In these experiments, we used features ranging from the simple most features (direct pixel values), slightly computationally expensive, profile based features, to more complex gradient features extracted using Kirsch and wavelet transforms. Using these features we have measured recognition accuracies of several classification schemes. Among them the combined gradient and direct pixel feature using KNN classifier yielded the highest recognition accuracy of 95.2 %. The recognition result was improved to 97.87% by using multi stage classifier ensemble scheme. The paper also reports on the development of CPAR-2012 dataset that is being developed for Devnagari optical document recognition research. Presently, it contains 35,000 (15,000 constrained, 5,000 semi- constrained and 15,000 unconstrained) handwritten numerals, 8...
This paper presents the offline handwritten character recognition for Devnagari, a major script o... more This paper presents the offline handwritten character recognition for Devnagari, a major script of India. The main objective of this work is to develop a handwritten dataset (CPAR-2012) for Devnagari character and further develop a character recognition scheme for benchmark study. The present dataset is a new development in Devnagari optical document recognition. The dataset includes 78,400 samples collected from 2,000 heterogeneous strata of Hindi speaking persons. These dataset is further divided into 49,000 as training set and 29,400 as test set. The evaluated feature extraction includes: direct pixel, image zoning, wavelet transformation and Gaussian image transformation techniques. These features were classified by using KNN and neural network classifier. The experiment shows that Gaussian image transformation (level 1) using KNN classifier has achieved highest recognition 72.18 % than other feature extraction methods. Further classification result obtained from KNN classifier ...
Sign language recognition is an important topic of current research in today's scenario. The ... more Sign language recognition is an important topic of current research in today's scenario. The systems developed till now worldwide in this area are under scanner for standardization. But, the situation of automatic gesture recognition of Indian Sign Language is in its early stage. So, the focus of this research is to develop an automatic gesture recognition system for Indian Sign Language. In this paper, the focus is on the dataset development, application of various feature extraction methods in combination with skin colour detection and analysis of results of k-Nearest Neighbor and Neural Network classifiers in specific words belongs to computer terminologies of Indian Sign Language. The skin colour detection method helps us to extract hand and face portions of images from input video frames. This ignores other components available in image frames, like body parts and image background, which are not desired features for the classification. The best results obtained from k-Neare...
Information processing and responding to sensory input with appropriate actions are among the mos... more Information processing and responding to sensory input with appropriate actions are among the most important capabilities of the brain and the brain has specific areas that deal with auditory or visual processing. The auditory information is sent first to the cochlea, then to the inferior colliculus area and then later to the auditory cortex where it is further processed so that then eyes, head or both can be turned towards an object or location in response. The visual information is processed in the retina, various subsequent nuclei and then the visual cortex before again actions will be performed. However, how is this information integrated and what is the effect of auditory and visual stimuli arriving at the same time or at different times? Which information is processed when and what are the responses for multimodal stimuli? Multimodal integration is first performed in the Superior Colliculus, located in a subcortical part of the midbrain. In this chapter we will focus on this first level of multimodal integration, outline various approaches of modelling the superior colliculus, and suggest a model of multimodal integration of visual and auditory information.
Sensory information processing is an important feature of robotic agents that must interact with ... more Sensory information processing is an important feature of robotic agents that must interact with humans or the environment. For example, numerous attempts have been made to develop robots that have the capability of performing interactive communication. In most cases, individual sensory information is processed and based on this, an output action is performed. In many robotic applications, visual and audio sensors are used to emulate human-like communication. The Superior Colliculus, located in the mid-brain region of the nervous system, carries out similar functionality of audio and visual stimuli integration in both humans and animals. In recent years numerous researchers have attempted integration of sensory information using biological inspiration. A common focus lies in generating a single output state (i.e. a multimodal output) that can localize the source of the audio and visual stimuli. This research addresses the problem and attempts to find an effective solution by investi...
We report the effectiveness of LU factorization (also called LU decomposition) as a technique for... more We report the effectiveness of LU factorization (also called LU decomposition) as a technique for feature extraction along with naïve Bayes Classifier in recognizing handwritten Odia numerals (a regional language of north-eastern states of India derived from the Devanagari script). Experimental results show that LU factorization could be an alternative choice for feature extraction in pattern classification problems.
This paper presents a study on the performance of transformed domain features in Devnagari digit ... more This paper presents a study on the performance of transformed domain features in Devnagari digit recognition. In this research the recognition performance is measured from features obtained in direct pixel value, Fourier Transform, Discrete Cosine Transform, Gaussian Pyramid, Laplacian Pyramid, Wavelet Transform and Curvelet Transform using classification schemes: Feed Forward, Function Fitting, Pattern Recognition, Cascade Neural Networks and K-Nearest Neighbor (KNN). The Gaussian Pyramid based feature with KNN classifier yielded the best accuracy of 96.93% on the test set. The recognition accuracy was increased to 98.02% by using a majority voting classification scheme at expense of 0.26 % rejection rate. The majority voting classifiers are based on features: Gaussian pyramid, Laplacian pyramid, wavelet pyramid and direct pixel value using KNN classifiers.
The paper presents handwritten Devnagari digit recognition results for benchmark studies. To obta... more The paper presents handwritten Devnagari digit recognition results for benchmark studies. To obtain these results, we conducted several experiments on CPAR-2012 dataset. In these experiments, we used features ranging from the simple most features (direct pixel values), slightly computationally expensive, profile based features, to more complex gradient features extracted using Kirsch and wavelet transforms. Using these features we have measured recognition accuracies of several classification schemes. Among them the combined gradient and direct pixel feature using KNN classifier yielded the highest recognition accuracy of 95.2 %. The recognition result was improved to 97.87% by using multi stage classifier ensemble scheme. The paper also reports on the development of CPAR-2012 dataset that is being developed for Devnagari optical document recognition research. Presently, it contains 35,000 (15,000 constrained, 5,000 semi- constrained and 15,000 unconstrained) handwritten numerals, 8...
This paper presents the offline handwritten character recognition for Devnagari, a major script o... more This paper presents the offline handwritten character recognition for Devnagari, a major script of India. The main objective of this work is to develop a handwritten dataset (CPAR-2012) for Devnagari character and further develop a character recognition scheme for benchmark study. The present dataset is a new development in Devnagari optical document recognition. The dataset includes 78,400 samples collected from 2,000 heterogeneous strata of Hindi speaking persons. These dataset is further divided into 49,000 as training set and 29,400 as test set. The evaluated feature extraction includes: direct pixel, image zoning, wavelet transformation and Gaussian image transformation techniques. These features were classified by using KNN and neural network classifier. The experiment shows that Gaussian image transformation (level 1) using KNN classifier has achieved highest recognition 72.18 % than other feature extraction methods. Further classification result obtained from KNN classifier ...
Sign language recognition is an important topic of current research in today's scenario. The ... more Sign language recognition is an important topic of current research in today's scenario. The systems developed till now worldwide in this area are under scanner for standardization. But, the situation of automatic gesture recognition of Indian Sign Language is in its early stage. So, the focus of this research is to develop an automatic gesture recognition system for Indian Sign Language. In this paper, the focus is on the dataset development, application of various feature extraction methods in combination with skin colour detection and analysis of results of k-Nearest Neighbor and Neural Network classifiers in specific words belongs to computer terminologies of Indian Sign Language. The skin colour detection method helps us to extract hand and face portions of images from input video frames. This ignores other components available in image frames, like body parts and image background, which are not desired features for the classification. The best results obtained from k-Neare...
Information processing and responding to sensory input with appropriate actions are among the mos... more Information processing and responding to sensory input with appropriate actions are among the most important capabilities of the brain and the brain has specific areas that deal with auditory or visual processing. The auditory information is sent first to the cochlea, then to the inferior colliculus area and then later to the auditory cortex where it is further processed so that then eyes, head or both can be turned towards an object or location in response. The visual information is processed in the retina, various subsequent nuclei and then the visual cortex before again actions will be performed. However, how is this information integrated and what is the effect of auditory and visual stimuli arriving at the same time or at different times? Which information is processed when and what are the responses for multimodal stimuli? Multimodal integration is first performed in the Superior Colliculus, located in a subcortical part of the midbrain. In this chapter we will focus on this first level of multimodal integration, outline various approaches of modelling the superior colliculus, and suggest a model of multimodal integration of visual and auditory information.
Uploads
Papers by Kiran Ravulakollu