Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Sunil Kopparapu

    Sunil Kopparapu

    • Sunil Kumar Kopparapu (Senior Member, IEEE) obtained his doctoral degree in Electrical Engineering from the Indian In... moreedit
    ABSTRACT Multiresolution analysis is being extensively used in the signal processing literature. In this paper, we show the behaviour of a general degradation model (Y=B ⊗ X+W) over different resolutions and derive an expression for the... more
    ABSTRACT Multiresolution analysis is being extensively used in the signal processing literature. In this paper, we show the behaviour of a general degradation model (Y=B ⊗ X+W) over different resolutions and derive an expression for the degradation model at all coarse resolutions given the degradation model at the finest resolution. Knowledge of the behaviour of the degradation model over resolutions is useful in many computer vision applications and to this effect we sketch an algorithm for signal restoration to demonstrate the usefulness of the derived result. We also experimentally validate the derived degradation model at different resolutions
    ABSTRACT Segmentation is an important topic in computer vision and image processing. In this paper we develop a scheme based on multiresolution for segmentation. The multiresolution based segmentation algorithm first segments the image... more
    ABSTRACT Segmentation is an important topic in computer vision and image processing. In this paper we develop a scheme based on multiresolution for segmentation. The multiresolution based segmentation algorithm first segments the image using a known segmentation algorithm at coarse resolution and uses this information to segment images at finer resolutions. In this paper, we sketch a scheme for a multiresolution segmentation algorithm and demonstrate its validity on some real images and compare its performance with the segmented image obtained working at a single resolution
    ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original... more
    ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original internet video. Conventionally, access to the audio or speech, in a video, by the hearing impaired is provided by means of either text subtitles or sign language gestures by an interpreter. The proposed tool would be beneficial, especially in situations where such aids are not readily available or generating such aids is difficult. We have conducted a number experiments to determine the feasibility and usefulness of the proposed visual aid.
    ABSTRACT Mashup service is common in the realm of web development. Mashup, generally, is a web page that sources two or more data sources to create a new and an improved service. While mashup service is a common phenomenon in the web... more
    ABSTRACT Mashup service is common in the realm of web development. Mashup, generally, is a web page that sources two or more data sources to create a new and an improved service. While mashup service is a common phenomenon in the web development scenario, there is no such mashup for services that are based on voice user interface in the Interactive Voice Response (IVR) systems. IVR based telephony services have been popular because they are an easy and a convenient means to inquire, seek information or book services. At the same time frequently used service, like taxi usually has multiple service operators. Though the Same Service is provided by Multiple Operators (SSMO) to the end user, their access points are different and require the user to choose a particular service operator to call. Unlike a web portal or a web mashup, there is no way for the user to comparatively and simultaneously check the offerings from SSMO and choose the best among them. This paper describes a novel approach, based on a recent patent, to create a mashup service by integrating two or more IVR services.
    In this paper, we formulate and develop an approach which integrates different modules(feature extractor, matching and interpolation) involved in stereo. We study the integrationprocess at the finest resolution when, (i) precomputed edge... more
    In this paper, we formulate and develop an approach which integrates different modules(feature extractor, matching and interpolation) involved in stereo. We study the integrationprocess at the finest resolution when, (i) precomputed edge map is the only line field drivingthe model, (ii) the line fields are computed interactively by the feature extracting moduleof the model and (iii) when both the interactive
    Use of mixed language in day to day spoken speech is becoming common and is being accepted as being syntactically correct. However recognition of mixed language spoken speech is a challenge to a speech recognition engine. Though sparse,... more
    Use of mixed language in day to day spoken speech is becoming common and is being accepted as being syntactically correct. However recognition of mixed language spoken speech is a challenge to a speech recognition engine. Though sparse, there have been studies on how to enable recognition of mixed language spoken speech. At one extreme is to use acoustic models of the complete phone set of the mixed language to enable recognition while on the other extreme is to use a language identification module followed by a language dependent speech recognition engine to recognize mixed language. Each of this has its own implications. In this paper, we approach the problem of mixed language recognition by constraining ourselves to use readily available resources and show that by (a) suitably modifying the language model to use mixed language and (b) by constructing a pronunciation dictionary, one can achieve a good recognition of mixed language spoken speech.
    Identity of a vehicle is done through the vehicle license plate by traffic police in general. Au- tomatic vehicle license plate recognition has several applications in intelligent traffic management systems. The security situation across... more
    Identity of a vehicle is done through the vehicle license plate by traffic police in general. Au- tomatic vehicle license plate recognition has several applications in intelligent traffic management systems. The security situation across the globe and particularly in India demands a need to equip the traffic police with a system that enables them to get instant details of a vehicle. The system should be easy to use, should be mobile, and work 24 x 7. In this paper, we describe a mobile phone based, client-server architected, license plate recognition system. While we use the state of the art image processing and pattern recognition algorithms tuned for Indian conditions to automatically recognize non-uniform license plates, the main contribution is in creating an end to end usable solution. The client application runs on a mobile device and a server application, with access to vehicle information database, is hosted centrally. The solution enables capture of license plate image capt...
    This paper describes a new feature set for use in the recognition of on-line handwritten Devanagari script based on Fuzzy Directional Features. Experiments are conducted for the automatic recognition of isolated handwritten character... more
    This paper describes a new feature set for use in the recognition of on-line handwritten Devanagari script based on Fuzzy Directional Features. Experiments are conducted for the automatic recognition of isolated handwritten character primitives (sub-character units). Initially we describe the proposed feature set, called the Fuzzy Directional Features (FDF) and then show how these features can be effectively utilized for writer independent character recognition. Experimental results show that FDF set perform well for writer independent data set at stroke level recognition. The main contribution of this paper is the introduction of a novel feature set and establish experimentally its ability in recognition of handwritten Devanagari script.
    ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original... more
    ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original internet video. Conventionally, access to the audio or speech, in a video, by the hearing impaired is provided by means of either text subtitles or sign language gestures by an interpreter. The proposed tool would be beneficial, especially in situations where such aids are not readily available or generating such aids is difficult. We have conducted a number experiments to determine the feasibility and usefulness of the proposed visual aid.
    ABSTRACT The main challenge in on-line handwritten character recognition in Indian lan- guage is the large size of the character set, larger similarity between different characters in the script and the huge variation in writing style. In... more
    ABSTRACT The main challenge in on-line handwritten character recognition in Indian lan- guage is the large size of the character set, larger similarity between different characters in the script and the huge variation in writing style. In this paper we propose a framework for on-line handwitten script recognition taking cues from speech signal processing literature. The framework is based on identify- ing strokes, which in turn lead to recognition of handwritten on-line characters rather that the conventional character identification. Though the framework is described for Devanagari script, the framework is general and can be applied to any language. The proposed platform consists of pre-processing, feature extraction, recog- nition and post processing like the conventional character recognition but ap- plied to strokes. The on-line Devanagari character recognition reduces to one of recognizing one of 69 primitives and recognition of a character is performed by recognizing a sequence of such primitives. We further show the impact of noise removal on on-line raw data which is usually noisy. The use of Fuzzy Direc- tional Features to enhance the accuracy of stroke recognition is also described. The recognition results are compared with commonly used directional features in literature using several classifiers.
    Municipal corporation (MC) of a city is a local governing body which takes care of the functioning of the city. Among many other things one of the chief responsibilities of an MC includes addressing the complaints that the residents of... more
    Municipal corporation (MC) of a city is a local governing body which takes care of the functioning of the city. Among many other things one of the chief responsibilities of an MC includes addressing the complaints that the residents of the city might have. Maintenance of a large city requires that the MC be aware of any shortcomings either through surveillance (sensors/cameras) or by allowing the citizens to report them. The second option is usually preferred because there is a sense of belonging. A mechanism to accept complaints from citizens 24 × 7 would be the expectation from both the citizens and the MC. The Mumbai MC allows its citizens to place their complaints through several channels. The chief modes of complaint registration is (a) a visit to the ward office-where a person in charge listens to the complaint and asks for some personal details and put it across into an electronic form for other departments within the MC to handle the complaint, (b) through a contact center over a telephone-where the complaint is registered by an call center agent by typing the complaint into the system and more recently (c) through a web portal. In this paper, we propose a natural English enabled mobile interface which can be used to lodge complaints 1. The essential idea is to make use of the existing web portal infrastructure [6] and provide an easy, cheap and quick (complain as you see) mode of complaint registration around the clock. The proposed system enables and assists citizens to lodge compliant and seek redressal through their mobile phone in natural language.
    Research Interests:
    The home page of a company is an effective means for show casing their products and technology. Companies invest major effort, time and money in designing their web pages to enable their user's to access information they are looking for... more
    The home page of a company is an effective means for show casing their products and technology. Companies invest major effort, time and money in designing their web pages to enable their user's to access information they are looking for as quickly and as easily as possible. In spite of all these efforts, it is not uncommon for a user to spend a sizable amount of time trying to retrieve the particular information that he is looking for. Today, he has to go through several hyperlink clicks or manually search the pages displayed by the site search engine to get to the information that he is looking for. Much time gets wasted if the required information does not exist on that website. With websites being increasingly used as sources of information about companies and their products, there is need for a more convenient interface. In this paper we discuss a system based on a set of Natural Language Processing (NLP) techniques which addresses this problem. The system enables a user to ask for information from a particular website in free style natural English. The NLP based system is able to respond to the query by 'understanding' the intent of the query and then using this understanding to retrieve relevant information from its unstructured info-base or structured database for presenting it to the user. The interface is called UniqliQ as it avoids the user having to click through several hyperlinked pages. The core of UniqliQ is its ability to understand the question without formally parsing it. The system is based on identifying key-concepts and keywords and then using them to retrieve information. This approach enables UniqliQ framework to be used for different input languages with minimal architectural changes. Further, the key-concept – keyword approach gives the system an inherent ability to provide approximate answers in case the exact answers are not present in the information database.
    Research Interests:
    Yellow Pages are directories that source information about various commercial organizations like their addresses, phone contact and other details. These are very useful and are used by individual and other business houses. Until recently,... more
    Yellow Pages are directories that source information about various commercial organizations like their addresses, phone contact and other details. These are very useful and are used by individual and other business houses. Until recently, the only way to access these yellow pages directory information was to physically look into a huge hardcopy directory, which was not only laborious but also time consuming and required the user to be familiar with the organization of the directory. More recently, there have been IVR based contact centers that have been set up which can be used by the users to query information. While it is easier than browsing through the physical directory, it still has several pitfalls. The time spent on trying to get the information is quite large and at the end of enquiry one is not sure if one will get the information that one is looking for. In this paper, we propose a novel interface which enables accessing the yellow pages directory information on the mobile phone by sending a short message service (SMS). The central idea of the proposed method is to avoid any constraint on the way the user can query the yellow pages directory except that it be in natural English. The system, which uses natural language processing (NLP) techniques, understands the intent of the query and intelligently searches the yellow pages directory to retrieve information. This retrieved information is then sent back to the user in the form of a SMS.
    Research Interests:
    Farmers in most rural areas in India not only need expert and timely suggestion to obtain rich harvest of their crops but also need information regarding the subsidies, government schemes to make cultivation pay rich dividends. Expert... more
    Farmers in most rural areas in India not only need expert and timely suggestion to obtain rich harvest of their crops but also need information regarding the subsidies, government schemes to make cultivation pay rich dividends. Expert guidance comes in the form of an human expert visiting the village and the farmers being able to get their turn to seek answers to their queries. In this paper, we propose a Question Answering (QA) system, which would act as an expert and answer queries of the farmers. We call this QA system KisanMitra, friend of the farmer. The idea in building this system is to give access to information 24×7, to keep the information that reaches the farmer updated, enable the farmer to query in his own language without being strict on grammar or construct of the query. The system is intelligent in the sense, it understand the intent of the query and provides responses. In the absence of exact answers not being present in its KisanMitra, it provides answers which are close in some sense.
    Research Interests:
    The ability to classify spoken speech based on the style of speaking is an important problem. With the advent of BPO's in recent times, specifically those that cater to a population other than the local population, it has become necessary... more
    The ability to classify spoken speech based on the style of speaking is an important problem. With the advent of BPO's in recent times, specifically those that cater to a population other than the local population, it has become necessary for BPO's to identify people with certain style of speaking (American, British etc). Today BPO's employ accent analysts to identify people having the required style of speaking. This process while involving human bias, it is becoming increasingly infeasible because of the high attrition rate in the BPO industry. In this paper, we propose a new metric, which robustly and accurately helps classify spoken speech based on the style of speaking. The role of the proposed metric is substantiated by using it to classify real speech data collected from over seventy different people working in a BPO. We compare the performance of the metric against human experts who independently carried out the classification process. Experimental results show that the performance of the system using the novel metric performs better than two different human expert.
    Research Interests:
    Most stereo algorithms assume images to be epipolar aligned. There are two ways of achieving this (i) physically aligning the cameras or (ii) rectifying the stereo images after capturing them. For real-time or fixed stereo head... more
    Most stereo algorithms assume images to be epipolar aligned. There are two ways of achieving this (i) physically aligning the cameras or (ii) rectifying the stereo images after capturing them. For real-time or fixed stereo head applications aligning the cameras is preferable because rectifying the stereo images would require precious computational resources and once set the cameras remain in alignment. For applications involving mobility of stereo head, like in robots, it is preferable to rectify images using software because the alignment of cameras could change with time. In this paper we use affine like transform to rectify stereo images and demonstrate its usefulness in producing better disparity estimates and show it can be used to capture stereo images from a single camera.Most stereo algorithms assume images to be epipolar aligned. There are two ways of achieving this (i) physically aligning the cameras or (ii) rectifying the stereo images after capturing them. For real-time or fixed stereo head applications aligning the cameras is preferable because rectifying the stereo images would require precious computational resources and once set the cameras remain in alignment. For applications involving mobility of stereo head, like in robots, it is preferable to rectify images using software because the alignment of cameras could change with time. In this paper we use affine like transform to rectify stereo images and demonstrate its usefulness in producing better disparity estimates and show it can be used to capture stereo images from a single camera.
    Research Interests:
    The ability to assess spoken speech based on the style of speaking is an important problem. In this paper, we propose a method, which robustly and accurately assess spoken speech based on the style of the speaker and classifies them into... more
    The ability to assess spoken speech based on the style of speaking is an important problem. In this paper, we propose a method, which robustly and accurately assess spoken speech based on the style of the speaker and classifies them into one of several predefined set of styles. The system uses a carefully selected list of 20 words and 10 sentences to assess the speaking style of a person. The predefined style set is automatically constructed from a speech corpus, consisting of different speaking styles. We use a novel clustering technique to identify clusters of speech having similar styles. The clustering technique uses acoustic and stylistic parameters extracted from speech signal and uses the relative distances between speech files to form clusters. The centroids of the cluster is identified as that speech file that is most equidistant from all the speech files in that cluster. The cluster is assigned labels (good, average, bad) by an accent expert. The speech of a person for speaking style is assessed by first comparing all the speech samples (of all the words and sentences) spoken by the person with the cluster centroids of the corresponding word or sentences. The distance of the speech file from the cluster centroids is calculated for all the spoken words and sentences using a variant of dynamic time warping algorithm. The final assessment of the speaking style of the speaker is based on a weighted sum of distances for each spoken word and sentence. Experimental results, performed on 30 speakers with varying speaking styles show that the system is able to match the judgment of the human accent expert in more than 90% of the cases. In the full paper, we will detail the system architecture and dwell on the selection of predefined set of words and sentences; the MFCC and LPC based acoustic and stylistic parameters extracted from the speech sample to compare speech samples. We also present a metric to combine the distances of each of the spoken word (sentences) from the cluster centroids to produce a overall assessment of the speaker. We will also describe the clustering method used to find the centroids of the style clusters and present the experimental results.
    Research Interests:
    Yellow Pages are directories that source information about various commercial organizations like their addresses, phone contact and other details. These are very useful and are used by individual and other business houses. Until recently,... more
    Yellow Pages are directories that source information about various commercial organizations like their addresses, phone contact and other details. These are very useful and are used by individual and other business houses. Until recently, the only way to access these yellow pages directory information was to physically look into a huge hardcopy directory, which was not only laborious but also time consuming and required the user to be familiar with the organization of the directory. More recently, there have been IVR based contact centers that have been set up which can be used by the users to query information. While it is easier than browsing through the physical directory, it still has several pitfalls. The time spent on trying to get the information is quite large and at the end of enquiry one is not sure if one will get the information that one is looking for. In this paper, we propose a novel method (which has been implemented for a major telecom operator) of accessing the yellow pages directory information on the mobile phone by sending a short message service (SMS). The central idea of the proposed method is to avoid any constraint on the way the user can query the yellow pages directory except that it be in natural English. The system, which uses natural language processing (NLP) techniques, understands the intent of the query and intelligently searches the yellow pages directory to retrieve information. This retrieved information is then sent back to the user in the form of a SMS.
    Research Interests:
    Formant tracking in speech signals is an essential component of many speech recognition systems. Formant tracking has been the subject of investigation by many researchers over the past many years. Several methods have been proposed in... more
    Formant tracking in speech signals is an essential component of many speech
    recognition systems. Formant tracking has been the subject of
    investigation by many researchers
    over the past many years. Several methods have been proposed in
    literature to track formants in a speech signal. Almost all the methods use
    either the time-energy or the frequency-energy information
    of the speech
    signal to track speech formants.
    Using only this information
    to track
    formants does not, in our opinion, amount to  using
    all the information that is available in a speech
    signal. For this reason
    errors in estimation of formants crop in, for example,
    it is common for formant tracking algorithms to miss
    formant peaks in the
    Fourier spectrum,
    in the absence of a suitable look-ahead in time. In this
    paper, we present a novel method to track formants
    using the visual cues present in a speech spectrogram.
    We compare our results with the commonly used
    formant tracking utility of the ESPS package and show that it is advantageous
    to track formants using visual cues as seen from our experiments.
    Research Interests:
    This paper describes a system for automated resume information extraction to support rapid resume search and management. The system is capable of extracting several important informative fields from a free format resume using a set of... more
    This paper describes a system for automated resume information extraction to support rapid resume search and management. The system is capable of extracting several important informative fields from a free format resume using a set of natural language processing (NLP) techniques. We describe a working system, for automatic resume management. The system is capable of extracting six major fields of information as defined by HR-XML[8]. Experimental results carried out on a large number of resumes show that the proposed system can handle a large variety of resumes in different document formats with a precision of 91% and a recall of 88%
    Research Interests:
    ... A. Identification of curvature points The curvature points (also called critical points) are ex-tracted from the smoothed ... in Digital Signal Processing literature 3Note that [12] talks of fuzzy feature set for Devanagari script... more
    ... A. Identification of curvature points The curvature points (also called critical points) are ex-tracted from the smoothed ... in Digital Signal Processing literature 3Note that [12] talks of fuzzy feature set for Devanagari script albeit for offline handwritten character recognition Page 3. ...
    Abstract Noise in on-line hand written characters due to natural shaking of the hand and noise due to the process of digitization is inherent and this can lead to a degraded performance of character recognition system. In this paper, we... more
    Abstract Noise in on-line hand written characters due to natural shaking of the hand and noise due to the process of digitization is inherent and this can lead to a degraded performance of character recognition system. In this paper, we propose a noise removal ...
    ABSTRACT It is a well known fact that majority of rural India earns its livelihood from agriculture and farming. Although India is a net exporter of various agricultural products, the farmer who happens to be the primary producer, has... more
    ABSTRACT It is a well known fact that majority of rural India earns its livelihood from agriculture and farming. Although India is a net exporter of various agricultural products, the farmer who happens to be the primary producer, has remained information poor which puts him at a disadvantage. With little or no knowledge of prices at the markets, farmers have no leverage to negotiate better prices for their produce. Speech based solution can address this issue of market price information availibility to farmers. Speech based solutions are increasingly being used for transaction but they are both (a) restricted to menu based type interactions where a series of interactions are required for the transaction to take place and (b) primarily built for the English literate population synonymously urban population. Paradoxically, the benefit of a speech based solution is best reaped by the rural folks speaking their native language (very often non-English) because the other modes of transactions are either not readily available to them or if available difficult to use. In this paper, we develop a natural language Hindi speech interface to enable Hindi speaking population access market prices of commodities.

    And 31 more