Sunil Kopparapu
TCS Innovation labs, Speech and Natural Language, Department Member
- Sunil Kumar Kopparapu (Senior Member, IEEE) obtained his doctoral degree in Electrical Engineering from the Indian In... moreSunil Kumar Kopparapu (Senior Member, IEEE) obtained his doctoral degree in Electrical Engineering from the Indian Institute of Technology, Bombay, India in 1997. His thesis "Modular integration for low-level and high-level vision problems in a multi-resolution framework" provided a broad framework to enable reliable and fast vision processing.
Between 1997-2000 he was with the Automation Group, Commonwealth Scientific and Industrial Research Organization (CSIRO), Brisbane, Australia working on practical image processing and 3D vision problems, mainly for the benefit of the Australian mining industry.
Prior to joining the Cognitive Systems Research Laboratory (CSRL), Tata Infotech Limited, as a Senior Research Member, in 2001, he was associated with the R&D Group at Aquila Technologies Private Limited, India, as an expert for developing virtual self line of e-commerce products.
In his current role as a Principal Scientist with the TCS Innovations Labs - Mumbai, he is actively working in the areas of speech, script, image and natural language processing with a focus on building usable systems for mass use in Indian conditions.
He has coauthored a book titled Bayesian Approach to Image Interpretation and more recently a Springer Brief on Non-linguistic Analysis of Call Center Conversation apart from several patents, journal and conference publications.edit
ABSTRACT Multiresolution analysis is being extensively used in the signal processing literature. In this paper, we show the behaviour of a general degradation model (Y=B ⊗ X+W) over different resolutions and derive an expression for the... more
ABSTRACT Multiresolution analysis is being extensively used in the signal processing literature. In this paper, we show the behaviour of a general degradation model (Y=B ⊗ X+W) over different resolutions and derive an expression for the degradation model at all coarse resolutions given the degradation model at the finest resolution. Knowledge of the behaviour of the degradation model over resolutions is useful in many computer vision applications and to this effect we sketch an algorithm for signal restoration to demonstrate the usefulness of the derived result. We also experimentally validate the derived degradation model at different resolutions
ABSTRACT Segmentation is an important topic in computer vision and image processing. In this paper we develop a scheme based on multiresolution for segmentation. The multiresolution based segmentation algorithm first segments the image... more
ABSTRACT Segmentation is an important topic in computer vision and image processing. In this paper we develop a scheme based on multiresolution for segmentation. The multiresolution based segmentation algorithm first segments the image using a known segmentation algorithm at coarse resolution and uses this information to segment images at finer resolutions. In this paper, we sketch a scheme for a multiresolution segmentation algorithm and demonstrate its validity on some real images and compare its performance with the segmented image obtained working at a single resolution
ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original... more
ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original internet video. Conventionally, access to the audio or speech, in a video, by the hearing impaired is provided by means of either text subtitles or sign language gestures by an interpreter. The proposed tool would be beneficial, especially in situations where such aids are not readily available or generating such aids is difficult. We have conducted a number experiments to determine the feasibility and usefulness of the proposed visual aid.
ABSTRACT Mashup service is common in the realm of web development. Mashup, generally, is a web page that sources two or more data sources to create a new and an improved service. While mashup service is a common phenomenon in the web... more
ABSTRACT Mashup service is common in the realm of web development. Mashup, generally, is a web page that sources two or more data sources to create a new and an improved service. While mashup service is a common phenomenon in the web development scenario, there is no such mashup for services that are based on voice user interface in the Interactive Voice Response (IVR) systems. IVR based telephony services have been popular because they are an easy and a convenient means to inquire, seek information or book services. At the same time frequently used service, like taxi usually has multiple service operators. Though the Same Service is provided by Multiple Operators (SSMO) to the end user, their access points are different and require the user to choose a particular service operator to call. Unlike a web portal or a web mashup, there is no way for the user to comparatively and simultaneously check the offerings from SSMO and choose the best among them. This paper describes a novel approach, based on a recent patent, to create a mashup service by integrating two or more IVR services.
In this paper, we formulate and develop an approach which integrates different modules(feature extractor, matching and interpolation) involved in stereo. We study the integrationprocess at the finest resolution when, (i) precomputed edge... more
In this paper, we formulate and develop an approach which integrates different modules(feature extractor, matching and interpolation) involved in stereo. We study the integrationprocess at the finest resolution when, (i) precomputed edge map is the only line field drivingthe model, (ii) the line fields are computed interactively by the feature extracting moduleof the model and (iii) when both the interactive
Research Interests:
Use of mixed language in day to day spoken speech is becoming common and is being accepted as being syntactically correct. However recognition of mixed language spoken speech is a challenge to a speech recognition engine. Though sparse,... more
Use of mixed language in day to day spoken speech is becoming common and is being accepted as being syntactically correct. However recognition of mixed language spoken speech is a challenge to a speech recognition engine. Though sparse, there have been studies on how to enable recognition of mixed language spoken speech. At one extreme is to use acoustic models of the complete phone set of the mixed language to enable recognition while on the other extreme is to use a language identification module followed by a language dependent speech recognition engine to recognize mixed language. Each of this has its own implications. In this paper, we approach the problem of mixed language recognition by constraining ourselves to use readily available resources and show that by (a) suitably modifying the language model to use mixed language and (b) by constructing a pronunciation dictionary, one can achieve a good recognition of mixed language spoken speech.
Identity of a vehicle is done through the vehicle license plate by traffic police in general. Au- tomatic vehicle license plate recognition has several applications in intelligent traffic management systems. The security situation across... more
Identity of a vehicle is done through the vehicle license plate by traffic police in general. Au- tomatic vehicle license plate recognition has several applications in intelligent traffic management systems. The security situation across the globe and particularly in India demands a need to equip the traffic police with a system that enables them to get instant details of a vehicle. The system should be easy to use, should be mobile, and work 24 x 7. In this paper, we describe a mobile phone based, client-server architected, license plate recognition system. While we use the state of the art image processing and pattern recognition algorithms tuned for Indian conditions to automatically recognize non-uniform license plates, the main contribution is in creating an end to end usable solution. The client application runs on a mobile device and a server application, with access to vehicle information database, is hosted centrally. The solution enables capture of license plate image capt...
This paper describes a new feature set for use in the recognition of on-line handwritten Devanagari script based on Fuzzy Directional Features. Experiments are conducted for the automatic recognition of isolated handwritten character... more
This paper describes a new feature set for use in the recognition of on-line handwritten Devanagari script based on Fuzzy Directional Features. Experiments are conducted for the automatic recognition of isolated handwritten character primitives (sub-character units). Initially we describe the proposed feature set, called the Fuzzy Directional Features (FDF) and then show how these features can be effectively utilized for writer independent character recognition. Experimental results show that FDF set perform well for writer independent data set at stroke level recognition. The main contribution of this paper is the introduction of a novel feature set and establish experimentally its ability in recognition of handwritten Devanagari script.
ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original... more
ABSTRACT We present a visual aid for the hearing impaired to enable access to internet videos. The visual tool is in the form of a time synchronized lip movement corresponding to the speech in the video which is embedded in the original internet video. Conventionally, access to the audio or speech, in a video, by the hearing impaired is provided by means of either text subtitles or sign language gestures by an interpreter. The proposed tool would be beneficial, especially in situations where such aids are not readily available or generating such aids is difficult. We have conducted a number experiments to determine the feasibility and usefulness of the proposed visual aid.
ABSTRACT The main challenge in on-line handwritten character recognition in Indian lan- guage is the large size of the character set, larger similarity between different characters in the script and the huge variation in writing style. In... more
ABSTRACT The main challenge in on-line handwritten character recognition in Indian lan- guage is the large size of the character set, larger similarity between different characters in the script and the huge variation in writing style. In this paper we propose a framework for on-line handwitten script recognition taking cues from speech signal processing literature. The framework is based on identify- ing strokes, which in turn lead to recognition of handwritten on-line characters rather that the conventional character identification. Though the framework is described for Devanagari script, the framework is general and can be applied to any language. The proposed platform consists of pre-processing, feature extraction, recog- nition and post processing like the conventional character recognition but ap- plied to strokes. The on-line Devanagari character recognition reduces to one of recognizing one of 69 primitives and recognition of a character is performed by recognizing a sequence of such primitives. We further show the impact of noise removal on on-line raw data which is usually noisy. The use of Fuzzy Direc- tional Features to enhance the accuracy of stroke recognition is also described. The recognition results are compared with commonly used directional features in literature using several classifiers.
Research Interests:
Municipal corporation (MC) of a city is a local governing body which takes care of the functioning of the city. Among many other things one of the chief responsibilities of an MC includes addressing the complaints that the residents of... more
Municipal corporation (MC) of a city is a local governing body which takes care of the functioning of the city. Among many other things one of the chief responsibilities of an MC includes addressing the complaints that the residents of the city might have. Maintenance of a large city requires that the MC be aware of any shortcomings either through surveillance (sensors/cameras) or by allowing the citizens to report them. The second option is usually preferred because there is a sense of belonging. A mechanism to accept complaints from citizens 24 × 7 would be the expectation from both the citizens and the MC. The Mumbai MC allows its citizens to place their complaints through several channels. The chief modes of complaint registration is (a) a visit to the ward office-where a person in charge listens to the complaint and asks for some personal details and put it across into an electronic form for other departments within the MC to handle the complaint, (b) through a contact center over a telephone-where the complaint is registered by an call center agent by typing the complaint into the system and more recently (c) through a web portal. In this paper, we propose a natural English enabled mobile interface which can be used to lodge complaints 1. The essential idea is to make use of the existing web portal infrastructure [6] and provide an easy, cheap and quick (complain as you see) mode of complaint registration around the clock. The proposed system enables and assists citizens to lodge compliant and seek redressal through their mobile phone in natural language.
Research Interests:
Yellow Pages are directories that source information about various commercial organizations like their addresses, phone contact and other details. These are very useful and are used by individual and other business houses. Until recently,... more
Yellow Pages are directories that source information about various commercial organizations like their addresses, phone contact and other details. These are very useful and are used by individual and other business houses. Until recently, the only way to access these yellow pages directory information was to physically look into a huge hardcopy directory, which was not only laborious but also time consuming and required the user to be familiar with the organization of the directory. More recently, there have been IVR based contact centers that have been set up which can be used by the users to query information. While it is easier than browsing through the physical directory, it still has several pitfalls. The time spent on trying to get the information is quite large and at the end of enquiry one is not sure if one will get the information that one is looking for. In this paper, we propose a novel interface which enables accessing the yellow pages directory information on the mobile phone by sending a short message service (SMS). The central idea of the proposed method is to avoid any constraint on the way the user can query the yellow pages directory except that it be in natural English. The system, which uses natural language processing (NLP) techniques, understands the intent of the query and intelligently searches the yellow pages directory to retrieve information. This retrieved information is then sent back to the user in the form of a SMS.
Research Interests:
Farmers in most rural areas in India not only need expert and timely suggestion to obtain rich harvest of their crops but also need information regarding the subsidies, government schemes to make cultivation pay rich dividends. Expert... more
Farmers in most rural areas in India not only need expert and timely suggestion to obtain rich harvest of their crops but also need information regarding the subsidies, government schemes to make cultivation pay rich dividends. Expert guidance comes in the form of an human expert visiting the village and the farmers being able to get their turn to seek answers to their queries. In this paper, we propose a Question Answering (QA) system, which would act as an expert and answer queries of the farmers. We call this QA system KisanMitra, friend of the farmer. The idea in building this system is to give access to information 24×7, to keep the information that reaches the farmer updated, enable the farmer to query in his own language without being strict on grammar or construct of the query. The system is intelligent in the sense, it understand the intent of the query and provides responses. In the absence of exact answers not being present in its KisanMitra, it provides answers which are close in some sense.
Research Interests:
The ability to classify spoken speech based on the style of speaking is an important problem. With the advent of BPO's in recent times, specifically those that cater to a population other than the local population, it has become necessary... more
The ability to classify spoken speech based on the style of speaking is an important problem. With the advent of BPO's in recent times, specifically those that cater to a population other than the local population, it has become necessary for BPO's to identify people with certain style of speaking (American, British etc). Today BPO's employ accent analysts to identify people having the required style of speaking. This process while involving human bias, it is becoming increasingly infeasible because of the high attrition rate in the BPO industry. In this paper, we propose a new metric, which robustly and accurately helps classify spoken speech based on the style of speaking. The role of the proposed metric is substantiated by using it to classify real speech data collected from over seventy different people working in a BPO. We compare the performance of the metric against human experts who independently carried out the classification process. Experimental results show that the performance of the system using the novel metric performs better than two different human expert.
Research Interests:
Most stereo algorithms assume images to be epipolar aligned. There are two ways of achieving this (i) physically aligning the cameras or (ii) rectifying the stereo images after capturing them. For real-time or fixed stereo head... more
Most stereo algorithms assume images to be epipolar aligned. There are two ways of achieving this (i) physically aligning the cameras or (ii) rectifying the stereo images after capturing them. For real-time or fixed stereo head applications aligning the cameras is preferable because rectifying the stereo images would require precious computational resources and once set the cameras remain in alignment. For applications involving mobility of stereo head, like in robots, it is preferable to rectify images using software because the alignment of cameras could change with time. In this paper we use affine like transform to rectify stereo images and demonstrate its usefulness in producing better disparity estimates and show it can be used to capture stereo images from a single camera.Most stereo algorithms assume images to be epipolar aligned. There are two ways of achieving this (i) physically aligning the cameras or (ii) rectifying the stereo images after capturing them. For real-time or fixed stereo head applications aligning the cameras is preferable because rectifying the stereo images would require precious computational resources and once set the cameras remain in alignment. For applications involving mobility of stereo head, like in robots, it is preferable to rectify images using software because the alignment of cameras could change with time. In this paper we use affine like transform to rectify stereo images and demonstrate its usefulness in producing better disparity estimates and show it can be used to capture stereo images from a single camera.
Research Interests:
This paper describes a system for automated resume information extraction to support rapid resume search and management. The system is capable of extracting several important informative fields from a free format resume using a set of... more
This paper describes a system for automated resume information extraction to support rapid resume search and management. The system is capable of extracting several important informative fields from a free format resume using a set of natural language processing (NLP) techniques. We describe a working system, for automatic resume management. The system is capable of extracting six major fields of information as defined by HR-XML[8]. Experimental results carried out on a large number of resumes show that the proposed system can handle a large variety of resumes in different document formats with a precision of 91% and a recall of 88%
Research Interests:
Research Interests:
... A. Identification of curvature points The curvature points (also called critical points) are ex-tracted from the smoothed ... in Digital Signal Processing literature 3Note that [12] talks of fuzzy feature set for Devanagari script... more
... A. Identification of curvature points The curvature points (also called critical points) are ex-tracted from the smoothed ... in Digital Signal Processing literature 3Note that [12] talks of fuzzy feature set for Devanagari script albeit for offline handwritten character recognition Page 3. ...
Abstract Noise in on-line hand written characters due to natural shaking of the hand and noise due to the process of digitization is inherent and this can lead to a degraded performance of character recognition system. In this paper, we... more
Abstract Noise in on-line hand written characters due to natural shaking of the hand and noise due to the process of digitization is inherent and this can lead to a degraded performance of character recognition system. In this paper, we propose a noise removal ...
Research Interests:
ABSTRACT It is a well known fact that majority of rural India earns its livelihood from agriculture and farming. Although India is a net exporter of various agricultural products, the farmer who happens to be the primary producer, has... more
ABSTRACT It is a well known fact that majority of rural India earns its livelihood from agriculture and farming. Although India is a net exporter of various agricultural products, the farmer who happens to be the primary producer, has remained information poor which puts him at a disadvantage. With little or no knowledge of prices at the markets, farmers have no leverage to negotiate better prices for their produce. Speech based solution can address this issue of market price information availibility to farmers. Speech based solutions are increasingly being used for transaction but they are both (a) restricted to menu based type interactions where a series of interactions are required for the transaction to take place and (b) primarily built for the English literate population synonymously urban population. Paradoxically, the benefit of a speech based solution is best reaped by the rural folks speaking their native language (very often non-English) because the other modes of transactions are either not readily available to them or if available difficult to use. In this paper, we develop a natural language Hindi speech interface to enable Hindi speaking population access market prices of commodities.