Abstract It is a common experience while Web searching that one gets to see pages that are not of... more Abstract It is a common experience while Web searching that one gets to see pages that are not of interest. Partly these are due to a word or words in the search query having different contexts, the user obviously expecting to find pages related to the context of interest. This paper proposes a method for disambiguating contexts in Web search results.
Abstract Many organizations provide dialog-based support through contact centers to sell their pr... more Abstract Many organizations provide dialog-based support through contact centers to sell their products, handle customer issues, and address product-and service-related issues. This is usually provided through voice calls—of late, web-chat based support is gaining prominence. In this paper, we consider any conversational text derived from web-chat systems, voice recognition systems etc., and propose a method to identify procedures that are embedded in the text.
Abstract It is a common experience while web searching that one gets to see pages that are not of... more Abstract It is a common experience while web searching that one gets to see pages that are not of interest. Partly these are due to a word or words in the search query having different contexts, the user obviously expecting to find pages related to the context of interest. This paper proposes a method for disambiguating contexts in web search results.
Abstract Unsupervised feature selection techniques for text data are gaining more and more attent... more Abstract Unsupervised feature selection techniques for text data are gaining more and more attention over the last few years. Text data is different from structured data, both in origin and content, and they have some special differentiating properties from other types of data. In this work we analyze some such features and exploit them to propose a new unsupervised feature selection technique called Scaled Entropy.
Developing countries like India are observing an increasing trend in the penetration of mobile ph... more Developing countries like India are observing an increasing trend in the penetration of mobile phones towards the base of the pyramid (lower strata of the society). This segment comprises of users who are novice and semi-literate and are interested in the basic usage of the mobile phone. This paper explores one of the basic features, the address book for its usability and presents an enhanced symbol-based design to cater for the semi-literate user.
Abstract It is a common experience while Web searching that one gets to see pages that are not of... more Abstract It is a common experience while Web searching that one gets to see pages that are not of interest. Partly these are due to a word or words in the search query having different contexts, the user obviously expecting to find pages related to the context of interest. This paper proposes a method for disambiguating contexts in Web search results.
There is an increasing trend in the penetration of mobile phones towards the lower strata (lower ... more There is an increasing trend in the penetration of mobile phones towards the lower strata (lower income) group of the society. Cost is perceived as the governing factor which determines the adoption of mobile phones in this group. This paper explores the effect of cost on the usage of mobile phones and proposes an enhanced design with features that optimize its usage cost for lower income group.
Abstract Web pages are identified by their URLs. For authoritative web pages, pages that are focu... more Abstract Web pages are identified by their URLs. For authoritative web pages, pages that are focused on a specific topic, webmasters tend to use URLs which summarize the page. URL information is good for clustering because, they are small and ubiquitous, making techniques based on just URL information magnitudes faster than those which make use of the text content as well.
Abstract. Case-based reasoning (CBR) has been shown to be of considerable utility in a spam-filte... more Abstract. Case-based reasoning (CBR) has been shown to be of considerable utility in a spam-filtering task. In the course of this study, we propose that the non-random skewed distribution of the cases in a case base is crucial, especially in the context of a classification task like spam filtering. In this paper, we propose approaches to improve the performance of a CBR spam filter by making use of the non-random nature of the case base.
Abstract This paper illustrates the utility of URL information in unsupervised learning. We outli... more Abstract This paper illustrates the utility of URL information in unsupervised learning. We outline the motivation behind the usage of URL information upfront, and present two techniques for unsupervised learning from URL corpora. First, we devise a similarity measure for URL pairs putting down the intuitions behind the same and verify its goodness by using it for clustering. Further, we outline a method for keyword identification using the similarity measure.
Abstract The top-k retrieval problem requires finding k objects most similar to a given query obj... more Abstract The top-k retrieval problem requires finding k objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes cannot be used. Among the most popular techniques that can handle arbitrary similarity measures is the family of threshold algorithms.
Abstract Mobile-enabled social networks applications are becoming increasingly popular. Most of t... more Abstract Mobile-enabled social networks applications are becoming increasingly popular. Most of the current social network applications have been designed for high-end mobile devices, and they rely upon features such as GPS, capabilities of the World Wide Web, and rich media support. However, a significant fraction of mobile user base, especially in the developing world, own low-end devices that are only capable of voice and short text messages (SMS).
Abstract Contact centers provide dialog-based support to organizations to address various custome... more Abstract Contact centers provide dialog-based support to organizations to address various customer-related issues. We have observed that calls at contact centers mostly follow well-defined patterns. Such call flows could specify how an agent should proceed in a call, handle objections, persuade customers, follow compliance issues, etc., and could also help to structure the operational process of call handling.
Abstract Master data management (MDM) provides a means to link data from various structured data ... more Abstract Master data management (MDM) provides a means to link data from various structured data sources and to generate a consolidated master record for entities such as customers or products. However, a large amount of valuable information about entities exists as unstructured content in documents. In this paper, we show how MDM can be made aware of information from unstructured content by automatically extracting valuable information from documents.
Abstract Text categorization involves mapping of documents to a fixed set of labels. A similar bu... more Abstract Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of documents from sources like the World Wide Web, manual labeling by domain experts is prohibitively expensive. The problem of reducing effort in labeling of documents has warranted a lot of investigation in the past. Most of this work involved some kind of supervised or semisupervised learning.
Abstract Clustering, particularly text clustering, in data mining has been attracting a lot of at... more Abstract Clustering, particularly text clustering, in data mining has been attracting a lot of attention of late. There have been conventional techniques like K-means, which involve parameters that can't be easily estimated. With the emergence of density-based clustering algorithms which have significant advantages, a lot of attention has been devoted to them. OPTICS [1] is the latest and most sophisticated technique in this direction, and has been shown to be considerably tolerant to value changes in parameters.
Abstract A Reverse Skyline query returns all objects whose skyline contains the query object. In ... more Abstract A Reverse Skyline query returns all objects whose skyline contains the query object. In this paper, we consider Reverse Skyline query processing where the distance between attribute values are not necessarily metric. We outline real world cases that motivate Reverse Skyline processing in such scenarios. We consider various optimizations to develop efficient algorithms for Reverse Skyline processing. Firstly, we consider block-based processing of objects to optimize on IO costs.
Abstract Implementing a CRM Analytics solution for a business involves many steps including data ... more Abstract Implementing a CRM Analytics solution for a business involves many steps including data extraction, populating the extracted data into a warehouse, and running an appropriate mining algorithm. We propose a CRM Analytics Framework that provides an end-to-end framework for developing and deploying pre-packaged predictive modeling business solutions, intended to help in reducing the time and effort required for building the application.
Abstract We might have heard quite a few people say on seeing some new mails in their inboxes," O... more Abstract We might have heard quite a few people say on seeing some new mails in their inboxes," Oh! that spam again". People who observe the kind of spam messages that they receive would perhaps be able to classify similar spam mails into communities. Such properties of spam messages can be used to filter spam. This paper describes an approach towards spam filtering that seeks to exploit the nature of spam messages that allow them to be classified into different communities.
Abstract Clusters of text documents output by clustering algorithms are often hard to interpret. ... more Abstract Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models.
Abstract It is a common experience while Web searching that one gets to see pages that are not of... more Abstract It is a common experience while Web searching that one gets to see pages that are not of interest. Partly these are due to a word or words in the search query having different contexts, the user obviously expecting to find pages related to the context of interest. This paper proposes a method for disambiguating contexts in Web search results.
Abstract Many organizations provide dialog-based support through contact centers to sell their pr... more Abstract Many organizations provide dialog-based support through contact centers to sell their products, handle customer issues, and address product-and service-related issues. This is usually provided through voice calls—of late, web-chat based support is gaining prominence. In this paper, we consider any conversational text derived from web-chat systems, voice recognition systems etc., and propose a method to identify procedures that are embedded in the text.
Abstract It is a common experience while web searching that one gets to see pages that are not of... more Abstract It is a common experience while web searching that one gets to see pages that are not of interest. Partly these are due to a word or words in the search query having different contexts, the user obviously expecting to find pages related to the context of interest. This paper proposes a method for disambiguating contexts in web search results.
Abstract Unsupervised feature selection techniques for text data are gaining more and more attent... more Abstract Unsupervised feature selection techniques for text data are gaining more and more attention over the last few years. Text data is different from structured data, both in origin and content, and they have some special differentiating properties from other types of data. In this work we analyze some such features and exploit them to propose a new unsupervised feature selection technique called Scaled Entropy.
Developing countries like India are observing an increasing trend in the penetration of mobile ph... more Developing countries like India are observing an increasing trend in the penetration of mobile phones towards the base of the pyramid (lower strata of the society). This segment comprises of users who are novice and semi-literate and are interested in the basic usage of the mobile phone. This paper explores one of the basic features, the address book for its usability and presents an enhanced symbol-based design to cater for the semi-literate user.
Abstract It is a common experience while Web searching that one gets to see pages that are not of... more Abstract It is a common experience while Web searching that one gets to see pages that are not of interest. Partly these are due to a word or words in the search query having different contexts, the user obviously expecting to find pages related to the context of interest. This paper proposes a method for disambiguating contexts in Web search results.
There is an increasing trend in the penetration of mobile phones towards the lower strata (lower ... more There is an increasing trend in the penetration of mobile phones towards the lower strata (lower income) group of the society. Cost is perceived as the governing factor which determines the adoption of mobile phones in this group. This paper explores the effect of cost on the usage of mobile phones and proposes an enhanced design with features that optimize its usage cost for lower income group.
Abstract Web pages are identified by their URLs. For authoritative web pages, pages that are focu... more Abstract Web pages are identified by their URLs. For authoritative web pages, pages that are focused on a specific topic, webmasters tend to use URLs which summarize the page. URL information is good for clustering because, they are small and ubiquitous, making techniques based on just URL information magnitudes faster than those which make use of the text content as well.
Abstract. Case-based reasoning (CBR) has been shown to be of considerable utility in a spam-filte... more Abstract. Case-based reasoning (CBR) has been shown to be of considerable utility in a spam-filtering task. In the course of this study, we propose that the non-random skewed distribution of the cases in a case base is crucial, especially in the context of a classification task like spam filtering. In this paper, we propose approaches to improve the performance of a CBR spam filter by making use of the non-random nature of the case base.
Abstract This paper illustrates the utility of URL information in unsupervised learning. We outli... more Abstract This paper illustrates the utility of URL information in unsupervised learning. We outline the motivation behind the usage of URL information upfront, and present two techniques for unsupervised learning from URL corpora. First, we devise a similarity measure for URL pairs putting down the intuitions behind the same and verify its goodness by using it for clustering. Further, we outline a method for keyword identification using the similarity measure.
Abstract The top-k retrieval problem requires finding k objects most similar to a given query obj... more Abstract The top-k retrieval problem requires finding k objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes cannot be used. Among the most popular techniques that can handle arbitrary similarity measures is the family of threshold algorithms.
Abstract Mobile-enabled social networks applications are becoming increasingly popular. Most of t... more Abstract Mobile-enabled social networks applications are becoming increasingly popular. Most of the current social network applications have been designed for high-end mobile devices, and they rely upon features such as GPS, capabilities of the World Wide Web, and rich media support. However, a significant fraction of mobile user base, especially in the developing world, own low-end devices that are only capable of voice and short text messages (SMS).
Abstract Contact centers provide dialog-based support to organizations to address various custome... more Abstract Contact centers provide dialog-based support to organizations to address various customer-related issues. We have observed that calls at contact centers mostly follow well-defined patterns. Such call flows could specify how an agent should proceed in a call, handle objections, persuade customers, follow compliance issues, etc., and could also help to structure the operational process of call handling.
Abstract Master data management (MDM) provides a means to link data from various structured data ... more Abstract Master data management (MDM) provides a means to link data from various structured data sources and to generate a consolidated master record for entities such as customers or products. However, a large amount of valuable information about entities exists as unstructured content in documents. In this paper, we show how MDM can be made aware of information from unstructured content by automatically extracting valuable information from documents.
Abstract Text categorization involves mapping of documents to a fixed set of labels. A similar bu... more Abstract Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of documents from sources like the World Wide Web, manual labeling by domain experts is prohibitively expensive. The problem of reducing effort in labeling of documents has warranted a lot of investigation in the past. Most of this work involved some kind of supervised or semisupervised learning.
Abstract Clustering, particularly text clustering, in data mining has been attracting a lot of at... more Abstract Clustering, particularly text clustering, in data mining has been attracting a lot of attention of late. There have been conventional techniques like K-means, which involve parameters that can't be easily estimated. With the emergence of density-based clustering algorithms which have significant advantages, a lot of attention has been devoted to them. OPTICS [1] is the latest and most sophisticated technique in this direction, and has been shown to be considerably tolerant to value changes in parameters.
Abstract A Reverse Skyline query returns all objects whose skyline contains the query object. In ... more Abstract A Reverse Skyline query returns all objects whose skyline contains the query object. In this paper, we consider Reverse Skyline query processing where the distance between attribute values are not necessarily metric. We outline real world cases that motivate Reverse Skyline processing in such scenarios. We consider various optimizations to develop efficient algorithms for Reverse Skyline processing. Firstly, we consider block-based processing of objects to optimize on IO costs.
Abstract Implementing a CRM Analytics solution for a business involves many steps including data ... more Abstract Implementing a CRM Analytics solution for a business involves many steps including data extraction, populating the extracted data into a warehouse, and running an appropriate mining algorithm. We propose a CRM Analytics Framework that provides an end-to-end framework for developing and deploying pre-packaged predictive modeling business solutions, intended to help in reducing the time and effort required for building the application.
Abstract We might have heard quite a few people say on seeing some new mails in their inboxes," O... more Abstract We might have heard quite a few people say on seeing some new mails in their inboxes," Oh! that spam again". People who observe the kind of spam messages that they receive would perhaps be able to classify similar spam mails into communities. Such properties of spam messages can be used to filter spam. This paper describes an approach towards spam filtering that seeks to exploit the nature of spam messages that allow them to be classified into different communities.
Abstract Clusters of text documents output by clustering algorithms are often hard to interpret. ... more Abstract Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models.
Uploads
Papers by Jyothi John