This volume contains the papers presented at CoDS 2016: Third IKDD Conference on Data Sciences held on March 13-16, 2016 in Pune.
Proceeding Downloads
SocialStories: Segmenting Stories within Trending Twitter Topics
This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster ...
Learning to Collectively Link Entities
Recently Kulkarni et al. [20] proposed an approach for collective disambiguation of entity mentions occurring in natural language text. Their model achieves disambiguation by efficiently computing exact MAP inference in a binary labeled Markov Random ...
Learning DTW-Shapelets for Time-Series Classification
Shapelets are discriminative patterns in time series, that best predict the target variable when their distances to the respective time series are used as features for a classifier. Since the shapelet is simply any time series of some length less than ...
Modeling Spatio-temporal Change Pattern using Mathematical Morphology
Detection and assessment of spatio-temporal change pattern is a challenging task, and may provide insights into various spatio-temporal changes, like urban sprawl monitoring, surveillance of epidemics due to infectious diseases etc. The existing spatio-...
Detecting Community Structures in Social Networks by Graph Sparsification
Community structures are inherent in social networks and finding them is an interesting and well-studied problem. Finding community structures in social networks is similar to locating densely connected clusters of nodes in a graph. One of the popular ...
On the Dynamics of Username Changing Behavior on Twitter
People extensively use username to lookup users, their profiles and tweets that mention them via Twitter search engine. Often, the searched username is outdated due to a recent username change and no longer refers to the user of interest. Search by the ...
Audience Prism: Segmentation and Early Classification of Visitors Based on Reading Interests
The largest Media and Entertainment (M&E) web portals today cater to more than 100 Million unique visitors every month. In Customer Relationship Management, customer segmentation plays an important role, with the goal of targeting different products for ...
Investigating the Potential of Aggregated Tweets as Surrogate Data for Forecasting Civil Protests
Online Micro-blogging Social Media websites like Twitter are being used as a real-time platform for information sharing and communication during planning and mobilization of civil unrest events. We conduct a study of more than 1.5 million English Tweets ...
Learning transition models of biological regulatory and signaling networks from noisy data
In this paper, we present an extended 2-step probabilistic LGTS (PLGTS) transition system which aims to identify the network structure and stochastic nature of biological processes using time series data. This work is a step towards system ...
Some algorithms for correlated bandits with non-stationary rewards: Regret bounds and applications
We first propose an online learning model wherein rewards for different actions/arms used by the user can be correlated and the reward stream can be non-stationary. Thus, this extends the standard multi-armed bandit learning model. We propose two ...
Events Describe Places: Tagging Places with Event Based Social Network Data
Location based services and Geospatial web applications have become popular in recent years due to wide adoption of mobile devices. Search and recommendation of places or Points of Interests (PoIs) are prominent services available on them. The ...
Smart filters for social retrieval
Social media platform are increasingly becoming a rich source of information for capturing the views and opinions of online customers. Major brands listen to the social streams to understand the general pulse of their online community. The foremost task ...
Learning from Gurus: Analysis and Modeling of Reopened Questions on Stack Overflow
Community-driven Question Answering (Q&A) platforms are gaining popularity now-a-days and the number of posts on such platforms are increasing tremendously. Thus, the challenge to keep these platforms noise-free is attracting the interest of research ...
Exploiting Local and Global Context In PPI networks For Efficient Protein Function Prediction
Protein-protein interaction (PPI) networks are valuable biological data source which contain rich information useful for protein function prediction. The PPI network data obtained from high-throughput experiments is known to be noisy and incomplete. In ...
Feature Creation based Slicing for Privacy Preserving Data Mining
In the digital era vast amount of data are collected and shared for purpose of research and analysis. These data contain sensitive information about the people and organizations which needs to be protected during the process of data mining. This work ...
CitizenPulse: A Text Analytics framework for Proactive e-Governance - A Case Study of Mygov.in
Indian Citizens are beginning to express themselves via social media on a regular basis on various issues. Government of India have started an initiated called as Mygov.in as a collaborative portal where citizens can voice their opinions via free form ...
Trustworthiness of t-Distributed Stochastic Neighbour Embedding
A well known technique for embedding high dimensional objects in two or three dimensional space is the t-distributed stochastic neighbour embedding (t-SNE). The t-SNE minimizes the Kullback-Liebler (KL) divergence between two probability distributions, ...
Weighted Linear Loss Twin Support Vector Clustering
Traditional point based clustering methods such as k-means [1], k-median [2], etc. work by partitioning the data into clusters based on the cluster prototype points. These methods perform poorly in case when data is not distributed around several ...
Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis
Sentiment drifts due to people changing their opinions instantly on microblogs e.g. Twitter, are a major challenge in sentiment analysis. In this paper, we have developed a method that selects most frequent messages from a relevant message set ...
Mining Multi-source Data to Study Workplace Activity Patterns
- Sachin Patel,
- Ravi Mahamuni,
- Meghendra Singh,
- David Clarance,
- Mayuri Duggirala,
- Shivani Sharma,
- Vinay Katiyar,
- Gauri Deshpande,
- Amruta Deshmukh,
- Vaibhav,
- Vivek Balaraman
Examining work activity patterns is a problem of enduring research in organizations. The fortuitous availability of a whole new set of data collection mechanisms such as mobiles, activity loggers, GPS based location detectors, provide us new ways of ...
Consensus Clustering Approach for Discovering Overlapping Nodes in Social Networks
Community discovery is an important problem that has been addressed in social networks through multiple perspectives. Most of these algorithms discover disjoint communities and yield widely varying results with regard to number of communities as well as ...
An Approach to Allocate Advertisement Slots for Banner Advertising
In the banner advertising scenario, an advertiser aims to reach the maximum number of potential visitors and a publisher tries to meet the requests of increased number of advertisers to maximize the revenue. In the literature, a model was introduced to ...
Competing Algorithm Detection from Research Papers
We propose an unsupervised approach to extract all competing algorithms present in a given scholarly article. The algorithm names are treated as named entities and natural language processing techniques are used to extract them. All extracted entity ...
Query Classification using LDA Topic Model and Sparse Representation Based Classifier
Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords ...
Scalable Quick Reduct Algorithm: Iterative MapReduce Approach
Feature selection by reduct computation is the key technique for knowledge acquistion using rough set theory. Existing MapReduce based reduct algorithms use Hadoop Map Reduce framework, which is not suitable for iterative algorithms. Paper aims to ...
Improving Urban Transportation through Social Media Analytics
- Manjira Sinha,
- Preethy Varma,
- Gayatri Sivakumar,
- Mridula Singh,
- Tridib Mukherjee,
- Deepthi Chander,
- Koustuv Dasgupta
Citizens tend to discuss issues in public forums, social media, and web blogs. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, ...
AMEO 2015: A dataset comprising AMCAT test scores, biodata details and employment outcomes of job seekers
More than a million engineers enter the global workforce every year. A relevant question is what determines the jobs and salaries these engineers are offered right after graduation. Previous studies have shown the influence of various factors such as ...