We propose “Deep Autoencoders for Feature Learning in Recommender Systems,” a novel discriminativ... more We propose “Deep Autoencoders for Feature Learning in Recommender Systems,” a novel discriminative model based on the incorporation of features from autoencoders in combination with embeddings into a deep neural network to predict ratings in recommender systems. The work has two major motivations. The first is to engineer features for recommender systems in a domain-agnostic way using autoencoders. The second is to develop a method that sets a benchmark for predictive accuracy. In our proposed solution, we build a user autoencoder and item autoencoder that extract latent features for the users and items, respectively. The additional features engineered are the latent features for the users and items, and these come from the bottleneck activations of the autoencoder. Our method of feature engineering is domain agnostic, as the inner-most activations would differ for domains without any additional effort required on part of the modeler. Next, we then use the activations of the inner-most layers of the autoencoders as features in a subsequent deep neural network to predict the ratings along-with user and item embeddings. Our method incorporates additional linear and nonlinear latent features from the autoencoders to improve predictive accuracy. This is different from the existing approaches that use autoencoders as full-fledged recommender systems or use autoencoders to generate features for a subsequent supervised learning algorithm or without using embeddings. We demonstrate the out performance of our solution on four different datasets of varying sizes and sparsity, namely MovieLens 100 K, MovieLens 1 M, FilmTrust and BookCrossing datasets, with strong experimental results. We have compared our DAFERec method against mDA-CF, TrustSVD, SVD variants, BiasedMF, ItemKNN and I-AutoRec methods. The results demonstrate that our proposed solution beats the benchmarks and is a highly flexible model that works on different datasets solving different business problems like book recommendations, movie recommendations and trust.
2016 International Conference on Data Science and Engineering (ICDSE), 2016
Microblogging social media platforms like Twitter, Tumblr and Plurk have radically changed our li... more Microblogging social media platforms like Twitter, Tumblr and Plurk have radically changed our lives. The presence of millions of people has made these platforms a preferred channel for business organizations to target users for product promotions. Clustering of users as per their interest is required to perform the product promotions on these platforms. In this work, we propose a methodology for clustering of users on microblogging social media platforms on the basis of their primitive interest by clustering of micro-messages. We utilize rough set based concept called Similarity Upper Approximation for clustering of micromessages and corresponding users. We demonstrate the viability of the proposed approach by collecting, and clustering tweets and corresponding users from Twitter. Experimental results show that the proposed methodology is viable, and effective for clustering of micro-messages and corresponding users on the basis of their primitive interest.
2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016
Privacy preserving graph publishing is gaining importance in recent times mainly because of inher... more Privacy preserving graph publishing is gaining importance in recent times mainly because of inherent privacy issues existing in publishing graph and social network data. Therefore, graph structure needs to be anonymized before publishing. This study proposes anonymization of a graph using fuzzy sets to preserve the graph's privacy while maintaining the utility that can be derived from the graph. We have conducted the experiments on four different datasets, and the results suggest that the proposed approach would not only help in protecting the privacy of data but also in maintaining the quality of data for analysis. To check the robustness of the proposed approach, we have validated the effectiveness of the approach on five key community detection algorithms on three performance measures.
Information Security breaches today affect a large number of organizations including universities... more Information Security breaches today affect a large number of organizations including universities, globally. They pose an immense threat to the C-I-A (confidentiality, integrity and availability) of information. Hence, it is important to have proper Information Security Management System (ISMS) designed in accordance with industry adopted standards for risk management. The current case explores the IT infrastructure at a premier Indian business school where internet support is required round the clock. The entire ISMS framework of the organization, including security policy, security budget and network components, is described. Though the security infrastructure apparently seemed to be adequate, a spate of hacking attacks targeted at the SMTP server attempted to cripple the extremely crucial email services for the period of the attack by generating spam. The primary security challenges facing the organization including nature and appropriateness of ISMS, adequacy of the security pol...
Social networking sites have become an important market space in Internet. Social apps existing i... more Social networking sites have become an important market space in Internet. Social apps existing in social networking sites thrive in the environment provided by them. Companies are focusing more on social networking sites to promote their products and identify new sources of revenue generation. One suchnascent way is by sale of virtual/digital goods through social apps using social networking sites. To the best of our knowledge, research in this area has been growing, with limited studies exploring the business potential involved with purchase of virtual/digital goods. The aim of this paper is to identify the determinants of purchase decision of virtual/digital goods in social apps. We use TAM, social app usage, attributes of virtual/digital goods and other factors to explain the purchase intention of virtual/digital goods in social apps. Survey based methodology has been adopted to show the viability of our research.
International Journal of Web Based Communities, 2015
ABSTRACT Social networking sites are used for both personal and professional interactions. Social... more ABSTRACT Social networking sites are used for both personal and professional interactions. Social apps thrive in the environment provided by social networking sites. Companies are focussing on this market space to promote their products and to identify new sources of revenue generation. The objective of our paper is to identify the factors for purchase decision and word of mouth of virtual and digital goods in social app-based communities. In this paper, we incorporate technology acceptance model, social influence, social app usage, attributes of virtual and digital goods among other constructs. Findings from our research indicate that the social self-image expression is an important determinant of both purchase intention and word of mouth of virtual and digital goods. Social app usage affects purchase intention of virtual and digital goods. Features of social apps and social networking site enabling and enhancing these factors can contribute towards business value from social app-based community.
Among the machine learning applications to business, recommender systems would take one of the to... more Among the machine learning applications to business, recommender systems would take one of the top places when it comes to success and adoption. They help the user in accelerating the process of search while helping businesses maximize sales. Post phenomenal success in computer vision and speech recognition, deep learning methods are beginning to get applied to recommender systems. Current survey papers on deep learning in recommender systems provide a historical overview and taxonomy of recommender systems based on type. Our paper addresses the gaps of providing a taxonomy of deep learning approaches to address recommender systems problems in the areas of cold start and candidate generation in recommender systems. We outline different challenges in recommender systems into those related to the recommendations themselves (include relevance, speed, accuracy and scalability), those related to the nature of the data (cold start problem, imbalance and sparsity) and candidate generation....
Frequent sequence mining is a fundamental and essential operation in the process of discovering t... more Frequent sequence mining is a fundamental and essential operation in the process of discovering the sequential rules. Most of the sequence mining algorithms use apriori methodology or build the larger sequences from smaller patterns, a bottom-up approach. In this paper, we present an algorithm that uses top-down approach for mining long sequences. Our algorithm defines dominancy of the sequences and uses it for minimizing the scanning of the data set.
The goal of this work is to present an advanced query processing algorithm formulated and develop... more The goal of this work is to present an advanced query processing algorithm formulated and developed in support of heterogeneous distributed database management systems. Heterogeneous distributed database management systems view the integrated data through a uniform global schema. The query processing algorithm described here produces an inexpensive strategy for a query expressed over the global schema. The research addresses the following aspects of query processing: (1) Formulation of a low level query language to express the fundamental heterogeneous database operations; (2) Translation of the query expressed over the global schema to an equivalent query expressed over a conceptual schema; (3) An estimation methodology to derive the intermediate result sizes of the database operations; (4) A query decomposition algorithm to generate an efficient sequence of the basic database operations to answer the query. This research addressed the first issue by developing an algebraic query l...
ABSTRACT Trust and privacy features of websites have evolved as an important concern for any busi... more ABSTRACT Trust and privacy features of websites have evolved as an important concern for any businesses or interactions, particularly in online networks. The study investigates the relationship between trust, privacy concerns and behavioural intention of users on the social network. The behavioural intention of users on the online social network (OSN) is captured by intention to disclose information and intention to interact with others in OSN. The study was conducted on a sample of 457 active users from one of the major social networking website, Facebook. Partial least squares based structural equation modelling was used for analysing the results. The findings of the study reveal that intention to disclose information mediates the relationship between trust in the website and the intention to interact with others. Another important finding of the study indicates that prior positive experience with the website significantly impacts the trust in website, and the trust in website also plays a crucial role while determining the information privacy concerns in the OSN.
Abstract Review sentiment influences purchase decisions and indicates user satisfaction. Inferrin... more Abstract Review sentiment influences purchase decisions and indicates user satisfaction. Inferring the sentiment from reviews is an essential task in Natural Language Processing and has managerial implications for improving customer satisfaction and item quality. Traditional approaches to polarity classification use bag-of-words techniques and lexicons combined with machine learning. These approaches suffer from an inability to capture semantics and context. We propose a Deep Learning solution called OSLCFit (Organic Simultaneous LSTM and CNN Fit). In our architecture, we include all the components of a CNN until but not including the final fully connected layer and do the same in case of a bi-directional LSTM. The final fully connected layer in our architecture consists of fixed length features from the CNN, and features for both variable length and temporal dependencies from the bi-directional LSTM. The solution fine-tunes Language Model embeddings for the specific task of polarity classification using transfer learning, enabling the capture of semantics and context. The key contribution of this paper is the combination of features from both a CNN and a bi-directional LSTM into a single architecture with a single optimizer. This combination forms an organic combination and uses embeddings fine-tuned to the reviews for the specific purpose of sentiment polarity classification. The solution is benchmarked on six different datasets such as SMS Spam, YouTube Spam, Large Movie Review Corpus, Stanford Sentiment Treebank, Amazon Cellphone & Accessories and Yelp, where it beats existing benchmarks and scales to large datasets. The source code is available for the purposes of reproducible research on GitHub. 1
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Microblogging platforms like Twitter, Tumblr and Plurk have radically changed our lives. The pres... more Microblogging platforms like Twitter, Tumblr and Plurk have radically changed our lives. The presence of millions of people has made these platforms a preferred channel for communication. A large amount of User Generated Content, on these platforms, has attracted researchers and practitioners to mine and extract information nuggets. For information extraction, clustering is an important and widely used mining operation. This paper addresses the issue of clustering of micro-messages and corresponding users based on the text content of micro-messages that reflect their primitive interest. In this paper, we performed modification of the Similarity Upper Approximation based clustering algorithm for clustering of micro-messages. We compared the performance of the modified Similarity Upper Approximation based clustering algorithm with state-of-the-art clustering algorithms such as Partition Around Medoids, Hierarchical Agglomerative Clustering, Affinity Propagation Clustering and DBSCAN. ...
The emergence of multifarious complex networks has attracted researchers and practitioners from v... more The emergence of multifarious complex networks has attracted researchers and practitioners from various disciplines. Discovering cohesive subgroups or communities in complex networks is essential to understand the dynamics of real-world systems. Researchers have made persistent efforts to investigate and infer community patterns in complex networks. However, real-world networks exhibit various characteristics wherein existing communities are not only disjoint but are also overlapping and nested. The existing literature on community detection consists of limited methods to discover co-occurring disjoint, overlapping and nested communities.In this work, we propose a novel rough set based algorithm capable of uncovering true community structure in networks, be it disjoint, overlapping or nested. Initial sets of granules are constructed using neighborhood connectivity around the nodes and represented as rough sets. Subsequently, we iteratively obtain the constrained connectedness upper approximation of these sets. To constrain the sets and merge them during each iteration, we utilize the concept of relative connectedness among the nodes. We illustrate the proposed algorithm on a toy network and evaluate it on fourteen real-world benchmark networks. Experimental results show that the proposed algorithm reveals more accurate communities and significantly outperforms state-of-the-art techniques. A rough set based community detection algorithm for complex networks has been proposed.Experiments have been performed on fourteen benchmark networks from diverse domains.Comparative analysis of the proposed algorithm has been performed with the relevant state-of-the-art methods.The performance of proposed algorithm is superior to state-of-the-art methods.
We propose “Deep Autoencoders for Feature Learning in Recommender Systems,” a novel discriminativ... more We propose “Deep Autoencoders for Feature Learning in Recommender Systems,” a novel discriminative model based on the incorporation of features from autoencoders in combination with embeddings into a deep neural network to predict ratings in recommender systems. The work has two major motivations. The first is to engineer features for recommender systems in a domain-agnostic way using autoencoders. The second is to develop a method that sets a benchmark for predictive accuracy. In our proposed solution, we build a user autoencoder and item autoencoder that extract latent features for the users and items, respectively. The additional features engineered are the latent features for the users and items, and these come from the bottleneck activations of the autoencoder. Our method of feature engineering is domain agnostic, as the inner-most activations would differ for domains without any additional effort required on part of the modeler. Next, we then use the activations of the inner-most layers of the autoencoders as features in a subsequent deep neural network to predict the ratings along-with user and item embeddings. Our method incorporates additional linear and nonlinear latent features from the autoencoders to improve predictive accuracy. This is different from the existing approaches that use autoencoders as full-fledged recommender systems or use autoencoders to generate features for a subsequent supervised learning algorithm or without using embeddings. We demonstrate the out performance of our solution on four different datasets of varying sizes and sparsity, namely MovieLens 100 K, MovieLens 1 M, FilmTrust and BookCrossing datasets, with strong experimental results. We have compared our DAFERec method against mDA-CF, TrustSVD, SVD variants, BiasedMF, ItemKNN and I-AutoRec methods. The results demonstrate that our proposed solution beats the benchmarks and is a highly flexible model that works on different datasets solving different business problems like book recommendations, movie recommendations and trust.
2016 International Conference on Data Science and Engineering (ICDSE), 2016
Microblogging social media platforms like Twitter, Tumblr and Plurk have radically changed our li... more Microblogging social media platforms like Twitter, Tumblr and Plurk have radically changed our lives. The presence of millions of people has made these platforms a preferred channel for business organizations to target users for product promotions. Clustering of users as per their interest is required to perform the product promotions on these platforms. In this work, we propose a methodology for clustering of users on microblogging social media platforms on the basis of their primitive interest by clustering of micro-messages. We utilize rough set based concept called Similarity Upper Approximation for clustering of micromessages and corresponding users. We demonstrate the viability of the proposed approach by collecting, and clustering tweets and corresponding users from Twitter. Experimental results show that the proposed methodology is viable, and effective for clustering of micro-messages and corresponding users on the basis of their primitive interest.
2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016
Privacy preserving graph publishing is gaining importance in recent times mainly because of inher... more Privacy preserving graph publishing is gaining importance in recent times mainly because of inherent privacy issues existing in publishing graph and social network data. Therefore, graph structure needs to be anonymized before publishing. This study proposes anonymization of a graph using fuzzy sets to preserve the graph's privacy while maintaining the utility that can be derived from the graph. We have conducted the experiments on four different datasets, and the results suggest that the proposed approach would not only help in protecting the privacy of data but also in maintaining the quality of data for analysis. To check the robustness of the proposed approach, we have validated the effectiveness of the approach on five key community detection algorithms on three performance measures.
Information Security breaches today affect a large number of organizations including universities... more Information Security breaches today affect a large number of organizations including universities, globally. They pose an immense threat to the C-I-A (confidentiality, integrity and availability) of information. Hence, it is important to have proper Information Security Management System (ISMS) designed in accordance with industry adopted standards for risk management. The current case explores the IT infrastructure at a premier Indian business school where internet support is required round the clock. The entire ISMS framework of the organization, including security policy, security budget and network components, is described. Though the security infrastructure apparently seemed to be adequate, a spate of hacking attacks targeted at the SMTP server attempted to cripple the extremely crucial email services for the period of the attack by generating spam. The primary security challenges facing the organization including nature and appropriateness of ISMS, adequacy of the security pol...
Social networking sites have become an important market space in Internet. Social apps existing i... more Social networking sites have become an important market space in Internet. Social apps existing in social networking sites thrive in the environment provided by them. Companies are focusing more on social networking sites to promote their products and identify new sources of revenue generation. One suchnascent way is by sale of virtual/digital goods through social apps using social networking sites. To the best of our knowledge, research in this area has been growing, with limited studies exploring the business potential involved with purchase of virtual/digital goods. The aim of this paper is to identify the determinants of purchase decision of virtual/digital goods in social apps. We use TAM, social app usage, attributes of virtual/digital goods and other factors to explain the purchase intention of virtual/digital goods in social apps. Survey based methodology has been adopted to show the viability of our research.
International Journal of Web Based Communities, 2015
ABSTRACT Social networking sites are used for both personal and professional interactions. Social... more ABSTRACT Social networking sites are used for both personal and professional interactions. Social apps thrive in the environment provided by social networking sites. Companies are focussing on this market space to promote their products and to identify new sources of revenue generation. The objective of our paper is to identify the factors for purchase decision and word of mouth of virtual and digital goods in social app-based communities. In this paper, we incorporate technology acceptance model, social influence, social app usage, attributes of virtual and digital goods among other constructs. Findings from our research indicate that the social self-image expression is an important determinant of both purchase intention and word of mouth of virtual and digital goods. Social app usage affects purchase intention of virtual and digital goods. Features of social apps and social networking site enabling and enhancing these factors can contribute towards business value from social app-based community.
Among the machine learning applications to business, recommender systems would take one of the to... more Among the machine learning applications to business, recommender systems would take one of the top places when it comes to success and adoption. They help the user in accelerating the process of search while helping businesses maximize sales. Post phenomenal success in computer vision and speech recognition, deep learning methods are beginning to get applied to recommender systems. Current survey papers on deep learning in recommender systems provide a historical overview and taxonomy of recommender systems based on type. Our paper addresses the gaps of providing a taxonomy of deep learning approaches to address recommender systems problems in the areas of cold start and candidate generation in recommender systems. We outline different challenges in recommender systems into those related to the recommendations themselves (include relevance, speed, accuracy and scalability), those related to the nature of the data (cold start problem, imbalance and sparsity) and candidate generation....
Frequent sequence mining is a fundamental and essential operation in the process of discovering t... more Frequent sequence mining is a fundamental and essential operation in the process of discovering the sequential rules. Most of the sequence mining algorithms use apriori methodology or build the larger sequences from smaller patterns, a bottom-up approach. In this paper, we present an algorithm that uses top-down approach for mining long sequences. Our algorithm defines dominancy of the sequences and uses it for minimizing the scanning of the data set.
The goal of this work is to present an advanced query processing algorithm formulated and develop... more The goal of this work is to present an advanced query processing algorithm formulated and developed in support of heterogeneous distributed database management systems. Heterogeneous distributed database management systems view the integrated data through a uniform global schema. The query processing algorithm described here produces an inexpensive strategy for a query expressed over the global schema. The research addresses the following aspects of query processing: (1) Formulation of a low level query language to express the fundamental heterogeneous database operations; (2) Translation of the query expressed over the global schema to an equivalent query expressed over a conceptual schema; (3) An estimation methodology to derive the intermediate result sizes of the database operations; (4) A query decomposition algorithm to generate an efficient sequence of the basic database operations to answer the query. This research addressed the first issue by developing an algebraic query l...
ABSTRACT Trust and privacy features of websites have evolved as an important concern for any busi... more ABSTRACT Trust and privacy features of websites have evolved as an important concern for any businesses or interactions, particularly in online networks. The study investigates the relationship between trust, privacy concerns and behavioural intention of users on the social network. The behavioural intention of users on the online social network (OSN) is captured by intention to disclose information and intention to interact with others in OSN. The study was conducted on a sample of 457 active users from one of the major social networking website, Facebook. Partial least squares based structural equation modelling was used for analysing the results. The findings of the study reveal that intention to disclose information mediates the relationship between trust in the website and the intention to interact with others. Another important finding of the study indicates that prior positive experience with the website significantly impacts the trust in website, and the trust in website also plays a crucial role while determining the information privacy concerns in the OSN.
Abstract Review sentiment influences purchase decisions and indicates user satisfaction. Inferrin... more Abstract Review sentiment influences purchase decisions and indicates user satisfaction. Inferring the sentiment from reviews is an essential task in Natural Language Processing and has managerial implications for improving customer satisfaction and item quality. Traditional approaches to polarity classification use bag-of-words techniques and lexicons combined with machine learning. These approaches suffer from an inability to capture semantics and context. We propose a Deep Learning solution called OSLCFit (Organic Simultaneous LSTM and CNN Fit). In our architecture, we include all the components of a CNN until but not including the final fully connected layer and do the same in case of a bi-directional LSTM. The final fully connected layer in our architecture consists of fixed length features from the CNN, and features for both variable length and temporal dependencies from the bi-directional LSTM. The solution fine-tunes Language Model embeddings for the specific task of polarity classification using transfer learning, enabling the capture of semantics and context. The key contribution of this paper is the combination of features from both a CNN and a bi-directional LSTM into a single architecture with a single optimizer. This combination forms an organic combination and uses embeddings fine-tuned to the reviews for the specific purpose of sentiment polarity classification. The solution is benchmarked on six different datasets such as SMS Spam, YouTube Spam, Large Movie Review Corpus, Stanford Sentiment Treebank, Amazon Cellphone & Accessories and Yelp, where it beats existing benchmarks and scales to large datasets. The source code is available for the purposes of reproducible research on GitHub. 1
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Microblogging platforms like Twitter, Tumblr and Plurk have radically changed our lives. The pres... more Microblogging platforms like Twitter, Tumblr and Plurk have radically changed our lives. The presence of millions of people has made these platforms a preferred channel for communication. A large amount of User Generated Content, on these platforms, has attracted researchers and practitioners to mine and extract information nuggets. For information extraction, clustering is an important and widely used mining operation. This paper addresses the issue of clustering of micro-messages and corresponding users based on the text content of micro-messages that reflect their primitive interest. In this paper, we performed modification of the Similarity Upper Approximation based clustering algorithm for clustering of micro-messages. We compared the performance of the modified Similarity Upper Approximation based clustering algorithm with state-of-the-art clustering algorithms such as Partition Around Medoids, Hierarchical Agglomerative Clustering, Affinity Propagation Clustering and DBSCAN. ...
The emergence of multifarious complex networks has attracted researchers and practitioners from v... more The emergence of multifarious complex networks has attracted researchers and practitioners from various disciplines. Discovering cohesive subgroups or communities in complex networks is essential to understand the dynamics of real-world systems. Researchers have made persistent efforts to investigate and infer community patterns in complex networks. However, real-world networks exhibit various characteristics wherein existing communities are not only disjoint but are also overlapping and nested. The existing literature on community detection consists of limited methods to discover co-occurring disjoint, overlapping and nested communities.In this work, we propose a novel rough set based algorithm capable of uncovering true community structure in networks, be it disjoint, overlapping or nested. Initial sets of granules are constructed using neighborhood connectivity around the nodes and represented as rough sets. Subsequently, we iteratively obtain the constrained connectedness upper approximation of these sets. To constrain the sets and merge them during each iteration, we utilize the concept of relative connectedness among the nodes. We illustrate the proposed algorithm on a toy network and evaluate it on fourteen real-world benchmark networks. Experimental results show that the proposed algorithm reveals more accurate communities and significantly outperforms state-of-the-art techniques. A rough set based community detection algorithm for complex networks has been proposed.Experiments have been performed on fourteen benchmark networks from diverse domains.Comparative analysis of the proposed algorithm has been performed with the relevant state-of-the-art methods.The performance of proposed algorithm is superior to state-of-the-art methods.
Uploads
Papers by Bharat Bhasker