-
Higher-Order Relations Skew Link Prediction in Graphs
Authors:
Govind Sharma,
Aditya Challa,
Paarth Gupta,
M. Narasimha Murty
Abstract:
The problem of link prediction is of active interest. The main approach to solving the link prediction problem is based on heuristics such as Common Neighbors (CN) -- more number of common neighbors of a pair of nodes implies a higher chance of them getting linked. In this article, we investigate this problem in the presence of higher-order relations. Surprisingly, it is found that CN works very w…
▽ More
The problem of link prediction is of active interest. The main approach to solving the link prediction problem is based on heuristics such as Common Neighbors (CN) -- more number of common neighbors of a pair of nodes implies a higher chance of them getting linked. In this article, we investigate this problem in the presence of higher-order relations. Surprisingly, it is found that CN works very well, and even better in the presence of higher-order relations. However, as we prove in the current work, this is due to the CN-heuristic overestimating its prediction abilities in the presence of higher-order relations. This statement is proved by considering a theoretical model for higher-order relations and by showing that AUC scores of CN are higher than can be achieved from the model. Theoretical justification in simple cases is also provided. Further, we extend our observations to other similar link prediction algorithms such as Adamic Adar. Finally, these insights are used to propose an adjustment factor by taking into conscience that a random graph would only have a best AUC score of 0.5. This adjustment factor allows for a better estimation of generalization scores.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
Love tHy Neighbour: Remeasuring Local Structural Node Similarity in Hypergraph-Derived Networks
Authors:
Govind Sharma,
Paarth Gupta,
M. Narasihma Murty
Abstract:
The problem of node-similarity in networks has motivated a plethora of such measures between node-pairs, which make use of the underlying graph structure. However, higher-order relations cannot be losslessly captured by mere graphs and hence, extensions thereof viz. hypergraphs are used instead. Measuring proximity between node pairs in such a setting calls for a revision in the topological measur…
▽ More
The problem of node-similarity in networks has motivated a plethora of such measures between node-pairs, which make use of the underlying graph structure. However, higher-order relations cannot be losslessly captured by mere graphs and hence, extensions thereof viz. hypergraphs are used instead. Measuring proximity between node pairs in such a setting calls for a revision in the topological measures of similarity, lest the hypergraph structure remains under-exploited. We, in this work, propose a multitude of hypergraph-oriented similarity scores between node-pairs, thereby providing novel solutions to the link prediction problem. As a part of our proposition, we provide theoretical formulations to extend graph-topology based scores to hypergraphs. We compare our scores with graph-based scores (over clique-expansions of hypergraphs into graphs) from the state-of-the-art. Using a combination of the existing graph-based and the proposed hypergraph-based similarity scores as features for a classifier predicts links much better than using the former solely. Experiments on several real-world datasets and both quantitative as well as qualitative analyses on the same exhibit the superiority of the proposed similarity scores over the existing ones.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
The CAT SET on the MAT: Cross Attention for Set Matching in Bipartite Hypergraphs
Authors:
Govind Sharma,
Swyam Prakash Singh,
V. Susheela Devi,
M. Narasimha Murty
Abstract:
Usual relations between entities could be captured using graphs; but those of a higher-order -- more so between two different types of entities (which we term "left" and "right") -- calls for a "bipartite hypergraph". For example, given a left set of symptoms and right set of diseases, the relation between a set subset of symptoms (that a patient experiences at a given point of time) and a subset…
▽ More
Usual relations between entities could be captured using graphs; but those of a higher-order -- more so between two different types of entities (which we term "left" and "right") -- calls for a "bipartite hypergraph". For example, given a left set of symptoms and right set of diseases, the relation between a set subset of symptoms (that a patient experiences at a given point of time) and a subset of diseases (that he/she might be diagnosed with) could be well-represented using a bipartite hyperedge. The state-of-the-art in embedding nodes of a hypergraph is based on learning the self-attention structure between node-pairs from a hyperedge. In the present work, given a bipartite hypergraph, we aim at capturing relations between node pairs from the cross-product between the left and right hyperedges, and term it a "cross-attention" (CAT) based model. More precisely, we pose "bipartite hyperedge link prediction" as a set-matching (SETMAT) problem and propose a novel neural network architecture called CATSETMAT for the same. We perform extensive experiments on multiple bipartite hypergraph datasets to show the superior performance of CATSETMAT, which we compare with multiple techniques from the state-of-the-art. Our results also elucidate information flow in self- and cross-attention scenarios.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
Decentralised Approach for Multi Agent Path Finding
Authors:
Shyni Thomas,
M. Narasimha Murty
Abstract:
Multi Agent Path Finding (MAPF) requires identification of conflict free paths for agents which could be point-sized or with dimensions. In this paper, we propose an approach for MAPF for spatially-extended agents. These find application in real world problems like Convoy Movement Problem, Train Scheduling etc. Our proposed approach, Decentralised Multi Agent Path Finding (DeMAPF), handles MAPF as…
▽ More
Multi Agent Path Finding (MAPF) requires identification of conflict free paths for agents which could be point-sized or with dimensions. In this paper, we propose an approach for MAPF for spatially-extended agents. These find application in real world problems like Convoy Movement Problem, Train Scheduling etc. Our proposed approach, Decentralised Multi Agent Path Finding (DeMAPF), handles MAPF as a sequence of pathplanning and allocation problems which are solved by two sets of agents Travellers and Routers respectively, over multiple iterations. The approach being decentralised allows an agent to solve the problem pertinent to itself, without being aware of other agents in the same set. This allows the agents to be executed on independent machines, thereby leading to scalability to handle large sized problems. We prove, by comparison with other distributed approaches, that the approach leads to a faster convergence to a conflict-free solution, which may be suboptimal, with lesser memory requirement.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Multi Agent Path Finding with Awareness for Spatially Extended Agents
Authors:
Shyni Thomas,
Dipti Deodhare,
M. N. Murty
Abstract:
Path finding problems involve identification of a plan for conflict free movement of agents over a common road network. Most approaches to this problem handle the agents as point objects, wherein the size of the agent is significantly smaller than the road on which it travels. In this paper, we consider spatially extended agents which have a size comparable to the length of the road on which they…
▽ More
Path finding problems involve identification of a plan for conflict free movement of agents over a common road network. Most approaches to this problem handle the agents as point objects, wherein the size of the agent is significantly smaller than the road on which it travels. In this paper, we consider spatially extended agents which have a size comparable to the length of the road on which they travel. An optimal multi agent path finding approach for spatially-extended agents was proposed in the eXtended Conflict Based Search (XCBS) algorithm. As XCBS resolves only a pair of conflicts at a time, it results in deeper search trees in case of cascading or multiple (more than two agent) conflicts at a given location. This issue is addressed in eXtended Conflict Based Search with Awareness (XCBS-A) in which an agent uses awareness of other agents' plans to make its own plan. In this paper, we explore XCBS-A in greater detail, we theoretically prove its completeness and empirically demonstrate its performance with other algorithms in terms of variances in road characteristics, agent characteristics and plan characteristics. We demonstrate the distributive nature of the algorithm by evaluating its performance when distributed over multiple machines. XCBS-A generates a huge search space impacting its efficiency in terms of memory; to address this we propose an approach for memory-efficiency and empirically demonstrate the performance of the algorithm. The nature of XCBS-A is such that it may lead to suboptimal solutions, hence the final contribution of this paper is an enhanced approach, XCBS-Local Awareness (XCBS-LA) which we prove will be optimal and complete.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Robust Hierarchical Graph Classification with Subgraph Attention
Authors:
Sambaran Bandyopadhyay,
Manasvi Aggarwal,
M. Narasimha Murty
Abstract:
Graph neural networks get significant attention for graph representation and classification in machine learning community. Attention mechanism applied on the neighborhood of a node improves the performance of graph neural networks. Typically, it helps to identify a neighbor node which plays more important role to determine the label of the node under consideration. But in real world scenarios, a p…
▽ More
Graph neural networks get significant attention for graph representation and classification in machine learning community. Attention mechanism applied on the neighborhood of a node improves the performance of graph neural networks. Typically, it helps to identify a neighbor node which plays more important role to determine the label of the node under consideration. But in real world scenarios, a particular subset of nodes together, but not the individual pairs in the subset, may be important to determine the label of the graph. To address this problem, we introduce the concept of subgraph attention for graphs. On the other hand, hierarchical graph pooling has been shown to be promising in recent literature. But due to noisy hierarchical structure of real world graphs, not all the hierarchies of a graph play equal role for graph classification. Towards this end, we propose a graph classification algorithm called SubGattPool which jointly learns the subgraph attention and employs two different types of hierarchical attention mechanisms to find the important nodes in a hierarchy and the importance of individual hierarchies in a graph. Experimental evaluation with different types of graph classification algorithms shows that SubGattPool is able to improve the state-of-the-art or remains competitive on multiple publicly available graph classification datasets. We conduct further experiments on both synthetic and real world graph datasets to justify the usefulness of different components of SubGattPool and to show its consistent performance on other downstream tasks.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Integrating Network Embedding and Community Outlier Detection via Multiclass Graph Description
Authors:
Sambaran Bandyopadhyay,
Saley Vishal Vivek,
M. N. Murty
Abstract:
Network (or graph) embedding is the task to map the nodes of a graph to a lower dimensional vector space, such that it preserves the graph properties and facilitates the downstream network mining tasks. Real world networks often come with (community) outlier nodes, which behave differently from the regular nodes of the community. These outlier nodes can affect the embedding of the regular nodes, i…
▽ More
Network (or graph) embedding is the task to map the nodes of a graph to a lower dimensional vector space, such that it preserves the graph properties and facilitates the downstream network mining tasks. Real world networks often come with (community) outlier nodes, which behave differently from the regular nodes of the community. These outlier nodes can affect the embedding of the regular nodes, if not handled carefully. In this paper, we propose a novel unsupervised graph embedding approach (called DMGD) which integrates outlier and community detection with node embedding. We extend the idea of deep support vector data description to the framework of graph embedding when there are multiple communities present in the given network, and an outlier is characterized relative to its community. We also show the theoretical bounds on the number of outliers detected by DMGD. Our formulation boils down to an interesting minimax game between the outliers, community assignments and the node embedding function. We also propose an efficient algorithm to solve this optimization framework. Experimental results on both synthetic and real world networks show the merit of our approach compared to state-of-the-arts.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Unsupervised Graph Representation by Periphery and Hierarchical Information Maximization
Authors:
Sambaran Bandyopadhyay,
Manasvi Aggarwal,
M. Narasimha Murty
Abstract:
Deep representation learning on non-Euclidean data types, such as graphs, has gained significant attention in recent years. Invent of graph neural networks has improved the state-of-the-art for both node and the entire graph representation in a vector space. However, for the entire graph representation, most of the existing graph neural networks are trained on a graph classification loss in a supe…
▽ More
Deep representation learning on non-Euclidean data types, such as graphs, has gained significant attention in recent years. Invent of graph neural networks has improved the state-of-the-art for both node and the entire graph representation in a vector space. However, for the entire graph representation, most of the existing graph neural networks are trained on a graph classification loss in a supervised way. But obtaining labels of a large number of graphs is expensive for real world applications. Thus, we aim to propose an unsupervised graph neural network to generate a vector representation of an entire graph in this paper. For this purpose, we combine the idea of hierarchical graph neural networks and mutual information maximization into a single framework. We also propose and use the concept of periphery representation of a graph and show its usefulness in the proposed algorithm which is referred as GraPHmax. We conduct thorough experiments on several real-world graph datasets and compare the performance of GraPHmax with a diverse set of both supervised and unsupervised baseline algorithms. Experimental results show that we are able to improve the state-of-the-art for multiple graph level tasks on several real-world datasets, while remain competitive on the others.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Line Hypergraph Convolution Network: Applying Graph Convolution for Hypergraphs
Authors:
Sambaran Bandyopadhyay,
Kishalay Das,
M. Narasimha Murty
Abstract:
Network representation learning and node classification in graphs got significant attention due to the invent of different types graph neural networks. Graph convolution network (GCN) is a popular semi-supervised technique which aggregates attributes within the neighborhood of each node. Conventional GCNs can be applied to simple graphs where each edge connects only two nodes. But many modern days…
▽ More
Network representation learning and node classification in graphs got significant attention due to the invent of different types graph neural networks. Graph convolution network (GCN) is a popular semi-supervised technique which aggregates attributes within the neighborhood of each node. Conventional GCNs can be applied to simple graphs where each edge connects only two nodes. But many modern days applications need to model high order relationships in a graph. Hypergraphs are effective data types to handle such complex relationships. In this paper, we propose a novel technique to apply graph convolution on hypergraphs with variable hyperedge sizes. We use the classical concept of line graph of a hypergraph for the first time in the hypergraph learning literature. Then we propose to use graph convolution on the line graph of a hypergraph. Experimental analysis on multiple real world network datasets shows the merit of our approach compared to state-of-the-arts.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Beyond Node Embedding: A Direct Unsupervised Edge Representation Framework for Homogeneous Networks
Authors:
Sambaran Bandyopadhyay,
Anirban Biswas,
M. N. Murty,
Ramasuri Narayanam
Abstract:
Network representation learning has traditionally been used to find lower dimensional vector representations of the nodes in a network. However, there are very important edge driven mining tasks of interest to the classical network analysis community, which have mostly been unexplored in the network embedding space. For applications such as link prediction in homogeneous networks, vector represent…
▽ More
Network representation learning has traditionally been used to find lower dimensional vector representations of the nodes in a network. However, there are very important edge driven mining tasks of interest to the classical network analysis community, which have mostly been unexplored in the network embedding space. For applications such as link prediction in homogeneous networks, vector representation (i.e., embedding) of an edge is derived heuristically just by using simple aggregations of the embeddings of the end vertices of the edge. Clearly, this method of deriving edge embedding is suboptimal and there is a need for a dedicated unsupervised approach for embedding edges by leveraging edge properties of the network.
Towards this end, we propose a novel concept of converting a network to its weighted line graph which is ideally suited to find the embedding of edges of the original network. We further derive a novel algorithm to embed the line graph, by introducing the concept of collective homophily. To the best of our knowledge, this is the first direct unsupervised approach for edge embedding in homogeneous information networks, without relying on the node embeddings. We validate the edge embeddings on three downstream edge mining tasks. Our proposed optimization framework for edge embedding also generates a set of node embeddings, which are not just the aggregation of edges. Further experimental analysis shows the connection of our framework to the concept of node centrality.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Neural Cross-Domain Collaborative Filtering with Shared Entities
Authors:
Vijaikumar M,
Shirish Shevade,
M N Murty
Abstract:
Cross-Domain Collaborative Filtering (CDCF) provides a way to alleviate data sparsity and cold-start problems present in recommendation systems by exploiting the knowledge from related domains. Existing CDCF models are either based on matrix factorization or deep neural networks. Either of the techniques in isolation may result in suboptimal performance for the prediction task. Also, most of the e…
▽ More
Cross-Domain Collaborative Filtering (CDCF) provides a way to alleviate data sparsity and cold-start problems present in recommendation systems by exploiting the knowledge from related domains. Existing CDCF models are either based on matrix factorization or deep neural networks. Either of the techniques in isolation may result in suboptimal performance for the prediction task. Also, most of the existing models face challenges particularly in handling diversity between domains and learning complex non-linear relationships that exist amongst entities (users/items) within and across domains. In this work, we propose an end-to-end neural network model -- NeuCDCF, to address these challenges in a cross-domain setting. More importantly, NeuCDCF follows a wide and deep framework and it learns the representations combinedly from both matrix factorization and deep neural networks. We perform experiments on four real-world datasets and demonstrate that our model performs better than state-of-the-art CDCF models.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Outlier Aware Network Embedding for Attributed Networks
Authors:
Sambaran Bandyopadhyay,
Lokesh N,
M. N. Murty
Abstract:
Attributed network embedding has received much interest from the research community as most of the networks come with some content in each node, which is also known as node attributes. Existing attributed network approaches work well when the network is consistent in structure and attributes, and nodes behave as expected. But real world networks often have anomalous nodes. Typically these outliers…
▽ More
Attributed network embedding has received much interest from the research community as most of the networks come with some content in each node, which is also known as node attributes. Existing attributed network approaches work well when the network is consistent in structure and attributes, and nodes behave as expected. But real world networks often have anomalous nodes. Typically these outliers, being relatively unexplainable, affect the embeddings of other nodes in the network. Thus all the downstream network mining tasks fail miserably in the presence of such outliers. Hence an integrated approach to detect anomalies and reduce their overall effect on the network embedding is required.
Towards this end, we propose an unsupervised outlier aware network embedding algorithm (ONE) for attributed networks, which minimizes the effect of the outlier nodes, and hence generates robust network embeddings. We align and jointly optimize the loss functions coming from structure and attributes of the network. To the best of our knowledge, this is the first generic network embedding approach which incorporates the effect of outliers for an attributed network without any supervision. We experimented on publicly available real networks and manually planted different types of outliers to check the performance of the proposed algorithm. Results demonstrate the superiority of our approach to detect the network outliers compared to the state-of-the-art approaches. We also consider different downstream machine learning applications on networks to show the efficiency of ONE as a generic network embedding technique. The source code is made available at https://github.com/sambaranban/ONE.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
SaC2Vec: Information Network Representation with Structure and Content
Authors:
Sambaran Bandyopadhyay,
Harsh Kara,
Anirban Biswas,
M N Murty
Abstract:
Network representation learning (also known as information network embedding) has been the central piece of research in social and information network analysis for the last couple of years. An information network can be viewed as a linked structure of a set of entities. A set of linked web pages and documents, a set of users in a social network are common examples of information network. Network e…
▽ More
Network representation learning (also known as information network embedding) has been the central piece of research in social and information network analysis for the last couple of years. An information network can be viewed as a linked structure of a set of entities. A set of linked web pages and documents, a set of users in a social network are common examples of information network. Network embedding learns low dimensional representations of the nodes, which can further be used for downstream network mining applications such as community detection or node clustering. Information network representation techniques traditionally use only the link structure of the network. But in real world networks, nodes come with additional content such as textual descriptions or associated images. This content is semantically correlated with the network structure and hence using the content along with the topological structure of the network can facilitate the overall network representation. In this paper, we propose Sac2Vec, a network representation technique that exploits both the structure and content. We convert the network into a multi-layered graph and use random walk and language modeling technique to generate the embedding of the nodes. Our approach is simple and computationally fast, yet able to use the content as a complement to structure and vice-versa. We also generalize the approach for networks having multiple types of content in each node. Experimental evaluations on four real world publicly available datasets show the merit of our approach compared to state-of-the-art algorithms in the domain.
△ Less
Submitted 4 July, 2018; v1 submitted 27 April, 2018;
originally announced April 2018.
-
FSCNMF: Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks
Authors:
Sambaran Bandyopadhyay,
Harsh Kara,
Aswin Kannan,
M N Murty
Abstract:
Analysis and visualization of an information network can be facilitated better using an appropriate embedding of the network. Network embedding learns a compact low-dimensional vector representation for each node of the network, and uses this lower dimensional representation for different network analysis tasks. Only the structure of the network is considered by a majority of the current embedding…
▽ More
Analysis and visualization of an information network can be facilitated better using an appropriate embedding of the network. Network embedding learns a compact low-dimensional vector representation for each node of the network, and uses this lower dimensional representation for different network analysis tasks. Only the structure of the network is considered by a majority of the current embedding algorithms. However, some content is associated with each node, in most of the practical applications, which can help to understand the underlying semantics of the network. It is not straightforward to integrate the content of each node in the current state-of-the-art network embedding methods.
In this paper, we propose a nonnegative matrix factorization based optimization framework, namely FSCNMF which considers both the network structure and the content of the nodes while learning a lower dimensional representation of each node in the network. Our approach systematically regularizes structure based on content and vice versa to exploit the consistency between the structure and content to the best possible extent. We further extend the basic FSCNMF to an advanced method, namely FSCNMF++ to capture the higher order proximities in the network. We conduct experiments on real world information networks for different types of machine learning applications such as node clustering, visualization, and multi-class classification. The results show that our method can represent the network significantly better than the state-of-the-art algorithms and improve the performance across all the applications that we consider.
△ Less
Submitted 4 July, 2018; v1 submitted 15 April, 2018;
originally announced April 2018.
-
A Generic Axiomatic Characterization of Centrality Measures in Social Network
Authors:
Sambaran Bandyopadhyay,
M. Narasimha Murty,
Ramasuri Narayanam
Abstract:
Centrality is an important notion in complex networks; it could be used to characterize how influential a node or an edge is in the network. It plays an important role in several other network analysis tools including community detection. Even though there are a small number of axiomatic frameworks associated with this notion, the existing formalizations are not generic in nature. In this paper we…
▽ More
Centrality is an important notion in complex networks; it could be used to characterize how influential a node or an edge is in the network. It plays an important role in several other network analysis tools including community detection. Even though there are a small number of axiomatic frameworks associated with this notion, the existing formalizations are not generic in nature. In this paper we propose a generic axiomatic framework to capture all the intrinsic properties of a centrality measure (a.k.a. centrality index). We analyze popular centrality measures along with other novel measures of centrality using this framework. We observed that none of the centrality measures considered satisfies all the axioms.
△ Less
Submitted 22 March, 2017;
originally announced March 2017.
-
Finding Influential Institutions in Bibliographic Information Networks
Authors:
Anubhav Gupta,
M. Narasimha Murty
Abstract:
Ranking in bibliographic information networks is a widely studied problem due to its many applications such as advertisement industry, funding, search engines, etc. Most of the existing works on ranking in bibliographic information network are based on ranking of research papers and their authors. But the bibliographic information network can be used for solving other important problems as well. T…
▽ More
Ranking in bibliographic information networks is a widely studied problem due to its many applications such as advertisement industry, funding, search engines, etc. Most of the existing works on ranking in bibliographic information network are based on ranking of research papers and their authors. But the bibliographic information network can be used for solving other important problems as well. The KDD Cup $2016$ competition considers one such problem, which is to measure the impact of research institutions, i.e. to perform ranking of research institutions. The competition took place in three phases. In this paper, we discuss our solutions for ranking institutions in each phase. We participated under team name "anu@TASL" and our solutions achieved the average NDCG@$20$ score of $0.7483$, ranking in eleventh place in the contest.
△ Less
Submitted 27 December, 2016;
originally announced December 2016.
-
"Performance Evaluation of Wi-Fi comparison with WiMAX Networks"
Authors:
M. Sreerama Murty,
A. Veeraiah,
Srinivas Rao
Abstract:
Wireless networking has become an important area of research in academic and industry. The main objectives of this paper is to gain in-depth knowledge about the Wi-Fi- WiMAX technology and how it works and understand the problems about the WiFi- WiMAX technology in maintaining and deployment. The challenges in wireless networks include issues like security, seamless handover, location and emergenc…
▽ More
Wireless networking has become an important area of research in academic and industry. The main objectives of this paper is to gain in-depth knowledge about the Wi-Fi- WiMAX technology and how it works and understand the problems about the WiFi- WiMAX technology in maintaining and deployment. The challenges in wireless networks include issues like security, seamless handover, location and emergency services, cooperation, and QoS.The performance of the WiMAX is better than the Wi-Fi and also it provide the good response in the access. It's evaluated the Quality of Service (Qos) in Wi-Fi compare with WiMAX and provides the various kinds of security Mechanisms. Authentication to verify the identity of the authorized communicating client stations. Confidentiality (Privacy) to secure that the wirelessly conveyed information will remain private and protected. Take necessary actions and configurations that are needed in order to deploy Wi-Fi -WiMAX with increased levels of security and privacy
△ Less
Submitted 13 February, 2012;
originally announced February 2012.
-
Combining Heterogeneous Classifiers for Relational Databases
Authors:
Geetha Manjunatha,
M Narasimha Murty,
Dinkar Sitaram
Abstract:
Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat' form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical me…
▽ More
Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat' form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.
△ Less
Submitted 12 March, 2012; v1 submitted 13 January, 2012;
originally announced January 2012.
-
On Measure Theoretic definitions of Generalized Information Measures and Maximum Entropy Prescriptions
Authors:
Ambedkar Dukkipati,
M Narasimha Murty,
Shalabh Bhatnagar
Abstract:
Though Shannon entropy of a probability measure $P$, defined as $- \int_{X} \frac{\ud P}{\ud μ} \ln \frac{\ud P}{\udμ} \ud μ$ on a measure space $(X, \mathfrak{M},μ)$, does not qualify itself as an information measure (it is not a natural extension of the discrete case), maximum entropy (ME) prescriptions in the measure-theoretic case are consistent with that of discrete case. In this paper, we…
▽ More
Though Shannon entropy of a probability measure $P$, defined as $- \int_{X} \frac{\ud P}{\ud μ} \ln \frac{\ud P}{\udμ} \ud μ$ on a measure space $(X, \mathfrak{M},μ)$, does not qualify itself as an information measure (it is not a natural extension of the discrete case), maximum entropy (ME) prescriptions in the measure-theoretic case are consistent with that of discrete case. In this paper, we study the measure-theoretic definitions of generalized information measures and discuss the ME prescriptions. We present two results in this regard: (i) we prove that, as in the case of classical relative-entropy, the measure-theoretic definitions of generalized relative-entropies, Rényi and Tsallis, are natural extensions of their respective discrete cases, (ii) we show that, ME prescriptions of measure-theoretic Tsallis entropy are consistent with the discrete case.
△ Less
Submitted 18 January, 2006;
originally announced January 2006.
-
Uniqueness of Nonextensive entropy under Renyi's Recipe
Authors:
Ambedkar Dukkipati,
M. Narasimha Murty,
Shalabh Bhatnagar
Abstract:
By replacing linear averaging in Shannon entropy with Kolmogorov-Nagumo average (KN-averages) or quasilinear mean and further imposing the additivity constraint, Rényi proposed the first formal generalization of Shannon entropy. Using this recipe of Rényi, one can prepare only two information measures: Shannon and Rényi entropy. Indeed, using this formalism Rényi characterized these additive ent…
▽ More
By replacing linear averaging in Shannon entropy with Kolmogorov-Nagumo average (KN-averages) or quasilinear mean and further imposing the additivity constraint, Rényi proposed the first formal generalization of Shannon entropy. Using this recipe of Rényi, one can prepare only two information measures: Shannon and Rényi entropy. Indeed, using this formalism Rényi characterized these additive entropies in terms of axioms of quasilinear mean. As additivity is a characteristic property of Shannon entropy, pseudo-additivity of the form $x \oplus_{q} y = x + y + (1-q)x y$ is a characteristic property of nonextensive (or Tsallis) entropy. One can apply Rényi's recipe in the nonextensive case by replacing the linear averaging in Tsallis entropy with KN-averages and thereby imposing the constraint of pseudo-additivity. In this paper we show that nonextensive entropy is unique under the Rényi's recipe, and there by give a characterization.
△ Less
Submitted 21 November, 2005;
originally announced November 2005.
-
Cauchy Annealing Schedule: An Annealing Schedule for Boltzmann Selection Scheme in Evolutionary Algorithms
Authors:
Ambedkar Dukkipati,
M. Narasimha Murty,
Shalabh Bhatnagar
Abstract:
Boltzmann selection is an important selection mechanism in evolutionary algorithms as it has theoretical properties which help in theoretical analysis. However, Boltzmann selection is not used in practice because a good annealing schedule for the `inverse temperature' parameter is lacking. In this paper we propose a Cauchy annealing schedule for Boltzmann selection scheme based on a hypothesis t…
▽ More
Boltzmann selection is an important selection mechanism in evolutionary algorithms as it has theoretical properties which help in theoretical analysis. However, Boltzmann selection is not used in practice because a good annealing schedule for the `inverse temperature' parameter is lacking. In this paper we propose a Cauchy annealing schedule for Boltzmann selection scheme based on a hypothesis that selection-strength should increase as evolutionary process goes on and distance between two selection strengths should decrease for the process to converge. To formalize these aspects, we develop formalism for selection mechanisms using fitness distributions and give an appropriate measure for selection-strength. In this paper, we prove an important result, by which we derive an annealing schedule called Cauchy annealing schedule. We demonstrate the novelty of proposed annealing schedule using simulations in the framework of genetic algorithms.
△ Less
Submitted 24 August, 2004;
originally announced August 2004.
-
Generalized Evolutionary Algorithm based on Tsallis Statistics
Authors:
Ambedkar Dukkipati,
M. Narasimha Murty,
Shalabh Bhatnagar
Abstract:
Generalized evolutionary algorithm based on Tsallis canonical distribution is proposed. The algorithm uses Tsallis generalized canonical distribution to weigh the configurations for `selection' instead of Gibbs-Boltzmann distribution. Our simulation results show that for an appropriate choice of non-extensive index that is offered by Tsallis statistics, evolutionary algorithms based on this gene…
▽ More
Generalized evolutionary algorithm based on Tsallis canonical distribution is proposed. The algorithm uses Tsallis generalized canonical distribution to weigh the configurations for `selection' instead of Gibbs-Boltzmann distribution. Our simulation results show that for an appropriate choice of non-extensive index that is offered by Tsallis statistics, evolutionary algorithms based on this generalization outperform algorithms based on Gibbs-Boltzmann distribution.
△ Less
Submitted 16 July, 2004;
originally announced July 2004.