Collective Graph Identification

Published: 29 January 2016 Publication History


Data describing networks—such as communication networks, transaction networks, disease transmission networks, collaboration networks, etc.—are becoming increasingly available. While observational data can be useful, it often only hints at the actual underlying process that governs interactions and attributes. For example, an email communication network provides insight into its users and their relationships, but is not the same as the “real” underlying social network. In this article, we introduce the problem of graph identification, i.e., discovering the latent graph structure underlying an observed network. We cast the problem as a probabilistic inference task, in which we must infer the nodes, edges, and node labels of a hidden graph, based on evidence. This entails solving several canonical problems in network analysis: entity resolution (determining when two observations correspond to the same entity), link prediction (inferring the existence of links), and node labeling (inferring hidden attributes). While each of these subproblems has been well studied in isolation, here we consider them as a single, collective task. We present a simple, yet novel, approach to address all three subproblems simultaneously. Our approach, which we refer to as C3, consists of a collection of Coupled Collective Classifiers that are applied iteratively to propagate inferred information among the subproblems. We consider variants of C3 using different learning and inference techniques and empirically demonstrate that C3 is superior, both in terms of predictive accuracy and running time, to state-of-the-art probabilistic approaches on four real problems.


  • (2023)Sparse structure learning for consensus network systems via sum-of-absolute-values regularization2023 62nd Annual Conference of the Society of Instrument and Control Engineers (SICE)10.23919/SICE59929.2023.10354244(478-483)Online publication date: 6-Sep-2023
  • (2021)Research on Knowledge Graphs with Concept Lattice ConstraintsSymmetry10.3390/sym1312236313:12(2363)Online publication date: 8-Dec-2021
  • (2021)The Impact of the Collective Members’ Independence on the Prediction AccuracyIEEE Access10.1109/ACCESS.2021.31028509(111975-111984)Online publication date: 2021
    Published In

    ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 3
    February 2016
    358 pages
    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 January 2016
    Accepted: 01 August 2015
    Revised: 01 August 2015
    Received: 01 May 2014
    Published in TKDD Volume 10, Issue 3


    Author Tags

    1. Entity resolution
    2. collective classification
    3. link prediction
    4. semi-supervised learning


    Funding Sources

    • National Science Foundation


    • (2023)Sparse structure learning for consensus network systems via sum-of-absolute-values regularization2023 62nd Annual Conference of the Society of Instrument and Control Engineers (SICE)10.23919/SICE59929.2023.10354244(478-483)Online publication date: 6-Sep-2023
    • (2021)Research on Knowledge Graphs with Concept Lattice ConstraintsSymmetry10.3390/sym1312236313:12(2363)Online publication date: 8-Dec-2021
    • (2021)The Impact of the Collective Members’ Independence on the Prediction AccuracyIEEE Access10.1109/ACCESS.2021.31028509(111975-111984)Online publication date: 2021
    • (2021)Bayesian inference of network structure from unreliable dataJournal of Complex Networks10.1093/comnet/cnaa0468:6Online publication date: 7-Mar-2021
    • (2021)Social Networks as Platforms for Enhancing Collective IntelligenceCybernetics and Systems10.1080/01969722.2021.201854453:5(425-442)Online publication date: 31-Dec-2021
    • (2019)Scalable Explanation of Inferences on Large Graphs2019 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2019.00111(982-987)Online publication date: Nov-2019
    • (2018)GyrosFingerACM Transactions on Privacy and Security10.1145/317775121:2(1-25)Online publication date: 5-Feb-2018
    • (2018)Edge-Oriented Computing ParadigmsACM Computing Surveys10.1145/315481551:2(1-34)Online publication date: 17-Apr-2018
    • (2018)Network Structure Inference, A SurveyACM Computing Surveys10.1145/315452451:2(1-39)Online publication date: 17-Apr-2018
    • (2017)Lexicographic ranking supermartingales: an efficient approach to termination of probabilistic programsProceedings of the ACM on Programming Languages10.1145/31581222:POPL(1-32)Online publication date: 27-Dec-2017
