research-article

Public Access

Interactive Discovery of Coordinated Relationship Chains with Maximum Entropy Models

Authors:

Naren RamakrishnanAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 12, Issue 1

Article No.: 7, Pages 1 - 34

https://doi.org/10.1145/3047017

Published: 31 January 2018 Publication History

Abstract

Modern visual analytic tools promote human-in-the-loop analysis but are limited in their ability to direct the user toward interesting and promising directions of study. This problem is especially acute when the analysis task is exploratory in nature, e.g., the discovery of potentially coordinated relationships in massive text datasets. Such tasks are very common in domains like intelligence analysis and security forensics where the goal is to uncover surprising coalitions bridging multiple types of relations. We introduce new maximum entropy models to discover surprising chains of relationships leveraging count data about entity occurrences in documents. These models are embedded in a visual analytic system called MERCER (Maximum Entropy Relational Chain ExploRer) that treats relationship bundles as first class objects and directs the user toward promising lines of inquiry. We demonstrate how user input can judiciously direct analysis toward valid conclusions, whereas a purely algorithmic approach could be led astray. Experimental results on both synthetic and real datasets from the intelligence community are presented.

References

[1]

Simon Barkow, Stefan Bleuler, Amela Prelić, Philip Zimmermann, and Eckart Zitzler. 2006. BicAT: A biclustering analysis toolbox. Bioinformatics 22, 10 (2006), 1282--1283.

Digital Library

[2]

Andrea Califano, Gustavo Stolovitzky, and Yuhai Tu. 2000. Analysis of gene expression microarrays for phenotype classification. In Proceedings of the. International Conference on Intelligent Systems for Molecular Biology. 75--85.

Digital Library

[3]

Loïc Cerf, Jérémy Besson, Kim-Ngan T. Nguyen, and Jean-François Boulicaut. 2013. Closed and noise-tolerant patterns in n-ary relations. Data Min. Knowl. Discov. 26, 3 (2013), 574--619.

Digital Library

[4]

Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. 2009. Closed patterns meet N-ary relations. ACM Trans. Knowl. Discov. Data 3, 1, Article 3 (Mar. 2009), 36 p.

Digital Library

[5]

Yizong Cheng and George M. Church. 2000. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. AAAI Press, 93--103.

Digital Library

[6]

I. Csiszar. 1975. -Divergence geometry of probability distributions and minimization problems. Ann. Probab. 3, 1 (1975), 146--158.

[7]

J. N. Darroch and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. Ann. Mathe. Stat. 43, 5 (1972), 1470--1480.

[8]

Warren L. IV Davis, Peter Schwarz, and Evimaria Terzi. 2009. Finding representative association rules from large rule collections. In Proceedings of the SIAM International Conference on Data Mining (SDM’09). 521--532.

[9]

Tijl De Bie. 2011. An information theoretic framework for data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, 564--572.

Digital Library

[10]

Tijl De Bie. 2011. Maximum entropy models and subjective interestingness: An application to tiles in binary databases. Data Min. Knowl. Discov. 23, 3 (2011), 407--446.

Digital Library

[11]

Luc Dehaspe and Hannu Toironen. 2000. Discovery of relational association rules. In Relational Data Mining, Saĕso Dĕzeroski (Ed.). Springer-Verlag, New York, NY, 189--208.

Digital Library

[12]

S. Dzeroski and N. Lavrac. 2001. Relational Data Mining. Springer, Berlin, Germany.

Digital Library

[13]

Patrick Fiaux, Maoyuan Sun, Lauren Bradel, Chris North, Naren Ramakrishnan, and Alex Endert. 2013. Bixplorer: Visual analytics with biclusters. Computer 46, 8 (2013), 90--94.

Digital Library

[14]

Floris Geerts, Bart Goethals, and Taneli Mielikainen. 2004. Tiling databases. In Proceedings of the International Conference on Discovery Science (DS’04). Springer, 278--289.

[15]

Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, and Panayiotis Tsaparas. 2007. Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1, 3 (2007), 167--176.

Digital Library

[16]

Joana P. Gonçalves, Sara C. Madeira, and Arlindo L. Oliveira. 2009. BiGGEsTS: Integrated environment for biclustering analysis of time series gene expression data. BMC Res. Notes 2, 1 (2009), 124.

[17]

Gregory A. Grothaus, Adeel Mufti, and T. M. Murali. 2006. Automatic layout and visualization of biclusters. Algorithms Mol. Biol. 1, 1 (2006), 15.

[18]

Sami Hanhijärvi, Markus Ojala, Niko Vuokko, Kai Puolamäki, Nikolaj Tatti, and Heikki Mannila. 2009. Tell me something I don’t know: Randomization strategies for iterative data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, 379--388.

Digital Library

[19]

Julian Heinrich, Robert Seifert, Michael Burch, and Daniel Weiskopf. 2011. BiCluster viewer: A visualization tool for analyzing gene expression data. In Advances in Visual Computing. Springer, 641--652.

Digital Library

[20]

M. S. Hossain, J. Gresock, Y. Edmonds, R. Helm, M. Potts, and N. Ramakrishnan. 2012. Connecting the dots between abstracts. PLoS One 7, 1 (2012), e29509.

[21]

M. Shahriar Hossain, Patrick Butler, Arnold P. Boedihardjo, and Naren Ramakrishnan. 2012. Storytelling in entity networks to support intelligence analysts. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). ACM, 1375--1383.

Digital Library

[22]

F. Hughes and D. Schum. 2003. Discovery-Proof-Choice, the Art and Science of the Process of Intelligence Analysis-Preparing for the Future of Intelligence Analysis. Joint Military Intelligence College, Washington, DC.

[23]

E. T. Jaynes. 1957. Information theory and statistical mechanics. Phys. Rev. 106, 4 (1957), 620--630.

[24]

Ying Jin, T. M. Murali, and Naren Ramakrishnan. 2008. Compositional mining of multirelational biological datasets. ACM Trans. Knowl. Discov. Data 2, 1, (Apr. 2008), Article 2, 35 pages.

Digital Library

[25]

Youn-ah Kang, C. Gorg, and John Stasko. 2009. Evaluating visual analytics systems for investigative analysis: Deriving design principles from a case study. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST’09). IEEE, 139--146.

[26]

Misha Kapushesky, Patrick Kemmeren, Aedín C. Culhane, Steffen Durinck, Jan Ihmels, Christine Körner, Meelis Kull, Aurora Torrente, Ugis Sarkans, Jaak Vilo, and others. 2004. Expression profiler: Next generation-an online platform for analysis of microarray data. Nucleic Acids Res. 32, Suppl. 2 (2004), W465--W470.

[27]

Jerry Kiernan and Evimaria Terzi. 2008. Constructing comprehensive summaries of large event sequences. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 417--425.

Digital Library

[28]

K. Kontonasios, J. Vreeken, and T. De Bie. 2011. Maximum entropy modeling for assessing results on real-valued data. In Proceedings of the International Conference on Data Mining (ICDM’11). 350--359.

Digital Library

[29]

Kleanthis-Nikolaos Kontonasios and Tijl DeBie. 2012. Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In Proceedings of the International Conference on Intelligent Data Analysis (IDA’12). Springer-Verlag, 161--171.

Digital Library

[30]

Kleanthis-Nikolaos Kontonasios, Jilles Vreeken, and Tijl De Bie. 2013. Maximum entropy models for iteratively identifying subjectively interesting structure in real-valued data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’13). Springer, 256--271.

[31]

Deept Kumar, Naren Ramakrishnan, Richard F. Helm, and Malcolm Potts. 2006. Algorithms for storytelling. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 604--610.

Digital Library

[32]

A. Lambert, R. Bourqui, and D. Auber. 2010. Winding roads: Routing edges into bundles. Comput. Graph. Forum 29, 3 (Aug. 2010), 853--862.

Digital Library

[33]

N. Lavrac and P. A. Flach. 2001. An extended transformation approach to inductive logic programming. ACM Trans. Comput. Logic 2, 4 (Oct. 2001), 458--494.

Digital Library

[34]

Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1, 1 (Jan. 2004), 24--45.

Digital Library

[35]

Michael Mampaey, Jilles Vreeken, and Nikolaj Tatti. 2012. Summarizing data succinctly with the most informative itemsets. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 1--44.

Digital Library

[36]

Markus Ojala, Gemma C. Garriga, Aristides Gionis, and Heikki Mannila. 2010. Evaluating query result significance in databases via randomizations. In Proceedings of the SIAM International Conference on Data Mining (SDM’10). 906--917.

[37]

G. Rasch. 1960. Probabilistic Models for Some Intelligence and Attainnment Tests. Danmarks paedagogiske Institut.

[38]

Rodrigo Santamaría, Roberto Therón, and Luis Quintales. 2014. BicOverlapper 2.0: Visual analysis for gene expression. Bioinformatics 30, 12 (2014), 1785--6.

[39]

Eran Segal, Ben Taskar, Audrey Gasch, Nir Friedman, and Daphne Koller. 2001. Rich probabilistic models for gene expression. Bioinformatics 17, Suppl. 1 (2001), S243--S252.

[40]

Dafna Shahaf and Carlos Guestrin. 2012. Connecting two (or less) dots: Discovering structure in news articles. ACM Trans. Knowl. Discov. Data 5, 4 (Feb. 2012), Article 24, 31 pages.

Digital Library

[41]

Amnon Shashua. 2008. Introduction to Machine Learning: Class Notes 67577. Retrieved October 2015 from http://arxiv.org/pdf/0904.3664.pdf.

[42]

Qizheng Sheng, Yves Moreau, and Bart De Moor. 2003. Biclustering microarray data by Gibbs sampling. Bioinformatics 19, Suppl. 2 (2003), ii196--ii205.

[43]

Eirini Spyropoulou and Tijl De Bie. 2011. Interesting multi-relational patterns. In Proceedings of the International Conference on Data Mining (ICDM’11). 675--684.

Digital Library

[44]

Eirini Spyropoulou and Tijl De Bie. 2014. Mining approximate multi-relational patterns. In Proceedings of the International Conference on Data Science and Advanced Analytics (DSAA’14). 477--483.

[45]

Eirini Spyropoulou, Tijl De Bie, and Mario Boley. 2013. Mining interesting patterns in multi-relational data with N-ary relationships. In Proceedings of the International Conference on Discovery Science, Lecture Notes in Computer Science, vol. 8140. Springer-Verlag, Berlin, 217--232.

[46]

Eirini Spyropoulou, Tijl De Bie, and Mario Boley. 2014. Interesting pattern mining in multi-relational data. Data Min. Knowl. Discov. 28, 3 (2014), 808--849.

Digital Library

[47]

John Stasko, Carsten Görg, and Zhicheng Liu. 2008. Jigsaw: Supporting investigative analysis through interactive visualization. Inform. Visual. 7, 2 (2008), 118--132.

Digital Library

[48]

Marc Streit, Samuel Gratzl, Michael Gillhofer, Andreas Mayr, Andreas Mitterecker, and Sepp Hochreiter. 2014. Furby: Fuzzy force-directed bicluster visualization. BMC Bioinform. 15, Suppl. 6 (2014), S4.

[49]

Maoyuan Sun. 2016. Visual Analytics with Biclusters: Exploring Coordinated Relationships in Context. Ph.D. Dissertation. Virginia Tech.

[50]

Maoyuan Sun, Lauren Bradel, Chris L. North, and Naren Ramakrishnan. 2014. The role of interactive biclusters in sensemaking. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, 1559--1562.

Digital Library

[51]

Maoyuan Sun, Peng Mi, Chris North, and Naren Ramakrishnan. 2016. BiSet: Semantic edge bundling with biclusters for sensemaking. IEEE Trans. Visual. Comput. Graph. 22, 1 (2016), 310--319.

Digital Library

[52]

Maoyuan Sun, Peng Mi, Hao Wu, Chris North, and Naren Ramakrishnan. 2016. Usability challenges underlying bicluster interaction for sensemaking. In Proceedings of the ACM CHI workshop on Human Centered Machine Learning.

[53]

Maoyuan Sun, Chris North, and Naren Ramakrishnan. 2014. A five-level design framework for bicluster visualizations. IEEE Trans. Visual. Comput. Graph. 20, 12 (2014), 1713--1722.

[54]

Nikolaj Tatti. 2006. Computational complexity of queries based on itemsets. Inf. Process. Lett. 98, 5 (2006), 183--187.

Digital Library

[55]

Nikolaj Tatti and Jilles Vreeken. 2012. Comparing apples and oranges -- Measuring differences between exploratory data mining results. Data Min. Knowl. Discov. 25, 2 (2012), 173--207.

Digital Library

[56]

Lisa C. Thomas and Christopher D. Wickens. 2001. Visual displays and cognitive tunneling: Frames of reference effects on spatial judgments and change detection. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 45. SAGE Publications, 336--340.

[57]

Robert Tibshirani, Trevor Hastie, Mike Eisen, Doug Ross, David Botstein, and Pat Brown. 1999. Clustering Methods for the Analysis of DNA Microarray Data. Technical Report. Stanford University.

[58]

Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. 2004. An efficient algorithm for enumerating closed patterns in transaction databases. In Proceedings of the International Conference on Discovery Science. Springer, 16--31.

[59]

Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. 2005. LCM Ver.3: Collaboration of array, bitmap and prefix tree for frequent itemset mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM’05). ACM, New York, NY, 77--86.

Digital Library

[60]

Chao Wang and Srinivasan Parthasarathy. 2006. Summarizing itemset patterns using probabilistic models. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 730--735.

Digital Library

[61]

Hao Wu, Jilles Vreeken, Nikolaj Tatti, and Naren Ramakrishnan. 2014. Uncovering the plot: Detecting surprising coalitions of entities in multi-relational schemas. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’14). Springer.

Digital Library

[62]

Han-Ming Wu, Yin-Jing Tien, and Chun-houh Chen. 2010. GAP: A graphical environment for matrix visualization and cluster analysis. Comput. Stat. Data Anal. 54, 3 (2010), 767--778.

Digital Library

[63]

M. J. Zaki and C.-J. Hsiao. 2005. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Know. Data Eng. 17, 4 (2005), 462--478.

Digital Library

[64]

Mohammed J. Zaki and Naren Ramakrishnan. 2005. Reasoning about sets using redescription mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05). ACM, 364--373.

Digital Library

Cited By

Kale BSun MPapka M(2023)The State of the Art in Visualizing Dynamic Multivariate NetworksComputer Graphics Forum10.1111/cgf.1485642:3(471-490)Online publication date: 27-Jun-2023
https://doi.org/10.1111/cgf.14856
Kale BClyde ASun MRamanathan AStevens RPapka M(2023)ChemoGraph: Interactive Visual Exploration of the Chemical SpaceComputer Graphics Forum10.1111/cgf.1480742:3(13-24)Online publication date: 27-Jun-2023
https://doi.org/10.1111/cgf.14807
Sun MNamburi AKoop DZhao JLi TChung H(2022)Towards Systematic Design Considerations for Visualizing Cross-View Data RelationshipsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.310296628:12(4741-4756)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TVCG.2021.3102966
Show More Cited By

Index Terms

Interactive Discovery of Coordinated Relationship Chains with Maximum Entropy Models

Recommendations

Interactive visualization for knowledge discovery
Discovery of maximum length frequent itemsets

The use of frequent itemsets has been limited by the high computational cost as well as the large number of resulting itemsets. In many real-world scenarios, however, it is often sufficient to mine a small representative subset of frequent itemsets with ...
Maximum Entropy Based Associative Regression for Sparse Datasets
WI-IAT '14: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 02

We propose a supervised learning technique defining significant frequent patterns for associative regression. Assuming frequent patterns quantify correlations in dataset, we constrain the Generalized Iterative Scaling (GIS) convergence algorithm for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 12, Issue 1

Special Issue (IDEA) and Regular Papers

February 2018

363 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3178542

Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
University of Louisiana at Lafayette, USA

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2018

Accepted: 01 January 2017

Revised: 01 January 2017

Received: 01 December 2015

Published in TKDD Volume 12, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
438
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)15

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kale BSun MPapka M(2023)The State of the Art in Visualizing Dynamic Multivariate NetworksComputer Graphics Forum10.1111/cgf.1485642:3(471-490)Online publication date: 27-Jun-2023
https://doi.org/10.1111/cgf.14856
Kale BClyde ASun MRamanathan AStevens RPapka M(2023)ChemoGraph: Interactive Visual Exploration of the Chemical SpaceComputer Graphics Forum10.1111/cgf.1480742:3(13-24)Online publication date: 27-Jun-2023
https://doi.org/10.1111/cgf.14807
Sun MNamburi AKoop DZhao JLi TChung H(2022)Towards Systematic Design Considerations for Visualizing Cross-View Data RelationshipsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.310296628:12(4741-4756)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TVCG.2021.3102966
Zhao JSun MChen FChiu P(2022)Understanding Missing Links in Bipartite Networks With MissBiNIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.303298428:6(2457-2469)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TVCG.2020.3032984
Yu YWang WShao MWu NSun YSun YTian Q(2022)Multi-users interaction anomalous subgraph detection for event miningNeurocomputing10.1016/j.neucom.2022.08.072509(34-45)Online publication date: Oct-2022
https://doi.org/10.1016/j.neucom.2022.08.072
Yu YWang WWu NLiu HShao M(2022)IISD: Integrated interaction subgraph detection for event miningKnowledge-Based Systems10.1016/j.knosys.2021.108080(108080)Online publication date: Jan-2022
https://doi.org/10.1016/j.knosys.2021.108080
Sun MShaikh AAlhoori HZhao J(2021)SightBi: Exploring Cross-View Data Relationships with BiclustersIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311480128:1(54-64)Online publication date: 24-Dec-2021
https://dl.acm.org/doi/10.1109/TVCG.2021.3114801
Sun MKoop DZhao JNorth CRamakrishnan N(2019)Interactive Bicluster Aggregation in Bipartite Graphs2019 IEEE Visualization Conference (VIS)10.1109/VISUAL.2019.8933546(246-250)Online publication date: Oct-2019
https://doi.org/10.1109/VISUAL.2019.8933546
Adriaens FLijffijt JBie T(2019)Subjectively interesting connecting trees and forestsData Mining and Knowledge Discovery10.1007/s10618-019-00627-133:4(1088-1124)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s10618-019-00627-1

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents