Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1150402.1150475acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Algorithms for storytelling

Published: 20 August 2006 Publication History

Abstract

We formulate a new data mining problem called it storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CART wheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large datasets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between genesets in a bioinformatics dataset, and relating publications in the PubMed index of abstracts.

References

[1]
C. C. Aggarwal, J. L. Wolf, and P. S. Yu. A New Method for Similarity Indexing of Market Basket Data. In Proc. SIGMOD'99, pages 407--418, 1999.
[2]
L. Getoor. Link Mining: A New Data Mining Challenge. SIGKDD Explorations, Vol. 5(1):pages 84--89, 2003.
[3]
R. Guha, R. Kumar, D. Sivakumar, and R. Sundaram. Unweaving a Web of Documents. In Proc. KDD'05, pages 574--579, 2005.
[4]
A. Kuchinsky, K. Graham, D. Moh, A. Adler, K. Babaria, and M. L. Creech. Biological Storytelling: a Software Tool for Biological Information Organization based upon Narrative Structure. ACM SIGGROUP Bulletin, Vol. 23(2):pages 4--5, Aug 2002.
[5]
M. Meila. Comparing Clusterings by the Variation of Information. In Proc. COLT'03, pages 173--187, 2003.
[6]
A. W. Moore and M. S. Lee. Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets. JAIR, Vol. 8:pages 67--91, 1998.
[7]
A. Nanopoulos and Y. Manolopoulos. Efficient Similarity Search for Market Basket Data. VLDB Journal, Vol. 11(2):pages 138--152, 2002.
[8]
J. Neville and D. Jensen. Supporting Relational Knowledge Discovery: Lessons in Architecture and Algorithm Design. In Proc. Data Mining Lessons Learned Workshop, ICML'02, 2002.
[9]
L. Parida and N. Ramakrishnan. Redescription Mining: Structure Theory and Algorithms. In Proc. AAAI'05, pages 837--844, 2005.
[10]
N. Ramakrishnan, D. Kumar, B. Mishra, M. Potts, and R. F. Helm. Turning CARTwheels: An Alternating Algorithm for Mining Redescriptions. In Proc. KDD'04, pages 266--275, 2004.
[11]
S. Sarawagi and A. Kirpal. Efficient Set Joins on Similarity Predicates. In Proc. SIGMOD'04, pages 743--754, June 2004.
[12]
D. A. Simovici and S. Jaroszewicz. An Axiomatization of Partition Entropy. IEEE Transactions on Information Theory, Vol. 48(7):pages 2138--2142, 2002.
[13]
D. R. Swanson and N. R. Smalheiser. An Interactive System for Finding Complementary Literatures: A Stimulus to Scientific Discovery. Artificial Intelligence, Vol. 91(2):pages 183--203, 1997.

Cited By

View all
  • (2024)A Survey of data-centric technologies supporting decision-making before deploying military assetsDefence Technology10.1016/j.dt.2024.07.012Online publication date: Jul-2024
  • (2018)Interactive Discovery of Coordinated Relationship Chains with Maximum Entropy ModelsACM Transactions on Knowledge Discovery from Data10.1145/304701712:1(1-34)Online publication date: 31-Jan-2018
  • (2018)F2ConText: how to extract holistic contexts of persons of interest for enhancing exploratory analysisKnowledge and Information Systems10.1007/s10115-018-1304-9Online publication date: 17-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. redescription
  3. storytelling

Qualifiers

  • Article

Conference

KDD06

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)6
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of data-centric technologies supporting decision-making before deploying military assetsDefence Technology10.1016/j.dt.2024.07.012Online publication date: Jul-2024
  • (2018)Interactive Discovery of Coordinated Relationship Chains with Maximum Entropy ModelsACM Transactions on Knowledge Discovery from Data10.1145/304701712:1(1-34)Online publication date: 31-Jan-2018
  • (2018)F2ConText: how to extract holistic contexts of persons of interest for enhancing exploratory analysisKnowledge and Information Systems10.1007/s10115-018-1304-9Online publication date: 17-Dec-2018
  • (2015)A metric for Literature-Based Discovery methodology evaluation2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA)10.1109/AICCSA.2015.7507092(1-5)Online publication date: Nov-2015
  • (2014)The human is the loopJournal of Intelligent Information Systems10.1007/s10844-014-0304-943:3(411-435)Online publication date: 1-Dec-2014
  • (2014)Uncovering the plotData Mining and Knowledge Discovery10.1007/s10618-014-0370-128:5-6(1398-1428)Online publication date: 1-Sep-2014
  • (2013)"Metro maps of information" by Dafna Shahaf, Carlos Guestrin and Eric Horvitz, with Ching-man Au Yeung as coordinatorACM SIGWEB Newsletter10.1145/2451836.24518402013:Spring(1-9)Online publication date: 1-Apr-2013
  • (2012)Connecting Two (or Less) DotsACM Transactions on Knowledge Discovery from Data10.1145/2086737.20867445:4(1-31)Online publication date: 1-Feb-2012
  • (2011)Ranking inter-relationships between clustersInternational Journal of Systems Science10.1080/0020772100371064942:12(2071-2083)Online publication date: 1-Dec-2011
  • (2009)Our ContentProceedings of the 2009 Fourth International Conference on Internet and Web Applications and Services10.1109/ICIW.2009.105(655-660)Online publication date: 24-May-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media