Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2723372.2735355acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Even Metadata is Getting Big: Annotation Summarization using InsightNotes

Published: 27 May 2015 Publication History

Abstract

In this paper, we demonstrate the InsightNotes system, a summary-based annotation management engine over relational databases. InsightNotes addresses the unique challenges that arise in modern applications (especially scientific applications) that rely on rich and large-scale repositories of curation and annotation information. In these applications, the number and size of the raw annotations may grow beyond what end-users and scientists can comprehend and analyze. InsightNotes overcomes these limitations by integrating mining and summarization techniques with the annotation management engine in novel ways. The objective is to create concise and meaningful representations of the raw annotations, called ''annotation summaries'', to be the basic unit of processing. The core functionalities of InsightNotes include: (1) Extensibility, where domain experts can define the summary types suitable for their application, (2) Incremental Maintenance, where the system efficiently maintains the annotation summaries under the continuous addition of new annotations, (3) Summary-Aware Query Processing and Propagation, where the execution engine and query operators are extended for manipulating and propagating the annotation summaries within the query pipeline under complex transformations, and (4) Zoom-in Query Processing, where end-users can interactively expand specific annotation summaries of interest and retrieve their detailed (raw) annotations. We will demonstrate the InsightNotes's features using a real-world annotated database from the ornithological domain (the science of studying birds). We will design an interactive demonstration that engage the audience in annotating the data, visualizing how annotations are summarized and propagated, and zooming-in when desired to retrieve more details.

References

[1]
eBird Trail Tracker Puts Millions of Eyes on the Sky. https://www.fws.gov/refuges/RefugeUpdate/MayJune_2011/ebirdtrailtracker.html.
[2]
Gene Ontology Consortium. http://geneontology.org.
[3]
Hydrologic Information System CUAHSI-HIS. (http://his.cuahsi.org.
[4]
The Avian Knowledge Network (AKN). http://www.avianknowledge.net/.
[5]
P. Alper, K. Belhajjame, C. Goble, and P. Karagoz. Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotations. In IEEE BigData Congress, pages 318--325, 2013.
[6]
D. Bhagwat, L. Chiticariu, and W. Tan. An annotation management system for relational databases. In VLDB, pages 900--911, 2004.
[7]
S. Bowers and B. Ludäscher. A Calculus for Propagating Semantic Annotations through Scientific Workflow Queries. In In Query Languages and Query Processing (QLQP), 2006.
[8]
P. Buneman, A. Chapman, and J. Cheney. Provenance management in curated databases. In SIGMOD, pages 539--550, 2006.
[9]
P. Buneman, S. Khanna, and W. Tan. Why and where: A characterization of data provenance. Lec. Notes in Comp. Sci., 1973:316--333, 2001.
[10]
P. Buneman, E. V. Kostylev, and S. Vansummeren. Annotations are relative. In Proceedings of the 16th International Conference on Database Theory, ICDT '13, pages 177--188, 2013.
[11]
L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it system for relational databases based on provenance. In SIGMOD, pages 942--944, 2005.
[12]
P. R. Christopher D. Manning and H. Schutze. Book Chapter: Text classification and Naive Bayes, in Introduction to Information Retrieval. In Cambridge University Press, pages 253--287, 2008.
[13]
A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web (WWW), pages 271--280, 2007.
[14]
S. B. Davidson and J. Freire. Provenance and scientific workflows: challenges and opportunities. In SIGMOD, pages 1345--1350, 2008.
[15]
M. Eltabakh, W. Aref, A. Elmagarmid, and M. Ouzzani. Supporting annotations on relations. In EDBT, pages 379--390, 2009.
[16]
M. Eltabakh, M. Ouzzani, and W. Aref. bdbms-database management system for biological data. In CIDR, pages 196--206, 2007.
[17]
M. Y. Eltabakh, W. G. Aref, and A. K. Elmagarmid. A database server for next-generation scientific data management. In ICDE Workshops, pages 313--316, 2010.
[18]
A. Gattani and et. al. Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach. Proc. VLDB Endow., 6(11):1126--1137, 2013.
[19]
W. Gatterbauer, M. Balazinska, N. Khoussainova, and D. Suciu. Believe it or not: adding belief annotations to databases. Proc. VLDB Endow., 2(1):1--12, 2009.
[20]
F. Geerts and et. al. Mondrian: Annotating and querying databases through colors and blocks. In ICDE, pages 82--93, 2006.
[21]
F. Geerts and J. Van Den Bussche. Relational completeness of query languages for annotated databases. In Proceedings of the 11th international conference on Database Programming Languages (DBPL), pages 127--137, 2007.
[22]
K. Ibrahim, D. Xiao, and M. Y. Eltabakh. Elevating Annotation Summaries To First-Class Citizens In InsightNotes. In EDBT Conference, 2015.
[23]
Y.-B. Liu, J.-R. Cai, J. Yin, and A. W. Fu. Clustering Text Data Streams. Journal of Computer Science and Technology, 23(1):112--128, 2008.
[24]
A. Nenkova and K. McKeown. A Survey of Text Summarization Techniques. In Book: Mining Text Data, pages 43--76, 2012.
[25]
G. Palma and et. al. Measuring Relatedness Between Scientific Entities in Annotation Datasets. In International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, pages 367:367--367:376, 2013.
[26]
A. Rae, B. Sigurbjörnsson, and R. van Zwol. Improving tag recommendation using social networks. In Adaptivity, Personalization and Fusion of Heterogeneous Information, RIAO, pages 92--99, 2010.
[27]
Y. L. Simmhan, B. Plale, and D. Gannon. A survey of data provenance in e-science. SIGMOD Record, 34(3):31--36, 2005.
[28]
W.-C. Tan. Containment of relational queries with annotation propagation. In DBPL, 2003.
[29]
D. Tarboton, J. Horsburgh, and D. Maidment. CUAHSI Community Observations Data Model (ODM),Version 1.1, Design Specifications. In Design Document, 2008.
[30]
D. Xiao and M. Y. Eltabakh. InsightNotes: Summary-Based Annotation Management in Relational Databases. In SIGMOD Conference, pages 661--672, 2014.

Cited By

View all
  • (2019)Dataset search: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00564-x29:1(251-272)Online publication date: 24-Aug-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. query processing
  2. scientific annotations
  3. summary-based annotation management

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation (NSF)

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Dataset search: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00564-x29:1(251-272)Online publication date: 24-Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media