Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1772690.1772846acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Entity relation discovery from web tables and links

Published: 26 April 2010 Publication History

Abstract

The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured information that are pervasive on the web, and Web-scale methods that automatically extract web tables have been studied extensively [1]. Many powerful systems (e.g.OCTOPUS [4], Mesa [3]) use extracted web tables as a fundamental component.
In the database vernacular, a table is defined as a set of tuples which have the same attributes. Similarly, a web table is defined as a set of rows (corresponding to database tuples) which have the same column headers (corresponding to database attributes). Therefore, to extract a web table is to extract a relation on the web. In databases, tables often contain foreign keys which refer to other tables. Therefore, it follows that hyperlinks inside a web table sometimes function as foreign keys to other relations whose tuples are contained in the hyperlink's target pages. In this paper, we explore this idea by asking: can we discover new attributes for web tables by exploring hyperlinks inside web tables?
This poster proposes a solution that takes a web table as input. Frequent patterns are generated as new candidate relations by following hyperlinks in the web table. The confidence of candidates are evaluated, and trustworthy candidates are selected to become new attributes for the table. Finally, we show the usefulness of our method by performing experiments on a variety of web domains.

References

[1]
G Miao, J. Tatemura, W.-P Hsiung, A. Sawires and L. E. Moser, Extracting data records from the web using tag path clustering In WWW, p981--990, 2009.
[2]
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu and Y. Zhang, WebTables: exploring the power of tables on the web, In VLDB, p.538--549, 2008.
[3]
S. Mergen, J. Freire and C. Heuser Mesa: A Search Engine for Querying Web Tables, In SBBD, demo, 2008.
[4]
M. J. Cafarella, A. Y. Halevy and N. Khoussainova, Data Integration for the Relational Web, VLDB, p.1090--1101, 2009.
[5]
J. Han and J. Pei, Mining Frequent Patterns by Pattern-Growth: Methodology and Implications, In SIGKDD Exploration, p.13--20, 2000
[6]
A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni and S. Soderland, TextRunner: Open Information Extraction on the Web, In HLT-NAACL, p.25--26, 2007.
[7]
A. Culotta, A. McCallum and J. Betz, Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text, In HLT-NAACL, 2006.

Cited By

View all
  • (2019)Combining URL and HTML Features for Entity Discovery in the WebACM Transactions on the Web10.1145/336557413:4(1-27)Online publication date: 4-Dec-2019
  • (2018)On the Generative Discovery of Structured Medical KnowledgeProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220010(2720-2728)Online publication date: 19-Jul-2018
  • (2016)Streamlining Management of Multiple Cloud Services2016 IEEE 9th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2016.0070(481-488)Online publication date: Jun-2016
  • Show More Cited By

Index Terms

  1. Entity relation discovery from web tables and links

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '10: Proceedings of the 19th international conference on World wide web
    April 2010
    1407 pages
    ISBN:9781605587998
    DOI:10.1145/1772690

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 April 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity relation discovery
    2. link
    3. web table

    Qualifiers

    • Poster

    Conference

    WWW '10
    WWW '10: The 19th International World Wide Web Conference
    April 26 - 30, 2010
    North Carolina, Raleigh, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Combining URL and HTML Features for Entity Discovery in the WebACM Transactions on the Web10.1145/336557413:4(1-27)Online publication date: 4-Dec-2019
    • (2018)On the Generative Discovery of Structured Medical KnowledgeProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220010(2720-2728)Online publication date: 19-Jul-2018
    • (2016)Streamlining Management of Multiple Cloud Services2016 IEEE 9th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2016.0070(481-488)Online publication date: Jun-2016
    • (2014)Automatic Extraction of Logical Web ListsFoundations of Intelligent Systems10.1007/978-3-319-08326-1_37(365-374)Online publication date: 2014
    • (2013)The parallel path framework for entity discovery on the webACM Transactions on the Web10.1145/2516633.25166387:3(1-29)Online publication date: 30-Sep-2013
    • (2012)Building enriched web page representations using link pathsProceedings of the 23rd ACM conference on Hypertext and social media10.1145/2309996.2310006(53-62)Online publication date: 25-Jun-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    EPUB

    View this article in ePub.

    ePub

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media