Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1935826.1935933acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
poster

Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality

Published: 09 February 2011 Publication History

Abstract

While tuple extraction for a given relation has been an active research area, its dual problem of pattern search-- to find and rank patterns in a principled way-- has not been studied explicitly. In this paper, we propose and address the problem of pattern search, in addition to tuple extraction. As our objectives, we stress reusability for pattern search and scalability of tuple extraction, such that our approach can be applied to very large corpora like the Web. As the key foundation, we propose a conceptual model PRDualRank to capture the notion of precision and recall for both tuples and patterns in a principled way, leading to the "rediscovery" of the Pattern-Relation Duality-- the formal quantification of the reinforcement between patterns and tuples with the metrics of precision and recall. We also develop a concrete framework for PRDualRank, guided by the principles of a perfect sampling process over a complete corpus. Finally, we evaluated our framework over the real Web. Experiments show that on all three target relations our principled approach greatly outperforms the previous state-of-the-art system in both effectiveness and efficiency. In particular, we improved optimal F-score by up to 64%.

References

[1]
G. Agarwal, G. Kabra, and K. C.-C. Chang. Towards rich query interpretation: Walking back and forth for mining query templates. In WWW, pages 1--10, 2010.
[2]
E. Agichtein. Confidence estimation methods for partially supervised information extraction. In SDM, 2006.
[3]
E. Agichtein and L. Gravano. Snowball: extracting relations from large plain-text collections. In ACM DL, pages 85--94, 2000.
[4]
E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. In ICDE, pages 113--124, 2003.
[5]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, pages 2670--2676, 2007.
[6]
S. Brin. Extracting patterns and relations from the World Wide Web. In WebDB, pages 172--183, 1998.
[7]
C. L. A. Clarke, G. V. Cormack, and T. R. Lynam. Exploiting redundancy in question answering. In SIGIR, pages 358--365, 2001.
[8]
S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: is more always better? In SIGIR, pages 291--298, 2002.
[9]
O. Etzioni, M. J. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in KnowItAll: (preliminary results). In WWW, pages 100--110, 2004.
[10]
O. Etzioni, M. J. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Methods for domain-independent information extraction from the Web: An experimental comparison. In AAAI, pages 391--398, 2004.
[11]
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992.
[12]
C. C. T. Kwok, O. Etzioni, and D. S. Weld. Scaling question answering to the web. In WWW, pages 150--161, 2001.
[13]
D. Ravichandran and E. H. Hovy. Learning surface text patterns for a question answering system. In ACL, pages 41--47, 2002.
[14]
S. Sekine. On-demand information extraction. In ACL, 2006.
[15]
Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In HLT-NAACL, 2006.
[16]
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In SIGKDD, pages 1048--1052, 2007.
[17]
M. Zhou, T. Cheng, and K. C.-C. Chang. Data-oriented content query system: searching for data into text on the web. In WSDM, pages 121--130, 2010.

Cited By

View all
  • (2022)Heterogeneous Network Crawling: Reaching Target Nodes by Motif-Guided NavigationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303845834:9(4285-4297)Online publication date: 1-Sep-2022
  • (2022)mg2vec: Learning Relationship-Preserving Heterogeneous Graph Representations via Metagraph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299250034:3(1317-1329)Online publication date: 1-Mar-2022
  • (2021)Multi-Graph Cooperative Learning Towards Distant Supervised Relation ExtractionACM Transactions on Intelligent Systems and Technology10.1145/346656012:5(1-21)Online publication date: 23-Sep-2021
  • Show More Cited By

Index Terms

  1. Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
    February 2011
    870 pages
    ISBN:9781450304931
    DOI:10.1145/1935826
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information extraction
    2. pattern
    3. ranking
    4. relation
    5. web mining

    Qualifiers

    • Poster

    Conference

    Acceptance Rates

    WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Heterogeneous Network Crawling: Reaching Target Nodes by Motif-Guided NavigationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303845834:9(4285-4297)Online publication date: 1-Sep-2022
    • (2022)mg2vec: Learning Relationship-Preserving Heterogeneous Graph Representations via Metagraph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299250034:3(1317-1329)Online publication date: 1-Mar-2022
    • (2021)Multi-Graph Cooperative Learning Towards Distant Supervised Relation ExtractionACM Transactions on Intelligent Systems and Technology10.1145/346656012:5(1-21)Online publication date: 23-Sep-2021
    • (2018)Diagnosing and Minimizing Semantic Drift in Iterative Bootstrapping ExtractionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.278269730:5(852-865)Online publication date: 1-May-2018
    • (2017)Object detection meets knowledge graphsProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172118(1661-1667)Online publication date: 19-Aug-2017
    • (2016)Learning to query: Focused web page harvesting for entity aspects2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498308(1002-1013)Online publication date: May-2016
    • (2016)Semantic proximity search on graphs with metagraph-based learning2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498247(277-288)Online publication date: May-2016
    • (2016)Towards Personal Relation Extraction Based on Sentence Pattern TreeKnowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data10.1007/978-981-10-3168-7_9(92-103)Online publication date: 23-Nov-2016
    • (2015)A Structured Query Model for the Deep Relational WebProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806589(1679-1682)Online publication date: 17-Oct-2015
    • (2015)Ranking Deep Web Text Collections for Scalable Information ExtractionProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806581(153-162)Online publication date: 17-Oct-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media