Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1871437.1871698acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Extracting structured information from Wikipedia articles to populate infoboxes

Published: 26 October 2010 Publication History

Abstract

Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes.
With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values to independently extract value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes.

References

[1]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of the 18th Intl. Conf. on Machine Learning, pages 282--289, 2001.
[2]
D. Lange, C. Böhm, and F. Naumann. Extracting Structured Information from Wikipedia Articles to Populate Infoboxes. Technical Report 38, Hasso Plattner Institute, Potsdam, 2010. ISBN 978-3-86956-081-6.
[3]
N. Okazaki. CRFsuite: a fast implementation of Conditional Random Fields (CRFs), 2007. http://www.chokkan.org/software/crfsuite/.
[4]
S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3), 2008.
[5]
F. Wu, R. Hoffmann, and D. S. Weld. Information Extraction from Wikipedia: Moving Down the Long tail. In Proc. of the 14th Intl. Conf. on Knowledge Discovery and Data Mining, pages 731--739, 2008.
[6]
F. Wu and D. S. Weld. Autonomously Semantifying Wikipedia. In Proc. of the 16th Conf. on Information and Knowledge Management, pages 41--50, 2007.

Cited By

View all
  • (2023)Vocational Domain Identification with Machine Learning and Natural Language Processing on Wikipedia Text: Error Analysis and Class BalancingComputers10.3390/computers1206011112:6(111)Online publication date: 24-May-2023
  • (2023)Descartes: Generating Short Descriptions of Wikipedia ArticlesProceedings of the ACM Web Conference 202310.1145/3543507.3583220(1446-1456)Online publication date: 30-Apr-2023
  • (2022)Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354382622:2(1-27)Online publication date: 8-Jul-2022
  • Show More Cited By

Index Terms

  1. Extracting structured information from Wikipedia articles to populate infoboxes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
    October 2010
    2036 pages
    ISBN:9781450300995
    DOI:10.1145/1871437
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information extraction
    2. linked data
    3. wikipedia

    Qualifiers

    • Poster

    Conference

    CIKM '10

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Vocational Domain Identification with Machine Learning and Natural Language Processing on Wikipedia Text: Error Analysis and Class BalancingComputers10.3390/computers1206011112:6(111)Online publication date: 24-May-2023
    • (2023)Descartes: Generating Short Descriptions of Wikipedia ArticlesProceedings of the ACM Web Conference 202310.1145/3543507.3583220(1446-1456)Online publication date: 30-Apr-2023
    • (2022)Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354382622:2(1-27)Online publication date: 8-Jul-2022
    • (2022)Machine Learning on Wikipedia Text for the Automatic Identification of Vocational Domains of Significance for Displaced Communities2022 17th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP)10.1109/SMAP56125.2022.9941803(1-6)Online publication date: 3-Nov-2022
    • (2021)A Century of French Railways: The Value of Remote Sensing and VGI in the Fusion of Historical DataISPRS International Journal of Geo-Information10.3390/ijgi1003015410:3(154)Online publication date: 10-Mar-2021
    • (2021)WDProp: Web Application to Analyse Multilingual Aspects of Wikidata PropertiesProceedings of the 17th International Symposium on Open Collaboration10.1145/3479986.3479996(1-12)Online publication date: 15-Sep-2021
    • (2021)ShExStatements: Simplifying Shape Expressions for WikidataCompanion Proceedings of the Web Conference 202110.1145/3442442.3452349(610-615)Online publication date: 19-Apr-2021
    • (2021)DeepEx: A Robust Weak Supervision System for Knowledge Base AugmentationJournal on Data Semantics10.1007/s13740-021-00134-xOnline publication date: 6-Jul-2021
    • (2021)When External Knowledge Does Not Aggregate in Named Entity RecognitionIntelligent Systems10.1007/978-3-030-91699-2_42(616-627)Online publication date: 28-Nov-2021
    • (2020)Competitor Mining from Web Encyclopedia: A Graph Embedding ApproachWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62005-9_5(56-68)Online publication date: 18-Oct-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media