Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICDE.2006.83guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Integrating Unstructured Data into Relational Databases

Published: 03 April 2006 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extraction and matching. We show how to extend current highperforming models, Conditional Random Fields and their semi-markov counterparts, to effectively exploit a variety of recognition clues available in a database of entities, thereby significantly reducing the dependence on manually labeled training data. Our system is designed to load unstructured records into columns spread across multiple tables in the database while resolving the relationship of the extracted text with existing column values, and preserving the cardinality and link constraints of the database. We show how to combine the inference algorithms of statistical models with the database imposed constraints for optimal data integration.

    Cited By

    View all
    • (2022)LILLIEInformation Systems10.1016/j.is.2021.101938105:COnline publication date: 1-Mar-2022
    • (2017)CNN-IETSProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132962(1159-1168)Online publication date: 6-Nov-2017
    • (2017)Towards heterogeneous keyword searchProceedings of the ACM Turing 50th Celebration Conference - China10.1145/3063955.3064802(1-6)Online publication date: 12-May-2017
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICDE '06: Proceedings of the 22nd International Conference on Data Engineering
    April 2006
    ISBN:0769525709

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 03 April 2006

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)LILLIEInformation Systems10.1016/j.is.2021.101938105:COnline publication date: 1-Mar-2022
    • (2017)CNN-IETSProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132962(1159-1168)Online publication date: 6-Nov-2017
    • (2017)Towards heterogeneous keyword searchProceedings of the ACM Turing 50th Celebration Conference - China10.1145/3063955.3064802(1-6)Online publication date: 12-May-2017
    • (2013)Exploiting a proximity-based positional model to improve the quality of information extraction by text segmentationProceedings of the Twenty-Fourth Australasian Database Conference - Volume 13710.5555/2525416.2525419(23-31)Online publication date: 29-Jan-2013
    • (2013)The parallel path framework for entity discovery on the webACM Transactions on the Web10.1145/2516633.25166387:3(1-29)Online publication date: 30-Sep-2013
    • (2013)Exploring structure and content on the webProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433499(779-780)Online publication date: 4-Feb-2013
    • (2012)Conceptual views for entity-centric searchComputer Science - Research and Development10.1007/s00450-011-0179-827:1(65-79)Online publication date: 1-Feb-2012
    • (2012)Self-supervised learning approach for extracting citation information on the webProceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications10.1007/978-3-642-29253-8_69(719-726)Online publication date: 11-Apr-2012
    • (2011)Enabling information extraction by inference of regular expressions from sample entitiesProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063763(1285-1294)Online publication date: 24-Oct-2011
    • (2011)Semi-supervised multi-task learning of structured prediction models for web information extractionProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063713(957-966)Online publication date: 24-Oct-2011
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media