Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3309129.3309139acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbraConference Proceedingsconference-collections
research-article

Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature

Published: 27 December 2018 Publication History

Abstract

The publication of biological literature increasing year by year. And the important information in biomedical articles may only appear in tables. However, research on information extraction from tables is rare. Nowadays, there are two ways to do table mining. The first way is that researchers convert the document to HTML format, but the performance of conversion is terrible. The second way is that researchers use documents in XML format directly, but the number of XML documents are limited. To solve this problem, we propose Biotable, a tool for mining biological tables in PDF documents. We use the concept of Connected Value to locate the table boundary and locate each cell after converting each page of the PDF into a picture. In the analysis of the table header field, we convert all the heterogeneous table headers into one row. Then we will have better understanding of the semantics of each column. Based on Biotable and the pipeline QTLMiners proposed, we performed a table mining experiment on QTLMiner's dataset. The precision value of the table detection is 98.12% and the recall value of table detection is 93.14%. The recall value of QTL statements is 86.53%.

References

[1]
Li, F., Zhang, M., Fu, G., et al. A neural joint model for entity and relation extraction from biomedical text {J}. Bmc Bioinformatics, 2017, 18(1):198.
[2]
Bastian, M. R., Purwarianti, A. Information extraction in statistics indicator tables using rule generalizations and ontology{C}. International Conference on Information Technology Systems and Innovation. IEEE, 2017:1--6.
[3]
Chavan, M. M., Shirgave, S. K. A Methodology for Extracting Head Contents from Meaningful Tables in Web Pages{C}. International Conference on Communication Systems and Network Technologies. IEEE, 2011:272--277.
[4]
Limaye, G., Sarawagi, S., Chakrabarti, S. Annotating and Searching Web Tables Using Entities, Types and Relationships {J}. Proceedings of the Vldb Endowment, 2010, 3(3):1338--1347.
[5]
Quercini, G., Reynaud, C. Entity discovery and annotation in tables{C}. International Conference on Extending Database Technology. 2013:693--704.
[6]
Peng, J., Shi, X., Sun, Y., et al. QTLMiner: QTL database curation by mining tables in literature {J}. Bioinformatics, 2015, 31(10):1689.
[7]
Fang, J., Mitra, P., Tang, Z., et al. Table header detection and classification{C}. AAAI Conference on Artificial Intelligence. 2012.
[8]
Liu, Y. TableSeer: Automatic Table Extraction, Search, and Understanding.{J}. Proquest Llc, 2009:172.
[9]
Corrêa, A. S., Zander, P. O. Unleashing Tabular Content to Open Data: A Survey on PDF Table Extraction Methods and Tools{C}. International Conference on Digital Government Research. ACM, 2017:54--63.
[10]
Singh, G., Kuzniar, A., van Mulligen, E. et al. QTLTableMiner++: semantic mining of QTL tables in scientific articles {J}. BMC Bioinformatics, 2018, 19: 183.
[11]
Milosevic, N., Gregson, C., Hernandez, R., et al. Disentangling the Structure of Tables in Scientific Literature{C}. International Conference on Applications of Natural Language to Information Systems. Springer, Cham, 2016:162--174.
[12]
Xu, R., Wang, Q. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.{J}. Journal of Biomedical Informatics, 2015, 53:128--135.
[13]
Schwartz, A. S., Hearst, M. A. A simple algorithm for identifying abbreviation definitions in biomedical text. {J}. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, 2003:451.
[14]
Crestan, E., Pantel, P. Web-scale table census and classification{C}. Acm International Conference on Web Search & Data Mining. ACM, 2011:545--554.
[15]
Lehmberg, O., Ritze, D., Meusel, R., et al. A Large Public Corpus of Web Tables containing Time and Context Metadata{C}. International Conference Companion on World Wide Web, 2016:75--76.
[16]
Nishida, K., Sadamitsu, K., Higashinaka, R., et al. Understanding the Semantic Structures of Tables with a Hybrid Deep Neural Network Architecture{C}. AAAI Conference on Artificial Intelligence. 2017.

Cited By

View all
  • (2022)TableGraph: An Image Segmentation–Based Table Knowledge Interpretation Model for Civil and Construction Inspection DocumentationJournal of Construction Engineering and Management10.1061/(ASCE)CO.1943-7862.0002346148:10Online publication date: Oct-2022

Index Terms

  1. Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBRA '18: Proceedings of the 5th International Conference on Bioinformatics Research and Applications
    December 2018
    111 pages
    ISBN:9781450366113
    DOI:10.1145/3309129
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • City University of Hong Kong: City University of Hong Kong

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 December 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. QTL
    2. Soybean
    3. Table extraction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICBRA '18

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)TableGraph: An Image Segmentation–Based Table Knowledge Interpretation Model for Civil and Construction Inspection DocumentationJournal of Construction Engineering and Management10.1061/(ASCE)CO.1943-7862.0002346148:10Online publication date: Oct-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media