Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2467696.2467753acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
poster

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Published: 22 July 2013 Publication History

Abstract

This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted.

References

[1]
Aumueller, D. 2009. Retrieving metadata for your local scholarly papers.
[2]
Beel, J., Gipp, B., Langer, S., Genzmehr, M., Wilde, E., Nürnberger, A. and Pitman, J. 2011. Introducing Mr. DLib, a Machine-readable Digital Library. JCDL'11.
[3]
Beel, J., Gipp, B., Shaker, A. and Friedrich, N. 2010. SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size). ECDL'10.
[4]
Beel, J., Langer, S., Genzmehr, M. and Müller, C. 2013. Docears PDF Inspector: Title Extraction from PDF files. JCDL'13.
[5]
Granitzer, M., Hristakeva, M., Knight, R. and Jack, K. 2012. A Comparison of Metadata Extraction Techniques for Crowdsourced Bibliographic Metadata Management. SAC'12.
[6]
JISC ConnectedWorks Project 2010. Research on existing PDF processors. University of Cambridge.

Cited By

View all
  • (2024)Text Processing and Analysis Pipeline for Scientific Literature2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)10.1109/ACCAI61061.2024.10602138(1-5)Online publication date: 9-May-2024
  • (2024)Comparing free reference extraction pipelinesInternational Journal on Digital Libraries10.1007/s00799-024-00404-625:4(841-853)Online publication date: 1-Dec-2024
  • (2023)Ekstrakcija metapodatkov s pomočjo strojnega učenjaModerna arhivistika10.54356/MA/2023/VRNY76652023 (6):2(255-269)Online publication date: 17-Dec-2023
  • Show More Cited By

Index Terms

  1. Evaluation of header metadata extraction approaches and tools for scientific PDF documents

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
    July 2013
    480 pages
    ISBN:9781450320771
    DOI:10.1145/2467696
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2013

    Check for updates

    Author Tags

    1. PDF
    2. evaluation
    3. information retrieval
    4. metadata extraction

    Qualifiers

    • Poster

    Conference

    JCDL '13
    Sponsor:
    JCDL '13: 13th ACM/IEEE-CS Joint Conference on Digital Libraries
    July 22 - 26, 2013
    Indiana, Indianapolis, USA

    Acceptance Rates

    JCDL '13 Paper Acceptance Rate 28 of 95 submissions, 29%;
    Overall Acceptance Rate 415 of 1,482 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Text Processing and Analysis Pipeline for Scientific Literature2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)10.1109/ACCAI61061.2024.10602138(1-5)Online publication date: 9-May-2024
    • (2024)Comparing free reference extraction pipelinesInternational Journal on Digital Libraries10.1007/s00799-024-00404-625:4(841-853)Online publication date: 1-Dec-2024
    • (2023)Ekstrakcija metapodatkov s pomočjo strojnega učenjaModerna arhivistika10.54356/MA/2023/VRNY76652023 (6):2(255-269)Online publication date: 17-Dec-2023
    • (2023)Fusion of blockchain and IoT in scientific publishingFuture Generation Computer Systems10.1016/j.future.2022.12.036142:C(248-275)Online publication date: 1-May-2023
    • (2023)A Benchmark of PDF Information Extraction Tools Using a Multi-task and Multi-domain Evaluation Framework for Academic DocumentsInformation for a Better World: Normality, Virtuality, Physicality, Inclusivity10.1007/978-3-031-28032-0_31(383-405)Online publication date: 13-Mar-2023
    • (2022)Vision and natural language for metadata extraction from scientific PDF documentsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533295(1-5)Online publication date: 20-Jun-2022
    • (2022)Nirjas: An open source framework for extracting metadata from the source code2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)10.1109/Confluence52989.2022.9734222(47-52)Online publication date: 27-Jan-2022
    • (2021)Environmental Hydraulics in the New Millennium: Historical Evolution and Recent Research TrendsWater10.3390/w1308102113:8(1021)Online publication date: 8-Apr-2021
    • (2021)Dijital Kütüphanelerde Dokümanlardan Bilgi Geri Kazanımı için Kullanılan Güncel Teknolojiler: Derleme ÇalışmasıCurrent Technologies for Information Retrieval of Documents in Digital Libraries: A SurveyDüzce Üniversitesi Bilim ve Teknoloji Dergisi10.29130/dubited.7969649:1(79-91)Online publication date: 31-Jan-2021
    • (2021)MexPub: Deep Transfer Learning for Metadata Extraction from German Publications2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL52503.2021.00076(250-253)Online publication date: Sep-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media