Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Creating a Scholarly Knowledge Graph from Survey Article Tables

  • Conference paper
  • First Online:
Digital Libraries at Times of Massive Societal Transition (ICADL 2020)

Abstract

Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-consuming and cumbersome. In this work, we present a human-in-the-loop methodology used to build a scholarly knowledge graph leveraging literature survey articles. Survey articles often contain manually curated and high-quality tabular information that summarizes findings published in the scientific literature. Consequently, survey articles are an excellent resource for generating a scholarly knowledge graph. The presented methodology consists of five steps, in which tables and references are extracted from PDF articles, tables are formatted and finally ingested into the knowledge graph. To evaluate the methodology, 92 survey articles, containing 160 survey tables, have been imported in the graph. In total, \(2\,626\) papers have been added to the knowledge graph using the presented methodology. The results demonstrate the feasibility of our approach, but also indicate that manual effort is required and thus underscore the important role of human experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.semanticscholar.org.

  2. 2.

    https://tabula.technology.

  3. 3.

    https://doi.org/10.5281/zenodo.3739427.

  4. 4.

    Digital Object Identifier.

  5. 5.

    File 4_reference_extraction.py from https://doi.org/10.5281/zenodo.3739427.

  6. 6.

    File 5_build_graph.py from https://doi.org/10.5281/zenodo.3739427.

  7. 7.

    https://gitlab.com/TIBHannover/orkg/orkg-papers/-/blob/master/question-answering-import.py.

References

  1. Adelfio, M.D., Samet, H.: Schema extraction for tabular data on the web. Proc. VLDB Endowment 6, 421–432 (2013). https://doi.org/10.14778/2536336.2536343

    Article  Google Scholar 

  2. Corrêa, A.S., Corrêa, P.L.P., Da Silva, F.S.C.: Transparency portals versus open government data. An assessment of openness in Brazilian municipalities. In: ACM International Conference Proceeding Series, pp. 178–185 (2014). https://doi.org/10.1145/2612733.2612760

  3. Corrêa, A.S., Zander, P.O.: Unleashing tabular content to open data: a survey on PDF table extraction methods and tools. In: ACM International Conference Proceeding Series, pp. 54–63 (2017). https://doi.org/10.1145/3085228.3085278

  4. Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25

    Chapter  Google Scholar 

  5. Gall, M.D., Borg, W.R.: Educational Research: An introduction, 6th edn. Longman Publishers USA, White Plains (1996)

    Google Scholar 

  6. Hart, C.: Doing a Literature Review: Releasing the Social Science Research Imagination. Sage, Thousand Oaks (1998)

    Google Scholar 

  7. Hassan, T., Baumgartner, R.: Table recognition and understanding from PDF files. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1143–1147 (2007). https://doi.org/10.1109/ICDAR.2007.4377094

  8. Herrmannova, D., Knoth, P.: An analysis of the microsoft academic graph. D-lib Mag. 22(9/10) (2016). https://doi.org/10.1045/september2016-herrmannova

  9. Hyvönen, E.: Publishing and using cultural heritage linked data on the semantic web. Synth. Lect. Semant. Web Theory Technol. 2(1), 1–159 (2012). https://doi.org/10.2200/S00452ED1V01Y201210WBE003

    Article  Google Scholar 

  10. Jaradeh, M.Y., et al.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture, pp. 243–246 (2019). https://doi.org/10.1145/3360901.3364435

  11. Jiang, D., Yang, X.: Converting PDF to HTML approach based on text detection. In: Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, vol. 403, pp. 982–985 (2009). https://doi.org/10.1145/1655925.1656103

  12. Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014). https://doi.org/10.1007/s00799-014-0115-1

    Article  Google Scholar 

  13. Körner, M., Ghavimi, B., Mayr, P., Hartmann, H., Staab, S.: Evaluating reference string extraction using line-based conditional random fields: a case study with German language publications. In: Kirikova, M., et al. (eds.) ADBIS 2017. CCIS, vol. 767, pp. 137–145. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67162-8_15

    Chapter  Google Scholar 

  14. Krotzsch, M., Vrandecic, D.: Wikidata : a free collaborative knowledge base. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489

    Article  Google Scholar 

  15. Lammey, R.: CrossRef text and data mining services. Insights UKSG J. 28(2), 62–68 (2015). https://doi.org/10.1629/uksg.233

    Article  Google Scholar 

  16. Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web - WWW 2016 Companion (2016). https://doi.org/10.1145/2872518.2889386

  17. Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 385–386 (2013). https://doi.org/10.1145/2467696.2467753

  18. Liu, Y., Bai, K., Mitra, P., Giles, C.L.: TableSeer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 2007 Conference on Digital Libraries - JCDL 2007 (2007). https://doi.org/10.1145/1255175.1255193

  19. Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62

    Chapter  Google Scholar 

  20. Mäkelä, E., Hyvönen, E., Ruotsalo, T.: How to deal with massively heterogeneous cultural heritage data - lessons learned in CultureSampo. Semant. Web 3(1), 85–109 (2012). https://doi.org/10.3233/sw-2012-0049

    Article  Google Scholar 

  21. Mons, B., Velterop, J.: Nano-publication in the e-science era. In: Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), pp. 14–15 (2009)

    Google Scholar 

  22. Oelen, A., Jaradeh, M.Y., Farfar, K.E., Stocker, M., Auer, S.: Comparing research contributions in a scholarly knowledge graph. In: Proceedings of the Third International Workshop on Capturing Scientific Knowledge (SciKnow19), pp. 21–26 (2019)

    Google Scholar 

  23. Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate FAIR literature surveys with scholarly knowledge graphs. In: JCDL 2020: The 20th ACM/IEEE Joint Conference on Digital Libraries (In Press) (2020). https://doi.org/10.1145/3383583.3398520

  24. Oelen, A., Stocker, M., Auer, S.: Dataset for creating a scholarly knowledge graph from survey article tables (2020). https://doi.org/10.5281/ZENODO.3735152

  25. Rastan, R., Paik, H.Y., Shepherd, J.: Texus. In: Proceedings of the 2015 ACM Symposium on Document Engineering - DocEng 2015 (2015). https://doi.org/10.1145/2682571.2797069

  26. Ros, G.: Analysis of tabula : a PDF-Table extraction tool (2019)

    Google Scholar 

  27. Skjæveland, M.G., Lian, E.H., Horrocks, I.: Publishing the norwegian petroleum directorate’s FactPages as semantic web data. In: International Semantic Web Conference, vol. 8219, pp. 162–177 (2013). https://doi.org/10.1007/978-3-642-41338-4_11

  28. Takis, J., Islam, A.S., Lange, C., Auer, S.: Crowdsourced semantic annotation of scientific publications and tabular data in pdf. In: SEMANTICS 2015 Proceedings of the 11th International Conference on Semantic Systems (2015). https://doi.org/10.1145/2814864.2814887

  29. Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.E.: Semantic representation of scientific publications. In: International Conference on Theory and Practice of Digital Libraries, vol. 11799, pp. 375–379 (2019). https://doi.org/10.1007/978-3-030-30760-8_37

  30. Vasileiadis, M., Kaklanis, N., Votis, K., Tzovaras, D.: Extraction of tabular data from document images. In: Proceedings of the 14th Web for All Conference, W4A (2017). https://doi.org/10.1145/3058555.3058581

  31. Verborgh, R., De Wilde, M.: Using OpenRefine. Packt Publishing Ltd., Birmingham (2013)

    Google Scholar 

  32. Webster, J., Watson, R.T.: Analyzing the Past to Prepare for the Future: writing a Literature Review. MIS Q. 26(2), xiii–xxiii (2002)

    Google Scholar 

  33. Wee, B.V., Banister, D.: How to Write a literature review paper? Transp. Rev. 36(2), 278–288 (2016). https://doi.org/10.1080/01441647.2015.1065456

    Article  Google Scholar 

  34. Wilkinson, M.D., et al.: Comment: the FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016). https://doi.org/10.1038/sdata.2016.18

Download references

Acknowledgements

This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology. We want to thank our colleagues Mohamad Yaser Jaradeh and Kheir Eddine Farfar for their contributions to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Allard Oelen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oelen, A., Stocker, M., Auer, S. (2020). Creating a Scholarly Knowledge Graph from Survey Article Tables. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds) Digital Libraries at Times of Massive Societal Transition. ICADL 2020. Lecture Notes in Computer Science(), vol 12504. Springer, Cham. https://doi.org/10.1007/978-3-030-64452-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64452-9_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64451-2

  • Online ISBN: 978-3-030-64452-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics