Creating a Scholarly Knowledge Graph from Survey Article Tables

Oelen, Allard; Stocker, Markus; Auer, Sören

doi:10.1007/978-3-030-64452-9_35

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12504))

Included in the following conference series:

International Conference on Asian Digital Libraries

1626 Accesses
7 Citations

Abstract

Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-consuming and cumbersome. In this work, we present a human-in-the-loop methodology used to build a scholarly knowledge graph leveraging literature survey articles. Survey articles often contain manually curated and high-quality tabular information that summarizes findings published in the scientific literature. Consequently, survey articles are an excellent resource for generating a scholarly knowledge graph. The presented methodology consists of five steps, in which tables and references are extracted from PDF articles, tables are formatted and finally ingested into the knowledge graph. To evaluate the methodology, 92 survey articles, containing 160 survey tables, have been imported in the graph. In total, $2\,626$ papers have been added to the knowledge graph using the presented methodology. The results demonstrate the feasibility of our approach, but also indicate that manual effort is required and thus underscore the important role of human experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles

Question Answering on Scholarly Knowledge Graphs

Scholarly Knowledge Graph Construction from Published Software Packages

Notes

1.
https://www.semanticscholar.org.
2.
https://tabula.technology.
3.
https://doi.org/10.5281/zenodo.3739427.
4.
Digital Object Identifier.
5.
File 4_reference_extraction.py from https://doi.org/10.5281/zenodo.3739427.
6.
File 5_build_graph.py from https://doi.org/10.5281/zenodo.3739427.
7.
https://gitlab.com/TIBHannover/orkg/orkg-papers/-/blob/master/question-answering-import.py.

References

Adelfio, M.D., Samet, H.: Schema extraction for tabular data on the web. Proc. VLDB Endowment 6, 421–432 (2013). https://doi.org/10.14778/2536336.2536343
Article Google Scholar
Corrêa, A.S., Corrêa, P.L.P., Da Silva, F.S.C.: Transparency portals versus open government data. An assessment of openness in Brazilian municipalities. In: ACM International Conference Proceeding Series, pp. 178–185 (2014). https://doi.org/10.1145/2612733.2612760
Corrêa, A.S., Zander, P.O.: Unleashing tabular content to open data: a survey on PDF table extraction methods and tools. In: ACM International Conference Proceeding Series, pp. 54–63 (2017). https://doi.org/10.1145/3085228.3085278
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25
Chapter Google Scholar
Gall, M.D., Borg, W.R.: Educational Research: An introduction, 6th edn. Longman Publishers USA, White Plains (1996)
Google Scholar
Hart, C.: Doing a Literature Review: Releasing the Social Science Research Imagination. Sage, Thousand Oaks (1998)
Google Scholar
Hassan, T., Baumgartner, R.: Table recognition and understanding from PDF files. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1143–1147 (2007). https://doi.org/10.1109/ICDAR.2007.4377094
Herrmannova, D., Knoth, P.: An analysis of the microsoft academic graph. D-lib Mag. 22(9/10) (2016). https://doi.org/10.1045/september2016-herrmannova
Hyvönen, E.: Publishing and using cultural heritage linked data on the semantic web. Synth. Lect. Semant. Web Theory Technol. 2(1), 1–159 (2012). https://doi.org/10.2200/S00452ED1V01Y201210WBE003
Article Google Scholar
Jaradeh, M.Y., et al.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture, pp. 243–246 (2019). https://doi.org/10.1145/3360901.3364435
Jiang, D., Yang, X.: Converting PDF to HTML approach based on text detection. In: Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, vol. 403, pp. 982–985 (2009). https://doi.org/10.1145/1655925.1656103
Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014). https://doi.org/10.1007/s00799-014-0115-1
Article Google Scholar
Körner, M., Ghavimi, B., Mayr, P., Hartmann, H., Staab, S.: Evaluating reference string extraction using line-based conditional random fields: a case study with German language publications. In: Kirikova, M., et al. (eds.) ADBIS 2017. CCIS, vol. 767, pp. 137–145. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67162-8_15
Chapter Google Scholar
Krotzsch, M., Vrandecic, D.: Wikidata : a free collaborative knowledge base. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489
Article Google Scholar
Lammey, R.: CrossRef text and data mining services. Insights UKSG J. 28(2), 62–68 (2015). https://doi.org/10.1629/uksg.233
Article Google Scholar
Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web - WWW 2016 Companion (2016). https://doi.org/10.1145/2872518.2889386
Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 385–386 (2013). https://doi.org/10.1145/2467696.2467753
Liu, Y., Bai, K., Mitra, P., Giles, C.L.: TableSeer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 2007 Conference on Digital Libraries - JCDL 2007 (2007). https://doi.org/10.1145/1255175.1255193
Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62
Chapter Google Scholar
Mäkelä, E., Hyvönen, E., Ruotsalo, T.: How to deal with massively heterogeneous cultural heritage data - lessons learned in CultureSampo. Semant. Web 3(1), 85–109 (2012). https://doi.org/10.3233/sw-2012-0049
Article Google Scholar
Mons, B., Velterop, J.: Nano-publication in the e-science era. In: Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), pp. 14–15 (2009)
Google Scholar
Oelen, A., Jaradeh, M.Y., Farfar, K.E., Stocker, M., Auer, S.: Comparing research contributions in a scholarly knowledge graph. In: Proceedings of the Third International Workshop on Capturing Scientific Knowledge (SciKnow19), pp. 21–26 (2019)
Google Scholar
Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate FAIR literature surveys with scholarly knowledge graphs. In: JCDL 2020: The 20th ACM/IEEE Joint Conference on Digital Libraries (In Press) (2020). https://doi.org/10.1145/3383583.3398520
Oelen, A., Stocker, M., Auer, S.: Dataset for creating a scholarly knowledge graph from survey article tables (2020). https://doi.org/10.5281/ZENODO.3735152
Rastan, R., Paik, H.Y., Shepherd, J.: Texus. In: Proceedings of the 2015 ACM Symposium on Document Engineering - DocEng 2015 (2015). https://doi.org/10.1145/2682571.2797069
Ros, G.: Analysis of tabula : a PDF-Table extraction tool (2019)
Google Scholar
Skjæveland, M.G., Lian, E.H., Horrocks, I.: Publishing the norwegian petroleum directorate’s FactPages as semantic web data. In: International Semantic Web Conference, vol. 8219, pp. 162–177 (2013). https://doi.org/10.1007/978-3-642-41338-4_11
Takis, J., Islam, A.S., Lange, C., Auer, S.: Crowdsourced semantic annotation of scientific publications and tabular data in pdf. In: SEMANTICS 2015 Proceedings of the 11th International Conference on Semantic Systems (2015). https://doi.org/10.1145/2814864.2814887
Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.E.: Semantic representation of scientific publications. In: International Conference on Theory and Practice of Digital Libraries, vol. 11799, pp. 375–379 (2019). https://doi.org/10.1007/978-3-030-30760-8_37
Vasileiadis, M., Kaklanis, N., Votis, K., Tzovaras, D.: Extraction of tabular data from document images. In: Proceedings of the 14th Web for All Conference, W4A (2017). https://doi.org/10.1145/3058555.3058581
Verborgh, R., De Wilde, M.: Using OpenRefine. Packt Publishing Ltd., Birmingham (2013)
Google Scholar
Webster, J., Watson, R.T.: Analyzing the Past to Prepare for the Future: writing a Literature Review. MIS Q. 26(2), xiii–xxiii (2002)
Google Scholar
Wee, B.V., Banister, D.: How to Write a literature review paper? Transp. Rev. 36(2), 278–288 (2016). https://doi.org/10.1080/01441647.2015.1065456
Article Google Scholar
Wilkinson, M.D., et al.: Comment: the FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016). https://doi.org/10.1038/sdata.2016.18

Download references

Acknowledgements

This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology. We want to thank our colleagues Mohamad Yaser Jaradeh and Kheir Eddine Farfar for their contributions to this work.

Author information

Authors and Affiliations

L3S Research Center, Leibniz University of Hannover, Hannover, Germany
Allard Oelen & Sören Auer
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Allard Oelen, Markus Stocker & Sören Auer

Authors

Allard Oelen
View author publications
You can also search for this author in PubMed Google Scholar
Markus Stocker
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Allard Oelen .

Editor information

Editors and Affiliations

Kyushu University, Fukuoka, Japan
Emi Ishita
National University of Singapore, Singapore, Singapore
Natalie Lee San Pang
Wuhan University, Wuhan, China
Lihong Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oelen, A., Stocker, M., Auer, S. (2020). Creating a Scholarly Knowledge Graph from Survey Article Tables. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds) Digital Libraries at Times of Massive Societal Transition. ICADL 2020. Lecture Notes in Computer Science(), vol 12504. Springer, Cham. https://doi.org/10.1007/978-3-030-64452-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-64452-9_35
Published: 26 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64451-2
Online ISBN: 978-3-030-64452-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics