Abstract
Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-consuming and cumbersome. In this work, we present a human-in-the-loop methodology used to build a scholarly knowledge graph leveraging literature survey articles. Survey articles often contain manually curated and high-quality tabular information that summarizes findings published in the scientific literature. Consequently, survey articles are an excellent resource for generating a scholarly knowledge graph. The presented methodology consists of five steps, in which tables and references are extracted from PDF articles, tables are formatted and finally ingested into the knowledge graph. To evaluate the methodology, 92 survey articles, containing 160 survey tables, have been imported in the graph. In total, \(2\,626\) papers have been added to the knowledge graph using the presented methodology. The results demonstrate the feasibility of our approach, but also indicate that manual effort is required and thus underscore the important role of human experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Digital Object Identifier.
- 5.
File 4_reference_extraction.py from https://doi.org/10.5281/zenodo.3739427.
- 6.
File 5_build_graph.py from https://doi.org/10.5281/zenodo.3739427.
- 7.
References
Adelfio, M.D., Samet, H.: Schema extraction for tabular data on the web. Proc. VLDB Endowment 6, 421–432 (2013). https://doi.org/10.14778/2536336.2536343
Corrêa, A.S., Corrêa, P.L.P., Da Silva, F.S.C.: Transparency portals versus open government data. An assessment of openness in Brazilian municipalities. In: ACM International Conference Proceeding Series, pp. 178–185 (2014). https://doi.org/10.1145/2612733.2612760
Corrêa, A.S., Zander, P.O.: Unleashing tabular content to open data: a survey on PDF table extraction methods and tools. In: ACM International Conference Proceeding Series, pp. 54–63 (2017). https://doi.org/10.1145/3085228.3085278
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25
Gall, M.D., Borg, W.R.: Educational Research: An introduction, 6th edn. Longman Publishers USA, White Plains (1996)
Hart, C.: Doing a Literature Review: Releasing the Social Science Research Imagination. Sage, Thousand Oaks (1998)
Hassan, T., Baumgartner, R.: Table recognition and understanding from PDF files. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1143–1147 (2007). https://doi.org/10.1109/ICDAR.2007.4377094
Herrmannova, D., Knoth, P.: An analysis of the microsoft academic graph. D-lib Mag. 22(9/10) (2016). https://doi.org/10.1045/september2016-herrmannova
Hyvönen, E.: Publishing and using cultural heritage linked data on the semantic web. Synth. Lect. Semant. Web Theory Technol. 2(1), 1–159 (2012). https://doi.org/10.2200/S00452ED1V01Y201210WBE003
Jaradeh, M.Y., et al.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture, pp. 243–246 (2019). https://doi.org/10.1145/3360901.3364435
Jiang, D., Yang, X.: Converting PDF to HTML approach based on text detection. In: Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, vol. 403, pp. 982–985 (2009). https://doi.org/10.1145/1655925.1656103
Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014). https://doi.org/10.1007/s00799-014-0115-1
Körner, M., Ghavimi, B., Mayr, P., Hartmann, H., Staab, S.: Evaluating reference string extraction using line-based conditional random fields: a case study with German language publications. In: Kirikova, M., et al. (eds.) ADBIS 2017. CCIS, vol. 767, pp. 137–145. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67162-8_15
Krotzsch, M., Vrandecic, D.: Wikidata : a free collaborative knowledge base. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489
Lammey, R.: CrossRef text and data mining services. Insights UKSG J. 28(2), 62–68 (2015). https://doi.org/10.1629/uksg.233
Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web - WWW 2016 Companion (2016). https://doi.org/10.1145/2872518.2889386
Lipinski, M., Yao, K., Breitinger, C., Beel, J., Gipp, B.: Evaluation of header metadata extraction approaches and tools for scientific PDF documents. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 385–386 (2013). https://doi.org/10.1145/2467696.2467753
Liu, Y., Bai, K., Mitra, P., Giles, C.L.: TableSeer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 2007 Conference on Digital Libraries - JCDL 2007 (2007). https://doi.org/10.1145/1255175.1255193
Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62
Mäkelä, E., Hyvönen, E., Ruotsalo, T.: How to deal with massively heterogeneous cultural heritage data - lessons learned in CultureSampo. Semant. Web 3(1), 85–109 (2012). https://doi.org/10.3233/sw-2012-0049
Mons, B., Velterop, J.: Nano-publication in the e-science era. In: Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), pp. 14–15 (2009)
Oelen, A., Jaradeh, M.Y., Farfar, K.E., Stocker, M., Auer, S.: Comparing research contributions in a scholarly knowledge graph. In: Proceedings of the Third International Workshop on Capturing Scientific Knowledge (SciKnow19), pp. 21–26 (2019)
Oelen, A., Jaradeh, M.Y., Stocker, M., Auer, S.: Generate FAIR literature surveys with scholarly knowledge graphs. In: JCDL 2020: The 20th ACM/IEEE Joint Conference on Digital Libraries (In Press) (2020). https://doi.org/10.1145/3383583.3398520
Oelen, A., Stocker, M., Auer, S.: Dataset for creating a scholarly knowledge graph from survey article tables (2020). https://doi.org/10.5281/ZENODO.3735152
Rastan, R., Paik, H.Y., Shepherd, J.: Texus. In: Proceedings of the 2015 ACM Symposium on Document Engineering - DocEng 2015 (2015). https://doi.org/10.1145/2682571.2797069
Ros, G.: Analysis of tabula : a PDF-Table extraction tool (2019)
Skjæveland, M.G., Lian, E.H., Horrocks, I.: Publishing the norwegian petroleum directorate’s FactPages as semantic web data. In: International Semantic Web Conference, vol. 8219, pp. 162–177 (2013). https://doi.org/10.1007/978-3-642-41338-4_11
Takis, J., Islam, A.S., Lange, C., Auer, S.: Crowdsourced semantic annotation of scientific publications and tabular data in pdf. In: SEMANTICS 2015 Proceedings of the 11th International Conference on Semantic Systems (2015). https://doi.org/10.1145/2814864.2814887
Vahdati, S., Fathalla, S., Auer, S., Lange, C., Vidal, M.E.: Semantic representation of scientific publications. In: International Conference on Theory and Practice of Digital Libraries, vol. 11799, pp. 375–379 (2019). https://doi.org/10.1007/978-3-030-30760-8_37
Vasileiadis, M., Kaklanis, N., Votis, K., Tzovaras, D.: Extraction of tabular data from document images. In: Proceedings of the 14th Web for All Conference, W4A (2017). https://doi.org/10.1145/3058555.3058581
Verborgh, R., De Wilde, M.: Using OpenRefine. Packt Publishing Ltd., Birmingham (2013)
Webster, J., Watson, R.T.: Analyzing the Past to Prepare for the Future: writing a Literature Review. MIS Q. 26(2), xiii–xxiii (2002)
Wee, B.V., Banister, D.: How to Write a literature review paper? Transp. Rev. 36(2), 278–288 (2016). https://doi.org/10.1080/01441647.2015.1065456
Wilkinson, M.D., et al.: Comment: the FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016). https://doi.org/10.1038/sdata.2016.18
Acknowledgements
This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology. We want to thank our colleagues Mohamad Yaser Jaradeh and Kheir Eddine Farfar for their contributions to this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Oelen, A., Stocker, M., Auer, S. (2020). Creating a Scholarly Knowledge Graph from Survey Article Tables. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds) Digital Libraries at Times of Massive Societal Transition. ICADL 2020. Lecture Notes in Computer Science(), vol 12504. Springer, Cham. https://doi.org/10.1007/978-3-030-64452-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-64452-9_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64451-2
Online ISBN: 978-3-030-64452-9
eBook Packages: Computer ScienceComputer Science (R0)