Abstract
With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining.
This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
They also evaluated the extracting the best score as an automated task which proved very challenging owing to inconsistency with which the best scores are reported and thereby the inability of pdf-to-text extractors to mine the data effectively.
- 5.
- 6.
Our corpus was downloaded from the PwC Github repository https://github.com/paperswithcode/paperswithcode-data and was constructed by combining the information in the files All papers with abstracts and Evaluation tables which included article urls and TDM crowdsourced annotation metadata.
- 7.
References
AI metrics. https://www.eff.org/ai/metrics. Accessed 26 Apr 2021
Natural Language Inference. https://paperswithcode.com/task/natural-language-inference. Accessed 22 Apr 2021
Nlp-progress. http://nlpprogress.com/. Accessed 26 Apr 2021
paperswithcode.com. https://paperswithcode.com/. Accessed 26 Apr 2021
Reddit sota. https://github.com/RedditSota/state-of-the-art-result-for-machine-learning-problems. Accessed 26 Apr 2021
Squad explorer. https://rajpurkar.github.io/SQuAD-explorer/. Accessed 26 Apr 2021
Anteghini, M., D’Souza, J., Dos Santos, V.A.M., Auer, S.: SciBERT-based semantification of bioassays in the open research knowledge graph. In: EKAW-PD 2020, pp. 22–30 (2020)
Anteghini, M., D’Souza, J., Martins dos Santos, V.A.P., Auer, S.: Representing semantified biological assays in the open research knowledge graph. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds.) ICADL 2020. LNCS, vol. 12504, pp. 89–98. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64452-9_8
Auer, S.: Towards an open research knowledge graph, January 2018. https://doi.org/10.5281/zenodo.1157185
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: SemEval@ACL (2017)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R., et al.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M. (ed.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17
Chiarelli, A., Johnson, R., Richens, E., Pinfield, S.: Accelerating scholarly communication: the transformative role of preprints (2019)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
D’Souza, J., Auer, S., Pedersen, T.: SemEval-2021 task 11: NLPcontributiongraph - structuring scholarly NLP contributions for a research knowledge graph. In: Proceedings of the Fifteenth Workshop on Semantic Evaluation. Association for Computational Linguistics, Bangkok, August 2021
D’Souza, J., Auer, S., Pederson, T.: SemEval-2021 task 11: NLPContributionGraph - structuring scholarly NLP contributions for a research knowledge graph, May 2021. https://zenodo.org/record/4737071
D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in stem scholarly content to authoritative encyclopedic and lexicographic sources. In: LREC, Marseille, France, pp. 2192–2203, May 2020
D’Souza, J., Auer, S.: Sentence, phrase, and triple annotations to build a knowledge graph of natural language processing contributions–a trial dataset. J. Data Inf. Sci. 20210429 (2021)
Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)
Ghasemi-Gol, M., Szekely, P.: TabVec: table vectors for classification of web tables. arXiv preprint arXiv:1802.06290 (2018)
Handschuh, S., QasemiZadeh, B.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: COLING 2014: 4th International Workshop on Computational Terminology (2014)
Herzig, J., Nowak, P.K., Mueller, T., Piccinno, F., Eisenschlos, J.: TaPas: weakly supervised table parsing via pre-training. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4320–4333 (2020)
Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. arXiv preprint arXiv:1906.09317 (2019)
Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: TDMSci: a specialized corpus for scientific literature entity tagging of tasks datasets and metrics. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 707–714 (2021)
Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: SciREX: a challenge dataset for document-level information extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7506–7516 (2020)
Jaradeh, M.Y., et al.: Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 243–246 (2019)
Jiang, M., D’Souza, J., Auer, S., Downie, J.S.: Improving scholarly knowledge representation: evaluating BERT-based models for scientific relation classification. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds.) ICADL 2020. LNCS, vol. 12504, pp. 3–19. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64452-9_1
Jinha, A.E.: Article 50 million: an estimate of the number of scholarly articles in existence. Learn. Publ. 23(3), 258–263 (2010)
Kardas, M., et al.: AxCell: automatic extraction of results from machine learning papers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8580–8594 (2020)
Kononova, O., et al.: Text-mined dataset of inorganic materials synthesis recipes. Sci. Data 6(1), 1–11 (2019)
Kulkarni, C., Xu, W., Ritter, A., Machiraju, R.: An annotated corpus for machine reading of instructions in wet lab protocols. In: NAACL: HLT, Volume 2 (Short Papers), New Orleans, Louisiana, pp. 97–106, June 2018. https://doi.org/10.18653/v1/N18-2016
Kuniyoshi, F., Makino, K., Ozawa, J., Miwa, M.: Annotating and extracting synthesis process of all-solid-state batteries from scientific literature. In: LREC, pp. 1941–1950 (2020)
Liu, Y., Bai, K., Mitra, P., Giles, C.L.: TableSeer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 91–100 (2007)
Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04346-8_62
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. arXiv preprint arXiv:1808.09602 (2018)
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: EMNLP (2018)
Manning, C.D.: Computational linguistics and deep learning. Comput. Linguist. 41(4), 701–707 (2015)
Milosevic, N., Gregson, C., Hernandez, R., Nenadic, G.: A framework for information extraction from tables in biomedical literature. Int. J. Doc. Anal. Recognit. 22(1), 55–78 (2019). https://doi.org/10.1007/s10032-019-00317-0
Mondal, I., Hou, Y., Jochim, C.: End-to-end NLP knowledge graph construction. arXiv preprint arXiv:2106.01167 (2021)
Mysore, S., et al.: The materials science procedural text corpus: annotating materials synthesis procedures with shallow semantic structures. In: Proceedings of the 13th Linguistic Annotation Workshop, pp. 56–64 (2019)
Oelen, A., Stocker, M., Auer, S.: Crowdsourcing scholarly discourse annotations. In: 26th International Conference on Intelligent User Interfaces, pp. 464–474 (2021)
Renear, A.H., Palmer, C.L.: Strategic reading, ontologies, and the future of scientific publishing. Science 325(5942), 828–832 (2009)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing, March 2015
Wei, X., Croft, B., Mccallum, A.: Table extraction for answer retrieval. Inf. Retr. 9(5), 589–611 (2006). https://doi.org/10.1007/s10791-006-9005-5
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019)
Acknowledgements
This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kabongo, S., D’Souza, J., Auer, S. (2021). Automated Mining of Leaderboards for Empirical AI Research. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-91669-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)