Enriched BERT Embeddings for Scholarly Publication Classification

Wolff, Benjamin; Seidlmayer, Eva; Förstner, Konrad U.

doi:10.1007/978-3-031-65794-8_16

Computer Science > Artificial Intelligence

arXiv:2405.04136 (cs)

[Submitted on 7 May 2024]

Title:Enriched BERT Embeddings for Scholarly Publication Classification

Authors:Benjamin Wolff, Eva Seidlmayer, Konrad U. Förstner

View PDF HTML (experimental)

Abstract:With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given this http URL paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref. Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.

Comments:	8 pages, 2 figures, NSLP2024 conference
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2405.04136 [cs.AI]
	(or arXiv:2405.04136v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2405.04136
Journal reference:	Natural Scientific Language Processing and Research Knowledge Graphs (2024), LNAI 14770, 234-243
Related DOI:	https://doi.org/10.1007/978-3-031-65794-8_16

Submission history

From: Benjamin Wolff [view email]
[v1] Tue, 7 May 2024 09:05:20 UTC (126 KB)

Computer Science > Artificial Intelligence

Title:Enriched BERT Embeddings for Scholarly Publication Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Enriched BERT Embeddings for Scholarly Publication Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators