Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

Weber, Tobias; Kranzlmüller, Dieter; Fromm, Michael; de Sousa, Nelson Tavares

Computer Science > Information Retrieval

arXiv:1910.09313 (cs)

[Submitted on 16 Oct 2019]

Title:Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

Authors:Tobias Weber, Dieter Kranzlmüller, Michael Fromm, Nelson Tavares de Sousa

View PDF

Abstract:Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.

Subjects:	Information Retrieval (cs.IR); Digital Libraries (cs.DL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.09313 [cs.IR]
	(or arXiv:1910.09313v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1910.09313

Submission history

From: Tobias Weber [view email]
[v1] Wed, 16 Oct 2019 07:51:37 UTC (230 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
cs.DL
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tobias Weber
Dieter Kranzlmüller
Michael Fromm

export BibTeX citation

Computer Science > Information Retrieval

Title:Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators