Utilizing Out-Domain Datasets to Enhance Multi-Task Citation Analysis

Mercier, Dominique; Rizvi, Syed Tahseen Raza; Rajashekar, Vikas; Ahmed, Sheraz; Dengel, Andreas

Computer Science > Information Retrieval

arXiv:2202.10884 (cs)

[Submitted on 22 Feb 2022]

Title:Utilizing Out-Domain Datasets to Enhance Multi-Task Citation Analysis

Authors:Dominique Mercier, Syed Tahseen Raza Rizvi, Vikas Rajashekar, Sheraz Ahmed, Andreas Dengel

View PDF

Abstract:Citations are generally analyzed using only quantitative measures while excluding qualitative aspects such as sentiment and intent. However, qualitative aspects provide deeper insights into the impact of a scientific research artifact and make it possible to focus on relevant literature free from bias associated with quantitative aspects. Therefore, it is possible to rank and categorize papers based on their sentiment and intent. For this purpose, larger citation sentiment datasets are required. However, from a time and cost perspective, curating a large citation sentiment dataset is a challenging task. Particularly, citation sentiment analysis suffers from both data scarcity and tremendous costs for dataset annotation. To overcome the bottleneck of data scarcity in the citation analysis domain we explore the impact of out-domain data during training to enhance the model performance. Our results emphasize the use of different scheduling methods based on the use case. We empirically found that a model trained using sequential data scheduling is more suitable for domain-specific usecases. Conversely, shuffled data feeding achieves better performance on a cross-domain task. Based on our findings, we propose an end-to-end trainable multi-task model that covers the sentiment and intent analysis that utilizes out-domain datasets to overcome the data scarcity.

Comments:	23 pages, 2 figures, 10 tables
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)
Cite as:	arXiv:2202.10884 [cs.IR]
	(or arXiv:2202.10884v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2202.10884

Submission history

From: Dominique Mercier [view email]
[v1] Tue, 22 Feb 2022 13:33:48 UTC (1,143 KB)

Computer Science > Information Retrieval

Title:Utilizing Out-Domain Datasets to Enhance Multi-Task Citation Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Utilizing Out-Domain Datasets to Enhance Multi-Task Citation Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators