Sequential span classification with neural semi-Markov CRFs for biomedical abstracts
Findings of the Association for Computational Linguistics: EMNLP 2020, 2020•aclanthology.org
Dividing biomedical abstracts into several segments with rhetorical roles is essential for
supporting researchers' information access in the biomedical domain. Conventional
methods have regarded the task as a sequence labeling task based on sequential sentence
classification, ie, they assign a rhetorical label to each sentence by considering the context
in the abstract. However, these methods have a critical problem: they are prone to mislabel
longer continuous sentences with the same rhetorical label. To tackle the problem, we …
supporting researchers' information access in the biomedical domain. Conventional
methods have regarded the task as a sequence labeling task based on sequential sentence
classification, ie, they assign a rhetorical label to each sentence by considering the context
in the abstract. However, these methods have a critical problem: they are prone to mislabel
longer continuous sentences with the same rhetorical label. To tackle the problem, we …
Abstract
Dividing biomedical abstracts into several segments with rhetorical roles is essential for supporting researchers’ information access in the biomedical domain. Conventional methods have regarded the task as a sequence labeling task based on sequential sentence classification, ie, they assign a rhetorical label to each sentence by considering the context in the abstract. However, these methods have a critical problem: they are prone to mislabel longer continuous sentences with the same rhetorical label. To tackle the problem, we propose sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences. Accordingly, we introduce Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths. Experimental results obtained from PubMed 20k RCT and NICTA-PIBOSO datasets demonstrate that our proposed method achieved the best micro sentence-F1 score as well as the best micro span-F1 score.
aclanthology.org