short-paper

Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data

Authors:

Soumya Banerjee,

Debarshi Kumar Sanyal,

Samiran Chattopadhyay,

Plaban Kumar Bhowmick,

Partha Pratim DasAuthors Info & Claims

JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

Pages 429 - 432

https://doi.org/10.1145/3383583.3398598

Published: 01 August 2020 Publication History

Get Access

Abstract

The abstract of a scientific paper distills the contents of the paper into a short paragraph. In the biomedical literature, it is customary to structure an abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT, and CONCLUSION, but this segmentation is uncommon in other fields like computer science. Explicit categories could be helpful for more granular, that is, discourse-level search and recommendation. The sparsity of labeled data makes it challenging to construct supervised machine learning solutions for automatic discourse-level segmentation of abstracts in non-bio domains. In this paper, we address this problem using transfer learning. We define three discourse categories -- BACKGROUND, TECHNIQUE, and OBSERVATION -- for an abstract because these three categories are most common. We train a deep neural network on structured abstracts from PubMed, then fine-tune it on a small hand-labeled corpus of computer science papers. We observe an accuracy of 75% on the test corpus of computer science papers. We also perform an ablation study to highlight the roles of the different parts of the model. Our method appears to be a promising solution to the automatic segmentation of abstracts, where the labeled data is sparse.

References

[1]

Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, and Aniket Kittur. 2018. SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 31.

Digital Library

Google Scholar

[2]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, Vol. 20, 1 (1960), 37--46.

Crossref

Google Scholar

[3]

Franck Dernoncourt. 2017. Sequential short-text classification with neural networks. Ph.D. Dissertation. Massachusetts Institute of Technology.

Google Scholar

[4]

Franck Dernoncourt and Ji Young Lee. 2017. 200k RCT: a dataset for sequential sentence classification in medical abstracts. arXiv preprint arXiv:1710.06071 (2017).

Google Scholar

[5]

Di Jin and Peter Szolovits. 2018. Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts. arXiv preprint arXiv:1808.06161 (2018).

Google Scholar

[6]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

Google Scholar

[7]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

Crossref

Google Scholar

[8]

Anna M Ripple, James G Mork, John M Rozier, and Lou S Knecht. 2012. Structured abstracts in MEDLINE: Twenty-five years later. National Library of Medicine (2012).

Google Scholar

[9]

Charles Sutton, Andrew McCallum, et al. 2012. An introduction to conditional random fields. Foundations and Trends® in Machine Learning, Vol. 4, 4 (2012), 267--373.

Crossref

Google Scholar

[10]

US National Library of Health. [n. d.]. Structured Abstracts. https://www.nlm.nih.gov/bsd/policy/structured_abstracts.html. (Accessed on May 20, 2020).

Google Scholar

Cited By

View all

Brack AEntrup EStamatakis MBuschermöhle PHoppe AEwerth R(2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00392-z
Karabulut MVijay-Shanker KWang MYoon B(2023)Sectioning biomedical abstracts using pointer networksProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3584371.3612959(1-9)Online publication date: 3-Sep-2023
https://dl.acm.org/doi/10.1145/3584371.3612959
Tokala YAluru SVallabhajosyula ASanyal DDas P(2023)Label informed hierarchical transformers for sequential sentence classification in scientific abstractsExpert Systems10.1111/exsy.1323840:6Online publication date: 25-Jan-2023
https://doi.org/10.1111/exsy.13238
Show More Cited By

Index Terms

Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Document structure

Recommendations

Automatic segmentation of optic disc in retinal fundus images using semi-supervised deep learning
Abstract
Diseases of the eye require manual segmentation and examination of the optic disc by ophthalmologists. Though, image segmentation using deep learning techniques is achieving remarkable results, it leverages on large-scale labeled datasets. But, in ...
Semi-supervised Deep Learning for Network Anomaly Detection
Algorithms and Architectures for Parallel Processing
Abstract
Deep learning promotes the fields of image processing, machine translation and natural language processing etc. It also can be used in network anomaly detection. In practice, it is not hard to obtain normal instances. However, it is always ...
Learning safe multi-label prediction for weakly labeled data

In this paper we study multi-label learning with weakly labeled data, i.e., labels of training examples are incomplete, which commonly occurs in real applications, e.g., image classification, document categorization. This setting includes, e.g., (i) ...

Comments

Information & Contributors

Information

Published In

JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

August 2020

611 pages

ISBN:9781450375856

DOI:10.1145/3383583

General Chairs:
Ruhua Huang
Wuhan University, China
,
Dan Wu
Wuhan University, China
,
Gary Marchionini
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Daqing He
University of Pittsburgh, USA
,
Sally Jo Cunningham
University of Waikato, New Zealand
,
Preben Hansen
Stockholm University, Sweden

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Digital Library of India Project sponsored by the Ministry of Human Resource Development, Government of India at IIT Kharagpur

Conference

JCDL '20

Sponsor:

JCDL '20: The ACM/IEEE Joint Conference on Digital Libraries in 2020

August 1 - 5, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
125
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Brack AEntrup EStamatakis MBuschermöhle PHoppe AEwerth R(2024)Sequential sentence classification in research papers using cross-domain multi-task learningInternational Journal on Digital Libraries10.1007/s00799-023-00392-z25:2(377-400)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00392-z
Karabulut MVijay-Shanker KWang MYoon B(2023)Sectioning biomedical abstracts using pointer networksProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3584371.3612959(1-9)Online publication date: 3-Sep-2023
https://dl.acm.org/doi/10.1145/3584371.3612959
Tokala YAluru SVallabhajosyula ASanyal DDas P(2023)Label informed hierarchical transformers for sequential sentence classification in scientific abstractsExpert Systems10.1111/exsy.1323840:6Online publication date: 25-Jan-2023
https://doi.org/10.1111/exsy.13238
Ramesh Kashyap AYang YKan M(2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
https://doi.org/10.1007/s00799-023-00352-7
Bhowmick PDas PChakrabarti PSanyal D(2022)National digital library of IndiaCommunications of the ACM10.1145/355048065:11(58-61)Online publication date: 20-Oct-2022
https://dl.acm.org/doi/10.1145/3550480
Brack AHoppe ABuschermöhle PEwerth RAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Cross-domain multi-task learning for sequential sentence classification in research papersProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530922(1-13)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530922

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Automatic segmentation of optic disc in retinal fundus images using semi-supervised deep learning

Semi-supervised Deep Learning for Network Anomaly Detection

Learning safe multi-label prediction for weakly labeled data

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations