Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397271.3401105acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Leveraging Social Media for Medical Text Simplification

Published: 25 July 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.

    Supplementary Material

    MP4 File (3397271.3401105.mp4)
    Most of the medical content on the web is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.5% over the existing best performing models.

    References

    [1]
    Emil Abrahamsson, Timothy Forni, Maria Skeppstedt, and Maria Kvist. 2014. Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR@EACL 2014, Gothenburg, Sweden, April 27, 2014. 57--65. https://doi.org/10.3115/v1/W14--1207
    [2]
    Viraj Adduru, Sadid A. Hasan, Joey Liu, Yuan Ling, Vivek V. Datla, Ashequl Qadir, and Oladimeji Farri. 2018. Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification. In Proceedings of the 3rd International Workshop on Knowledge Discovery in Healthcare Data co-located with the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), Stockholm, Schweden, July 13, 2018. 45--52. http://ceur-ws.org/Vol-2148/paper07.pdf
    [3]
    Alan R Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium. American Medical Informatics Association, 17.
    [4]
    Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.
    [5]
    William Coster and David Kauchak. 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 665--669.
    [6]
    Mark Davies. 2014. N-grams data from the Corpus of Contemporary American English (COCA).
    [7]
    William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu. 2015. Aligning Sentences from Standard Wikipedia to Simple Wikipedia. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 211--217. https://doi.org/10.3115/v1/N15-1022
    [8]
    Dorothy Curtis Kandula, Sasikiran and Qing Zeng-Treitler. 2010. A semantic and syntactic text simplification tool for health content. In AMIA annual symposium proceedings. Vol. 2010. American Medical Informatics Association.
    [9]
    Diederik P Kingma and Jimmy Ba. 2014.Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [10]
    Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 605.
    [11]
    Donald AB Lindberg, Betsy L Humphreys, and Alexa T McCray. 1993. The unified medical language system. Yearbook of Medical Informatics, Vol. 2, 01 (1993), 41--51.
    [12]
    Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association, Vol. 88, 3 (2000), 265.
    [13]
    Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).
    [14]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
    [15]
    Sergiu Nisioi, Sanja vS tajner, Simone Paolo Ponzetto, and Liviu P. Dinu. 2017. Exploring Neural Text Simplification Models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 85--91. https://doi.org/10.18653/v1/P17--2014
    [16]
    Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019).
    [17]
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.
    [18]
    Ellie Pavlick and Chris Callison-Burch. 2016. Simple PPDB: A paraphrase database for simplification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 143--148.
    [19]
    Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, Belgium, Brussels, 186--191. https://www.aclweb.org/anthology/W18--6319
    [20]
    Basel Qenam, Tae Youn Kim, Mark J Carroll, and Michael Hogarth. 2017. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. Journal of medical Internet research, Vol. 19, 12 (2017), e417.
    [21]
    Evelina Rennes and Arne Jönsson. 2015. A tool for automatic simplification of swedish texts. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015). 317--320.
    [22]
    Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, and Christopher G Chute. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, Vol. 17, 5 (2010), 507--513.
    [23]
    Matthew Shardlow and Raheel Nawaz. 2019. Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 380--389. https://doi.org/10.18653/v1/P19-1037
    [24]
    Advaith Siddharthan. 2006. Syntactic simplification and text cohesion. Research on Language and Computation, Vol. 4, 1 (2006), 77--109.
    [25]
    Luca Soldaini and Nazli Goharian. 2016. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, sigir. 1--4.
    [26]
    Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Vasudeva Varma, et almbox. 2019. Adapting Language Models for Non-Parallel Author-Stylized Rewriting. arXiv preprint arXiv:1909.09962 (2019).
    [27]
    Özlem Uzuner, Brett R South, Shuying Shen, and Scott L DuVall. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, Vol. 18, 5 (2011), 552--556.
    [28]
    Raghuram Vadapalli, Bakhtiyar Syed, Nishant Prabhu, Balaji Vasan Srinivasan, and Vasudeva Varma. 2018. When science journalism meets artificial intelligence: An interactive demonstration. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 163--168.
    [29]
    Laurens van den Bercken, Robert-Jan Sips, and Christoph Lofi. 2019. Evaluating Neural Text Simplification in the Medical Domain. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 3286--3292. https://doi.org/10.1145/3308558.3313630
    [30]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
    [31]
    Deborah X Xie, Ray Y Wang, and Sivakumar Chinnadurai. 2018. Readability of online patient education materials for velopharyngeal insufficiency. International journal of pediatric otorhinolaryngology, Vol. 104 (2018), 113--119.
    [32]
    Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, Vol. 3 (2015), 283--297.
    [33]
    Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 401--415.
    [34]
    Sanqiang Zhao, Rui Meng, Daqing He, Saptono Andi, and Parmanto Bambang. 2018. Integrating transformer and paraphrase rules for sentence simplification. arXiv preprint arXiv:1810.11193 (2018).
    [35]
    Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 1353--1361.

    Cited By

    View all
    • (2023)A dataset for plain language adaptation of biomedical abstractsScientific Data10.1038/s41597-022-01920-310:1Online publication date: 4-Jan-2023
    • (2023)Knowledge Augmentation for Early Depression DetectionArtificial Intelligence for Personalized Medicine10.1007/978-3-031-36938-4_14(175-191)Online publication date: 2-Sep-2023
    • (2022)Summarization, simplification, and generationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.117627205:COnline publication date: 1-Nov-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2020
    2548 pages
    ISBN:9781450380164
    DOI:10.1145/3397271
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. denoising autoencoders
    2. seq-to-seq models
    3. text simplification

    Qualifiers

    • Research-article

    Conference

    SIGIR '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)4

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A dataset for plain language adaptation of biomedical abstractsScientific Data10.1038/s41597-022-01920-310:1Online publication date: 4-Jan-2023
    • (2023)Knowledge Augmentation for Early Depression DetectionArtificial Intelligence for Personalized Medicine10.1007/978-3-031-36938-4_14(175-191)Online publication date: 2-Sep-2023
    • (2022)Summarization, simplification, and generationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.117627205:COnline publication date: 1-Nov-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media