Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests

Treviso, Marcos Vinícius; Aluísio, Sandra Maria

doi:10.1007/978-3-319-99722-3_41

Marcos Vinícius Treviso²¹ &
Sandra Maria Aluísio²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

934 Accesses

Abstract

Natural Language Processing (NLP) tools aiming at the diagnosis of language impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in transcripts prevents the direct application of NLP methods which rely on these marks to work properly, such as taggers and parsers. We present a method to segment the transcripts into sentences and another to detect the disfluencies present in them, to serve as a preprocessing step for the application of subsequent NLP tools. Our methods use recurrent convolutional neural networks with prosodic, morphosyntactic features, and word embeddings. We evaluated both tasks intrinsically, analyzing the most important features, comparing the proposed methods to simpler ones, and identifying the main hits and misses. In addition, a final method was created to combine all tasks and it was evaluated extrinsically using 9 syntactic metrics of Coh-Metrix-Dementia. In the intrinsic evaluations, we showed that our method achieved (i) state-of-the-art results for the sentence segmentation task on impaired speech, and (ii) results that are similar to related works for the English language for disfluency detection tasks. Regarding the extrinsic evaluation, only 3 metrics showed a statistically significant difference between manual MCI transcripts and those generated by our method, suggesting that our method is capable to preprocess transcriptions to be further analyzed by NLP tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese

Identifying neurocognitive disorder using vector representation of free conversation

Article Open access 03 August 2022

Detecting Alzheimer’s Disease by Exploiting Linguistic Information from Nepali Transcript

Notes

1.
http://nilc.icmc.usp.br/coh-metrix-dementia/.

References

Aluísio, S., Cunha, A., Scarton, C.: Evaluating progression of alzheimer’s disease by regression and classification methods in a narrative language test in Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 109–114. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_10
Chapter Google Scholar
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22, 249–254 (1996)
Google Scholar
Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: LREC, pp. 654–658 (2016)
Google Scholar
Chen, J.C.: Speech recognition with automatic punctuation. In: EUROSPEECH, pp. 6–9 (1999)
Google Scholar
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research (2006)
Google Scholar
Fraser, K.C., Ben-david, N., Hirst, G., Graham, N.L., Rochon, E.: Sentence segmentation of aphasic speech. In: NAACL, pp. 862–871 (2015)
Google Scholar
Heeman, P., Allen, J.: Detecting and correcting speech repairs. In: ACL, pp. 1–8 (1994)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Hough, J., Schlangen, D.: Joint, incremental disfluency detection and utterance segmentation from speech. In: EACL, pp. 326–336 (2017)
Google Scholar
Jarrold, W.L., Peintner, B., Yeh, E., Krasnow, R., Javitz, H.S., Swan, G.E.: Language analytics for assessing brain health: cognitive impairment, depression and pre-symptomatic alzheimer’s disease. In: Yao, Y., Sun, R., Poggio, T., Liu, J., Zhong, N., Huang, J. (eds.) BI 2010. LNCS (LNAI), vol. 6334, pp. 299–307. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15314-3_28
Chapter Google Scholar
Lehr, M., Prud’hommeaux, E.T., Shafran, I., Roark, B.: Fully automated neuropsychological assessment for detecting mild cognitive impairment. In: INTERSPEECH, pp. 1039–1042 (2012)
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Harper, M.P.: Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. In: INTERSPEECH, pp. 3313–3316 (2005)
Google Scholar
Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using conditional random fields for sentence boundary detection in speech. In: ACL, pp. 451–458 (2005)
Google Scholar
Qian, X., Liu, Y.: Disfluency detection using multi-step stacked learning. In: ACL, pp. 820–825 (2013)
Google Scholar
Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Eurospeech, pp. 2383–2386 (1997)
Google Scholar
Stolcke, A., et al.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)
Google Scholar
Tieleman, T., Hinton, G.: RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. (2012)
Google Scholar
Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: INTERSPEECH, pp. 683–687. ISCA (2015)
Google Scholar
Treviso, M.V., Shulby, C., Aluísio, S.M.: Sentence segmentation in narrative transcripts from neuropsychological tests using recurrent convolutional neural networks. In: EACL, pp. 1–10 (2017)
Google Scholar
Wang, S., Che, W., Zhang, Y., Zhang, M., Liu, T.: Transition-based disfluency detection using LSTMs. EMNLP, pp. 2775–2784 (2017)
Google Scholar

Download references

Acknowledgments

We thank CNPq for a scholarship granted to the first author.

Author information

Authors and Affiliations

Interinstitutional Center for Computational Linguistics (NILC), Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, Brazil
Marcos Vinícius Treviso & Sandra Maria Aluísio

Authors

Marcos Vinícius Treviso
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Maria Aluísio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos Vinícius Treviso .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Treviso, M.V., Aluísio, S.M. (2018). Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_41
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics