Abstract
Natural Language Processing (NLP) tools aiming at the diagnosis of language impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in transcripts prevents the direct application of NLP methods which rely on these marks to work properly, such as taggers and parsers. We present a method to segment the transcripts into sentences and another to detect the disfluencies present in them, to serve as a preprocessing step for the application of subsequent NLP tools. Our methods use recurrent convolutional neural networks with prosodic, morphosyntactic features, and word embeddings. We evaluated both tasks intrinsically, analyzing the most important features, comparing the proposed methods to simpler ones, and identifying the main hits and misses. In addition, a final method was created to combine all tasks and it was evaluated extrinsically using 9 syntactic metrics of Coh-Metrix-Dementia. In the intrinsic evaluations, we showed that our method achieved (i) state-of-the-art results for the sentence segmentation task on impaired speech, and (ii) results that are similar to related works for the English language for disfluency detection tasks. Regarding the extrinsic evaluation, only 3 metrics showed a statistically significant difference between manual MCI transcripts and those generated by our method, suggesting that our method is capable to preprocess transcriptions to be further analyzed by NLP tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aluísio, S., Cunha, A., Scarton, C.: Evaluating progression of alzheimer’s disease by regression and classification methods in a narrative language test in Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 109–114. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_10
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22, 249–254 (1996)
Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: LREC, pp. 654–658 (2016)
Chen, J.C.: Speech recognition with automatic punctuation. In: EUROSPEECH, pp. 6–9 (1999)
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research (2006)
Fraser, K.C., Ben-david, N., Hirst, G., Graham, N.L., Rochon, E.: Sentence segmentation of aphasic speech. In: NAACL, pp. 862–871 (2015)
Heeman, P., Allen, J.: Detecting and correcting speech repairs. In: ACL, pp. 1–8 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Hough, J., Schlangen, D.: Joint, incremental disfluency detection and utterance segmentation from speech. In: EACL, pp. 326–336 (2017)
Jarrold, W.L., Peintner, B., Yeh, E., Krasnow, R., Javitz, H.S., Swan, G.E.: Language analytics for assessing brain health: cognitive impairment, depression and pre-symptomatic alzheimer’s disease. In: Yao, Y., Sun, R., Poggio, T., Liu, J., Zhong, N., Huang, J. (eds.) BI 2010. LNCS (LNAI), vol. 6334, pp. 299–307. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15314-3_28
Lehr, M., Prud’hommeaux, E.T., Shafran, I., Roark, B.: Fully automated neuropsychological assessment for detecting mild cognitive impairment. In: INTERSPEECH, pp. 1039–1042 (2012)
Liu, Y., Shriberg, E., Stolcke, A., Harper, M.P.: Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. In: INTERSPEECH, pp. 3313–3316 (2005)
Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using conditional random fields for sentence boundary detection in speech. In: ACL, pp. 451–458 (2005)
Qian, X., Liu, Y.: Disfluency detection using multi-step stacked learning. In: ACL, pp. 820–825 (2013)
Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Eurospeech, pp. 2383–2386 (1997)
Stolcke, A., et al.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)
Tieleman, T., Hinton, G.: RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. (2012)
Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: INTERSPEECH, pp. 683–687. ISCA (2015)
Treviso, M.V., Shulby, C., Aluísio, S.M.: Sentence segmentation in narrative transcripts from neuropsychological tests using recurrent convolutional neural networks. In: EACL, pp. 1–10 (2017)
Wang, S., Che, W., Zhang, Y., Zhang, M., Liu, T.: Transition-based disfluency detection using LSTMs. EMNLP, pp. 2775–2784 (2017)
Acknowledgments
We thank CNPq for a scholarship granted to the first author.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Treviso, M.V., Aluísio, S.M. (2018). Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-99722-3_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)