Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents

Published: 16 August 2024 Publication History

Abstract

Automatic speech recognition (ASR) has become an indispensable part of the AI domain, with various speech technologies reliant on it. The quality of speech recognition depends on the amount of annotated data used to train an ASR system, among other factors. For a low-resourced language, this is a severe constraint and thus ASR quality is often poor. Humans can read through text containing ASR-errors, provided the context of the sentence is preserved. Yet in cases of transcripts generated by ASR systems of low-resource languages, multiple important words are misrecognized and the context is mostly lost; discerning such a text becomes nearly impossible. This article analyzes the types of transcription errors that occur while generating ASR transcripts of spoken documents in Bengali, an under-resourced language predominantly spoken in India and Bangladesh. The transcripts of the Bengali spoken document are generated using the ASR of Google Cloud Speech. The article also explores if there is an effect of such transcription errors in generating speech summaries of these spoken documents. Summarization is carried out extractively; sentences are selected from the ASR-generated text of the spoken document. Speech summaries are created by aggregating the speech-segments from the original speech of the selected sentences. Subjective evaluation shows the “readability” of the spoken summaries are not degraded by ASR errors, but the quality is affected due to the reliance on intermediate text-summary containing transcription errors.

References

[1]
Tomonori Kikuchi, Sadaoki Furui, and Chiori Hori. 2003. Automatic speech summarization based on sentence extraction and compaction. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, I–I.
[2]
Shih-Hung Liu, Kuan-Yu Chen, and Berlin Chen. 2020. Enhanced language modeling with proximity and sentence relatedness information for extractive broadcast news summarization. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 3 (2020), 1–19.
[3]
Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2015. Positional language modeling for extractive broadcast news speech summarization. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.
[4]
Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2015. Combining relevance language modeling and clarity measure for extractive speech summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 6 (2015), 957–969.
[5]
Mohammad Hadi Bokaei, Hossein Sameti, and Yang Liu. 2016. Extractive summarization of multi-party meetings through discourse segmentation. Natural Language Engineering 22, 1 (2016), 41.
[6]
Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Ea-Ee Jan, Wen-Lian Hsu, and Hsin-Hsi Chen. 2015. Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 8 (2015), 1322–1334.
[7]
Shih-Hung Liu, Hung-Shih Lee, Hsiao-Tsung Hung, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2015. Incorporating proximity information in relevance language modeling for extractive speech summarization. In Proceedings of the 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 401–407.
[8]
Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2016. Exploiting graph regularized nonnegative matrix factorization for extractive speech summarization. In Proceedings of the 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 1–7.
[9]
Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, and Hsin-Hsi Chen. 2016. Novel word embedding and translation-based language modeling for extractive speech summarization. In Proceedings of the 24th ACM International Conference on Multimedia. 377–381.
[10]
Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, and Hsin-Hsi Chen. 2016. Exploring the use of unsupervised query modeling techniques for speech recognition and summarization. Speech Communication 80, C (2016), 49–59.
[11]
Atsunori Ogawa, Tsutomu Hirao, Tomohiro Nakatani, and Masaaki Nagata. 2019. ILP-based compressive speech summarization with content word coverage maximization and its oracle performance analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 7190–7194.
[12]
Matthew Willian Stirling Toulmin. 2006. Reconstructing linguistic history in a dialect continuum: The kamta, rajbanshi, and northern deshi bangla subgroup of indo-aryan. In The Australian National University.
[13]
M. F. Mridha, Abu Quwsar Ohi, Md Abdul Hamid, and Muhammad Mostafa Monowar. 2022. A study on the challenges and opportunities of speech recognition for Bengali language. Artificial Intelligence Review 55, 4 (Apr. 2022), 3431–3455.
[14]
Elena Filatova and Vasileios Hatzivassiloglou. 2004. Event-based extractive summarization. In Text Summarization Branches Out. Barcelona, Spain. Association for Computational Linguistics, 104–111.
[15]
Tadashi Nomoto and Yuji Matsumoto. 2002. Supervised ranking in open-domain text summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[16]
Joel Larocca Neto, Alex Alves Freitas, and Celso A. A. Kaestner. 2002. Automatic text summarization using a machine learning approach. In Proceedings of the Brazilian Symposium on Artificial Intelligence.
[17]
Massih-Reza Amini and Patrick Gallinari. 2002. The use of unlabeled data to improve supervised learning for text summarization. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[18]
Massih-Reza Amini and Patrick Gallinari. 2001. Self supervised learning for automatic text summarization by text span extraction. In The 23rd BCS European Annual Colloquium on Information Retrieval (ECIR’01), Darmstadt, Germany, 55–63.
[19]
Kam-Fai Wong, Mingli Wu, and Wenjie Li. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics. 985–992.
[20]
Wesley T. Chuang and Jihoon Yang. 2000. Extracting sentence segments for text summarization: A machine learning approach. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[21]
Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22, 1 (2004), 457–479.
[22]
Massih-Reza Amini and Patrick Gallinari. 2001. Automatic text summarization using unsupervised and semi-supervised learning. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery.
[23]
Pascale Fung, Grace Ngai, and Chi-Shun Cheung. 2003. Combining optimal clustering and hidden markov models for extractive summarization. In Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering. 21–28.
[24]
Rada Mihalcea. 2005. Language independent extractive summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[25]
Kaile Shi, Xiaoyan Cai, Libin Yang, Jintao Zhao, and Shirui Pan. 2022. Starsum: A star architecture based model for extractive summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 3020–3031.
[26]
Zhengyuan Liu, Angela Ng, Sheldon Lee, Ai Ti Aw, and Nancy F Chen. 2019. Topic-aware pointer-generator networks for summarizing spoken conversations. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop. IEEE, 814–821.
[27]
Berlin Chen, Yao-Ming Yeh, Yao-Min Huang, and Yi-Ting Chen. 2006. Chinese spoken document summarization using probabilistic latent topical information. In Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. IEEE, I–I.
[28]
Berlin Chen, Yi-Ting Chen, Chih-Hao Chang, and Hung-Bin Chen. 2005. Speech retrieval of mandarin broadcast news via mobile devices. In Proceedings of the INTERSPEECH. 109–112.
[29]
A. Vinnarasu and Deepa V. Jose. 2019. Speech to text conversion and summarization for effective understanding and documentation. International Journal of Electrical and Computer Engineering 9, 5 (2019), 3642.
[30]
Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, and Hsin-Min Wang. 2016. Improved spoken document summarization with coverage modeling techniques. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6010–6014.
[31]
Akira Inoue, Takayoshi Mikami, and Yoichi Yamashita. 2004. Improvement of speech summarization using prosodic information. In Proceedings of the Speech Prosody 2004, International Conference.
[32]
Sadaoki Furui, Tomonori Kikuchi, Yosuke Shinnaka, and Chiori Hori. 2004. Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Transactions on Speech and Audio Processing 12, 4 (2004), 401–408.
[33]
Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estéve, and Renato De Mori. 2017. ASR error management for improving spoken language understanding. In Proceedings of Interspeech 2017. 3329–3333.
[34]
Askars Salimbajevs and Jevgenijs Strigins. 2015. Error analysis and improving speech recognition for latvian language. In Proceedings of the International Conference Recent Advances in Natural Language Processing. 563–569.
[35]
Johannes Wirth and Rene Peinl. 2022. Asr in german: A detailed error analysis. arXiv:2204.05617. Retrieved from https://arxiv.org/abs/2204.05617
[36]
Takahiro Shinozaki and Sadaoki Furui. 2001. Error analysis using decision trees in spontaneous presentation speech recognition. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 198–201.
[37]
Vicky Zayats, Trang Tran, Richard Wright, Courtney Mansfield, and Mari Ostendorf. 2019. Disfluencies and human speech transcription errors. In Proceedings of the Interspeech 2019. 3088–3092.
[38]
June Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao, and Nicole Holliday. 2022. Language-specific effects on automatic speech recognition errors for world englishes. In Proceedings of the 29th International Conference on Computational Linguistics. 7177–7186.
[39]
N. Usha Rani and P. N. Girija. 2012. Error analysis to improve the speech recognition accuracy on telugu language. Sadhana 37, 6 (2012), 747–761.
[40]
Shelly Jain, Aditya Yadavalli, Ganesh S. Mirishkar, and Anil Kumar Vuppala. 2023. How do phonological properties affect bilingual automatic speech recognition? 2022 IEEE Spoken Language Technology Workshop (2023), 763–770.
[41]
Shourin R. Aura, Md J. Rahimi, and Oli L. Baroi. 2020. Analysis of the error pattern of HMM based bangla ASR. International Journal of Image, Graphics and Signal Processing 12, 1 (2020), 1–9.
[42]
György Szaszák, Máté Ákos Tündik, and András Beke. 2016. Summarization of spontaneous speech using automatic speech recognition and a speech prosody based tokenizer. In Proceedings of the KDIR. 221–227.
[43]
Anil Kumar Singh. 2008. Natural language processing for less privileged languages: Where do we come from? Where are we going? In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages.
[44]
Christopher Cieri, Mike Maxwell, Stephanie Strassel, and Jennifer Tracey. 2016. Selection criteria for low resource language programs. In Proceedings of the 10th International Conference on Language Resources and Evaluation. 4543–4549.
[45]
Yulia Tsvetkov. 2017. Opportunities and challenges in working with low-resource languages. Slides Part-1 (2017).
[46]
A. NithyaKalyani and S. Jothilakshmi. 2019. Speech summarization for tamil language. In Proceedings of the Intelligent Speech Signal Processing. Elsevier, 113–138.
[47]
Kamal Sarkar. 2012. Bengali text summarization by sentence extraction. In Proceedings of International Conference on Business and Information Management (ICBIM-2012). 233–245.
[48]
Kamal Sarkar. 2014. A keyphrase-based approach to text summarization for English and bengali documents. International Journal of Technology Diffusion 5, 2 (2014), 28–38.
[49]
Nadira Anjum Nipa and Naznin Sultana. 2022. A comprehensive study on multi document text summarization for bengali language. In Proceedings of the International Joint Conference on Advances in Computational Intelligence: IJCACI 2021. Springer, 415–426.
[50]
Sheikh Abujar, Mahmudul Hasan, MSI Shahin, and Syed Akhter Hossain. 2017. A heuristic approach of text summarization for Bengali documentation. In Proceedings of the 2017 8th International Conference on Computing, Communication and Networking Technologies. IEEE, 1–8.
[51]
PB Tumpa, S. Yeasmin, AM Nitu, MP Uddin, MI Afjal, and MAA Mamun. 2018. An improved extractive summarization technique for bengali text (s). In Proceedings of the 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering. IEEE, 1–4.
[52]
Alvee Rahman, Fahim Md Rafiq, Ramkrishna Saha, Ruhit Rafian, and Hossain Arif. 2019. Bengali text summarization using TextRank, fuzzy c-means and aggregate scoring methods. In Proceedings of the 2019 IEEE Region 10 Symposium. IEEE, 331–336.
[53]
Sumya Akter, Aysa Siddika Asa, Md Palash Uddin, Md Delowar Hossain, Shikhor Kumer Roy, and Masud Ibn Afjal. 2017. An extractive text summarization technique for Bengali document (s) using k-means clustering algorithm. In Proceedings of the 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition. IEEE, 1–6.
[54]
Sohini Roy Chowdhury, Kamal Sarkar, and Santanu Dam. 2017. An approach to generic Bengali text summarization using latent semantic analysis. In Proceedings of the 2017 International Conference on Information Technology. IEEE, 11–16.
[55]
Sohini Roychowdhury, Kamal Sarkar, and Arka Maji. 2022. Unsupervised bengali text summarization using sentence embedding and spectral clustering. In Proceedings of the 19th International Conference on Natural Language Processing. 337–346.
[56]
Sheikh Abujar, Abu Kaisar Mohammad Masum, Md Mohibullah, and Syed Akhter Hossain. 2019. An approach for bengali text summarization using word2vector. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies. IEEE, 1–5.
[57]
Abdullah Al Munzir, Md Lutfor Rahman, Sheikh Abujar, and Syed Akhter Hossain. 2019. Text analysis for bengali text summarization using deep learning. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies. IEEE, 1–6.
[58]
Piotr Szymański, Piotr Żelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Żyła-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, and Yishay Carmiel. 2020. WER we are and WER we think we are. In Findings of the Association for Computational Linguistics: EMNLP 2020. 3290–3295.
[59]
Sushant Kafle and Matt Huenerfauth. 2017. Evaluating the usability of automatically generated captions for people who are deaf or hard of hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. 165–174.
[60]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. ACM New York.
[61]
Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. 2002. Distance metric learning, with application to clustering with side-information. In Proceedings of the 15th International Conference on Neural Information Processing Systems (NIPS’02). MIT Press, Cambridge, MA, USA, 521–528.

Index Terms

  1. Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 9
    September 2024
    186 pages
    EISSN:2375-4702
    DOI:10.1145/3613646
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 August 2024
    Online AM: 17 July 2024
    Accepted: 07 July 2024
    Revised: 03 May 2024
    Received: 28 September 2022
    Published in TALLIP Volume 23, Issue 9

    Check for updates

    Author Tags

    1. Low-resource language
    2. extractive speech sumarization
    3. automatic speech recognition
    4. transcription error
    5. speech summary

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 122
      Total Downloads
    • Downloads (Last 12 months)122
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media