Abstract
The importance of oral history archives in preserving and understanding past experiences is counterbalanced by the challenges encountered in accessing and searching through them, primarily due to their extensive size and the diverse demographics of the speakers. This paper presents an approach combining ASR technology and Transformer-based neural networks into the Asking questions framework. Its primary function is to generate questions accompanied by concise answers that relate to the topics discussed in each interview segment. Additionally, we introduce a semantic continuity model that filters the generated questions, ensuring that only the most relevant ones are retained. This enables a real-time semantic search through thousands of hours of recordings, with the crucial benefit that the speakers’ original words remain unaltered and still semantically align with the query. While the method is exemplified using a specific publicly available archive, its applicability extends universally to datasets of a similar nature.
This research was supported by the Czech Science Foundation (GA CR), project No. GA22-27800S, and by the grant of the University of West Bohemia, project No. SGS-2022-017.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
USC Shoah Foundation Oral History with Abraham Bomba | Experiencing History: Holocaust Sources in Context. https://perspectives.ushmm.org/. Accessed 12 Apr 2023
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: Wav2Vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460 (2020)
Chen, G., et al.: Gigaspeech: an evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio. In: Proceedings of Interspeech 2021 (2021)
Gospodinov, M., MacAvaney, S., Macdonald, C.: Doc2query-: when less is more. In: Kamps, J., et al. (eds.) ECIR 2023. LNCS, vol. 13981, pp. 414–422. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28238-6_31
He, B., Ounis, I.: Studying query expansion effectiveness. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 611–619. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_57
Khashabi, D., et al.: UNIFIEDQA: crossing format boundaries with a single QA system. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online (2020)
Košarko, O., Variš, D., Popel, M.: LINDAT translation service (2019). http://hdl.handle.net/11234/1-2922. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Lehečka, J., Švec, J., Pražák, A., Psutka, J.V.: Exploring capabilities of monolingual audio transformers using large datasets in automatic speech recognition of Czech. In: Proceedings of Interspeech 2022, pp. 1831–1835 (2022)
Mao, H.H., Li, S., McAuley, J., Cottrell, G.W.: Speech recognition and multi-speaker diarization of long conversations. In: Proceedings of Interspeech 2020, pp. 691–695 (2020)
OpenAI: GPT-3 API (2021). https://beta.openai.com/docs/api-reference/introduction. Accessed 25 Mar 2023
Pecina, P., Hoffmannová, P., Jones, G.J.F., Zhang, Y., Oard, D.W.: Overview of the CLEF-2007 cross-language speech retrieval track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 674–686. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_86
Picheny, M., Tüske, Z., Kingsbury, B., Audhkhasi, K., Cui, X., Saon, G.: Challenging the boundaries of speech recognition: the MALACH corpus. In: Proceedings of Interspeech 2019, pp. 326–330 (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR (2019). http://arxiv.org/abs/1910.10683
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for SQuAD. In: Proceedings of ACL 2018, Melbourne, Australia, pp. 784–789. ACL (2018)
Ramabhadran, B., et al.: USC-SFI MALACH Interviews and Transcripts English LDC2012S05. Linguistic Data Consortium, Philadelphia (2012). https://catalog.ldc.upenn.edu/LDC2012s05
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 EMNLP-IJCNLP, Hong Kong, China, pp. 3982–3992. Association for Computational Linguistics (2019)
Švec, J., Lehečka, J., Šmídl, L., Ircing, P.: Transformer-based automatic punctuation prediction and word casing reconstruction of the ASR output. In: Ekštein, K., Pártl, F., Konopík, M. (eds.) TSD 2021. LNCS, vol. 12848, pp. 86–94. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83527-9_7
Wang, J., Jatowt, A., Yoshikawa, M.: Archivalqa: a large-scale benchmark dataset for open domain question answering over archival news collections. CoRR abs/2109.03438 (2021)
Yao, X., et al.: Creating conversational characters using question generation tools. Dialogue Discourse 3(2), 125–146 (2012)
Švec, J., Šmídl, L., Psutka, J.V., Pražák, A.: Spoken term detection and relevance score estimation using dot-product of pronunciation embeddings. In: Proceedings of Interspeech 2021, pp. 4398–4402 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Švec, J., Bulín, M., Frémund, A., Polák, F. (2024). Asking Questions Framework for Oral History Archives. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-56063-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)