Abstract
Some applications of speech recognition, such as automatic directory information services, require very large vocabularies. In this paper, we focus on the task of recognizing surnames in an Interactive telephone-based Directory Assistance Services (IDAS) system, which supersedes other large vocabulary applications in terms of complexity and vocabulary size. We present a method for building compact networks in order to reduce the search space in very large vocabularies using Directed Acyclic Word Graphs (DAWGs). Furthermore, trees, graphs and full-forms (whole words with no merging of nodes) are compared in a straightforward way under the same conditions, using the same decoder and the same vocabularies. Experimental results showed that, as we move from full-form lexicons to trees and then to graphs, the size of the recognition network is reduced, as is the recognition time. However, recognition accuracy is retained since the same phoneme combinations are involved. Subsequently, we refine the N-best hypotheses' list provided by the speech recognizer by applying context-dependent phonological rules. Thus, a small number N in the N-best hypotheses' list produces multiple solutions sufficient to retain high accuracy and at the same time achieve real-time response. Recognition tests with a vocabulary of 88,000 surnames that correspond to 123,313 distinct pronunciations proved the efficiency of the approach. For N = 3 (a value that ensures we have fast performance), before the application of rules the recognition accuracy was 70.27%. After applying phonological rules the recognition performance rose to 86.75%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aoe, J., Morimoto, K., and Hase, M. (1993). An algorithm for compressing common suffixes used in trie structures. Systems and Computers in Japan, 24(12):31–42 (Translated from Trans. IEICE, J75-D-II(4):770-799, 1992).
Betz, M. and Hild, H. (1995). Language models for a spelled letter recognizer. Proceedings of ICASSP, Detroit, MI, Vol. 1, pp. 856–859.
Billi, R., Canavesio, F., and Rullent, C. (1998). Automation of Telecom Italia directory assistance service: Field trial results. Proceedings of IVTTA, Turin, Italy, pp. 11–16.
Chen, F.R. (1990). Identification of contextual factors for pronunciation networks. Proceedings of ICASSP, pp. 753–756.
Collingham, R.J., Johnson, K., Nettleton, D.J., Dempster, G., and Garigliano, R. (1997). The Durham telephone enquiry system. International Journal of Speech Technology, 2(2):113–119.
Còrdoba, R., San-Segundo, R., Montero, J.M., Col´as, J., Ferreiros, J., Macías-Guarasa, J., and Pardo, J.M. (2001). An interactive directory assistance service for Spanish with large-vocabulary recognition. Proceedings of Eurospeech, Aalborg, Denmark, pp. 1279–1282.
Georgila, K., Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2000). Fast very large vocabulary recognition based on compact DAWGstructured language models. Proceedings of ICSLP, Beijing, China, Vol. 2, pp. 987–990.
Gopalakrisnan, P.S., Bahl, L.R., and Mercer, R.L. (1995). A tree search strategy for large vocabulary continuous speech recognition. Proceedings of ICASSP, Detroit, MI, Vol. 1, pp. 572–575.
Gupta, V., Robillard, S., and Pelletier, C. (1998). Automation of locality recognition in ADAS Plus. Proceedings of IVTTA, Turin, Italy, pp. 1–4.
Hanazawa, K., Minami, Y., and Furui, S. (1997). An efficient search method for large-vocabulary continuous-speech recognition. Proceedings of ICASSP, Munich, Germany, pp. 1787–1790.
Kamm, C.A., Shamieh, C.R., and Singhal, S. (1995). Speech recognition issues for directory assistance applications. Speech Communication, 17:303–311.
Kaspar, B., Fries, G., Schumacher, K., and Wirth, A. (1995). FAUST-A directory-assistance demonstrator. Proceedings of Eurospeech, Madrid, Spain, pp. 1161–1164.
Lennig, M., Bielby, G., and Massicotte, J. (1995). Directory assistance automation in Bell Canada: Trial results. Speech Communication, 17:227–234.
Mitchell, C.D. and Setlur, A.R. (1999). Improved spelling recognition using a tree-based fast lexical match. Proceedings of ICASSP, Phoenix, AZ.
Nguyen, L. and Schwartz, R. (1999). Single-tree method for grammar-directed search. Proceedings of ICASSP
Phoenix, AZ. Phonetic S ystems (2002). Searching large directories by voice. Provided by Phonetic Systems.
Ramabhadran, B., Bahl, L.R., deSouza, P.V., and Padmanabhan, M. (1998). Acoustics-only based automatic phonetic baseform generation. Proceedings of ICASSP, Seatlle, WA, Vol. 1, pp. 309–312.
Schmid, P., Cole, R., and Fanty, M. (1993). Automatically generated word pronunciations from phoneme classifier output. Proceedings of ICASSP, Minneapolis, MN, Vol. 2, pp. 223–226.
Schramm, H., Rueber, B., and Kellner, A. (2000). Strategies for name recognition in automatic directory assistance systems. Speech Communication, 31: pp 329–338.
Seide, F. and Kellner, A. (1997). Towards an automated directory information system. Proceedings of Eurospeech, Rhodes, Greece, Vol. 3, pp. 1327–1330.
Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (1995). Two algorithms for incremental construction of directed acyclic word graphs. International Journal on Artificial Intelligence Tools, 4(3): 369–381.
Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2001). Incremental construction of compact acyclic NFAs. Proceedings of ACLEACL, Toulouse, France, pp. 482–489.
Suontausta, J., Häkkinen, J., and Viikki, O. (2000). Fast decoding in large vocabulary name dialing. Proceedings of ICASSP, Istanbul, Turkey, 2000.
Van den Heuvel, H., Moreno, A., Omologo, M., Richard G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127–143.
Whittaker, S.J. and Attwater, D.J. (1995). Advanced speech applications-The integration of speech technology into complex services. ESCA Workshop on Spoken Dialogue Systems-Theory and Application, Visgø, Denmark, pp. 113–116.
Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (1997). The HTK Book (user manual), Entropic Cambridge Research Laboratory, Cambridge.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Georgila, K., Fakotakis, N. & Kokkinakis, G. Large Vocabulary Search Space Reduction Employing Directed Acyclic Word Graphs and Phonological Rules. International Journal of Speech Technology 5, 355–370 (2002). https://doi.org/10.1023/A:1020965126094
Issue Date:
DOI: https://doi.org/10.1023/A:1020965126094