Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Large Vocabulary Search Space Reduction Employing Directed Acyclic Word Graphs and Phonological Rules

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Some applications of speech recognition, such as automatic directory information services, require very large vocabularies. In this paper, we focus on the task of recognizing surnames in an Interactive telephone-based Directory Assistance Services (IDAS) system, which supersedes other large vocabulary applications in terms of complexity and vocabulary size. We present a method for building compact networks in order to reduce the search space in very large vocabularies using Directed Acyclic Word Graphs (DAWGs). Furthermore, trees, graphs and full-forms (whole words with no merging of nodes) are compared in a straightforward way under the same conditions, using the same decoder and the same vocabularies. Experimental results showed that, as we move from full-form lexicons to trees and then to graphs, the size of the recognition network is reduced, as is the recognition time. However, recognition accuracy is retained since the same phoneme combinations are involved. Subsequently, we refine the N-best hypotheses' list provided by the speech recognizer by applying context-dependent phonological rules. Thus, a small number N in the N-best hypotheses' list produces multiple solutions sufficient to retain high accuracy and at the same time achieve real-time response. Recognition tests with a vocabulary of 88,000 surnames that correspond to 123,313 distinct pronunciations proved the efficiency of the approach. For N = 3 (a value that ensures we have fast performance), before the application of rules the recognition accuracy was 70.27%. After applying phonological rules the recognition performance rose to 86.75%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aoe, J., Morimoto, K., and Hase, M. (1993). An algorithm for compressing common suffixes used in trie structures. Systems and Computers in Japan, 24(12):31–42 (Translated from Trans. IEICE, J75-D-II(4):770-799, 1992).

    Google Scholar 

  • Betz, M. and Hild, H. (1995). Language models for a spelled letter recognizer. Proceedings of ICASSP, Detroit, MI, Vol. 1, pp. 856–859.

    Google Scholar 

  • Billi, R., Canavesio, F., and Rullent, C. (1998). Automation of Telecom Italia directory assistance service: Field trial results. Proceedings of IVTTA, Turin, Italy, pp. 11–16.

    Google Scholar 

  • Chen, F.R. (1990). Identification of contextual factors for pronunciation networks. Proceedings of ICASSP, pp. 753–756.

  • Collingham, R.J., Johnson, K., Nettleton, D.J., Dempster, G., and Garigliano, R. (1997). The Durham telephone enquiry system. International Journal of Speech Technology, 2(2):113–119.

    Google Scholar 

  • Còrdoba, R., San-Segundo, R., Montero, J.M., Col´as, J., Ferreiros, J., Macías-Guarasa, J., and Pardo, J.M. (2001). An interactive directory assistance service for Spanish with large-vocabulary recognition. Proceedings of Eurospeech, Aalborg, Denmark, pp. 1279–1282.

    Google Scholar 

  • Georgila, K., Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2000). Fast very large vocabulary recognition based on compact DAWGstructured language models. Proceedings of ICSLP, Beijing, China, Vol. 2, pp. 987–990.

    Google Scholar 

  • Gopalakrisnan, P.S., Bahl, L.R., and Mercer, R.L. (1995). A tree search strategy for large vocabulary continuous speech recognition. Proceedings of ICASSP, Detroit, MI, Vol. 1, pp. 572–575.

    Google Scholar 

  • Gupta, V., Robillard, S., and Pelletier, C. (1998). Automation of locality recognition in ADAS Plus. Proceedings of IVTTA, Turin, Italy, pp. 1–4.

    Google Scholar 

  • Hanazawa, K., Minami, Y., and Furui, S. (1997). An efficient search method for large-vocabulary continuous-speech recognition. Proceedings of ICASSP, Munich, Germany, pp. 1787–1790.

    Google Scholar 

  • Kamm, C.A., Shamieh, C.R., and Singhal, S. (1995). Speech recognition issues for directory assistance applications. Speech Communication, 17:303–311.

    Google Scholar 

  • Kaspar, B., Fries, G., Schumacher, K., and Wirth, A. (1995). FAUST-A directory-assistance demonstrator. Proceedings of Eurospeech, Madrid, Spain, pp. 1161–1164.

  • Lennig, M., Bielby, G., and Massicotte, J. (1995). Directory assistance automation in Bell Canada: Trial results. Speech Communication, 17:227–234.

    Google Scholar 

  • Mitchell, C.D. and Setlur, A.R. (1999). Improved spelling recognition using a tree-based fast lexical match. Proceedings of ICASSP, Phoenix, AZ.

  • Nguyen, L. and Schwartz, R. (1999). Single-tree method for grammar-directed search. Proceedings of ICASSP

  • Phoenix, AZ. Phonetic S ystems (2002). Searching large directories by voice. Provided by Phonetic Systems.

  • Ramabhadran, B., Bahl, L.R., deSouza, P.V., and Padmanabhan, M. (1998). Acoustics-only based automatic phonetic baseform generation. Proceedings of ICASSP, Seatlle, WA, Vol. 1, pp. 309–312.

    Google Scholar 

  • Schmid, P., Cole, R., and Fanty, M. (1993). Automatically generated word pronunciations from phoneme classifier output. Proceedings of ICASSP, Minneapolis, MN, Vol. 2, pp. 223–226.

    Google Scholar 

  • Schramm, H., Rueber, B., and Kellner, A. (2000). Strategies for name recognition in automatic directory assistance systems. Speech Communication, 31: pp 329–338.

    Google Scholar 

  • Seide, F. and Kellner, A. (1997). Towards an automated directory information system. Proceedings of Eurospeech, Rhodes, Greece, Vol. 3, pp. 1327–1330.

    Google Scholar 

  • Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (1995). Two algorithms for incremental construction of directed acyclic word graphs. International Journal on Artificial Intelligence Tools, 4(3): 369–381.

    Google Scholar 

  • Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2001). Incremental construction of compact acyclic NFAs. Proceedings of ACLEACL, Toulouse, France, pp. 482–489.

  • Suontausta, J., Häkkinen, J., and Viikki, O. (2000). Fast decoding in large vocabulary name dialing. Proceedings of ICASSP, Istanbul, Turkey, 2000.

  • Van den Heuvel, H., Moreno, A., Omologo, M., Richard G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127–143.

    Google Scholar 

  • Whittaker, S.J. and Attwater, D.J. (1995). Advanced speech applications-The integration of speech technology into complex services. ESCA Workshop on Spoken Dialogue Systems-Theory and Application, Visgø, Denmark, pp. 113–116.

  • Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (1997). The HTK Book (user manual), Entropic Cambridge Research Laboratory, Cambridge.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Georgila, K., Fakotakis, N. & Kokkinakis, G. Large Vocabulary Search Space Reduction Employing Directed Acyclic Word Graphs and Phonological Rules. International Journal of Speech Technology 5, 355–370 (2002). https://doi.org/10.1023/A:1020965126094

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020965126094