Abstract
Wordgraphs are structures that may be output by speech recognisers. We discuss various methods for turning wordgraphs into smaller structures. One of these methods is novel; this method relies on a new kind of determinization of acyclic weighted finite automata that is language-preserving but not fully weight-preserving, and results in smaller automata than in the case of traditional determinization of weighted finite automata. We present empirical data comparing the respective methods.
The methods are relevant for systems in which wordgraphs form the input to kinds of syntactic analysis that are very time consuming, such as unification parsing.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J.W. Amtrup and V. Weber. Time mapping with hypergraphs. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, volume 1, pages 55–61, Montreal, Quebec, Canada, August 1998.
H. Aust, M. Oerder, F. Seide, and V. Steinbiss. The Philips automatic train timetable information system. Speech Communication, 17:249–262, 1995.
Y. Bar-Hillel, M. Perles, and E. Shamir. On formal properties of simple phrase structure grammars. In Y. Bar-Hillel, editor, Language and Information: Selected Essays on their Theory and Application, chapter 9, pages 116–150. Addison-Wesley, 1964.
F. Barthélemy and E. Villemonte de la Clergerie. Subsumption-oriented push-down automata. In Programming Language Implementation and Logic Programming, 4th International Symposium, volume 631 of Lecture Notes in Computer Science, pages 100–114, Leuven, Belgium, August 1992. Springer-Verlag.
S. Billot and B. Lang. The structure of shared forests in ambiguous parsing. In 27th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 143–151, Vancouver, British Columbia, Canada, June 1989.
J.A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. Mathematical Theory of Automata, 12:529–561, 1962.
A.L. Buchsbaum, R. Giancarlo, and J.R. Westbrook. On the determinization of weighted finite automata. In Automata, Languages and Programming, 25th International Colloquium, volume 1443 of Lecture Notes in Computer Science, pages 482–493, Aalborg, Denmark, 1998. Springer-Verlag.
A.L. Buchsbaum, R. Giancarlo, and J.R. Westbrook. Shrinking language models by robust approximation. In ICASSP’ 98, volume II, pages 685–688, 1998.
T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms The MIT Press, 1990.
T. Jiang and B. Ravikumar. Minimal NFA problems are hard. SIAM Journal on Computing, 22(6):1117–1141, 1993.
B. Kiefer, H.-U. Krieger, J. Carroll, and R. Malouf. A bag of useful techniques for efficient and robust parsing. In 37th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Maryland, June 1999.
M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269–311, 1997.
H. Murveit et al. Large-vocabulary dictation using SRI’s DECIPHERTM speech recognition system: progressive search techniques. In ICASSP-93, volume II, pages 319–322, 1993.
S.M. Shieber. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In 23rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 145–152, Chicago, Illinois, USA, July 1985.
G. van Noord. Treatment of ε-moves in subset construction. In Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, pages 57–68, Ankara, Turkey, June–July 1998.
R.A. Wagner and M.J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168–173, 1974.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nederhof, MJ. (2000). Preprocessing for Unification Parsing of Spoken Language. In: Christodoulakis, D.N. (eds) Natural Language Processing — NLP 2000. NLP 2000. Lecture Notes in Computer Science(), vol 1835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45154-4_11
Download citation
DOI: https://doi.org/10.1007/3-540-45154-4_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67605-8
Online ISBN: 978-3-540-45154-9
eBook Packages: Springer Book Archive