Abstract
We describe an implementation of a simple probabilistic link grammar. This probabilistic language model extends trigrams by allowing a word to be predicted not only from the two immediately preceeding words, but potentially from any preceeding pair of adjacent words that lie within the same sentence. In this way, the trigram model can skip over less informative words to make its predictions. The underlying “grammar” is nothing more than a list of pairs of words that can be linked together with one or more intervening words; this word-pair grammar is automatically inferred from a corpus of training text. We present a novel technique for indexing the model parameters that allows us to avoid all sorting in the M-step of the training algorithm. This results in significant savings in computation time, and is applicable to the training of a general probabilistic link grammar. Results of preliminary experiments carried out for this class of models are presented.
Research supported in part by NSF and ARPA under grants IRI-9314969 and N00014-92-C-0189.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
L.R. Bahl, P.F. Brown, P.V. de Souza, and R.L. Mercer. Tree-based smoothing algorithm for a trigram language speech recognition model. IBM Technical Disclosure Bulletin, 34(7B):380–383, December 1991.
A. Berger, P. Brown, S. Della Pietra, V. Della Pietra, J. Gillett, J. Lafferty, R. Mercer, H. Printz, and L. Ureš. The Candide system for machine translation. In Human Language Technologies, Morgan Kaufman Publishers, 1994.
T. Booth and R. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22:442–450, 1973.
J. Lafferty, D. Sleator, and D. Temperley. Grammatical trigrams: A probabilistic model of link grammar. In Proceedings of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, Cambridge, MA, 1992.
D. Sleator and D. Temperley. Parsing English with a link grammar. Technical Report CMU-CS-91-196, School of Computer Science, Carnegie Mellon University, 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pietra, S.D., Pietra, V.D., Gillett, J., Lafferty, J., Printz, H., Ureš, L. (1994). Inference and estimation of a long-range trigram model. In: Carrasco, R.C., Oncina, J. (eds) Grammatical Inference and Applications. ICGI 1994. Lecture Notes in Computer Science, vol 862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58473-0_139
Download citation
DOI: https://doi.org/10.1007/3-540-58473-0_139
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58473-5
Online ISBN: 978-3-540-48985-6
eBook Packages: Springer Book Archive