Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach
View/ Open
Date
30/06/2011Author
Arun, Abhishek
Metadata
Abstract
Recent advances in statistical machine translation (SMT) have used dynamic programming
(DP) based beam search methods for approximate inference within probabilistic
translation models. Despite their success, these methods compromise the probabilistic
interpretation of the underlying model thus limiting the application of probabilistically
defined decision rules during training and decoding.
As an alternative, in this thesis, we propose a novel Monte Carlo sampling approach
for theoretically sound approximate probabilistic inference within these models. The
distribution we are interested in is the conditional distribution of a log-linear translation
model; however, often, there is no tractable way of computing the normalisation term
of the model. Instead, a Gibbs sampling approach for phrase-based machine translation
models is developed which obviates the need of computing this term yet produces
samples from the required distribution.
We establish that the sampler effectively explores the distribution defined by a
phrase-based models by showing that it converges in a reasonable amount of time to
the desired distribution, irrespective of initialisation. Empirical evidence is provided to
confirm that the sampler can provide accurate estimates of expectations of functions of
interest. The mix of high probability and low probability derivations obtained through
sampling is shown to provide a more accurate estimate of expectations than merely
using the n-most highly probable derivations.
Subsequently, we show that the sampler provides a tractable solution for finding the
maximum probability translation in the model. We also present a unified approach to
approximating two additional intractable problems: minimum risk training and minimum
Bayes risk decoding. Key to our approach is the use of the sampler which
allows us to explore the entire probability distribution and maintain a strict probabilistic
formulation through the translation pipeline. For these tasks, sampling allies
the simplicity of n-best list approaches with the extended view of the distribution that
lattice-based approaches benefit from, while avoiding the biases associated with beam
search. Our approach is theoretically well-motivated and can give better and more
stable results than current state of the art methods.
Collections
The following license files are associated with this item: