This document discusses various techniques for optimizing search space in phrase-based machine translation models, including:
1) Using graph structures and semirings like the tropical semiring to represent translation hypotheses as paths through a weighted graph and find optimal paths.
2) Applying constraints like distortion limits and beam search to prune unpromising partial translations.
3) Using heuristic functions to guide the search and pre-ordering methods like rules and learned models to reorder languages with different word orders.
Report
Share
Report
Share
1 of 26
More Related Content
[Book Reading] 機械翻訳 - Section 5 No.2
1. Graph Structure
・Use search graph in phrase-based model
・At weighted acyclic directed graph G < Ф,V,E,s,g,𝐴>
Ф : phrase pair sets
Ф=feature vector h(・)・weight 𝜔
V: vertex ≡ partial hypotheses
E:edges ≡ weight of route
E ⊆ V×V× Ф×A
A: weight sets
2. Graph Structure
• out(𝑣)= 𝑣 = 𝑒 ∈ 𝐸|tail(𝑒) : edge sets which go
out from vertex 𝑣
• in(𝑣) = 𝑣 = 𝑒 ∈ 𝐸|head(𝑒) : edge sets which
head to vertex 𝑣
->Phrase pairs are linked by <out(𝑣), in(𝑣)>
At figure 5.8, phrase pair <へ行った, I went to> is
linked by
out(𝑣) = <-----,0,<s>> and in(𝑣)=<--・・・,9,went to>
𝑣
𝑣
3. Graph Structure
• If Ѱ=(𝑒1, 𝑒1,…, 𝑒l): rout from start to any vertexs,
head(𝑒k)=tail(𝑒k+1), then
Source language phrase sets:
𝑘=1
𝑙
𝑓(∅(𝑒 𝑘)) ≡ 𝑓(Ѱ)
Target language phrase sets:
𝑒(∅ 𝑒1 ), … , 𝑒(∅ 𝑒𝑙 ≡ 𝑒(Ѱ)
Route weight: 𝜔(Ѱ)= 𝑘=1
𝑙
𝜔(𝑒 𝑘)
4. Graph Structure
• In Fig.5.8, for the route
-> the parallel of word sets of source language
「行った」「へ」「領事館」is
“He went to the consulate”
Start
<行った,He went>
<へ,to>
<領事館,
the consulate>
5. Semiring
• set R equipped with two binary operations
addition“ + ” and multiplication “ × ”
• Associative:
a+(b+c)=(a+b)+c, a×(b×c)=(a×b)×c
• Commutative: a+b=b+a
• Distributional: a×(b+c)=(a×b)+(a×c)
• Additive inverse, multiplicative inverse
0+a=a+0=a; 1×a=a×1=a; 0×a=a×0=0
are not defined
6. Semiring
• In Table 5.1, tropical semiring is used to solve
maximization problem for route weight in
decoder
A ⊕ ⊗ 𝟎 𝟏
Tropical 𝑅−∞
∞ max + ー∞ 0
7. Semiring
• In weight directed graph G, for a rout from
starting point to ending point of source
language input f is Ѱ= 𝑒1, 𝑒1,…, 𝑒l
• Score of Ѱ = product of partial routes
𝜔(Ѱ)=⊗ 𝑘=1 𝜔(𝑒 𝑘)
-> Problem which maximize this score is
max⊗𝜔(𝑒)= ⊕⊗𝜔(𝑒)
A ⊕ ⊗ 𝟎 𝟏
Tropical 𝑅−∞
∞ max + ー∞ 0
8. Semiring
• In Fig.5.7,line 11
Q(𝑣′
, 𝑗′′
+1,𝑒′
𝑠 𝑒′′
𝑠)←max
Q(𝑣′
, 𝑗′′
+1,𝑒′
𝑠 𝑒′′
𝑠),
Q(𝑣, 𝑗, 𝑒′
𝑒′′
)+𝑠 𝑑 + 𝑠∅ + 𝑠𝑙𝑚
additive operation ⊕ is implemented for
each vertex tail(e)=s of G
• As semiring sastifies distributional feature
-> weight 𝜔(𝑣)of any vertexs 𝑣 ∈V is
⊕⊗𝜔(𝑒)=⊕ 𝑒∈𝑖𝑛(𝑣) 𝜔(𝑒)⊗ 𝜔(𝑡𝑎𝑖𝑙(𝑒))
9. Semiring
• Forward-backward algorithm for finding
maximum of route weight in graph structure
• topological order(G): list of vertexs of graph G
which arranged in topological order
• 𝛼, 𝛽: external variable
12. Semiring
In problem which choose the optimum
translation from search space expressed by
weighted directed graph G
Tropical semiring + Forward algorithm
->Viterbi semiring
13. k-best
• Besides forward-backward algorithm, k-best
algorithm is used to optimize route weight
• Dijkstra’s algorithm: for single source shortest
path problem
• Eppstein’s algorithm: for heaping multiple paths
efficiently
14. k-best
• Assume problem satisfies Tropical semiring
and backward algorithm
• Calculate and choose max (weight 𝛽(𝑣))
• Fig.5.10 algorithm
・cand: priority queue
・< 𝑣, s>: partial route
・< 𝑣′
,𝑠′
>: partial route whose vertex 𝑣′
= 𝑣
and edge 𝑠′
= tail 𝑒 = 𝑒 ∈out(𝑣)
・D: set of < 𝑣′
,𝑠′
>
15. k-best
• k=1: Initialized cand
• Optimize weight of partial route and whole
route
Whole route
D
cand
optimal
get out < 𝑣, s>,register D
Choose 𝑣′ = 𝑣 and
𝑒′ = e ∈out(𝑣)
insert to cand
heap 𝛽(・) to get optimal
k time
16. Limitation of Search Space
• If search space is big
->any sort can be forgiven
->calculation amount of decode algorithm
become massive
->limitation is necessary:
・Distortion limit, constraint
・Reordering limit, constraint
17. Distortion Constraint
• Upper limit setting d for distance between
phrase pair ∅ 𝑘and∅ 𝑘−1: start 𝑘 − end 𝑘−1 ≤d
The purpose is making model score small if
model distorted lead to penalty become big
For language pair which do not have big sort,
distortion constraint reach good efficiency
If d=0: no skip, translate from left to right
smoothly
->monotone translation
18. Distortion Constraint
• Constraint for case when have partial phrases
do not reach the ending point
𝑗: position of the first phrase of source language
start 𝑘: the first position of translated phrase
If ( 𝑗 < start 𝑘), add
end 𝑘 − 𝑗 ≤d
・IBM Constraint
𝑗 𝑠𝑡𝑎𝑟𝑡 𝑘 𝑒𝑛𝑑 𝑘・・・
∅ 𝑘 phrase
No need to
exam
19. Beam Search
・Prune disused partial hypothesis and pay
attention only partial hypothesis with high score
for computational reduction
・Group of vertexs of search graph and prune
partial hypothesis which has low score
20. Beam Search
・Group of vertexs of search graph and prune
partial hypothesis which has low score
Partial hypothesis pruned Partial hypothesis chose
22. Heuristic Function
• Prevent partial hypothesis which has not been
translated yet from pruning
• Give predicted score for the rout and learn by
A* search so that rout score get the maximum
• ->can reduce search error
23. Pre-reordering Method
Translation between languages which has
significantly different grammatical structure
• Pre-reordering rule
• Pre-reordering model
• Pre-reordering learning
24. Pre-reordering Rule
• Based on tree from syntactic analysis, reorder to
target language word order
• Head-driven phrase structure grammar(HPSG)’s
rule:
- Syntactic anlysis
- Move the subjects back
25. Pre-reordering Model
• Source languages must have syntactic analysis
tool and morphological analysis tool
• Bilingual data are necessary
• Probability value of pre-reordering patterns
obtained will be estimated by maximum-
likelihood estimation(MLE)
• Choose the suitable pre-reordering patterns
based on reordering part of speech from
morphological analysis, or clustering word
class
26. Pre-reordering Learning
• For language pairs without any syntactic
analysis tools and morphological analysis tools
• Provisional tree structure automatically
generated from syntactic analysis result
• Divide tree factors to 2 labels: reordering label
[X],and no-reordering label <X>
• Use linear ordering problem(LOP) to
formulate reordering model to find the
approximate solution and build the parse tree