Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
[Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count
[Pickhardt+ ACL2014] A Generalized Language Model as the
Comination of Skipped n-grams and Modified Kneser-Ney Smoothing
2014/7/12 ACL Reading @ PFI
Nakatani Shuyo, Cybozu Labs Inc.
Kneser-Ney Smoothing
[Kneser+ 1995]
• Discounting & Interpolation
𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
max 𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷, 0
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+
𝐷
𝑐 𝑤𝑖−𝑛+1
𝑖−1
𝑁1+ 𝑤𝑖−𝑛+1
𝑖−1
∙ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2
𝑖−1
• where
𝑤 𝑚
𝑛 = 𝑤 𝑚 ⋯ 𝑤 𝑛, 𝑁1+ 𝑤 𝑚
𝑛 ⋅ = 𝑤𝑖|𝑐 𝑤 𝑚
𝑛 𝑤𝑖 > 0
Number of
Discounting
Modified KN-Smoothing
[Chen+ 1999]
𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷 𝑤𝑖−𝑛+1
𝑖
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+ 𝛾 𝑤𝑖−𝑛+1
𝑖−1
𝑃 𝑤𝑖 𝑤𝑖−𝑛+2
𝑖−1
• where 𝐷 𝑐 = 0 if 𝑐 = 0,
𝐷1 if 𝑐 = 1, 𝐷2 if 𝑐 = 2, _ 𝐷3+ if 𝑐 ≥ 3
𝛾 𝑤𝑖−𝑛+1
𝑖−1
=
[amount of discounting]
𝑐 𝑤𝑖−𝑛+1
𝑖−1
Weighted Discounting
(D_n are estimated by leave-1-out CV)
[Zhang+ ACL2014] Kneser-Ney
Smoothing on Expected Count
• When each sentence has fractional
weight
– Domain adaptation
– EM-algorithm on word alignment
• Propose KN-smoothing using expected
fractional counts
I’m interested in it!
Model
• 𝒖 means 𝑤𝑖−𝑛+1
𝑖−1
, and 𝒖′ means 𝑤𝑖−𝑛+2
𝑖−1
• A sequence 𝒖𝑤 occurs 𝑘 times and each
occurring has probability 𝑝𝑖
(𝑖 = 1, ⋯ , 𝑘) as weight,
• then count 𝑐(𝒖𝑤) is distributed according to
Poisson Binomial Distribution.
• 𝑝 𝑐 𝑢𝑤 = 𝑟 = 𝑠 𝑘, 𝑟 , where
𝑠 𝑘, 𝑟 =
𝑠 𝑘 − 1, 𝑟 1 − 𝑝 𝑘
+ 𝑠 𝑘 − 1, 𝑟 − 1 𝑝 𝑘
if 0 ≤ 𝑟 ≤ 𝑘
1 if 𝑘 = 𝑟 = 0
0 otherwise
MLE on this model
• Expectations
– 𝔼 𝑐 𝒖𝑤 = 𝑟 ⋅ 𝑝 𝑐 𝒖𝑤 = 𝑟𝑟
– 𝔼 𝑁𝑟 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 = 𝑟𝑤
– 𝔼 𝑁𝑟+ 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 ≥ 𝑟𝑤
• Maximize (expected) likelihood
– 𝔼 𝐿 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤
= 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤
– obtain 𝑝MLE 𝑤 𝒖 =
𝔼 𝑐 𝒖𝑤
𝔼 𝑐 𝒖⋅
Expected Kneser-Ney
• 𝑐 𝒖𝑤 =
max 0, 𝑐 𝒖𝑤 − 𝐷 + 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′
)
• So, 𝔼 𝑐 𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 − 𝑝 𝑐 𝒖𝑤 > 0 𝐷 +
𝔼 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′
)
– where 𝑝′ 𝑤 𝒖′
=
𝔼 𝑁1+ ⋅𝒖′ 𝑤
𝔼 𝑁1+ ⋅𝒖′⋅
• then 𝑝 𝑤 𝒖 =
𝔼 𝑐 𝒖𝑤
𝔼 𝑐 𝒖⋅
Language model adaptation
• Our corpus consists on
– large general-domain data and
– small specific domain data
• Sentence 𝒘 ‘s weight:
– 𝑝 𝒘 is in − domain =
1
1+exp −𝐻 𝒘
– where 𝐻 𝒘 =
log 𝑝in 𝒘 −log 𝑝out 𝒘
𝒘
,
– 𝑝in:lang. model of in-domain, 𝑝out: out’s one
• Figure 1: On the language model adaptation task, expected KN outperforms all
other methods across all sizes of selected subsets. Integral KN is applied to
unweighted instances, while fractional WB, fractional KN and expected KN are
applied to weighted instances. (via [Zhang+ ACL2014])
from general-domain data
in-domain data
- training: 54k
- testing: 3k
192
162
156
148
Why isn't there
Modified KN as a
baseline?
[Pickhardt+ ACL2014] A Generalized Language Model
as the Comination of Skipped n-grams
and Modified Kneser-Ney Smoothing
• Higher-order n-grams are very sparse
– Especially remarkable on small data(e.g.
domain specific data!)
• Improve performance for small data
by skipped n-grams and Modified KN-
smoothing
– Perplexity reduces 25.7% for very small
training data of only 736KB text
“Generalized Language Models”
• 𝜕3 𝑤1 𝑤2 𝑤3 𝑤4 = 𝑤1 𝑤2_𝑤4
– “_” means a word placeholder
𝑃GLM 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷 𝑐 𝑤𝑖−𝑛+1
𝑖
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+𝛾high 𝑤𝑖−𝑛+1
𝑖−1 1
𝑛 − 1
𝑃GLM
𝑛−1
𝑗=1
𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
𝑃GLM 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
=
𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛
𝑖
− 𝐷 𝑐 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖
𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
∗
+𝛾mid 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1 1
𝑛 − 2
𝑃GLM 𝑤𝑖 𝜕𝑗 𝜕 𝑘 𝑤𝑖−𝑛+1
𝑖−1
𝑛−1
𝑘=1,𝑘≠𝑗
• The bold arrows correspond to interpolation of models in traditional
modified Kneser-Ney smoothing. The lighter arrows illustrate the
additional interpolations introduced by our generalized language
models. (via [Pickhardt+ ACL2014])
• shrunk training data
sets for the English
Wikipedia
small domain
specific data
Space Complexity
model size = 9.5GB
# of entries = 427M
model size = 15GB
# of entries = 742M
References
• [Zhang+ ACL2014] Kneser-Ney Smoothing
on Expected Count
• [Pickhardt+ ACL2014] A Generalized
Language Model as the Comination of
Skipped n-grams and Modified Kneser-Ney
Smoothing
• [Kneser+ 1995] Improved backing-off for m-
gram language modeling
• [Chen+ 1999] An Empirical Study of
Smoothing Techniques for Language Modeling

More Related Content

Viewers also liked

ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
Shuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
Shuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
Shuyo Nakatani
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみた
Koji Matsuda
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
Shuyo Nakatani
 
Acl読み会2014
Acl読み会2014Acl読み会2014
Acl読み会2014
tempra28
 
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
Hiroyuki TOKUNAGA
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割
tn1031
 
Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
Naoaki Okazaki
 
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
Preferred Networks
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
Shuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
Shuyo Nakatani
 
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
Yuya Unno
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
Shuyo Nakatani
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
Tutorial-DeepLearning-PCSJ-IMPS2016
Tutorial-DeepLearning-PCSJ-IMPS2016Tutorial-DeepLearning-PCSJ-IMPS2016
Tutorial-DeepLearning-PCSJ-IMPS2016
Takayoshi Yamashita
 

Viewers also liked (20)

ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみた
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Acl読み会2014
Acl読み会2014Acl読み会2014
Acl読み会2014
 
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割
 
Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
 
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
LDA入門
LDA入門LDA入門
LDA入門
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Tutorial-DeepLearning-PCSJ-IMPS2016
Tutorial-DeepLearning-PCSJ-IMPS2016Tutorial-DeepLearning-PCSJ-IMPS2016
Tutorial-DeepLearning-PCSJ-IMPS2016
 

Similar to ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"

QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
The Statistical and Applied Mathematical Sciences Institute
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
ChenYiHuang5
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
Hemantha Kulathilake
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
Dai-Hai Nguyen
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
Altoros
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
Manchor Ko
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
ZhiwuGuo1
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
csl9496
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
Jinho Lee
 
暗認本読書会6
暗認本読書会6暗認本読書会6
暗認本読書会6
MITSUNARI Shigeo
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
sagayalavanya2
 
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsEncoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Yubing Dong
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
KarasuLee
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
HassanAhmad442087
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged information
Akisato Kimura
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
SantiagoGarridoBulln
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
Asafak Husain
 

Similar to ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing" (20)

QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
暗認本読書会6
暗認本読書会6暗認本読書会6
暗認本読書会6
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsEncoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged information
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 

More from Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Shuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
Shuyo Nakatani
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
Shuyo Nakatani
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
Shuyo Nakatani
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
Shuyo Nakatani
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-Gram
Shuyo Nakatani
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
Shuyo Nakatani
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定
Shuyo Nakatani
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編
Shuyo Nakatani
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a Week
Shuyo Nakatani
 
言語判定へのいざない
言語判定へのいざない言語判定へのいざない
言語判定へのいざない
Shuyo Nakatani
 
∞-gram を使った短文言語判定
∞-gram を使った短文言語判定∞-gram を使った短文言語判定
∞-gram を使った短文言語判定
Shuyo Nakatani
 
CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011
Shuyo Nakatani
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
Shuyo Nakatani
 
CRF を使った Web 本文抽出
CRF を使った Web 本文抽出CRF を使った Web 本文抽出
CRF を使った Web 本文抽出
Shuyo Nakatani
 

More from Shuyo Nakatani (19)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-Gram
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a Week
 
言語判定へのいざない
言語判定へのいざない言語判定へのいざない
言語判定へのいざない
 
∞-gram を使った短文言語判定
∞-gram を使った短文言語判定∞-gram を使った短文言語判定
∞-gram を使った短文言語判定
 
CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
 
CRF を使った Web 本文抽出
CRF を使った Web 本文抽出CRF を使った Web 本文抽出
CRF を使った Web 本文抽出
 

Recently uploaded

How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
apoorva2579
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
uuuot
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
The Digital Insurer
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
kantakumariji156
 

Recently uploaded (20)

How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
 

ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"

  • 1. [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing 2014/7/12 ACL Reading @ PFI Nakatani Shuyo, Cybozu Labs Inc.
  • 2. Kneser-Ney Smoothing [Kneser+ 1995] • Discounting & Interpolation 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = max 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷, 0 𝑐 𝑤𝑖−𝑛+1 𝑖−1 + 𝐷 𝑐 𝑤𝑖−𝑛+1 𝑖−1 𝑁1+ 𝑤𝑖−𝑛+1 𝑖−1 ∙ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2 𝑖−1 • where 𝑤 𝑚 𝑛 = 𝑤 𝑚 ⋯ 𝑤 𝑛, 𝑁1+ 𝑤 𝑚 𝑛 ⋅ = 𝑤𝑖|𝑐 𝑤 𝑚 𝑛 𝑤𝑖 > 0 Number of Discounting
  • 3. Modified KN-Smoothing [Chen+ 1999] 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷 𝑤𝑖−𝑛+1 𝑖 𝑐 𝑤𝑖−𝑛+1 𝑖−1 + 𝛾 𝑤𝑖−𝑛+1 𝑖−1 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2 𝑖−1 • where 𝐷 𝑐 = 0 if 𝑐 = 0, 𝐷1 if 𝑐 = 1, 𝐷2 if 𝑐 = 2, _ 𝐷3+ if 𝑐 ≥ 3 𝛾 𝑤𝑖−𝑛+1 𝑖−1 = [amount of discounting] 𝑐 𝑤𝑖−𝑛+1 𝑖−1 Weighted Discounting (D_n are estimated by leave-1-out CV)
  • 4. [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count • When each sentence has fractional weight – Domain adaptation – EM-algorithm on word alignment • Propose KN-smoothing using expected fractional counts I’m interested in it!
  • 5. Model • 𝒖 means 𝑤𝑖−𝑛+1 𝑖−1 , and 𝒖′ means 𝑤𝑖−𝑛+2 𝑖−1 • A sequence 𝒖𝑤 occurs 𝑘 times and each occurring has probability 𝑝𝑖 (𝑖 = 1, ⋯ , 𝑘) as weight, • then count 𝑐(𝒖𝑤) is distributed according to Poisson Binomial Distribution. • 𝑝 𝑐 𝑢𝑤 = 𝑟 = 𝑠 𝑘, 𝑟 , where 𝑠 𝑘, 𝑟 = 𝑠 𝑘 − 1, 𝑟 1 − 𝑝 𝑘 + 𝑠 𝑘 − 1, 𝑟 − 1 𝑝 𝑘 if 0 ≤ 𝑟 ≤ 𝑘 1 if 𝑘 = 𝑟 = 0 0 otherwise
  • 6. MLE on this model • Expectations – 𝔼 𝑐 𝒖𝑤 = 𝑟 ⋅ 𝑝 𝑐 𝒖𝑤 = 𝑟𝑟 – 𝔼 𝑁𝑟 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 = 𝑟𝑤 – 𝔼 𝑁𝑟+ 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 ≥ 𝑟𝑤 • Maximize (expected) likelihood – 𝔼 𝐿 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤 – obtain 𝑝MLE 𝑤 𝒖 = 𝔼 𝑐 𝒖𝑤 𝔼 𝑐 𝒖⋅
  • 7. Expected Kneser-Ney • 𝑐 𝒖𝑤 = max 0, 𝑐 𝒖𝑤 − 𝐷 + 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′ ) • So, 𝔼 𝑐 𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 − 𝑝 𝑐 𝒖𝑤 > 0 𝐷 + 𝔼 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′ ) – where 𝑝′ 𝑤 𝒖′ = 𝔼 𝑁1+ ⋅𝒖′ 𝑤 𝔼 𝑁1+ ⋅𝒖′⋅ • then 𝑝 𝑤 𝒖 = 𝔼 𝑐 𝒖𝑤 𝔼 𝑐 𝒖⋅
  • 8. Language model adaptation • Our corpus consists on – large general-domain data and – small specific domain data • Sentence 𝒘 ‘s weight: – 𝑝 𝒘 is in − domain = 1 1+exp −𝐻 𝒘 – where 𝐻 𝒘 = log 𝑝in 𝒘 −log 𝑝out 𝒘 𝒘 , – 𝑝in:lang. model of in-domain, 𝑝out: out’s one
  • 9. • Figure 1: On the language model adaptation task, expected KN outperforms all other methods across all sizes of selected subsets. Integral KN is applied to unweighted instances, while fractional WB, fractional KN and expected KN are applied to weighted instances. (via [Zhang+ ACL2014]) from general-domain data in-domain data - training: 54k - testing: 3k 192 162 156 148 Why isn't there Modified KN as a baseline?
  • 10. [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing • Higher-order n-grams are very sparse – Especially remarkable on small data(e.g. domain specific data!) • Improve performance for small data by skipped n-grams and Modified KN- smoothing – Perplexity reduces 25.7% for very small training data of only 736KB text
  • 11. “Generalized Language Models” • 𝜕3 𝑤1 𝑤2 𝑤3 𝑤4 = 𝑤1 𝑤2_𝑤4 – “_” means a word placeholder 𝑃GLM 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷 𝑐 𝑤𝑖−𝑛+1 𝑖 𝑐 𝑤𝑖−𝑛+1 𝑖−1 +𝛾high 𝑤𝑖−𝑛+1 𝑖−1 1 𝑛 − 1 𝑃GLM 𝑛−1 𝑗=1 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 𝑃GLM 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 = 𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛 𝑖 − 𝐷 𝑐 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖 𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 ∗ +𝛾mid 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 1 𝑛 − 2 𝑃GLM 𝑤𝑖 𝜕𝑗 𝜕 𝑘 𝑤𝑖−𝑛+1 𝑖−1 𝑛−1 𝑘=1,𝑘≠𝑗
  • 12. • The bold arrows correspond to interpolation of models in traditional modified Kneser-Ney smoothing. The lighter arrows illustrate the additional interpolations introduced by our generalized language models. (via [Pickhardt+ ACL2014])
  • 13. • shrunk training data sets for the English Wikipedia small domain specific data
  • 14. Space Complexity model size = 9.5GB # of entries = 427M model size = 15GB # of entries = 742M
  • 15. References • [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count • [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing • [Kneser+ 1995] Improved backing-off for m- gram language modeling • [Chen+ 1999] An Empirical Study of Smoothing Techniques for Language Modeling