Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition

Huang, Yiheng; He, Liqiang; Han, Lei; Wang, Guangsen; Su, Dan

Computer Science > Computation and Language

arXiv:1909.00556 (cs)

[Submitted on 2 Sep 2019]

Title:Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition

Authors:Yiheng Huang, Liqiang He, Lei Han, Guangsen Wang, Dan Su

View PDF

Abstract:The success of speech assistants requires precise recognition of a number of entities on particular contexts. A common solution is to train a class-based n-gram language model and then expand the classes into specific words or phrases. However, when the class has a huge list, e.g., more than 20 million songs, a fully expansion will cause memory explosion. Worse still, the list items in the class need to be updated frequently, which requires a dynamic model updating technique. In this work, we propose to train pruned language models for the word classes to replace the slots in the root n-gram. We further propose to use a novel technique, named Difference Language Model (DLM), to correct the bias from the pruned language models. Once the decoding graph is built, we only need to recalculate the DLM when the entities in word classes are updated. Results show that the proposed method consistently and significantly outperforms the conventional approaches on all datasets, esp. for large lists, which the conventional approaches cannot handle.

Comments:	5 pages, 3 figures and 3 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1909.00556 [cs.CL]
	(or arXiv:1909.00556v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1909.00556

Submission history

From: Yiheng Huang [view email]
[v1] Mon, 2 Sep 2019 05:55:36 UTC (337 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Lei Han
Guangsen Wang
Dan Su

export BibTeX citation

Computer Science > Computation and Language

Title:Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators