Can DNNs Learn to Lipread Full Sentences?

Sterpu, George; Saam, Christian; Harte, Naomi

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:1805.11685 (eess)

[Submitted on 29 May 2018]

Title:Can DNNs Learn to Lipread Full Sentences?

Authors:George Sterpu, Christian Saam, Naomi Harte

View PDF

Abstract:Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss. The system is evaluated on the publicly available TCD-TIMIT dataset, with 59 speakers and a vocabulary of over 6000 words. Results show a major improvement on a Hidden Markov Model framework. A fuller analysis of performance across visemes demonstrates that the network is not only learning the language model, but actually learning to lipread.

Comments:	Accepted at the 2018 IEEE International Conference on Image Processing (ICIP 2018)
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1805.11685 [eess.IV]
	(or arXiv:1805.11685v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.1805.11685

Submission history

From: George Sterpu [view email]
[v1] Tue, 29 May 2018 19:54:19 UTC (52 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Can DNNs Learn to Lipread Full Sentences?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Can DNNs Learn to Lipread Full Sentences?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators