Deep Structured Output Learning for Unconstrained Text Recognition

Jaderberg, Max; Simonyan, Karen; Vedaldi, Andrea; Zisserman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:1412.5903 (cs)

[Submitted on 18 Dec 2014 (v1), last revised 10 Apr 2015 (this version, v5)]

Title:Deep Structured Output Learning for Unconstrained Text Recognition

Authors:Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

View PDF

Abstract:We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length.
To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN that predicts characters at each position of the output, while higher order terms are provided by another CNN that detects the presence of N-grams. We show that this entire model (CRF, character predictor, N-gram predictor) can be jointly optimised by back-propagating the structured output loss, essentially requiring the system to perform multi-task learning, and training uses purely synthetically generated data. The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable.

Comments:	arXiv admin note: text overlap with arXiv:1406.2227
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1412.5903 [cs.CV]
	(or arXiv:1412.5903v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1412.5903

Submission history

From: Max Jaderberg [view email]
[v1] Thu, 18 Dec 2014 15:49:46 UTC (1,126 KB)
[v2] Fri, 19 Dec 2014 17:37:37 UTC (1,756 KB)
[v3] Mon, 22 Dec 2014 19:56:48 UTC (2,071 KB)
[v4] Tue, 23 Dec 2014 13:17:59 UTC (2,089 KB)
[v5] Fri, 10 Apr 2015 15:36:01 UTC (2,089 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Structured Output Learning for Unconstrained Text Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Structured Output Learning for Unconstrained Text Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators