NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Namysl, Marcin; Behnke, Sven; Köhler, Joachim

Computer Science > Computation and Language

arXiv:2005.07162 (cs)

[Submitted on 14 May 2020]

Title:NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Authors:Marcin Namysl, Sven Behnke, Joachim Köhler

View PDF

Abstract:Sequence labeling systems should perform reliably not only under ideal conditions but also with corrupted inputs - as these systems often process user-generated text or follow an error-prone upstream component. To this end, we formulate the noisy sequence labeling problem, where the input may undergo an unknown noising process and propose two Noise-Aware Training (NAT) objectives that improve robustness of sequence labeling performed on perturbed input: Our data augmentation method trains a neural model using a mixture of clean and noisy samples, whereas our stability training algorithm encourages the model to create a noise-invariant latent representation. We employ a vanilla noise model at training time. For evaluation, we use both the original data and its variants perturbed with real OCR errors and misspellings. Extensive experiments on English and German named entity recognition benchmarks confirmed that NAT consistently improved robustness of popular sequence labeling models, preserving accuracy on the original input. We make our code and data publicly available for the research community.

Comments:	Accepted to appear at ACL 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2005.07162 [cs.CL]
	(or arXiv:2005.07162v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.07162

Submission history

From: Marcin Namysl [view email]
[v1] Thu, 14 May 2020 17:30:06 UTC (510 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Marcin Namysl
Sven Behnke
Joachim Köhler

export BibTeX citation

Computer Science > Computation and Language

Title:NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators