Aligning Subtitles in Sign Language Videos

Bull, Hannah; Afouras, Triantafyllos; Varol, Gül; Albanie, Samuel; Momeni, Liliane; Zisserman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2105.02877 (cs)

[Submitted on 6 May 2021]

Title:Aligning Subtitles in Sign Language Videos

Authors:Hannah Bull, Triantafyllos Afouras, Gül Varol, Samuel Albanie, Liliane Momeni, Andrew Zisserman

View PDF

Abstract:The goal of this work is to temporally align asynchronous subtitles in sign language videos. In particular, we focus on sign-language interpreted TV broadcast data comprising (i) a video of continuous signing, and (ii) subtitles corresponding to the audio content. Previous work exploiting such weakly-aligned data only considered finding keyword-sign correspondences, whereas we aim to localise a complete subtitle text in continuous signing. We propose a Transformer architecture tailored for this task, which we train on manually annotated alignments covering over 15K subtitles that span 17.7 hours of video. We use BERT subtitle embeddings and CNN video representations learned for sign recognition to encode the two signals, which interact through a series of attention layers. Our model outputs frame-level predictions, i.e., for each video frame, whether it belongs to the queried subtitle or not. Through extensive evaluations, we show substantial improvements over existing alignment baselines that do not make use of subtitle text embeddings for learning. Our automatic alignment model opens up possibilities for advancing machine translation of sign languages via providing continuously synchronized video-text data.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2105.02877 [cs.CV]
	(or arXiv:2105.02877v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2105.02877

Submission history

From: Triantafyllos Afouras [view email]
[v1] Thu, 6 May 2021 17:59:36 UTC (4,111 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Triantafyllos Afouras
Gül Varol
Samuel Albanie
Andrew Zisserman

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Aligning Subtitles in Sign Language Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Aligning Subtitles in Sign Language Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators