Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications

Naithani, Gaurav; Nikunen, Joonas; Bramsløw, Lars; Virtanen, Tuomas

Computer Science > Sound

arXiv:1807.06899 (cs)

[Submitted on 18 Jul 2018]

Title:Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications

Authors:Gaurav Naithani, Joonas Nikunen, Lars Bramsløw, Tuomas Virtanen

View PDF

Abstract:Mean square error (MSE) has been the preferred choice as loss function in the current deep neural network (DNN) based speech separation techniques. In this paper, we propose a new cost function with the aim of optimizing the extended short time objective intelligibility (ESTOI) measure. We focus on applications where low algorithmic latency ($\leq 10$ ms) is important. We use long short-term memory networks (LSTM) and evaluate our proposed approach on four sets of two-speaker mixtures from extended Danish hearing in noise (HINT) dataset. We show that the proposed loss function can offer improved or at par objective intelligibility (in terms of ESTOI) compared to an MSE optimized baseline while resulting in lower objective separation performance (in terms of the source to distortion ratio (SDR)). We then proceed to propose an approach where the network is first initialized with weights optimized for MSE criterion and then trained with the proposed ESTOI loss criterion. This approach mitigates some of the losses in objective separation performance while preserving the gains in objective intelligibility.

Comments:	To appear at International Workshop on Acoustic Signal Enhancement (IWAENC) 2018
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1807.06899 [cs.SD]
	(or arXiv:1807.06899v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1807.06899

Submission history

From: Gaurav Naithani [view email]
[v1] Wed, 18 Jul 2018 12:55:59 UTC (423 KB)

Computer Science > Sound

Title:Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators