Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3386415.3386968acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciteeConference Proceedingsconference-collections
research-article

Monaural Speech Separation of Specific Speaker Based on Deep Learning

Published: 30 May 2020 Publication History

Abstract

The traditional speaker separation system is usually only for two speakers to separate, which is difficult to apply to other mixed speech. In this paper, a mono-channel speech separation method for specific speakers is proposed. The specific speaker here is the target speaker that we want to separate from the mono channel. The training data comes from the mixture of the speech of the specific speaker with multiple non-specific speakers, which reduces the dependence between the target speaker and other speakers to some extent. When the test data is mixed by the target speaker and other non-specific people, the speech of the target speaker can also be separated well, and the model has certain robustness. This paper compares the separation performance of three kinds of network models based on Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM) and Bidirectional Long-Short Term Memory (Bi-LSTM). The experimental results show that Bi-LSTM network has the best performance under Blind Source Separation Evaluation (BBS Eval). In addition, this paper explores the influence of the number of consecutive time-frequency amplitude frames input into the network on the separation results. The experimental results show that the separation performance is best when the number of input frames is 10.

References

[1]
P.K. Mundodu Krishna and K. Ramaswamy. 2017. Single Channel speech separation based on empirical mode decomposition and Hilbert Transform. IET Signal Processing. IET 11, 5 (July. 2017), 579--586.
[2]
X. Zheng, C. Rit z and J. Xi. 2013. Collaborative Blind Source Separation Using Location Informed Spatial Microphones. IEEE Signal Processing Letters. IEEE 20, 1 (Jan. 2013), 83--86.
[3]
S. Michael, W. Michael, P. Franz. 2011. Source-filter based single channel speech separation using pitch information. IEEE Trans. Audio Speech Lang. Process. IEEE 19, 2 (Feb. 2011), 242--254.
[4]
Y. Xu, J. Du, L. Dai and C. Lee. 2014. An Experimental Study on Speech Enhancement Based on Deep Neural Networks. IEEE Signal Processing Letters. IEEE 21, 1 (Jan. 2014), 65--68.
[5]
D. Jun, T. Yanhui, D. Lirong et al. 2016. A regression approach to single channel speech separation via high resolution deep neural networks. IEEE Trans. Audio Speech Lang. Process. IEEE 24, 8 (Aug. 2016), 1424--1437.
[6]
P. Huang, K. Minje, H. Mark et al. 2015. Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE Trans. Audio Speech Lang. Process. IEEE 23, 12 (Dec. 2015), 2136--2147.
[7]
L. Xiao, W. DeLiang. 2016. A deep ensemble learning method for monaural speech separation. IEEE Trans. Audio Speech Lang. Process. IEEE 24, 5 (May. 2016), 967--977.
[8]
K. W. E Lin, B. B. T. E. Koh, S. Lui, D. Herremans. 2018. Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy. Neural Computing and Applications.
[9]
D. Stoller, S. Ewert, S. Dixon, 2017. Wave-u-net: A multi-scale neural network for end-to-end audio source separation. ISMIR.
[10]
Y. Sun, Y. Xian, W. Wang and S. M. Naqvi. 2019. Monaural Source Separation in Complex Domain with Long Short-Term Memory Neural Network. IEEE Journal of Selected Topics in Signal Processing. IEEE 13, 2 (May. 2019), 359--369.
[11]
M. Delfarah and D. Wang. 2018. Recurrent Neural Networks for Cochannel Speech Separation in Reverberant Environments. In Proceeding of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Calgary, AB, 5404--5408.
[12]
Guoning Hu and DeLiang Wang. 2004. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks. IEEE 15, 5 (Sept. 2004), 1135--1150.
[13]
Y. X. Wang, A. Narayannan, and D. L. Wang. 2014. on training tragets for supervised speech separation. IEEE transactions on Audio. Speech and Language Processing. IEEE 22, 12, 1849--1858.
[14]
M. Delfarah and D. Wang. 2017. Features for Masking-Based Monaural Speech Separation in Reverberant Conditions. In IEEE/ACM Transactions on Audio, Speech, and Language Processing. IEEE 25, 5 (May 2017), 1085--1094.
[15]
Y. Wang and D. Wang. 2015. A deep neural network for time-domain signal reconstruction. In Proceeding of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brisbane, QLD, 4390--4394.
[16]
E. Vincent, R. Gribonval and C. Fevotte. 2006. Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing. IEEE 14, 4 (July. 2006), 1462--1469.

Cited By

View all
  • (2022)A novel end‐to‐end deep separation network based on attention mechanism for single channel blind separation in wireless communicationIET Signal Processing10.1049/sil2.1217317:2Online publication date: 7-Nov-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICITEE '19: Proceedings of the 2nd International Conference on Information Technologies and Electrical Engineering
December 2019
870 pages
ISBN:9781450372930
DOI:10.1145/3386415
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bi-LSTM
  2. Frames
  3. Monaural speech separation
  4. Specific speaker

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICITEE-2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A novel end‐to‐end deep separation network based on attention mechanism for single channel blind separation in wireless communicationIET Signal Processing10.1049/sil2.1217317:2Online publication date: 7-Nov-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media