research-article

Public Access

Sound-Adapter: Multi-Source Domain Adaptation for Acoustic Classification Through Domain Discovery

Authors:

Md Tamzeed Islam,

Shahriar NirjonAuthors Info & Claims

IPSN '21: Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021)

Pages 176 - 190

https://doi.org/10.1145/3412382.3458265

Published: 18 May 2021 Publication History

Abstract

The accuracy of an audio classifier drops when it is trained and tested in different conditions aka domains, e.g., different devices, different environments, or their combinations. Previous works have proposed audio domain adaptation techniques for a special case where the training data are recorded with a single source microphone and the model is applied to test data recorded with a different but single target microphone (i.e., single source to single target domain adaptation). In this paper, we solve a more generic and practical problem where the goal is to adapt models that are trained on data from more than one acoustic (i.e., multi-source domain adaptation). Unlike previous works, the proposed method does not assume availability of recording metadata (i.e., domain labels) in the training data---which makes the adaptation problem harder. To solve this, we propose the first multi-task deep neural network architecture to cluster audio samples according to their domain in an unsupervised way. Using the inferred domain information, we perform domain adaptation to remove biases due to domain heterogeneity from the machine learning model. We conduct extensive experiments on an empirical dataset that we collect from five domains as well as on a public dataset. Our results show that the proposed technique has a mean accuracy of 87% for domain discovery in a five domain scenario and its model adaptation step improves acoustic event classification accuracy by up to 21% when compared to state-of-the-art algorithms on datasets containing samples from multiple source domains.

References

[1]

Hearables Are Growing Fast. https://voicebot.ai/2019/09/13/idc-says-hearables-are-now-biggest-wearables-segment-and-growing-fast/.

[2]

Lenovo IdeaPad. https://www.lenovo.com/us/en/laptops/thinkpad.

[3]

Matrix Voice Microphone. https://www.matrix.one/products/voice.

[4]

Raspberry Pi Model 3. https://www.raspberrypi.org/products/raspberry-pi-3-model-a-plus/.

[5]

The Complete Guide to Hearable Technology in 2019. https://www.everydayhearing.com/hearing-technology/articles/hearables/.

[6]

USB Mic 1. https://www.amazon.com/dp/B00UZY2YQE.

[7]

USB Mic 2. https://www.amazon.com/dp/B01142EPO.

[8]

USB Mic 3. https://www.amazon.com/dp/B07N2WRHMY.

[9]

USB Mic 4. https://www.amazon.com/dp/B0028Y4DCC.

[10]

M. Almeida, M. Bilal, A. Finamore, I. Leontiadis, Y. Grunenberger, M. Varvello, and J. Blackburn. Chimp: Crowdsourcing human inputs for mobile phones. In Proceedings of the 2018 World Wide Web Conference, pages 45--54, 2018.

Digital Library

[11]

S. Aranganayagi and K. Thangavel. Clustering categorical data using silhouette coefficient as a relocating measure. In International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), volume 2. IEEE, 2007.

Digital Library

[12]

Y. Aytar, C. Vondrick, and A. Torralba. Soundnet: Learning sound representations from unlabeled video. In Advances in neural information processing systems, 2016.

[13]

H. Bojinov, Y. Michalevsky, G. Nakibly, and D. Boneh. Mobile device identification via sensor fingerprinting. arXiv preprint arXiv.1408.1416, 2014.

[14]

K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in neural information processing systems, pages 343--351, 2016.

Digital Library

[15]

N. Bui, N. Pham, J. J. Barnitz, Z. Zou, P. Nguyen, H. Truong, T. Kim, N. Farrow, A. Nguyen, J. Xiao, et al. ebp: A wearable system for frequent and comfortable blood pressure monitoring from user's ear. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1--17, 2019.

Digital Library

[16]

A. Canziani, A. Paszke, and E. Culurciello. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678, 2016.

[17]

A. Chowdhery, P. Warden, J. Shlens, A. Howard, and R. Rhodes. Visual wake words dataset. arXiv preprint arXiv:1906.05721, 2019.

[18]

A. Das, N. Borisov, and M. Caesar. Do you hear what i hear?: Fingerprinting smart devices through embedded acoustic components. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014.

Digital Library

[19]

D. de Godoy, B. Islam, S. Xia, M. T. Islam, R. Chandrasekaran, Y.-C. Chen, S. Nirjon, P. R. Kinget, and X. Jiang. Paws: A wearable acoustic system for pedestrian safety. In 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI), pages 237--248. IEEE, 2018.

[20]

L. Deng, M. L. Seltzer, D. Yu, A. Acero, A.-r. Mohamed, and G. Hinton. Binary coding of speech spectrograms using a deep auto-encoder. In Eleventh Annual Conference of the International Speech Communication Association, 2010.

[21]

D. Emmanouilidou and H. Gamper. The effect of room acoustics on audio event classification. In Proc. 23rd International Congress on Acoustics (ICA), 2019.

[22]

Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by back-propagation. arXiv preprint arXiv:1409.7495, 2014.

[23]

G. Gokul, Y. Yan, K. Dantu, S. Y. Ko, and L. Ziarek. Real time sound processing on android. In Proceedings of the 14th International Workshop on Java Technologies for Real-Time and Embedded Systems, pages 1--10, 2016.

Digital Library

[24]

B. Gong, K. Grauman, and F. Sha. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In International Conference on Machine Learning, pages 222--230, 2013.

Digital Library

[25]

B. Gong, K. Grauman, and F. Sha. Reshaping visual datasets for domain adaptation. In Advances in Neural Information Processing Systems, pages 1286--1294, 2013.

Digital Library

[26]

B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2066--2073. IEEE, 2012.

Digital Library

[27]

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.

Digital Library

[28]

S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al. Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131--135. IEEE, 2017.

[29]

J. Hoffman, B. Kulis, T. Darrell, and K. Saenko. Discovering latent domains for multisource domain adaptation. In European Conference on Computer Vision, pages 702--715. Springer, 2012.

Digital Library

[30]

J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017.

[31]

J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf, and A. J. Smola. Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems, pages 601--608, 2007.

Digital Library

[32]

I. Hwang, S. Liu, E. J. Rozner, and C. N. Sze. Determining a behavior of a user utilizing audio data, May 3 2018. US Patent App. 15/336, 614.

[33]

M. T. Islam and S. Nirjon. Soundsemantics: exploiting semantic knowledge in text for embedded acoustic event classification. In International Conference on Information Processing in Sensor Networks, pages 217--228. ACM, 2019.

Digital Library

[34]

B. E. Kingsbury, N. Morgan, and S. Greenberg. Robust speech recognition using the modulation spectrogram. Speech communication, 25(1--3):117--132, 1998.

[35]

K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 1--4. IEEE, 2013.

[36]

A. Kurtz, H. Gascon, T. Becker, K. Rieck, and F. Freiling. Fingerprinting mobile devices using personalized configurations. Proceedings on Privacy Enhancing Technologies, 2016(1):4--19, 2016.

[37]

P. Lamere, P. Kwok, E. Gouvea, B. Raj, R. Singh, W. Walker, M. Warmuth, and P. Wolf. The cmu sphinx-4 speech recognition system. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2003), volume 1, pages 2--5, 2003.

[38]

G. Laput, K. Ahuja, M. Goel, and C. Harrison. Ubicoustics: Plug-and-play acoustic activity recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, pages 213--224, 2018.

Digital Library

[39]

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436, 2015.

[40]

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.

[41]

S. Łukasik, P. A. Kowalski, M. Charytanowicz, and P. Kulczycki. Clustering using flower pollination algorithm and calinski-harabasz index. In 2016 IEEE Congress on Evolutionary Computation (CEC), pages 2724--2728. IEEE, 2016.

Digital Library

[42]

M. Mancini, L. Porzi, S. R. Bulo, B. Caputo, and E. Ricci. Inferring latent domains for unsupervised deep domain adaptation. IEEE transactions on pattern analysis and machine intelligence, 2019.

Digital Library

[43]

M. Mancini, L. Porzi, S. Rota Bulò, B. Caputo, and E. Ricci. Boosting domain adaptation by discovering latent domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3771--3780, 2018.

[44]

A. Mathur, N. Berthouze, and N. D. Lane. Unsupervised domain adaptation under label space mismatch for speech classification.

[45]

A. Mathur, A. Isopoussu, F. Kawsar, N. Berthouze, and N. D. Lane. Mic2mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In Proceedings of the 18th International Conference on Information Processing in Sensor Networks, pages 169--180. ACM, 2019.

Digital Library

[46]

A. Mathur, T. Zhang, S. Bhattacharya, P. Veličković, L. Joffe, N. D. Lane, F. Kawsar, and P. Lió. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks, pages 200--211. IEEE Press, 2018.

Digital Library

[47]

A. Mesaros, T. Heittola, and T. Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pages 9--13, November 2018.

[48]

R. Nandakumar, S. Gollakota and N. Watson. Contactless sleep apnea detection on smartphones. In Proceedings of the 13th annual international conference on mobile systems, applications, and services, pages 45--57. ACM, 2015.

Digital Library

[49]

K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86--93, 2000.

Digital Library

[50]

S. Nirjon, R. F. Dickerson, P. Asare, Q. Li, D. Hong, J. A. Stankovic, P. Hu, G. Shen, and X. Jiang. Auditeur: A mobile-cloud service platform for acoustic event detection on smartphones. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services, pages 403--416, 2013.

Digital Library

[51]

S. Nirjon, R. F. Dickerson, Q. Li, P. Asare, J. A. Stankovic, D. Hong, B. Zhang, X. Jiang, G. Shen, and F. Zhao. Musicalheart: A hearty way of listening to music. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems.

[52]

K. J. Piczak. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd Annual ACM Conference on Multimedia, pages 1015--1018. ACM Press.

[53]

S. Pradhan, W. Sun, G. Baig, and L. Qiu. Combating replay attacks against voice assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(3):1--26, 2019.

Digital Library

[54]

H.-K. Ra, A. Salekin, H.-J. Yoon, J. Kim, S. S. Nirjon, D. J. Stone, S. Kim, J.-M. Lee, S. H. Son, J. A. Stankovic, et al. Asthmaguide: an asthma monitoring and advice ecosystem. In Wireless Health, pages 128--135, 2016.

[55]

N. Roy, H. Hassanieh, and R. Roy Choudhury. Backdoor: Making microphones hear inaudible sounds. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pages 2--14, 2017.

Digital Library

[56]

S. Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.

[57]

H. B. Sailor, D. M. Agrawal, and H. A. Patil. Unsupervised filterbank learning using convolutional restricted boltzmann machine for environmental sound classification. 2017.

[58]

B. U. Töreyin, Y. Dedeoğlu, and A. E. Çetin. Hmm based falling person detection using both audio and video. In International Workshop on Human-Computer Interaction, pages 211--220. Springer, 2005.

Digital Library

[59]

A. Wang. The shazam music recognition service. Communications of the ACM, 49(8):44--48, 2006.

Digital Library

[60]

A. Wang and S. Gollakota. Millisonic: Pushing the limits of acoustic motion tracking. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1--11, 2019.

Digital Library

[61]

H. Wu, J. Feng, X. Tian, F. Xu, Y. Liu, X. Wang, and S. Zhong. secgan: A cycle-consistent gan for securely-recoverable video transformation. In Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019.

Digital Library

[62]

J. Xie, R. Girshick, and A. Farhadi. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478--487, 2016.

Digital Library

[63]

J. Xie, S.Jiang, W. Xie, and X. Gao. An efficient global k-means clustering algorithm. 2011.

[64]

H. Zha, X. He, C. Ding, M. Gu, and H. D. Simon. Spectral relaxation for k-means clustering. In Advances in neural information processing systems, pages 1057--1064.

Cited By

Zhang JDai YChen JLuo CWei BLeung VLi J(2023)SIDAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109197:3(1-24)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610919
Chatterjee SSingh AMitra BChakraborty S(2023)Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT58021.2023.00013(25-33)Online publication date: Jun-2023
https://doi.org/10.1109/DCOSS-IoT58021.2023.00013
Samyoun SIslam MIqbal TStankovic J(2022)M3SenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35346006:2(1-32)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3534600

Index Terms

Sound-Adapter: Multi-Source Domain Adaptation for Acoustic Classification Through Domain Discovery
1. Computer systems organization
  1. Embedded and cyber-physical systems
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Federated Self-training for Semi-supervised Audio Recognition
Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices such as smartphones and virtual assistants, labeling is entrusted to the clients or labels are extracted in an ...
Ensemble of convolutional neural networks to improve animal audio classification
Abstract
In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of producing better classification accuracy ...
Unsupervised classification of audio signals by self-organizing maps and bayesian labeling
HAIS'12: Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I

Audio signal classification consists of extracting some descriptive features from a sound and use them as input in a classifier. Then, the classifier will assign a different label to any different sound class. The classification of the features can be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IPSN '21: Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021)

May 2021

423 pages

ISBN:9781450380980

DOI:10.1145/3412382

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE-SPS: Signal Processing Society
SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF
NIH

Conference

IPSN '21

Sponsor:

IEEE-SPS
SIGBED

IPSN '21: The 20th International Conference on Information Processing in Sensor Networks

May 18 - 21, 2021

TN, Nashville, USA

Acceptance Rates

Overall Acceptance Rate 143 of 593 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
753
Total Downloads

Downloads (Last 12 months)229
Downloads (Last 6 weeks)30

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JDai YChen JLuo CWei BLeung VLi J(2023)SIDAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109197:3(1-24)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610919
Chatterjee SSingh AMitra BChakraborty S(2023)Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT58021.2023.00013(25-33)Online publication date: Jun-2023
https://doi.org/10.1109/DCOSS-IoT58021.2023.00013
Samyoun SIslam MIqbal TStankovic J(2022)M3SenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35346006:2(1-32)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3534600

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten