Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3412382.3458265acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article
Public Access

Sound-Adapter: Multi-Source Domain Adaptation for Acoustic Classification Through Domain Discovery

Published: 18 May 2021 Publication History

Abstract

The accuracy of an audio classifier drops when it is trained and tested in different conditions aka domains, e.g., different devices, different environments, or their combinations. Previous works have proposed audio domain adaptation techniques for a special case where the training data are recorded with a single source microphone and the model is applied to test data recorded with a different but single target microphone (i.e., single source to single target domain adaptation). In this paper, we solve a more generic and practical problem where the goal is to adapt models that are trained on data from more than one acoustic (i.e., multi-source domain adaptation). Unlike previous works, the proposed method does not assume availability of recording metadata (i.e., domain labels) in the training data---which makes the adaptation problem harder. To solve this, we propose the first multi-task deep neural network architecture to cluster audio samples according to their domain in an unsupervised way. Using the inferred domain information, we perform domain adaptation to remove biases due to domain heterogeneity from the machine learning model. We conduct extensive experiments on an empirical dataset that we collect from five domains as well as on a public dataset. Our results show that the proposed technique has a mean accuracy of 87% for domain discovery in a five domain scenario and its model adaptation step improves acoustic event classification accuracy by up to 21% when compared to state-of-the-art algorithms on datasets containing samples from multiple source domains.

References

[1]
Hearables Are Growing Fast. https://voicebot.ai/2019/09/13/idc-says-hearables-are-now-biggest-wearables-segment-and-growing-fast/.
[2]
Lenovo IdeaPad. https://www.lenovo.com/us/en/laptops/thinkpad.
[3]
Matrix Voice Microphone. https://www.matrix.one/products/voice.
[4]
Raspberry Pi Model 3. https://www.raspberrypi.org/products/raspberry-pi-3-model-a-plus/.
[5]
The Complete Guide to Hearable Technology in 2019. https://www.everydayhearing.com/hearing-technology/articles/hearables/.
[6]
USB Mic 1. https://www.amazon.com/dp/B00UZY2YQE.
[7]
USB Mic 2. https://www.amazon.com/dp/B01142EPO.
[8]
USB Mic 3. https://www.amazon.com/dp/B07N2WRHMY.
[9]
USB Mic 4. https://www.amazon.com/dp/B0028Y4DCC.
[10]
M. Almeida, M. Bilal, A. Finamore, I. Leontiadis, Y. Grunenberger, M. Varvello, and J. Blackburn. Chimp: Crowdsourcing human inputs for mobile phones. In Proceedings of the 2018 World Wide Web Conference, pages 45--54, 2018.
[11]
S. Aranganayagi and K. Thangavel. Clustering categorical data using silhouette coefficient as a relocating measure. In International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), volume 2. IEEE, 2007.
[12]
Y. Aytar, C. Vondrick, and A. Torralba. Soundnet: Learning sound representations from unlabeled video. In Advances in neural information processing systems, 2016.
[13]
H. Bojinov, Y. Michalevsky, G. Nakibly, and D. Boneh. Mobile device identification via sensor fingerprinting. arXiv preprint arXiv.1408.1416, 2014.
[14]
K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in neural information processing systems, pages 343--351, 2016.
[15]
N. Bui, N. Pham, J. J. Barnitz, Z. Zou, P. Nguyen, H. Truong, T. Kim, N. Farrow, A. Nguyen, J. Xiao, et al. ebp: A wearable system for frequent and comfortable blood pressure monitoring from user's ear. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1--17, 2019.
[16]
A. Canziani, A. Paszke, and E. Culurciello. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678, 2016.
[17]
A. Chowdhery, P. Warden, J. Shlens, A. Howard, and R. Rhodes. Visual wake words dataset. arXiv preprint arXiv:1906.05721, 2019.
[18]
A. Das, N. Borisov, and M. Caesar. Do you hear what i hear?: Fingerprinting smart devices through embedded acoustic components. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014.
[19]
D. de Godoy, B. Islam, S. Xia, M. T. Islam, R. Chandrasekaran, Y.-C. Chen, S. Nirjon, P. R. Kinget, and X. Jiang. Paws: A wearable acoustic system for pedestrian safety. In 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI), pages 237--248. IEEE, 2018.
[20]
L. Deng, M. L. Seltzer, D. Yu, A. Acero, A.-r. Mohamed, and G. Hinton. Binary coding of speech spectrograms using a deep auto-encoder. In Eleventh Annual Conference of the International Speech Communication Association, 2010.
[21]
D. Emmanouilidou and H. Gamper. The effect of room acoustics on audio event classification. In Proc. 23rd International Congress on Acoustics (ICA), 2019.
[22]
Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by back-propagation. arXiv preprint arXiv:1409.7495, 2014.
[23]
G. Gokul, Y. Yan, K. Dantu, S. Y. Ko, and L. Ziarek. Real time sound processing on android. In Proceedings of the 14th International Workshop on Java Technologies for Real-Time and Embedded Systems, pages 1--10, 2016.
[24]
B. Gong, K. Grauman, and F. Sha. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In International Conference on Machine Learning, pages 222--230, 2013.
[25]
B. Gong, K. Grauman, and F. Sha. Reshaping visual datasets for domain adaptation. In Advances in Neural Information Processing Systems, pages 1286--1294, 2013.
[26]
B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2066--2073. IEEE, 2012.
[27]
I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
[28]
S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al. Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131--135. IEEE, 2017.
[29]
J. Hoffman, B. Kulis, T. Darrell, and K. Saenko. Discovering latent domains for multisource domain adaptation. In European Conference on Computer Vision, pages 702--715. Springer, 2012.
[30]
J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017.
[31]
J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf, and A. J. Smola. Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems, pages 601--608, 2007.
[32]
I. Hwang, S. Liu, E. J. Rozner, and C. N. Sze. Determining a behavior of a user utilizing audio data, May 3 2018. US Patent App. 15/336, 614.
[33]
M. T. Islam and S. Nirjon. Soundsemantics: exploiting semantic knowledge in text for embedded acoustic event classification. In International Conference on Information Processing in Sensor Networks, pages 217--228. ACM, 2019.
[34]
B. E. Kingsbury, N. Morgan, and S. Greenberg. Robust speech recognition using the modulation spectrogram. Speech communication, 25(1--3):117--132, 1998.
[35]
K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 1--4. IEEE, 2013.
[36]
A. Kurtz, H. Gascon, T. Becker, K. Rieck, and F. Freiling. Fingerprinting mobile devices using personalized configurations. Proceedings on Privacy Enhancing Technologies, 2016(1):4--19, 2016.
[37]
P. Lamere, P. Kwok, E. Gouvea, B. Raj, R. Singh, W. Walker, M. Warmuth, and P. Wolf. The cmu sphinx-4 speech recognition system. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2003), volume 1, pages 2--5, 2003.
[38]
G. Laput, K. Ahuja, M. Goel, and C. Harrison. Ubicoustics: Plug-and-play acoustic activity recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, pages 213--224, 2018.
[39]
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436, 2015.
[40]
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[41]
S. Łukasik, P. A. Kowalski, M. Charytanowicz, and P. Kulczycki. Clustering using flower pollination algorithm and calinski-harabasz index. In 2016 IEEE Congress on Evolutionary Computation (CEC), pages 2724--2728. IEEE, 2016.
[42]
M. Mancini, L. Porzi, S. R. Bulo, B. Caputo, and E. Ricci. Inferring latent domains for unsupervised deep domain adaptation. IEEE transactions on pattern analysis and machine intelligence, 2019.
[43]
M. Mancini, L. Porzi, S. Rota Bulò, B. Caputo, and E. Ricci. Boosting domain adaptation by discovering latent domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3771--3780, 2018.
[44]
A. Mathur, N. Berthouze, and N. D. Lane. Unsupervised domain adaptation under label space mismatch for speech classification.
[45]
A. Mathur, A. Isopoussu, F. Kawsar, N. Berthouze, and N. D. Lane. Mic2mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In Proceedings of the 18th International Conference on Information Processing in Sensor Networks, pages 169--180. ACM, 2019.
[46]
A. Mathur, T. Zhang, S. Bhattacharya, P. Veličković, L. Joffe, N. D. Lane, F. Kawsar, and P. Lió. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks, pages 200--211. IEEE Press, 2018.
[47]
A. Mesaros, T. Heittola, and T. Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pages 9--13, November 2018.
[48]
R. Nandakumar, S. Gollakota and N. Watson. Contactless sleep apnea detection on smartphones. In Proceedings of the 13th annual international conference on mobile systems, applications, and services, pages 45--57. ACM, 2015.
[49]
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86--93, 2000.
[50]
S. Nirjon, R. F. Dickerson, P. Asare, Q. Li, D. Hong, J. A. Stankovic, P. Hu, G. Shen, and X. Jiang. Auditeur: A mobile-cloud service platform for acoustic event detection on smartphones. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services, pages 403--416, 2013.
[51]
S. Nirjon, R. F. Dickerson, Q. Li, P. Asare, J. A. Stankovic, D. Hong, B. Zhang, X. Jiang, G. Shen, and F. Zhao. Musicalheart: A hearty way of listening to music. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems.
[52]
K. J. Piczak. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd Annual ACM Conference on Multimedia, pages 1015--1018. ACM Press.
[53]
S. Pradhan, W. Sun, G. Baig, and L. Qiu. Combating replay attacks against voice assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(3):1--26, 2019.
[54]
H.-K. Ra, A. Salekin, H.-J. Yoon, J. Kim, S. S. Nirjon, D. J. Stone, S. Kim, J.-M. Lee, S. H. Son, J. A. Stankovic, et al. Asthmaguide: an asthma monitoring and advice ecosystem. In Wireless Health, pages 128--135, 2016.
[55]
N. Roy, H. Hassanieh, and R. Roy Choudhury. Backdoor: Making microphones hear inaudible sounds. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pages 2--14, 2017.
[56]
S. Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
[57]
H. B. Sailor, D. M. Agrawal, and H. A. Patil. Unsupervised filterbank learning using convolutional restricted boltzmann machine for environmental sound classification. 2017.
[58]
B. U. Töreyin, Y. Dedeoğlu, and A. E. Çetin. Hmm based falling person detection using both audio and video. In International Workshop on Human-Computer Interaction, pages 211--220. Springer, 2005.
[59]
A. Wang. The shazam music recognition service. Communications of the ACM, 49(8):44--48, 2006.
[60]
A. Wang and S. Gollakota. Millisonic: Pushing the limits of acoustic motion tracking. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1--11, 2019.
[61]
H. Wu, J. Feng, X. Tian, F. Xu, Y. Liu, X. Wang, and S. Zhong. secgan: A cycle-consistent gan for securely-recoverable video transformation. In Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019.
[62]
J. Xie, R. Girshick, and A. Farhadi. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478--487, 2016.
[63]
J. Xie, S.Jiang, W. Xie, and X. Gao. An efficient global k-means clustering algorithm. 2011.
[64]
H. Zha, X. He, C. Ding, M. Gu, and H. D. Simon. Spectral relaxation for k-means clustering. In Advances in neural information processing systems, pages 1057--1064.

Cited By

View all
  • (2023)SIDAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109197:3(1-24)Online publication date: 27-Sep-2023
  • (2023)Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT58021.2023.00013(25-33)Online publication date: Jun-2023
  • (2022)M3SenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35346006:2(1-32)Online publication date: 7-Jul-2022

Index Terms

  1. Sound-Adapter: Multi-Source Domain Adaptation for Acoustic Classification Through Domain Discovery

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IPSN '21: Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021)
      May 2021
      423 pages
      ISBN:9781450380980
      DOI:10.1145/3412382
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 May 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Audio Classification
      2. Deep Learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • NSF
      • NIH

      Conference

      IPSN '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 143 of 593 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)231
      • Downloads (Last 6 weeks)41
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)SIDAProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109197:3(1-24)Online publication date: 27-Sep-2023
      • (2023)Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT58021.2023.00013(25-33)Online publication date: Jun-2023
      • (2022)M3SenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35346006:2(1-32)Online publication date: 7-Jul-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media