Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3025171.3025231acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Public Access

I-SED: An Interactive Sound Event Detector

Published: 07 March 2017 Publication History

Abstract

Tagging of sound events is essential in many research areas. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. Building an automatic recognition system using machine learning techniques is often not feasible because it requires a large number of human-labeled training examples and fine tuning the model for a specific application. Fully automated labeling is also not reliable enough for all uses. We present I-SED, an interactive sound detection interface using a human-in-the-loop approach that lets a user reduce the time required to label audio that is tediously long (e.g. 20 hours) to do manually and has too few prior labeled examples (e.g. one) to train a state-of-the-art machine audio labeling system. We performed a human-subject study to validate its effectiveness and the results showed that our tool helped participants label all target sound events within a recording twice as fast as labeling them manually.

References

[1]
Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2010. Examining multiple potential models in end-user interactive concept learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1357--1360.
[2]
Niels Bogaards, Chunghsin Yeh, and Juan José Burred. 2008. Introducing ASAnnotation: a tool for sound analysis and annotation. In ICMC. 1--1.
[3]
Chris Cannam, Christian Landone, Mark B Sandler, and Juan Pablo Bello. 2006. The Sonic Visualiser: A Visualisation Platform for Semantic Descriptors from Musical Signals. In ISMIR. 324--327.
[4]
James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 29--38.
[5]
Ferdinand Fuhrmann, Martín Haro, and Perfecto Herrera. 2009. Scalability, Generality and Temporal Aspects in Automatic Recognition of Predominant Musical Instruments in Polyphonic Music. In ISMIR.
[6]
Giorgio Giacinto. 2007. A nearest-neighbor approach to relevance feedback in content based image retrieval. In Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, 456--463.
[7]
Jorge Gomes, Teresa Chambel, and Thibault Langlois. 2013. SoundsLike: movies soundtrack browsing and labeling based on relevance feedback and gamification. In Proceedings of the 11th european conference on Interactive TV and video. ACM, 59--62.
[8]
Sébastien Gulluni, Slim Essid, Olivier Buisson, and Gaël Richard. 2011. An Interactive System for Electro-Acoustic Music Analysis. In ISMIR. 145--150.
[9]
Guodong Guo and Stan Z Li. 2003. Content-based audio classification and retrieval by support vector machines. Neural Networks, IEEE Transactions on 14, 1 (2003), 209--215.
[10]
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE 29, 6 (2012), 82--97.
[11]
Rony Kubat, Philip DeCamp, Brandon Roy, and Deb Roy. 2007. Totalrecall: visualization and semi-automatic annotation of very large audio-visual corpora. In ICMI, Vol. 7. 208--215.
[12]
Yizhar Lavner and Dima Ruinskiy. 2009. A decision-tree-based algorithm for speech/music classification and segmentation. EURASIP Journal on Audio, Speech, and Music Processing 2009 (2009), 2.
[13]
Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang. 2008. Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. Audio, Speech, and Language Processing, IEEE Transactions on 16, 8 (2008), 1541--1550.
[14]
Beinan Li, John Ashley Burgoyne, and Ichiro Fujinaga. 2006. Extending Audacity for Audio Annotation. In ISMIR. 379--380.
[15]
Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6440--6444.
[16]
16. Geoffroy Peeters. 2004. A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Technical report, IRCAM (2004).
[17]
Jose Portelo, Miguel Bugalho, Isabel Trancoso, Joao Neto, Alberto Abad, and Antonio Serralheiro. 2009. Non-speech audio event detection. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1973--1976.
[18]
Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Digital signal processing 10, 1 (2000), 19--41.
[19]
Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, and Mark D Plumbley. 2015. Detection and classification of acoustic scenes and events. Multimedia, IEEE Transactions on 17, 10 (2015), 1733--1746.
[20]
Giuseppe Valenzise, Luigi Gerosa, Marco Tagliasacchi, Fabio Antonacci, and Augusto Sarti. 2007. Scream and gunshot detection and localization for audio-surveillance systems. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on. IEEE, 21--26.
[21]
Sudheendra Vijayanarasimhan and Kristen Grauman. 2014. Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision 108, 1--2 (2014), 97--114.
[22]
L Vuegen, BVD Broeck, P Karsmakers, JF Gemmeke, B Vanrumste, and HV Hamme. 2013. An MFCC-GMM approach for event detection and classification. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 1--3.
[23]
Xiang-Yang Wang, Bei-Bei Zhang, and Hong-Ying Yang. 2013. Active SVM-based relevance feedback using multiple classifiers ensemble and features reweighting. Engineering Applications of Artificial Intelligence 26, 1 (2013), 368--381.
[24]
Emin Wu and Aidong Zhang. 2002. A feature re-weighting approach for relevance feedback in image retrieval. In Image Processing. 2002. Proceedings. 2002 International Conference on, Vol. 2. IEEE, II--581.
[25]
Dongxin Xu, Umit Yapanel, and Sharmi Gray. 2009. Reliability of the LENATM Language Environment Analysis System in young children's natural home environment. Technical Report. LENA Foundation Technical Report LTR-05-02). Retrieved from http://www.lenafoundation.org/TechReport. aspx/Reliability/LTR-05--2.

Cited By

View all
  • (2024)Deep Active Audio Feature Learning in Resource-Constrained EnvironmentsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.341669732(3224-3237)Online publication date: 2024
  • (2022)Unraveling ML Models of Emotion With NOVA: Multi-Level Explainable AI for Non-ExpertsIEEE Transactions on Affective Computing10.1109/TAFFC.2020.304360313:3(1155-1167)Online publication date: 1-Jul-2022
  • (2021)Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of HearingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34635015:2(1-23)Online publication date: 24-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces
March 2017
654 pages
ISBN:9781450343480
DOI:10.1145/3025171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-in-the-loop system
  2. interactive machine learning
  3. sound event detection

Qualifiers

  • Research-article

Funding Sources

Conference

IUI'17
Sponsor:

Acceptance Rates

IUI '17 Paper Acceptance Rate 63 of 272 submissions, 23%;
Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)17
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Deep Active Audio Feature Learning in Resource-Constrained EnvironmentsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.341669732(3224-3237)Online publication date: 2024
  • (2022)Unraveling ML Models of Emotion With NOVA: Multi-Level Explainable AI for Non-ExpertsIEEE Transactions on Affective Computing10.1109/TAFFC.2020.304360313:3(1155-1167)Online publication date: 1-Jul-2022
  • (2021)Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of HearingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34635015:2(1-23)Online publication date: 24-Jun-2021
  • (2020)Investigating audio data visualization for interactive sound recognitionProceedings of the 25th International Conference on Intelligent User Interfaces10.1145/3377325.3377483(67-77)Online publication date: 17-Mar-2020
  • (2020)Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind UsersProceedings of the 2020 ACM Designing Interactive Systems Conference10.1145/3357236.3395433(47-60)Online publication date: 3-Jul-2020
  • (2020)eXplainable Cooperative Machine Learning with NOVAKI - Künstliche Intelligenz10.1007/s13218-020-00632-334:2(143-164)Online publication date: 19-Jan-2020
  • (2019)Autocomplete vocal-fo annotation of songs using musical repetitionsCompanion Proceedings of the 24th International Conference on Intelligent User Interfaces10.1145/3308557.3308700(71-72)Online publication date: 16-Mar-2019
  • (2019)Improving Content-based Audio Retrieval by Vocal Imitation FeedbackICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8683461(4100-4104)Online publication date: May-2019
  • (2019)Look, Listen, and Learn More: Design Choices for Deep Audio EmbeddingsICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682475(3852-3856)Online publication date: May-2019
  • (2019)NOVA - A tool for eXplainable Cooperative Machine Learning2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII.2019.8925519(109-115)Online publication date: Sep-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media