research-article

Public Access

I-SED: An Interactive Sound Event Detector

Authors:

Bryan PardoAuthors Info & Claims

IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces

Pages 553 - 557

https://doi.org/10.1145/3025171.3025231

Published: 07 March 2017 Publication History

Abstract

Tagging of sound events is essential in many research areas. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. Building an automatic recognition system using machine learning techniques is often not feasible because it requires a large number of human-labeled training examples and fine tuning the model for a specific application. Fully automated labeling is also not reliable enough for all uses. We present I-SED, an interactive sound detection interface using a human-in-the-loop approach that lets a user reduce the time required to label audio that is tediously long (e.g. 20 hours) to do manually and has too few prior labeled examples (e.g. one) to train a state-of-the-art machine audio labeling system. We performed a human-subject study to validate its effectiveness and the results showed that our tool helped participants label all target sound events within a recording twice as fast as labeling them manually.

References

[1]

Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2010. Examining multiple potential models in end-user interactive concept learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1357--1360.

Digital Library

[2]

Niels Bogaards, Chunghsin Yeh, and Juan José Burred. 2008. Introducing ASAnnotation: a tool for sound analysis and annotation. In ICMC. 1--1.

[3]

Chris Cannam, Christian Landone, Mark B Sandler, and Juan Pablo Bello. 2006. The Sonic Visualiser: A Visualisation Platform for Semantic Descriptors from Musical Signals. In ISMIR. 324--327.

[4]

James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 29--38.

Digital Library

[5]

Ferdinand Fuhrmann, Martín Haro, and Perfecto Herrera. 2009. Scalability, Generality and Temporal Aspects in Automatic Recognition of Predominant Musical Instruments in Polyphonic Music. In ISMIR.

[6]

Giorgio Giacinto. 2007. A nearest-neighbor approach to relevance feedback in content based image retrieval. In Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, 456--463.

Digital Library

[7]

Jorge Gomes, Teresa Chambel, and Thibault Langlois. 2013. SoundsLike: movies soundtrack browsing and labeling based on relevance feedback and gamification. In Proceedings of the 11th european conference on Interactive TV and video. ACM, 59--62.

Digital Library

[8]

Sébastien Gulluni, Slim Essid, Olivier Buisson, and Gaël Richard. 2011. An Interactive System for Electro-Acoustic Music Analysis. In ISMIR. 145--150.

[9]

Guodong Guo and Stan Z Li. 2003. Content-based audio classification and retrieval by support vector machines. Neural Networks, IEEE Transactions on 14, 1 (2003), 209--215.

Digital Library

[10]

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE 29, 6 (2012), 82--97.

[11]

Rony Kubat, Philip DeCamp, Brandon Roy, and Deb Roy. 2007. Totalrecall: visualization and semi-automatic annotation of very large audio-visual corpora. In ICMI, Vol. 7. 208--215.

Digital Library

[12]

Yizhar Lavner and Dima Ruinskiy. 2009. A decision-tree-based algorithm for speech/music classification and segmentation. EURASIP Journal on Audio, Speech, and Music Processing 2009 (2009), 2.

Digital Library

[13]

Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang. 2008. Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. Audio, Speech, and Language Processing, IEEE Transactions on 16, 8 (2008), 1541--1550.

Digital Library

[14]

Beinan Li, John Ashley Burgoyne, and Ichiro Fujinaga. 2006. Extending Audacity for Audio Annotation. In ISMIR. 379--380.

[15]

Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6440--6444.

[16]

16. Geoffroy Peeters. 2004. A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Technical report, IRCAM (2004).

[17]

Jose Portelo, Miguel Bugalho, Isabel Trancoso, Joao Neto, Alberto Abad, and Antonio Serralheiro. 2009. Non-speech audio event detection. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1973--1976.

Digital Library

[18]

Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Digital signal processing 10, 1 (2000), 19--41.

Digital Library

[19]

Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, and Mark D Plumbley. 2015. Detection and classification of acoustic scenes and events. Multimedia, IEEE Transactions on 17, 10 (2015), 1733--1746.

[20]

Giuseppe Valenzise, Luigi Gerosa, Marco Tagliasacchi, Fabio Antonacci, and Augusto Sarti. 2007. Scream and gunshot detection and localization for audio-surveillance systems. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on. IEEE, 21--26.

Digital Library

[21]

Sudheendra Vijayanarasimhan and Kristen Grauman. 2014. Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision 108, 1--2 (2014), 97--114.

Digital Library

[22]

L Vuegen, BVD Broeck, P Karsmakers, JF Gemmeke, B Vanrumste, and HV Hamme. 2013. An MFCC-GMM approach for event detection and classification. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 1--3.

[23]

Xiang-Yang Wang, Bei-Bei Zhang, and Hong-Ying Yang. 2013. Active SVM-based relevance feedback using multiple classifiers ensemble and features reweighting. Engineering Applications of Artificial Intelligence 26, 1 (2013), 368--381.

Digital Library

[24]

Emin Wu and Aidong Zhang. 2002. A feature re-weighting approach for relevance feedback in image retrieval. In Image Processing. 2002. Proceedings. 2002 International Conference on, Vol. 2. IEEE, II--581.

[25]

Dongxin Xu, Umit Yapanel, and Sharmi Gray. 2009. Reliability of the LENATM Language Environment Analysis System in young children's natural home environment. Technical Report. LENA Foundation Technical Report LTR-05-02). Retrieved from http://www.lenafoundation.org/TechReport. aspx/Reliability/LTR-05--2.

Cited By

Mohaimenuzzaman MBergmeir CMeyer B(2024)Deep Active Audio Feature Learning in Resource-Constrained EnvironmentsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.341669732(3224-3237)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3416697
Heimerl AWeitz KBaur TAndre E(2022)Unraveling ML Models of Emotion With NOVA: Multi-Level Explainable AI for Non-ExpertsIEEE Transactions on Affective Computing10.1109/TAFFC.2020.304360313:3(1155-1167)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TAFFC.2020.3043603
Goodman SLiu PJain DMcDonnell EFroehlich JFindlater L(2021)Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of HearingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34635015:2(1-23)Online publication date: 24-Jun-2021
https://dl.acm.org/doi/10.1145/3463501
Show More Cited By

Index Terms

I-SED: An Interactive Sound Event Detector

Recommendations

Leveraging User Input and Feedback for Interactive Sound Event Detection and Annotation
IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

Tagging of environment audio events is essential in many areas. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. Building an automatic recognition system using modern machine learning is often not ...
A Human-in-the-Loop System for Sound Event Detection and Annotation
Special Issue on Human-Centered Machine Learning

Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often ...
SED: supervised experimental design and its application to text classification
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

In recent years, active learning methods based on experimental design achieve state-of-the-art performance in text classification applications. Although these methods can exploit the distribution of unlabeled data and support batch selection, they ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces

March 2017

654 pages

ISBN:9781450343480

DOI:10.1145/3025171

General Chairs:
George A. Papadopoulos
The University of Cyprus, Cyprus
,
Tsvi Kuflik
The University of Haifa, Israel
,
Program Chairs:
Fang Chen
Data61, CSIRO Australia
,
Carlos Duarte
The University of Lisbon, Portugal
,
Wai-Tat Fu
The University of Illinois at Urbana-Champaign, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

IUI'17

Sponsor:

IUI'17: 22nd International Conference on Intelligent User Interfaces

March 13 - 16, 2017

Limassol, Cyprus

Acceptance Rates

IUI '17 Paper Acceptance Rate 63 of 272 submissions, 23%;

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
582
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)12

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mohaimenuzzaman MBergmeir CMeyer B(2024)Deep Active Audio Feature Learning in Resource-Constrained EnvironmentsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.341669732(3224-3237)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3416697
Heimerl AWeitz KBaur TAndre E(2022)Unraveling ML Models of Emotion With NOVA: Multi-Level Explainable AI for Non-ExpertsIEEE Transactions on Affective Computing10.1109/TAFFC.2020.304360313:3(1155-1167)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TAFFC.2020.3043603
Goodman SLiu PJain DMcDonnell EFroehlich JFindlater L(2021)Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of HearingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34635015:2(1-23)Online publication date: 24-Jun-2021
https://dl.acm.org/doi/10.1145/3463501
Ishibashi TNakao YSugano YPaternò FOliver NConati CSpano LTintarev N(2020)Investigating audio data visualization for interactive sound recognitionProceedings of the 25th International Conference on Intelligent User Interfaces10.1145/3377325.3377483(67-77)Online publication date: 17-Mar-2020
https://dl.acm.org/doi/10.1145/3377325.3377483
Yuksel BFazli PMathur UBisht VKim SLee JJin SSiu YMiele JYoon IWakkary RAndersen KOdom WDesjardins APetersen M(2020)Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind UsersProceedings of the 2020 ACM Designing Interactive Systems Conference10.1145/3357236.3395433(47-60)Online publication date: 3-Jul-2020
https://dl.acm.org/doi/10.1145/3357236.3395433
Baur THeimerl ALingenfelser FWagner JValstar MSchuller BAndré E(2020)eXplainable Cooperative Machine Learning with NOVAKI - Künstliche Intelligenz10.1007/s13218-020-00632-334:2(143-164)Online publication date: 19-Jan-2020
https://doi.org/10.1007/s13218-020-00632-3
Nakano TKoyama YHamasaki MGoto M(2019)Autocomplete vocal-fo annotation of songs using musical repetitionsCompanion Proceedings of the 24th International Conference on Intelligent User Interfaces10.1145/3308557.3308700(71-72)Online publication date: 16-Mar-2019
https://dl.acm.org/doi/10.1145/3308557.3308700
Kim BPardo B(2019)Improving Content-based Audio Retrieval by Vocal Imitation FeedbackICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8683461(4100-4104)Online publication date: May-2019
https://doi.org/10.1109/ICASSP.2019.8683461
Cramer JWu HSalamon JBello J(2019)Look, Listen, and Learn More: Design Choices for Deep Audio EmbeddingsICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682475(3852-3856)Online publication date: May-2019
https://doi.org/10.1109/ICASSP.2019.8682475
Heimerl ABaur TLingenfelser FWagner JAndre E(2019)NOVA - A tool for eXplainable Cooperative Machine Learning2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII.2019.8925519(109-115)Online publication date: Sep-2019
https://doi.org/10.1109/ACII.2019.8925519
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten