review-article

A survey on preprocessing and classification techniques for acoustic scene

Authors:

Vikash Kumar Singh,

Kalpana Sharma,

Samarendra Nath SurAuthors Info & Claims

Volume 229, Issue PA

https://doi.org/10.1016/j.eswa.2023.120520

Published: 01 November 2023 Publication History

Abstract

There are lots of research papers for ASC, and in recent years it is rapidly increasing. DCASE also provides different types of competition for the submission of several papers to solve the various tasks of ASC and it is the opportunity for the research scholars either to participate in those competitions or to provide the enhanced model for ASC. This paper provides details about the various recent approaches along with the block diagram used for pre-processing required before model development for ASC. It also includes a description of different recent techniques used for the classification of different sounds for ASC tasks. The comparative analysis for different recent available techniques both for pre-processing and classification has been done and summarized in this paper. It also describes the contributions towards the survey on ASC by comparing this paper with some existing survey papers based on several parameters like functionality described separately, results from description with quantifiable value, a dataset with proper quantifiable analysis, and pictorial representation of model discussed, etc. Finally, considering the benefits for eminent research scholars, this paper has also focused on the details for future directions for both pre-processing and classification for ASC.

Graphical abstract

Display Omitted

Highlights

•

A systematic review on pre-processing and classification techniques for acoustic scenes.

•

Comparative analysis of pre-processing and classification techniques.

•

Discussion on different processing techniques for classification.

•

Identifies the future directions for further research in the area.

References

[1]

Abeßer J., A review of deep learning based methods for acoustic scene classification, Applied Sciences 10 (6) (2020),.

[2]

Abeßer, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene Classification By Combining Autoencoder-Based Dimensionality Reduction and Convolutional Neural Networks. In Detection and classification of acoustic scenes and events workshop (DCASE), Munich, Germany.

[3]

Akiyama O., Sato J., DCASE 2019 task 2: Multitask learning, semi-supervised learning and model ensemble with noisy data for audio tagging, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[4]

Arniriparian S., Freitag M., Cummins N., Gerczuk M., Pugachevskiy S., Schuller B., A fusion of deep convolutional generative adversarial networks and sequence to sequence autoencoders for acoustic scene classification, in: 2018 26th European signal processing conference, IEEE, 2018,.

[5]

Aytar Y., Vondrick C., Torralba A., SoundNet: Learning sound representations from unlabeled video, Adv. Neural Inf. Process. Syst.29: Annu. Conf. Neural Inf. Process. Syst. (2016) 892–900. arXiv:1610.09001.

[6]

Bahdanau D., Cho K., Bengio Y., Neural machine translation by jointly learning to align and translate, 2014, arXiv:1409.0473.

[7]

Banerjee S., Chattopadhyay T., Pal A., Garain U., Automation of feature engineering for IoT analytics, ACM SIGBED Rev. 15 (2) (2018) 24–30,.

Digital Library

[8]

Barchiesi D., Giannoulis D., Stowell D., Plumbley M.D., Acoustic scene classification: Classifying environments from the sounds they produce, IEEE Signal Process. Mag. 32 (3) (2015) 16–34,.

[9]

Basbug A.M., Sert M., Acoustic scene classification using spatial pyramid pooling with convolutional neural networks, in: 2019 IEEE 13th international conference on semantic computing, IEEE, 2019,.

[10]

Bear H.L., Morf V., Benetos E., An evaluation of data augmentation methods for sound scene geotagging, 2022,.

[11]

Bear H.L., Nolasco I., Benetos E., Towards joint sound scene and polyphonic sound event recognition, 2019, arXiv:1904.10408.

[12]

Bisot V., Serizel R., Essid S., Richard G., Supervised non negative matrix factorization for acoustic scene classification, in: Detection and classification of acoustic scenes and events 2016, 2016.

[13]

Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Non negative Feature Learning Methods for Acoustic SceneClassification. In Detection and classification of acoustic scenes and events workshop(DCASE), Munich, Germany.

[14]

Bittner, R. M., McFee, B., Salamon, J., Li, P., & Bello, J. P. (2017). Deep Salience Representations for F0 Estimation in Polyphonic Music. In 19th International society for music informationretrieval conference (ISMIR), Suzhou, China, 63–70.

[15]

Boss J.D., Shah C.T., Elner V.M., Hassan A.S., Assessment of office-based practice patterns on protective eyewear counseling for patients with monocular vision, Ophthalmic Plastic &Amp Reconstructive Surgery 31 (5) (2015) 361–363,.

[16]

Chan W., Jaitly N., Le Q., Vinyals O., Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, in: 2016 IEEE international conference on acoustics, speech and signal processing, IEEE, 2016,.

Digital Library

[17]

Chen, H., Liu, Z., Liu, Z., Zhang, P., & Yan, Y. (2019). Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling. In Detection and classification of acoustic scenes and events workshop (DCASE), New York, NY, USA.

[18]

Chen H., Zhang P., Bai H., Yuan Q., Bao X., Yan Y., Deep convolutional neural network with scalogram for audio scene modeling, in: Interspeech 2018, ISCA, 2018,.

[19]

Chen H., Zhang P., Yan Y., An audio scene classification framework with embedded filters and a DCT-based temporal module, in: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing, IEEE, 2019,.

[20]

Cheng S.-S., Wang H.-M., Fu H.-C., BIC-based audio segmentation by divide-and-conquer, in: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2008,.

[21]

Cho J., Yun S., Park H., Eum J., Hwang K., Acoustic scene classification based on a large-margin factorized CNN, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[22]

Chu X., Morcos J., Ilyas I.F., Ouzzani M., Papotti P., Tang N., Ye Y., KATARA: A data cleaning system powered by knowledge bases and crowdsourcing, in: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, 2015,.

Digital Library

[23]

Chu X., Morcos J., Ilyas I.F., Ouzzani M., Papotti P., Tang N., Ye Y., KATARA: Reliable data cleaning with knowledge bases and crowdsourcing, Proceedings of the VLDB Endowment 8 (12) (2015) 1952–1955,.

Digital Library

[24]

Cicco V.D., Firmani D., Koudas N., Merialdo P., Srivastava D., Interpreting deep learning models for entity resolution, in: Proceedings of the second international workshop on exploiting artificial intelligence techniques for data management, ACM Press, 2019,.

Digital Library

[25]

Coates A., Ng A.Y., The importance of encoding versus training with sparse coding and vector quantization, in: 28th International conference on machine learning, ACM, 2011, pp. 921–928.

[26]

Coates A., Ng A.Y., Learning feature representations with K-means, in: Lecture notes in computer science, Springer Berlin Heidelberg, 2012, pp. 561–580,.

[27]

Cohen B., Vawdrey D.K., Liu J., Caplan D., Furuya E.Y., Mis F.W., Larson E., Challenges associated with using large data sets for quality assessment and research in clinical settings, Policy, Politics, &Amp Nursing Practice 16 (3–4) (2015) 117–124,.

[28]

Crocco M., Cristani M., Trucco A., Murino V., Audio surveillance: A systematic review, ACM Computing Surveys 48 (4) (2014) 52:1–52:46, arXiv:1409.7787.

[29]

Dang A., Vu T.H., Wang J.-C., A survey of deep learning for polyphonic sound event detection, in: 2017 International conference on orange technologies, IEEE, 2017,.

[30]

Davis S., Mermelstein P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing 28 (4) (1980) 357–366,.

[31]

Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., ImageNet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, IEEE, 2009,.

[32]

Doersch C., Gupta A., Efros A.A., Unsupervised visual representation learning by context prediction, 2015, arXiv:1505.05192.

[33]

Ebaid A., Thirumuruganathan S., Aref W.G., Elmagarmid A., Ouzzani M., EXPLAINER: Entity resolution explanations, in: 2019 IEEE 35th international conference on data engineering, IEEE, 2019,.

[34]

Ebraheem M., Thirumuruganathan S., Joty S., Ouzzani M., Tang N., Distributed representations of tuples for entity resolution, Proceedings of the VLDB Endowment 11 (11) (2018) 1454–1467,.

Digital Library

[35]

Edelman A., Arias T.A., Smith S.T., The geometry of algorithms with orthogonality constraints, SIAM Journal on Matrix Analysis and Applications 20 (2) (1998) 303–353,.

Digital Library

[36]

Eghbal-Zadeh H., Dorfer M., Widmer G., Deep within-class covariance analysis for robust audio representation learning, 2017, arXiv:1711.04022.

[37]

Fernandez R.C., Deng D., Mansour E., Qahtan A.A., Tao W., Abedjan Z., Elmagarmid A., Ilyas I.F., Madden S., Ouzzani M., Stonebraker M., Tang N., A demo of the data civilizer system, in: Proceedings of the 2017 ACM international conference on management of data, ACM, 2017,.

Digital Library

[38]

Foggia P., Petkov N., Saggese A., Strisciuglio N., Vento M., Reliable detection of audio events in highly noisy environments, Pattern Recognition Letters 65 (2015) 22–28,.

Digital Library

[39]

Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez, E., & Serra, X. (2017). Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks. In Detection and classification of acoustic scenes and events workshop (DCASE), Munich, Germany.

[40]

Fonseca E., Plakal M., Font F., Ellis D.P.W., Serra X., Audio tagging with noisy labels and minimal supervision, 2019, arXiv:1906.02975.

[41]

Fujisawa K., Hirabe Y., Suwa H., Arakawa Y., Yasumoto K., Automatic content curation system for multiple live sport video streams, in: 2015 IEEE international symposium on multimedia, IEEE, 2015,.

[42]

Furui S., Speaker-independent isolated word recognition based on emphasized spectral dynamics, in: ICASSP ’86. IEEE international conference on acoustics, speech, and signal processing, Institute of Electrical and Electronics Engineers, 1986,.

[43]

Gemmeke, J. F., Ellis, D. P. W., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., & Ritter, M. (2017). Audio Set: An Ontology and Human-Labeled Dataset for Audio Events. In IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, la, USA, 776–780.

[44]

Goodfellow I., Bengio Y., Courville A., Deep learning, MIT Press, 2016.

Digital Library

[45]

Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., Generative adversarial nets, in: Advances in neural information processing systems, Curran Associates,Inc., Red Hook, NY, USA, 2014, pp. 2672–2680.

[46]

Hakkani-Tur D., Riccardi G., Gorin A., Active learning for automatic speech recognition, in: IEEE international conference on acoustics speech and signal processing, IEEE, 2002,.

[47]

Han W., Coutinho E., Ruan H., Li H., Schuller B., Yu X., Zhu X., Semi-supervised active learning for sound classification in hybrid learning environments, Schwenker F. (Ed.), PLOS ONE 11 (9) (2016),.

[48]

Han, Y., Park, J., & Lee, K. (2017). Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification. In Detection and classification of acousticscenes and events workshop (DCASE), Munich, Germany.

[49]

He K., Zhang X., Ren S., Sun J., Deep residual learning for image recognition, in: 2016 IEEE conference on computer vision and pattern recognition, IEEE, 2016,.

[50]

He K., Zhang X., Ren S., Sun J., Identity mappings in deep residual networks, 2016, arXiv:1603.05027.

[51]

Heer, J., Hellerstein, J., & Kandel, S. (2015). Predictive Interaction for Data Transformation. In 7th Biennial conference on innovative data systems research (CIDR ’15), Asilomar, California, USA.

[52]

Heittola T., Mesaros A., Virtanen T., Acoustic scene classification in DCASE 2020 challenge: generalization across devices and low complexity solutions, 2020, arXiv:2005.14623.

[53]

Hershey S., Chaudhuri S., Ellis D.P.W., Gemmeke J.F., Jansen A., Moore R.C., Plakal M., Platt D., Saurous R.A., Seybold B., Slaney M., Weiss R.J., Wilson K., CNN architectures for large-scale audio classification, in: 2017 IEEE international conference on acoustics, speech and signal processing, IEEE, 2017,.

Digital Library

[54]

Hoshen Y., Weiss R.J., Wilson K.W., Speech acoustic modeling from raw multichannel waveforms, in: 2015 IEEE international conference on acoustics, speech and signal processing, IEEE, 2015,.

[55]

Huang G., Li Y., Pleiss G., Liu Z., Hopcroft J.E., Weinberger K.Q., Snapshot ensembles: Train 1, get M for free, 2017, arXiv:1704.00109.

[56]

Huang J., Lu H., Meyer P.L., Cordourier H., Ontiveros J.D.H., Acoustic scene classification using deep learning-based ensemble averaging, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop (DCASE 2019), New York University, 2019,.

[57]

Huzaifah M., Comparison of time-frequency representations for environmental sound classification using convolutional neural networks, 2017, arXiv:1706.07156.

[58]

Imoto K., Shimauchi S., Acoustic scene analysis based on hierarchical generative model of acoustic event sequence, IEICE Transactions on Information and Systems E99.D (10) (2016) 2539–2549,.

[59]

Imoto K., Tonami N., Koizumi Y., Yasuda M., Yamanishi R., Yamashita Y., Sound event detection by multitask learning of sound events and scenes with soft scene labels, 2020, arXiv:2002.05848.

[60]

India M., Safari P., Hernando J., Self multi-head attention for speaker recognition, in: Interspeech 2019, ISCA, 2019,.

[61]

Jaitly N., Hinton G., Learning a better representation of speech soundwaves using restricted boltzmann machines, in: 2011 IEEE international conference on acoustics, speech and signal processing, IEEE, 2011,.

[62]

Jati A., Nadarajan A., Mundnich K., Narayanan S., Characterizing dynamically varying acoustic scenes from egocentric audio recordings in workplace setting, 2019, arXiv:1911.03843.

[63]

Jing L., Tian Y., Self-supervised visual feature learning with deep neural networks: A survey, 2019, arXiv:1902.06162.

[64]

Jung, J.-W., Heo, H.-S., Shim, H.-J., & Yu, H.-J. (2018). DNN based multi-level features ensemble for acoustic scene classification. In Proceedings of the detection and classification of acoustic scenes and events 2018 workshop.

[65]

Jung J.-W., Heo H.-S., jin Shim H., Yu H.-J., Distilling the knowledge of specialist deep neural networks in acoustic scene classification, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[66]

Jung J.-W., Heo H.-S., Shim H.-J., Yu H.-J., Knowledge distillation in acoustic scene classification, IEEE Access 8 (2020) 166870–166879,.

[67]

Jung J.-W., jin Shim H., ho Kim J., bin Kim S., Yu H.-J., Acoustic scene classification using audio tagging, 2020, arXiv:2003.09164.

[68]

Jung J.-W., Shim H.-J., Kim J.-H., Yu H.-J., DcaseNet: An integrated pretrained deep neural network for detecting and classifying acoustic scenes and events, 2020, arXiv:2009.09642.

[69]

Khayyat Z., Ilyas I.F., Jindal A., Madden S., Ouzzani M., Papotti P., Quiané-Ruiz J.-A., Tang N., Yin S., BigDansing, in: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, 2015,.

Digital Library

[70]

Kim J.-H., Jung J.-W., Shim H.-J., Yu H.-J., Audio tag representation guided dual attention network for acousticscene classification, in: Detection and classification of acoustic scenes and events 2020, 2020.

[71]

Kolouri S., Nadjahi K., Simsekli U., Badeau R., Rohde G.K., Generalized sliced wasserstein distances, 2019, arXiv:1902.00434.

[72]

Kong, Q., Xu, Y., Iqbal, T., Cao, Y., Wang, W., & Plumbley, M. D. (2019). Acoustic Scene Generation with Conditional Sample RNN. In IEEE international conference on acoustics, speech and signal processing(ICASSP), Brighton, UK, 925–929.

[73]

Kong Q., Xu Y., Wang W., Plumbley M.D., A joint detection-classification model for audio tagging of weakly labelled data, in: 2017 IEEE international conference on acoustics, speech and signal processing, IEEE, 2017,.

Digital Library

[74]

Kosmider M., Calibrating neural networks for secondary recording devices, Detection and Classification of Acoustic Scenes and Events 2019, Samsung R&D Institute Poland Artificial Intelligence Warsaw, Poland, 2019.

[75]

Kotti M., Benetos E., Kotropoulos C., Computationally efficient and robust BIC-based speaker segmentation, IEEE Transactions on Audio, Speech, and Language Processing 16 (5) (2008) 920–933,.

Digital Library

[76]

Koutini K., Chowdhury S., Haunschmid V., Eghbal-zadeh H., Widmer G., Emotion and theme recognition in music with frequency-aware RF-regularized CNNs, 2019, arXiv:1911.05833.

[77]

Koutini K., Eghbal-zadeh H., Dorfer M., Widmer G., The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification, in: 2019 27th European signal processing conference, IEEE, 2019,.

[78]

Koutini K., Eghbal-Zadeh H., Widmer G., CP-JKU submissions to DCASE’19: Acoustic scene classification and audio tagging with receptive-field-regularized CNNs, Detection and Classification of Acoustic Scenes and Events 2019, Institute of Computational Perception (CP-JKU) and LIT Artificial Intelligence Lab,Johannes Kepler University Linz, Austria, 2019.

[79]

Koutini K., Eghbal-zadeh H., Widmer G., Receptive-field-regularized CNN variants for acoustic scene classification, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[80]

Koutini K., Henkel F., Eghbal-Zadeh H., Widmer G., CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs, Detection and Classification of Acoustic Scenes and Events 2020, Institute of Computational Perception (CP-JKU) and LIT Artificial Intelligence Lab,Johannes Kepler University Linz, Austria, 2020.

[81]

Kudo M., Maeda K., Satoh F., Adaptable privacy-preserving data curation for business process analysis services, in: 2016 IEEE international conference on services computing, IEEE, 2016,.

[82]

Kumar A., Khadkevich M., Fugen C., Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes, in: 2018 IEEE international conference on acoustics, speech and signal processing, IEEE, 2018,.

Digital Library

[83]

Kumpawat J., Dey S., Acoustic scene classification using auditory datasets, 2021, arXiv:2112.13450.

[84]

Lebedev V., Ganin Y., Rakhuba M., Oseledets I., Lempitsky V., Speeding-up convolutional neural networks using fine-tuned CP-decomposition, 2014, arXiv:1412.6553.

[85]

Lee M.L., Ling T.W., Low W.L., IntelliClean:A knowledge-based intelligent data cleaner, in: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, 2000,.

Digital Library

[86]

Lehner, B., Koutini, K., Schwarzlmüller, C. H., Gallien, T., & Widmer, G. (2019). Acoustic Scene Classification with Reject Option based on Resnets. In Detection and classification of acoustic scenes and events workshop (DCASE), New York, NY, USA.

[87]

Li Z., Hou Y., Xie X., Li S., Zhang L., Du S., Liu W., Multi-level attention model with deep scattering spectrum for acoustic scene classification, in: 2019 IEEE international conference on multimedia &amp expo workshops, IEEE, 2019,.

[88]

Li H., Kadav A., Durdanovic I., Samet H., Graf H.P., Pruning filters for efficient ConvNets, 2016, arXiv:1608.08710.

[89]

Lin T.-Y., Goyal P., Girshick R., He K., Dollar P., Focal loss for dense object detection, in: 2017 IEEE international conference on computer vision, IEEE, 2017,.

[90]

Liu S., Mallol-Ragolta A., Parada-Cabaleiro E., Qian K., Jing X., Kathan A., Hu B., Schuller B.W., Audio self-supervised learning: A survey, Patterns 3 (12) (2022),.

[91]

Liu M., Wang W., Li Y., The system for acoustic scene classification using resnet, (DCASE2019) School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China, 2019.

[92]

Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for music instrument recognition. In 17th International society for music information retrieval conference (ISMIR), New York City, United States, 612–618.

[93]

Lostanlen V., Salamon J., Cartwright M., McFee B., Farnsworth A., Kelling S., Bello J.P., Per-channel energy normalization: Why and how, IEEE Signal Processing Letters 26 (1) (2019) 39–43,.

[94]

Luo W., Li Y., Urtasun R., Zemel R., Understanding the effective receptive field in deep convolutional neural networks, 2017, arXiv:1701.04128.

[95]

Maka, T. (2018). Audio Feature Space Analysis for Acoustic Scene Classification. In Detectionand classification of acoustic scenes and events workshop (DCASE), Surrey, UK.

[96]

Marchi, E., Tonelli, D., Xu, X., Ringeval, F., Deng, J., Squartini, S., & Schuller, B. (2016). Pairwise Decomposition with Deep Neural Networks and Multiscale Kernel Subspace Learning for Acoustic Scene Classification. In Detection and classification of acoustic scenes and events workshop (DCASE),Budapest, Hungary.

[97]

Mariotti, O., Cord, M., & Schwander, O. (2018). Exploring Deep Vision Models for Acoustic Scene Classification. In Detection and classification of acoustic scenes and events workshop (DCASE), Surrey,UK.

[98]

Mars R., Pratik P., Nagisetty S., Lim C., Acoustic scene classification from binaural signals using convolutional neural networks, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[99]

Mattys S.L., Davis M.H., Bradlow A.R., Scott S.K., Speech recognition in adverse conditions: A review, Language and Cognitive Processes 27 (7–8) (2012) 953–978,.

[100]

McDonnell M.D., Gao W., Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths, Detection and Classification of Acoustic Scenes and Events 2019, Computational Learning Systems Laboratory,School of Information Technology and Mathematical Sciences,University of South Australia, Mawson Lakes SA 5095, Australia, 2019.

[101]

Mesaros A., Heittola T., Benetos E., Foster P., Lagrange M., Virtanen T., Plumbley M.D., Detection and classification of acoustic scenes and events: Outcome of the DCASE 2016 challenge, IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (2) (2018) 379–393,.

Digital Library

[102]

Mesaros A., Heittola T., Virtanen T., TUT database for acoustic scene classification and sound event detection, in: 2016 24th European signal processing conference, IEEE, 2016,.

[103]

Mesaros A., Heittola T., Virtanen T., Assessment of human and machine performance in acoustic scene classification: Dcase 2016 case study, in: 2017 IEEE workshop on applications of signal processing to audio and acoustics, IEEE, 2017,.

[104]

Mesaros A., Heittola T., Virtanen T., A multi-device dataset for urban acoustic scene classification, 2018, arXiv:1807.09840.

[105]

Mesaros A., Heittola T., Virtanen T., Acoustic scene classification in DCASE 2019 challenge: Closed and open set classification and data mismatch setups, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[106]

Michael Mandel J.S., Ellis D.P., Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[107]

Mille, R. (2014). Big Data Curation. In 20th International conference on management of data (COMAD),17th-19th Dec 2014 At Hyderabad, India.

[108]

Miyamoto K., Koseki A., Ohno M., Effective data curation for frequently asked questions, in: 2017 IEEE international conference on service operations and logistics, and informatics, IEEE, 2017,.

Digital Library

[109]

Mohamed A.-R., Hinton G., Penn G., Understanding how deep belief networks perform acoustic modelling, in: ICASSP, 2012.

[110]

Mudgal S., Li H., Rekatsinas T., Doan A., Park Y., Krishnan G., Deep R., Arcaute E., Raghavendra V., Deep learning for entity matching, in: Proceedings of the 2018 international conference on management of data, ACM, 2018,.

Digital Library

[111]

Mun, S., Park, S., Han, D. K., & Ko, H. (2017). Generative Adversarial Networks based Acoustic Scene Training Set Augmentation and Selection using SVM Hyperplane. In Detection and classification of acoustic scenes and events workshop (DCASE), Munich, Germany.

[112]

Nanni L., Maguolo G., Paci M., Data augmentation approaches for improving animal audio classification, 2019, arXiv:1912.07756.

[113]

Nguyen T.N.T., Jones D.L., Gan W., DCASE 2020 task 3: Ensemble of sequence matching networks for dynamic sound event localization, detection, and tracking, Detection and Classification of Acoustic Scenes and Events, Nanyang Technological University, School of Electrical and Electronic Engineering, Singapore, University of Illinois at Urbana-Champaign, Dept. of Electrical and Computer Engineering,Illinois, USA, 2020.

[114]

Nguyen, T., & Pernkopf, F. (2018). Acoustic Scene Classification using a Convolutional Neural Network Ensemble and Nearest Neighbor Filters. In Detection and classification of acoustic scenesand events workshop (DCASE), Surrey, UK.

[115]

Nogueira A.F.R., Oliveira H.S., Machado J.J.M., Tavares J.M.R.S., Sound classification and processing of urban environments: A systematic literature review, Sensors 22 (22) (2022) 8608,.

[116]

Pezoulas V.C., Kourou K.D., Kalatzis F., Exarchos T.P., Venetsanopoulou A., Zampeli E., Gandolfo S., Skopouli F., Vita S.D., Tzioufas A.G., Fotiadis D.I., Medical data quality assessment: On the development of an automated framework for medical data curation, Computers in Biology and Medicine 107 (2019) 270–283,.

Digital Library

[117]

Phaye S.S.R., Benetos E., Wang Y., SubSpectralNet - using sub-spectrogram based convolutional neural networks for acoustic scene classification, 2018, arXiv:1810.12642.

[118]

Plumbley M.D., Kroos C., Bello J.P., Richard G., Ellis D.P., Mesaros A., Detection and classification of acoustic scenes and events 2018 workshop (DCASE2018), in: Detection and classification ofacoustic scenes and events 2018 workshop (DCASE2018), Tampere University of Technology. Laboratory of Signal Processing, 2018.

[119]

Primus P., Eghbal-zadeh H., Eitelsebner D., Koutini K., Arzt A., Widmer G., Exploiting parallel audio recordings to enforce device invariance in CNN-based acoustic scene classification, 2019, arXiv:1909.02869.

[120]

Primus P., Eitelsebner D., Acoustic scene classification with mismatched recording devices, Institute of Computational Perception (CP-JKU)Johannes Kepler University Linz, Austria, Detection and Classification of Acoustic Scenes and Events, 2019.

[121]

Purwins H., Li B., Virtanen T., Schluter J., Chang S.-Y., Sainath T., Deep learning for audio signal processing, IEEE Journal of Selected Topics in Signal Processing 13 (2) (2019) 206–219,.

[122]

Qian, K., Ren, Z., Pandit, V., Yang, Z., Zhang, Z., & Schuller, B. (2017). Wavelets Revisited for the Classification of Acoustic Scenes. In Detection and classification of acoustic scenes and events workshop (DCASE), Munich, Germany.

[123]

Rafii, Z., & Pardo, B. (2012). Music/Voice Separation using the Similarity Matrix. In 13th International society for music information retrieval conference (ISMIR), Porto, Portugal, 583–588.

[124]

Rahm E., Do H.H., Data cleaning: Problems and current approaches, in: IEEE bulletin of the technical committee on data engineering, 2000, pp. 3–13.

[125]

Ravanelli M., Bengio Y., Speaker recognition from raw waveform with SincNet, 2018, arXiv:1808.00158.

[126]

Ren Z., Kong Q., Han J., Plumbley M.D., Schuller B.W., Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes, in: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing, IEEE, 2019,.

[127]

Ren Z., Kong Q., Qian K., Plumbley M.D., Schuller B.W., Attention-based convolutional neural networks for acoustic scene classification, in: Detection and classification of acoustic scenes and events, 2018.

[128]

Ren, Z., Pandit, V., Qian, K., Yang, Z., Zhang, Z., & Schuller, B. (2017). Deep Sequential Image Features for Acoustic Scene Classification. In Detection and classification of acoustic scenes and eventsworkshop (DCASE), Munich, Germany.

[129]

Riccardi G., Hakkani-Tur D., Active learning: theory and applications to automatic speech recognition, IEEE Transactions on Speech and Audio Processing 13 (4) (2005) 504–511,.

[130]

Ridzuan F., Zainon W.M.N.W., A review on data cleansing methods for big data, Procedia Computer Science 161 (2019) 731–738,.

Digital Library

[131]

Roletscheck, C., Watzka, T., Seiderer, A., Schiller, D., & Andre, E. (2019). Using an Evolutionary Approach To Explore Convolutional Neural Networks for Acoustic Scene Classification. In Detectionand classification of acoustic scenes and events workshop (DCASE), New York, NY, USA.

[132]

Saki F., Guo Y., Hung C.-Y., hoon Kim L., Deshpande M., Moon S., Koh E., Visser E., Open-set evolving acoustic scene classification system, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[133]

Salah H., Al-Omari I., Alwidian J., Al-Hamadin R., Tawalbeh T., Data streams curation for better machine learning functionality and result to serve IoT and other applications: A survey, Journal of Computer Science 15 (10) (2019) 1572–1584,.

[134]

Salamon J., Bello J.P., Deep convolutional neural networks and data augmentation for environmental sound classification, IEEE Signal Processing Letters 24 (3) (2017) 279–283,.

[135]

Seo H., Park J., Park Y., Acoustic scene classification using various pre-processed features andconvolutional neural networks, in: Detection and classification of acoustic scenes and events 2019, 2019.

[136]

Sharma J., Granmo O.-C., Goodwin M., Environment sound classification using multiple feature channels and attention based deep convolutional neural network, 2019, arXiv:1908.11219.

[137]

Sharma G., Umapathy K., Krishnan S., Trends in audio signal feature extraction methods, Applied Acoustics 158 (2020),.

[138]

Shuyang Z., Heittola T., Virtanen T., Active learning for sound event classification by clustering unlabeled data, in: 2017 IEEE international conference on acoustics, speech and signal processing, IEEE, 2017,.

Digital Library

[139]

Shuyang Z., Heittola T., Virtanen T., An active learning method using clustering and committee-based sample selection for sound event classification, in: 2018 16th international workshop on acoustic signal enhancement, IEEE, 2018,.

[140]

Shuyang Z., Heittola T., Virtanen T., Active learning for sound event detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020).

[141]

Sidi F., Panahy P.H.S., Affendey L.S., Jabar M.A., Ibrahim H., Mustapha A., Data quality: A survey of data quality dimensions, in: 2012 International conference on information retrieval &amp knowledge management, IEEE, 2012,.

[142]

Silla C.N., Freitas A.A., A survey of hierarchical classification across different application domains, Data Mining and Knowledge Discovery 22 (1–2) (2010) 31–72,.

Digital Library

[143]

Singh A., Kaur N., Kukreja V., Kadyan V., Kumar M., Computational intelligence in processing of speech acoustics: A survey, Complex &Amp Intelligent Systems 8 (3) (2022) 2623–2661,.

[144]

Singh A., Thakur A., Rajan P., Bhavsar A., A layer-wise score level ensemble framework for acoustic scene classification, in: 2018 26th European signal processing conference, IEEE, 2018,.

[145]

Soo Hyun Bae, I. C., & Kim, N. S. (2016). Acoustic Scene Classification using Parallel Combination of LSTM and CNN. In Detection and classification of acoustic scenes and events workshop (DCASE),Budapest, Hungary, 3 September 2016.

[146]

Sowe S.K., Zettsu K., The architecture and design of a community-based cloud platform for curating big data, in: 2013 International conference on cyber-enabled distributed computing and knowledge discovery, IEEE, 2013,.

Digital Library

[147]

Spoorthy V., Mulimani M., Koolagudi S.G., Acoustic scene classification using deep learning architectures, in: 2021 6th international conference for convergence in technology, IEEE, 2021,.

[148]

Stonebrake M., Ilyas I.F., Data integration: The current status and the way forward, IEEE Data Engineering Bulletin 41 (2) (2018) 3–9.

[149]

Stonebraker, M., Bruckner, D., Ilyas, I. F., Beskales, G., Cherniack, M., & Zdonik, S. (2013). Data Curation at Scale: The Data Tamer System. In 6th Biennial conference on innovative data systems research (CIDR ’13), Asilomar, California, USA.

[150]

Suh S., Lim W., Park S., Jeong Y., Acoustic scene classification using SpecAugment and convolutional neural networkwith inception modules, Detection and Classification of Acoustic Scenes and Events 2019, Realistic AV Research GroupElectronics and Telecommunications Research Institute218 Gajeong-ro, Yuseong-gu, Daejeon, Korea, 2019.

[151]

Suh S., Park S., Jeong Y., Lee T., Designing acoustic scene classification models with CNN variants, Detection and Classification of Acoustic Scenes and Events 2020, Media Coding Research SectionElectronics and Telecommunications Research Institute, 2020.

[152]

Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A., Going deeper with convolutions, in: 2015 IEEE conference on computer vision and pattern recognition, IEEE, 2015,.

[153]

Szegedy C., Zaremba W., Sutskever I., Bruna J., Erhan D., Goodfellow I., Fergus R., Intriguing properties of neural networks, in: International conference on learning representations, 2013, arXiv:1312.6199.

[154]

Takahashi G., Yamada T., Ono N., Makino S., Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic features, in: 2017 Asia-Pacific signal and information processing association annual summit and conference, IEEE, 2017,.

[155]

Thickstun J., Harchaoui Z., Kakade S., Learning features of music from scratch, in: International conference on learningrepresentations, 2016, arXiv:1611.09827.

[156]

Thirumuruganathan S., Tang N., Ouzzani M., Doan A., Data curation with deep learning, Open Proceedings (2020).

[157]

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention is all you need, 2017, arXiv:1706.03762.

[158]

Virtanen, T., Mesaros, A., Heittola, T., Diment, A., Vincent, E., Benetos, E., & Elizalde, B. M. (2017). Detection and Classification of AcousticScenes and Events 2017 Workshop (DCASE2017). In Proceedings of the detection and classification of acousticscenes and events 2017 workshop.

[159]

Waldekar S., Saha G., Classification of audio scenes with novel features in a fused system framework, Digital Signal Processing 75 (2018) 71–82,.

Digital Library

[160]

Wang Y., Getreuer P., Hughes T., Lyon R.F., Saurous R.A., Trainable frontend for robust and far-field keyword spotting, in: 2017 IEEE international conference on acoustics, speech and signal processing, IEEE, 2017,.

Digital Library

[161]

Wang H., Li M., Bu Y., Li J., Gao H., Zhang J., Cleanix:A big data cleaning parfait, ACM SIGMOD Record 44 (4) (2016) 35–40,.

Digital Library

[162]

Wang H., Zou Y., Wang W., SpecAugment++: A hidden space data augmentation method for acoustic scene classification, 2021, arXiv:2103.16858.

[163]

Wilkinghoff K., Kurth F., Open-set acoustic scene classification with deep convolutional autoencoders, in: Proceedings of the detection and classification of acoustic scenes and events 2019 workshop, New York University, 2019,.

[164]

Wu T.T., Lange K., Coordinate descent algorithms for lasso penalized regression, The Annals of Applied Statistics 2 (1) (2008),.

[165]

Wu Y., Lee T., Enhancing sound texture in CNN-based acoustic scene classification, in: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing, IEEE, 2019,.

[166]

Xia X., Togneri R., Sohel F., Zhao Y., Huang D., A survey: Neural network-based deep learning for acoustic event detection, Circuits, Systems, and Signal Processing 38 (8) (2019) 3433–3453,.

Digital Library

[167]

Xu J.-X., Lin T.-C., Yu T.-C., Tai T.-C., Chang P.-C., Acoustic scene classification using reduced mobile net architecture, in: 2018 IEEE international symposium on multimedia, IEEE, 2018,.

[168]

Yakout M., Berti-Équille L., Elmagarmid A.K., Don’t be SCAREd, in: Proceedings of the 2013 international conference on management of data, ACM Press, 2013,.

Digital Library

[169]

Yamaguchi O., Fukui E., Maeda K., Face recognition using temporal image sequence, in: Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, IEEE Comput. Soc, 2002,.

[170]

Yang, L., Chen, X., & Tao, L. (2018). Acoustic Scene Classification using Multi-Scale Features. In Detection and classification of acoustic scenes and events workshop (DCASE), Surrey, UK.

[171]

Yang C., Puthal D., Mohanty S.P., Kougianos E., Big-sensing-data curation for the cloud is coming: A promise of scalable cloud-data-center mitigation for next-generation IoT and wireless sensor networks, IEEE Consumer Electronics Magazine 6 (4) (2017) 48–56,.

[172]

Yasumoto K., Yamaguchi H., Shigeno H., Survey of real-time processing technologies of IoT data streams, Journal of Information Processing 24 (2) (2016) 195–202,.

[173]

Ye J., Kobayashi T., Toyama N., Tsuda H., Murakawa M., Acoustic scene classification using efficient summary statistics and multiple spectro-temporal descriptor fusion, Applied Sciences 8 (8) (2018) 1363,.

[174]

Ye J., Kobayashi T., Wang X., Tsuda H., Murakawa M., Audio data mining for anthropogenic disaster identification: An automatic taxonomy approach, IEEE Transactions on Emerging Topics in Computing 8 (1) (2020) 126–136,.

[175]

Zeinali H., Burget L., Černocký J.H., Acoustic scene classification using fusion of attentive convolutional neural networks for DCASE2019 challenge, 2019, arXiv:1907.07127.

[176]

Zhang H., Cisse M., Dauphin Y.N., Lopez-Paz D., mixup: Beyond empirical risk minimization, 2017, arXiv:1710.09412.

[177]

Zheng X., Acoustic scene classification combining log-mel CNN model and end-to-end model, National Engineering Laboratory for Speech and Language Information ProcessingUniversity of Science and Technology of China, Hefei, China, 2019.

[178]

Zhong Z., Zheng L., Kang G., Li S., Yang Y., Random erasing data augmentation, 2017, arXiv:1708.04896.

[179]

Zieliński S., Lee H., Feature extraction of binaural recordings for acoustic scene classification, in: Proceedings of the 2018 federated conference on computer science and information systems, IEEE, 2018,.

Cited By

Ding BZhang TWang CLiu GLiang JHu RWu YGuo D(2024)Acoustic scene classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121902238:PBOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121902

Recommendations

Acoustic scene classification: A comprehensive survey
Highlights
- Present an overview of ASC methods covering earlier works and recent advances.
- Review core techniques including data processing, feature acquisition, and modeling.
- Summarize available resources for ASC and analyze ASC tasks in ...
Abstract
Acoustic scene classification (ASC) has gained significant interest recently due to its diverse applications. Various audio signal processing and machine learning methods have been proposed for ASC. The volume and scope of ASC publications ...
Computer-aided diagnosis for breast cancer classification using deep neural networks and transfer learning
Highlights
- Breast Cancer Classification (CC) automatic diagnosis with DL.
- Best average ...
Abstract
Background and objective: Many developed and non-developed countries worldwide suffer from cancer-related fatal diseases. In particular, the rate of breast cancer in females increases daily, partially due to unawareness ...
A Multi-task Learning Approach Based on Convolutional Neural Network for Acoustic Scene Classification
ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

Acoustic Scene Classification (ASC) aim to recognize an acoustic scene in audio signal records. The acoustic scene is a mixture of background sounds and various sound events, and sound events often determine the type of acoustic scene. However, in many ...

Comments

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal

Expert Systems with Applications: An International Journal Volume 229, Issue PA

Nov 2023

1358 pages

ISSN:0957-4174

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2023

Author Tags

Qualifiers

Review-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ding BZhang TWang CLiu GLiang JHu RWu YGuo D(2024)Acoustic scene classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121902238:PBOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121902

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents