Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

How Deep Features Have Improved Event Recognition in Multimedia: A Survey

Published: 05 June 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Event recognition is one of the areas in multimedia that is attracting great attention of researchers. Being applicable in a wide range of applications, from personal to collective events, a number of interesting solutions for event recognition using multimedia information sources have been proposed. On the other hand, following their immense success in classification, object recognition, and detection, deep learning has been shown to perform well in event recognition tasks also. Thus, a large portion of the literature on event analysis relies nowadays on deep learning architectures. In this article, we provide an extensive overview of the existing literature in this field, analyzing how deep features and deep learning architectures have changed the performance of event recognition frameworks. The literature on event-based analysis of multimedia contents can be categorized into four groups, namely (i) event recognition in single images; (ii) event recognition in personal photo collections; (iii) event recognition in videos; and (iv) event recognition in audio recordings. In this article, we extensively review different deep-learning-based frameworks for event recognition in these four domains. Furthermore, we also review some benchmark datasets made available to the scientific community to validate novel event recognition pipelines. In the final part of the manuscript, we also provide a detailed discussion on basic insights gathered from the literature review, and identify future trends and challenges.

    References

    [1]
    Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, and Tuomas Virtanen. 2017. Sound event detection in multichannel audio using spatial and harmonic features. arXiv preprint arXiv:1706.02293 (2017).
    [2]
    Sharath Adavanne, Archontis Politis, and Tuomas Virtanen. 2018. Multichannel sound event detection using 3D convolutional neural networks for learning inter-channel features. arXiv preprint arXiv:1801.09522 (2018).
    [3]
    Kashif Ahmad, Nicola Conci, Giulia Boato, and Francesco G. B. De Natale. 2016. USED: A large-scale social event detection dataset. In Proceedings of the 7th International Conference on Multimedia Systems. ACM, 50.
    [4]
    Kashif Ahmad, Nicola Conci, Giulia Boato, and Francesco G. B. De Natale. 2017. Event recognition in personal photo collections via multiple instance learning-based classification of multiple images. Journal of Electronic Imaging 26, 6 (2017), 060502.
    [5]
    Kashif Ahmad, Nicola Conci, and F. G. B. De Natale. 2018. A saliency-based approach to event recognition. Signal Processing: Image Communication 60 (2018), 42--51.
    [6]
    Kashif Ahmad, Francesco De Natale, Giulia Boato, and Andrea Rosani. 2016. A hierarchical approach to event discovery from single images using MIL framework. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 1223--1227.
    [7]
    Kashif Ahmad, M. L. Mekhalfi, Nicola Conci, Giliua Boato, F. Melgani, and F. G. B. De Natale. 2017. A pool of deep models for event recognition. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2886--2890.
    [8]
    Kashif Ahmad, Mohamed Lamine Mekhalfi, and Nicola Conci. 2018. Event recognition in personal photo collections: An active learning approach. Electronic Imaging 2018, 2 (2018), 1--5.
    [9]
    Kashif Ahmad, Mohamed Lamine Mekhalfi, Nicola Conci, Farid Melgani, and Francesco De Natale. 2018. Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 2 (2018), 51.
    [10]
    Kashif Ahmad, Konstantin Pogorelov, Michael Riegler, Nicola Conci, and Pål Halvorsen. 2018. Social media and satellites. Multimedia Tools and Applications (2018), 1--39.
    [11]
    Kashif Ahmad, Konstantin Pogorelov, Michael Riegler, Nicola Conci, and H. Pal. 2017. CNN and GAN based satellite and social media data fusion for disaster detection. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland.
    [12]
    Kashif Ahmad, Amir Sohail, Nicola Conci, and Francesco De Natale. 2018. A comparative study of global and deep features for the analysis of user-generated natural disaster related images. In Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). IEEE, 1--5.
    [13]
    Sheharyar Ahmad, Kashif Ahmad, Nasir Ahmad, and Nicola Conci. 2017. Convolutional neural networks for disaster images retrieval. In Proceedings of the MediaEval 2017 Workshop (Sept. 13--15, 2017). Dublin, Ireland.
    [14]
    Siti Nor Khuzaimah Binti Amit, Soma Shiraishi, Tetsuo Inoshita, and Yoshimitsu Aoki. 2016. Analysis of satellite images for disaster detection. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 5189--5192.
    [15]
    Nazia Attari, Ferda Ofli, Mohammad Awad, Ji Lucas, and Sanjay Chawla. 2016. Nazr-CNN: Fine-grained classification of UAV imagery for damage assessment. arXiv preprint arXiv:1611.06474 (2016).
    [16]
    Konstantinos Avgerinakis, Anastasia Moumtzidou, Stelios Andreadis, Emmanouil Michail, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Visual and textual analysis of social media and satellite images for flood detection@ multimedia satellite task MediaEval 2017. In Proceedings of the Working Notes Proceeding MediaEval Workshop, Dublin, Ireland. 13--15.
    [17]
    Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems. 892--900.
    [18]
    Elham Babaee, Nor Badrul Anuar, Ainuddin Wahid Abdul Wahab, Shahaboddin Shamshirband, and Anthony T. Chronopoulos. 2018. An overview of audio event detection methods from feature extraction to classification. Applied Artificial Intelligence (2018), 1--54.
    [19]
    Siham Bacha, Mohand Said Allili, and Nadjia Benblidia. 2016. Event recognition in photo albums using probabilistic graphical models and feature relevance. Journal of Visual Communication and Image Representation 40 (2016), 546--558.
    [20]
    Lamberto Ballan, Alessio Bazzica, Marco Bertini, Alberto Del Bimbo, and Giuseppe Serra. 2009. Deep networks for audio event classification in soccer videos. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’09). IEEE, 474--477.
    [21]
    Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision. Springer, 404--417.
    [22]
    Benjamin Bischke, Prakriti Bhardwaj, Aman Gautam, Patrick Helber, D. Borth, and A. Dengel. 2017. Detection of flooding events in social multimedia and satellite imagery using deep neural networks. In Proceedings of the Working Notes Proceeding MediaEval Workshop, Dublin, Ireland.
    [23]
    Benjamin Bischke, Damian Borth, Christian Schulze, and Andreas Dengel. 2016. Contextual enrichment of remote-sensed events with social media streams. In Proceedings of the 2016 ACM Multimedia Conference. ACM, 1077--1081.
    [24]
    Benjamin Bischke, Patrick Helber, Christian Schulze, Srinivasan Venkat, Andreas Dengel, and Damian Borth. 2017. The multimedia satellite task at MediaEval 2017: Emergence response for flooding events. In Proceedings of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland.
    [25]
    Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. ACM, 401--408.
    [26]
    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2013. Event recognition in photo collections with a stopwatch HMM. In Proceedings of the IEEE International Conference on Computer Vision. 1193--1200.
    [27]
    Markus Brenner and Ebroul Izquierdo. 2011. MediaEval benchmark: Social event detection in collaborative photo collections. In MediaEval.
    [28]
    Emre Cakir, Toni Heittola, Heikki Huttunen, and Tuomas Virtanen. 2015. Polyphonic sound event detection using multi label deep neural networks. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--7.
    [29]
    Emre Cakir, Ezgi Can Ozan, and Tuomas Virtanen. 2016. Filterbank learning for deep neural network based polyphonic sound event detection. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 3399--3406.
    [30]
    Liangliang Cao, Shih-Fu Chang, Noel Codella, Courtenay Cotton, Dan Ellis, Leiguang Gong, Matthew Hill, Gang Hua, John Kender, Michele Merler, Yadong Mu, Apostol Natsev, and John R. Smith. 2011. IBM research and Columbia University TRECVID-2011 multimedia event detection (MED) system. In NIST TRECVID Workshop, Vol. 28.
    [31]
    Xiaojun Chang, Yao-Liang Yu, Yi Yang, and Eric P. Xing. 2017. Semantic pooling for complex event analysis in untrimmed videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 8 (2017), 1617--1632.
    [32]
    S. Chatzichristofis, Y. Boutalis, and Mathias Lux. 2009. Selection of the proper compact composite descriptor for improving content based image retrieval. In Proceedings of the 6th IASTED International Conference, Vol. 134643. 064.
    [33]
    Savvas A. Chatzichristofis and Yiannis S. Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In Proceedings of the International Conference on Computer Vision Systems. Springer, 312--322.
    [34]
    Ming-yu Chen and Alexander Hauptmann. 2009. Mosift: Recognizing human actions in surveillance videos. (2009).
    [35]
    Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586 (2014).
    [36]
    Inkyu Choi, Kisoo Kwon, Soo Hyun Bae, and Nam Soo Kim. 2016. DNN-based sound event detection with exemplar-based approach for noise reduction. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016). 16--19.
    [37]
    Selina Chu, Shrikanth Narayanan, C.-C. Jay Kuo, and Maja J. Mataric. 2006. Where am I? Scene recognition for mobile robots using audio features. In Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, IEEE, 885--888.
    [38]
    Courtenay V. Cotton and Daniel P. W. Ellis. 2011. Spectral vs. spectro-temporal features for acoustic event detection. In Proceedings of the 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 69--72.
    [39]
    Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2018. AutoAugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018).
    [40]
    Juncheng Li Dai Wei, Phuong Pham, Samarjit Das, Shuhui Qu, and Florian Metze. 2016. Sound event detection for real life audio DCASE challenge. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events.
    [41]
    Minh-Son Dao, Duc-Tien Dang-Nguyen, and Francesco G. B. De Natale. 2014. Robust event discovery from photo collections using Signature Image Bases (SIBs). Multimedia Tools and Applications 70, 1 (2014), 25--53.
    [42]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248--255.
    [43]
    Terrance DeVries and Graham W. Taylor. 2017. Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538 (2017).
    [44]
    Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. 2015. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition 48, 10 (2015), 2993--3003.
    [45]
    Sergio Escalera, Junior Fabian, Pablo Pardo, Xavier Baró, Jordi Gonzalez, Hugo J. Escalante, Dusan Misevic, Ulrich Steiner, and Isabelle Guyon. 2015. Chalearn looking at people 2015: Apparent age and cultural event recognition datasets and results. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1--9.
    [46]
    Lijie Fan, Wenbing Huang, Stefano Ermon Chuang Gan, Boqing Gong, and Junzhou Huang. 2018. End-to-end learning of motion representation for video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6016--6025.
    [47]
    Yachuang Feng, Yuan Yuan, and Xiaoqiang Lu. 2016. Deep representation for abnormal event detection in crowded scenes. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 591--595.
    [48]
    Jonathan G. Fiscus. 2010. TRECVID multimedia event detection 2010 evaluation. (2010).
    [49]
    Pasquale Foggia, Nicolai Petkov, Alessia Saggese, Nicola Strisciuglio, and Mario Vento. 2015. Reliable detection of audio events in highly noisy environments. Pattern Recognition Letters 65 (2015), 22--28.
    [50]
    Pasquale Foggia, Nicolai Petkov, Alessia Saggese, Nicola Strisciuglio, and Mario Vento. 2016. Audio surveillance of roads: A system for detecting anomalous sounds. IEEE Transactions on Intelligent Transportation Systems 17, 1 (2016), 279--288.
    [51]
    Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM International Conference on Multimedia. ACM, 411--412.
    [52]
    Alexandre R. J. Francois, Ram Nevatia, Jerry Hobbs, Robert C. Bolles, and John R. Smith. 2005. VERL: An ontology framework for representing and annotating video events. IEEE Multimedia 12, 4 (2005), 76--86.
    [53]
    Steve Frolking, Jianjun Qiu, Stephen Boles, Xiangming Xiao, Jiyuan Liu, Yahui Zhuang, Changsheng Li, and Xiaoguang Qin. 2002. Combining remote sensing and ground census data to develop new maps of the distribution of rice agriculture in China. Global Biogeochemical Cycles 16, 4 (2002).
    [54]
    Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, and Yong Rui. 2015. Relaxing from vocabulary: Robust weakly-supervised deep learning for vocabulary-free image tagging. In Proceedings of the IEEE International Conference on Computer Vision. 1985--1993.
    [55]
    Chuang Gan, Chen Sun, Lixin Duan, and Boqing Gong. 2016. Webly-supervised video recognition by mutually voting for relevant web images and web video frames. In Proceedings of the European Conference on Computer Vision. Springer, 849--866.
    [56]
    Chuang Gan, Naiyan Wang, Yi Yang, Dit-Yan Yeung, and Alex G. Hauptmann. 2015. Devnet: A deep event network for multimedia event detection and evidence recounting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2568--2577.
    [57]
    Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, and Tao Mei. 2016. You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 923--932.
    [58]
    Oguzhan Gencoglu, Tuomas Virtanen, and Heikki Huttunen. 2014. Recognition of acoustic events using deep neural networks. In Proceedings of the 2014 22nd European Signal Processing Conference (EUSIPCO). IEEE, 506--510.
    [59]
    D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. Plumbley. 2013. IEEE AASP challenge: Detection and classification of acoustic scenes and events. Queen Mary University of London: London, UK (2013).
    [60]
    Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580--587.
    [61]
    Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press, Cambridge.
    [62]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.
    [63]
    Cong Guo and Xinmei Tian. 2015. Event recognition in personal photo collections using hierarchical model and multiple features. In Proceedings of the 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--6.
    [64]
    Cong Guo, Xinmei Tian, and Tao Mei. 2017. Multi-granular event recognition of personal photo albums. IEEE Transactions on Multimedia (2017).
    [65]
    Aki Harma, Martin F. McKinney, and Janto Skowronek. 2005. Automatic surveillance of the acoustic activity in our living environment. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005 (ICME 2005). IEEE, 4--pp.
    [66]
    Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, and Kazuya Takeda. 2016. Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016). 35--39.
    [67]
    Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, and Kazuya Takeda. 2017. BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic sound event detection. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 766--770.
    [68]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).
    [69]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
    [70]
    Toni Heittola, Annamaria Mesaros, Tuomas Virtanen, and Moncef Gabbouj. 2013. Supervised model training for overlapping sound events based on unsupervised source separation. In Proceedings of ICASSP. 8677--8681.
    [71]
    Somboon Hongeng, Ram Nevatia, and Francois Bremond. 2004. Video-based event recognition: Activity representation and probabilistic recognition methods. Computer Vision and Image Understanding 96, 2 (2004), 129--162.
    [72]
    Yuanbo Hou and Shengchen Li. 2017. Sound Event Detection in Real Life Audio Using Multimodel System. Technical Report. DCASE2017 Challenge, Tech. Rep.
    [73]
    Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116, 1 (2016), 1--20.
    [74]
    I-Hong Jhuo and D. T. Lee. 2014. Video event detection via multi-modality deep learning. In Proceedings of the 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, 666--671.
    [75]
    Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678.
    [76]
    Lu Jiang, Alexander G. Hauptmann, and Guang Xiang. 2012. Leveraging high-level and low-level features for multimedia event detection. In Proceedings of the 20th ACM International Conference on Multimedia. ACM, 449--458.
    [77]
    Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364.
    [78]
    Brendan Jou and Shih-Fu Chang. 2016. Deep cross residual learning for multitask visual recognition. In Proceedings of the ACM Conference on Multimedia. ACM, 998--1007.
    [79]
    Andreas Kamilaris and Francesc X. Prenafeta-Boldú. 2018. Disaster monitoring using unmanned aerial vehicles and deep learning. arXiv preprint arXiv:1807.11805 (2018).
    [80]
    Keiller Nogueira, Samuel G. Fadel, Ícaro C. Dourado, Rafael de O. Werneck, Javier A. V. Muñoz, Otávio A. B. Penatti, Rodrigo T. Calumby, Lin Tzy Li, Jefersson A. dos Santos, and Ricardo da S. Torres. 2017. Data-driven flood detection using neural networks. In Proceedings of the MediaEval 2017 Workshop (Sept. 13--15, 2017). Dublin, Ireland.
    [81]
    Zvi Kons and Orith Toledo-Ronen. 2013. Audio event classification using deep neural networks. In Interspeech. 1482--1486.
    [82]
    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
    [83]
    Julian Kürby, Rene Grzeszick, Axel Plinge, and Gernot A. Fink. 2016. Bag-of-features acoustic event detection for sensor networks. In Proceedings on the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE'16). 55--59.
    [84]
    Ying-Hui Lai, Chun-Hao Wang, Shi-Yan Hou, Bang-Yin Chen, Yu Tsao, and Yi-Wen Liu. 2016. DCASE report for task 3: Sound event detection in real life audio. IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (2016).
    [85]
    Zhen-Zhong Lan, Lu Jiang, Shoou-I Yu, Shourabh Rawat, Yang Cai, Chenqiang Gao, Shicheng Xu, Haoquan Shen, Xuanchong Li, Yipei Wang, Waito Sze, Yan Yan, Zhigang Ma, Wei Tong, Yi Yang, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Richard Stern, Teruko Mitamura, Eric Nyberg, and Alexander Hauptmann. 2013. CMU-informedia at TRECVID 2013 multimedia event detection. In TRECVID 2013 Workshop, Vol. 1. 5.
    [86]
    Donmoon Lee, Subin Lee, Yoonchang Han, and Kyogu Lee. 2017. Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input. Technical Report. Tech. Rep., DCASE2017 Challenge.
    [87]
    Li-Jia Li and Li Fei-Fei. 2007. What, where and who? Classifying events by scene and object recognition. In Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV’07). IEEE, 1--8.
    [88]
    Hyungui Lim, Jeongsoo Park, Kyogu Lee, and Yoonchang Han. 2017. Rare sound event detection using 1D convolutional recurrent neural networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop.
    [89]
    Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).
    [90]
    Mengyi Liu, Xin Liu, Yan Li, Xilin Chen, Alexander G. Hauptmann, and Shiguang Shan. 2015. Exploiting feature hierarchies with convolutional neural networks for cultural event recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 32--37.
    [91]
    Xueliang Liu and Benoit Huet. 2013. Heterogeneous features and model selection for event-based media classification. In Proceedings of the 3rd ACM International Conference on Multimedia Retrieval. ACM, 151--158.
    [92]
    Ying Liu and Linzhi Wu. 2016. Geological disaster recognition on optical remote sensing images using deep learning. Procedia Computer Science 91 (2016), 566--575.
    [93]
    Xiang Long, Chuang Gan, Gerard de Melo, Xiao Liu, Yandong Li, Fu Li, and Shilei Wen. 2018. Multimodal keyless attention fusion for video classification. Thirty-Second AAAI Conference on Artificial Intelligence.
    [94]
    Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, and Shilei Wen. 2018. Attention clusters: Purely attention based local feature integration for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7834--7843.
    [95]
    Laura Lopez-Fuentes, Joost van de Weijer, Marc Bolanos, and Harald Skinnemoen. 2017. Multi-modal deep learning approach for flood detection. In Proceedings of the MediaEval 2017 Workshop (Sept. 13--15, 2017). Dublin, Ireland.
    [96]
    David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91--110.
    [97]
    Mathias Lux, Michael Riegler, Pål Halvorsen, Konstantin Pogorelov, and Nektarios Anagnostopoulos. 2016. LIRE: Open source visual information retrieval. In Proceedings of the 7th International Conference on Multimedia Systems. ACM, 30.
    [98]
    R. Mattivi, G. Boato, and F. G. B. De Natale. 2011. Event-based media organization and indexing. Infocommunications Journal 3, 3 (2011), 9--18.
    [99]
    Annamaria Mesaros, Toni Heittola, Aleksandr Diment, Benjamin Elizalde, Ankit Shah, Emmanuel Vincent, Bhiksha Raj, and Tuomas Virtanen. 2017. DCASE 2017 challenge setup: Tasks, datasets and baseline system. In Proceedings of the DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events.
    [100]
    Annamaria Mesaros, Toni Heittola, Antti Eronen, and Tuomas Virtanen. 2010. Acoustic event detection in real life recordings. In Proceedings of the 2010 18th European Signal Processing Conference. IEEE, 1267--1271.
    [101]
    Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2016. Metrics for polyphonic sound event detection. Applied Sciences 6, 6 (2016), 162.
    [102]
    Pascal Mettes, Dennis C. Koelma, and Cees G. M. Snoek. 2016. The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 175--182.
    [103]
    Matthias Meyer, Lukas Cavigelli, and Lothar Thiele. 2017. Efficient convolutional neural network for audio event detection. arXiv preprint arXiv:1709.09888 (2017).
    [104]
    Dao Minh-Son, Pham Quang-Nhat-Minh, and Dang-Nguyen Duc-Tien. 2017. A domain-based late-fusion for disaster image retrieval from social media. In Proc. of the MediaEval Workshop (Sept. 13--15, 2017). Dublin, Ireland.
    [105]
    Hanif Muhammad, Atif Muhammad, Khan Mahrukh, and Rafi Mohammad. 2017. Flood detection using social media data and spectral regression based kernel discriminant analysis. In Proceedings of the MediaEval 2017 Workshop (Sept. 13--15, 2017). Dublin, Ireland.
    [106]
    Milind Naphade, John R. Smith, Jelena Tesic, Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Alexander Hauptmann, and Jon Curtis. 2006. Large-scale concept ontology for multimedia. IEEE Multimedia 13, 3 (2006), 86--91.
    [107]
    Keiller Nogueira, Samuel G. Fadel, Ícaro C. Dourado, Rafael de O. Werneck, Javier A. V. Muñoz, Otávio A. B. Penatti, Rodrigo T. Calumby, Lin Tzy Li, Jefersson A. dos Santos, and Ricardo da S. Torres. 2017. Exploiting ConvNet diversity for flooding identification. arXiv preprint arXiv:1711.03564 (2017).
    [108]
    Dan Oneata, Matthijs Douze, Jérôme Revaud, Schwenninger Jochen, Danila Potapov, Heng Wang, Zaid Harchaoui, Jakob Verbeek, Cordelia Schmid, Robin Aly, Kevin Mcguiness, Shu Chen, Noel O'ConnorKen ChatfieldOmkar Parkhi, Relja Arandjelovic, Andrew Zisserman, Fernando Basura, and Tinne Tuytelaars. 2012. Axes at TRECVID 2012: KIS, INS, and MED. In TRECVID Workshop.
    [109]
    Paul Over, Jon Fiscus, Greg Sanders, David Joy, Martial Michel, George Awad, Alan Smeaton, Wessel Kraaij, and Georges Quénot. 2014. TRECVID 2014--An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID. 52.
    [110]
    Symeon Papadopoulos, Raphael Troncy, Vasileios Mezaris, Benoit Huet, and Ioannis Kompatsiaris. 2011. Social event detection at MediaEval 2011: Challenges, dataset and evaluation. In Proceedings of MediaEval.
    [111]
    Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6440--6444.
    [112]
    Sungheon Park and Nojun Kwak. 2015. Cultural event recognition by subregion classification with convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 45--50.
    [113]
    Georgios Petkos, Symeon Papadopoulos, Vasileios Mezaris, Raphael Troncy, Philipp Cimiano, Timo Reuter, and Yiannis Kompatsiaris. 2014. Social event detection at MediaEval: A three-year retrospect of tasks and results. In Proceedings of the International Conference on Multimedia Retrieval Workshop on Social Events in Web Multimedia (SEWM).
    [114]
    Huy Phan, Lars Hertel, Marco Maass, and Alfred Mertins. 2016. Robust audio event recognition with 1-max pooling convolutional neural networks. arXiv preprint arXiv:1604.06338 (2016).
    [115]
    Karol J. Piczak. 2015. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 1015--1018.
    [116]
    Axel Plinge, Rene Grzeszick, and Gernot A, Fink. 2014. A bag-of-features approach to acoustic event detection. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3704--3708.
    [117]
    Samira Pouyanfar and Shu-Ching Chen. 2016. Semantic event detection using ensemble deep learning. In Proceedings of the 2016 IEEE International Symposium on Multimedia (ISM). IEEE, 203--208.
    [118]
    Samira Pouyanfar and Shu-Ching Chen. 2017. Automatic video event detection for imbalance data using enhanced ensemble deep learning. International Journal of Semantic Computing 11, 1 (2017), 85--109.
    [119]
    Reza Fuad Rachmadi, Keiichi Uchimura, and Gou Koutaki. 2016. Combined convolutional neural network for event recognition. In Proceedings of the Korea-Japan Joint Workshop on Frontiers of Computer Vision. 85--90.
    [120]
    Timo Reuter, Symeon Papadopoulos, Giorgos Petkos, Vasileios Mezaris, Yiannis Kompatsiaris, Philipp Cimiano, Christopher de Vries, and Shlomo Geva. 2013. Social event detection at mediaeval 2013: Challenges, datasets, and evaluation. In Proceedings of the MediaEval Multimedia Benchmark Workshop Barcelona, Spain, October 18--19, 2013.
    [121]
    Jinyoung Rhee, Jungho Im, and Gregory J. Carbone. 2010. Monitoring agricultural drought for arid and humid regions using multi-sensor remote sensing data. Remote Sensing of Environment 114, 12 (2010), 2875--2887.
    [122]
    Seyed Morteza Safdarnejad, Xiaoming Liu, Lalita Udpa, Brooks Andrus, John Wood, and Dean Craven. 2015. Sports videos in the wild (SVW): A video dataset for sports analysis. In Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). Vol. 1. IEEE, 1--7.
    [123]
    Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. 2014. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 1041--1044.
    [124]
    Amaia Salvador, Matthias Zeppelzauer, Daniel Manchon-Vizuete, Andrea Calafell, and Xavier Giro-i Nieto. 2015. Cultural event recognition with visual convnets and temporal models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 36--44.
    [125]
    Emmanouil Schinas, Georgios Petkos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2012. CERTH@ MediaEval 2012 social event detection task. In MediaEval. Citeseer.
    [126]
    Yuhui Shi and Russell C. Eberhart. 1999. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation (CEC’99). Vol. 3. IEEE, 1945--1950.
    [127]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [128]
    Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, and Larry S. Davis. 2015. Selecting relevant web trained concepts for automated event retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 4561--4569.
    [129]
    Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly detection in surveillance videos. Center for Research in Computer Vision (CRCV). Technical Report. University of Central Florida (UCF).
    [130]
    Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang, and Wei Zhang. 2018. Optical flow guided feature: A fast and robust motion representation for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1390--1399.
    [131]
    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
    [132]
    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
    [133]
    Naoya Takahashi, Michael Gygli, Beat Pfister, and Luc Van Gool. 2016. Deep convolutional neural networks and data augmentation for acoustic event detection. arXiv preprint arXiv:1604.07160 (2016).
    [134]
    Naoya Takahashi, Michael Gygli, and Luc Van Gool. 2018. Aenet: Learning deep audio features for video analysis. IEEE Transactions on Multimedia 20, 3 (2018), 513--524.
    [135]
    Planet Team. 2016. Planet application program interface: In Space for Life on Earth. San Francisco, CA.
    [136]
    Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64--73.
    [137]
    Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497.
    [138]
    Shen-Fu Tsai, Thomas S. Huang, and Feng Tang. 2011. Album-based object-centric event recognition. In Proceedings of the 2011 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
    [139]
    Christos Tzelepis, Zhigang Ma, Vasileios Mezaris, Bogdan Ionescu, Ioannis Kompatsiaris, Giulia Boato, Nicu Sebe, and Shuicheng Yan. 2016. Event-based media processing and analysis: A survey of the literature. Image and Vision Computing 53 (2016), 3--19.
    [140]
    Dmitrii Ubskii and Alexei Pugachev. 2016. Sound event detection in real-life audio. IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (2016).
    [141]
    Jasper R. R. Uijlings, Koen E. A. Van De Sande, Theo Gevers, and Arnold W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154--171.
    [142]
    MWW Van Grootel, Tjeerd C Andringa, and JD Krijnders. 2009. DARES-G1: Database of annotated real-world everyday sounds. In Proceedings of the NAG/DAGA International Conference on Acoustics.
    [143]
    Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3169--3176.
    [144]
    Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, 3551--3558.
    [145]
    Jun Wang and Jean-Daniel Zucker. 2000. Solving multiple-instance problem: A lazy learning approach. (2000).
    [146]
    Limin Wang, Zhe Wang, Sheng Guo, and Yu Qiao. 2015. Better exploiting OS-CNNS for better event recognition in images. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 45--52.
    [147]
    Limin Wang, Zhe Wang, Yu Qiao, and Luc Van Gool. 2017. Transferring deep object and scene representations for event recognition in still images. International Journal of Computer Vision (2017), 1--20.
    [148]
    Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
    [149]
    Xiaolong Wang and Abhinav Gupta. 2018. Videos as space-time region graphs. arXiv preprint arXiv:1806.01810 (2018).
    [150]
    Xiaoyang Wang and Qiang Ji. 2015. Video event recognition with deep hierarchical context model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4418--4427.
    [151]
    Yun Wang and Florian Metze. 2016. Recurrent support vector machines for audio-based multimedia event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 265--269.
    [152]
    Yun Wang and Florian Metze. 2017. A first attempt at polyphonic sound event detection using connectionist temporal classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2986--2990.
    [153]
    Yun Wang, Leonardo Neves, and Florian Metze. 2016. Audio-based multimedia event detection using deep recurrent neural networks. In Proceedings of the 2016 I/eee International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2742--2746.
    [154]
    Xiu-Shen Wei, Bin-Bin Gao, and Jianxin Wu. 2015. Deep spatial pyramid ensemble for cultural event recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 38--44.
    [155]
    Sebastien C. Wong, Adam Gatt, Victor Stamatescu, and Mark D. McDonnell. 2016. Understanding data augmentation for classification: When to warp? arXiv preprint arXiv:1609.08764 (2016).
    [156]
    Zifeng Wu, Yongzhen Huang, and Liang Wang. 2015. Learning representative deep features for image set analysis. IEEE Transactions on Multimedia 17, 11 (2015), 1960--1968.
    [157]
    Yuanjun Xiong, Kai Zhu, Dahua Lin, and Xiaoou Tang. 2015. Recognize complex events from static images by fusing deep channels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1600--1609.
    [158]
    Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, and Nicu Sebe. 2015. Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015).
    [159]
    Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2015. A discriminative CNN video representation for event detection. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1798--1807.
    [160]
    Ronald R. Yager and Dimitar P. Filev. 1999. Induced ordered weighted averaging operators. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 29, 2 (1999), 141--150.
    [161]
    Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. 2015. Describing videos by exploiting temporal structure. In Proceedings of the IEEE International Conference on Computer Vision. 4507--4515.
    [162]
    Guangnan Ye, Yitong Li, Hongliang Xu, Dong Liu, and Shih-Fu Chang. 2015. Eventnet: A large scale structured concept library for complex event detection in video. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 471--480.
    [163]
    Serena Yeung, Olga Russakovsky, Ning Jin, Mykhaylo Andriluka, Greg Mori, and Li Fei-Fei. 2018. Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision 126, 2--4 (2018), 375--389.
    [164]
    Litao Yu, Xiaoshuai Sun, and Zi Huang. 2016. Robust spatial-temporal deep model for multimedia event detection. Neurocomputing 213 (2016), 48--53.
    [165]
    Shoou-I Yu, Lu Jiang, Zexi Mao, Xiaojun Chang, Xingzhong Du, Chuang Gan, Zhenzhong Lan, Zhongwen Xu, Xuanchong Li, Yang Cai, Anurag Kumar, Yajie Miao, Lara Martin, Nikolas Wolfe, Shicheng Xu, Huan Li, Ming Lin, Zhigang Ma, Yi Yang, Deyu Meng, Shiguang Shan, Pinar Duygulu Sahin, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Teruko Mitamura, Richard Stern, and Alexander Hauptmann. 2014. MER. In Proceedings of the NIST TRECVID Video Retrieval Evaluation Workshop, Vol. 24.
    [166]
    Joe Yue-Hei Ng, Fan Yang, and Larry S. Davis. 2015. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 53--61.
    [167]
    Shengxin Zha, Florian Luisier, Walter Andrews, Nitish Srivastava, and Ruslan Salakhutdinov. 2015. Exploiting image-trained CNN architectures for unconstrained video classification. arXiv preprint arXiv:1503.04144 (2015).
    [168]
    Dongqing Zhang and Dan Ellis. 2001. Detecting sound events in basketball video archive. Dept. Electronic Eng., Columbia Univ., New York (2001).
    [169]
    Xishan Zhang, Hanwang Zhang, Yongdong Zhang, Yang Yang, Meng Wang, Huanbo Luan, Jintao Li, and Tat-Seng Chua. 2016. Deep fusion of multiple semantic cues for complex event recognition. IEEE Transactions on Image Processing 25, 3 (2016), 1033--1046.
    [170]
    Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems. 487--495.
    [171]
    Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering the temporal context for video question answering. International Journal of Computer Vision 124, 3 (2017), 409--421.

    Cited By

    View all
    • (2024)From CNNs to Transformers in Multimodal Human Action Recognition: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366481520:8(1-24)Online publication date: 13-May-2024
    • (2024)WaRENet: A Novel Urban Waterlogging Risk Evaluation NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365116320:7(1-28)Online publication date: 5-Mar-2024
    • (2024)Incomplete Multiview Clustering via Semidiscrete Optimal Transport for Multimedia Data Mining in IoTACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362554820:6(1-20)Online publication date: 8-Mar-2024
    • Show More Cited By

    Index Terms

    1. How Deep Features Have Improved Event Recognition in Multimedia: A Survey

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 2
      May 2019
      375 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3339884
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 June 2019
      Accepted: 01 January 2019
      Revised: 01 January 2019
      Received: 01 July 2018
      Published in TOMM Volume 15, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Information retrieval
      2. audio event analysis
      3. deep features
      4. deep learning
      5. event detection
      6. natural disaster
      7. social events detection
      8. social media
      9. video analysis

      Qualifiers

      • Survey
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)32
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)From CNNs to Transformers in Multimodal Human Action Recognition: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366481520:8(1-24)Online publication date: 13-May-2024
      • (2024)WaRENet: A Novel Urban Waterlogging Risk Evaluation NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365116320:7(1-28)Online publication date: 5-Mar-2024
      • (2024)Incomplete Multiview Clustering via Semidiscrete Optimal Transport for Multimedia Data Mining in IoTACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362554820:6(1-20)Online publication date: 8-Mar-2024
      • (2023)Temporal Dynamic Concept Modeling Network for Explainable Video Event RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356831219:6(1-22)Online publication date: 12-Jul-2023
      • (2023)Cross-Referencing Self-Training Network for Sound Event Detection in Audio MixturesIEEE Transactions on Multimedia10.1109/TMM.2022.317859125(4573-4585)Online publication date: 2023
      • (2023)Recognizing British Sign Language Using Deep Learning: A Contactless and Privacy-Preserving ApproachIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.321028810:4(2090-2098)Online publication date: Aug-2023
      • (2023)AI-Enabled IIoT for Live Smart City Event MonitoringIEEE Internet of Things Journal10.1109/JIOT.2021.310943510:4(2872-2880)Online publication date: 15-Feb-2023
      • (2023)Performance Evaluation of CNN Models in Urban Acoustic Event Recognition Through MFCC Hyperparameter Search2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00035(186-193)Online publication date: 24-Jul-2023
      • (2023)Explainable event recognitionMultimedia Tools and Applications10.1007/s11042-023-14832-082:26(40531-40557)Online publication date: 30-Mar-2023
      • (2023)Role of Social Media Imagery in Disaster InformaticsInternational Handbook of Disaster Research10.1007/978-981-19-8388-7_170(531-551)Online publication date: 1-Oct-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media