1. Introduction
Water is one of the most significant substances on earth. A large amount of water is wasted by various means, particularly through the overflow of rivers, lakes and similar streams. As a result, dry land receives an extensive flow of water, which creates flooding. Mostly, flood causes the loss of human lives and severe destruction of valuable resources. An appropriate response to flooding requires a timely and accurate flow of information from the affected area to the responsible organizations. Flood response systems have been improved from manual record keeping, to sensor-based monitoring of sensitive zones. However, sensor-based complex systems also require an enormous quantity of human and hardware resources, to monitor and report flooding conditions [
1].
Recently, use of social media has increased greatly, which has generated a lot of valuable data. The enormous amount of social media data is the key to resolve various challenging problems, specifically the monitoring of disastrous situations [
2,
3,
4]. Disaster-related situations require timely information, so that the damage can be minimized through appropriate measures from the responsible authorities. As it is very difficult to physically collect information from all flood-sensitive zones, social media data play a significant role in providing valuable and timely information. Different social networks, including twitter and Flickr, provide facilities of uploading and sharing text and images to its subscribers. This enormous quantity of data can be processed to extract useful information to obtain better solutions of various problems including floods.
The focus of using social media data for flooding events has received more attention due to the MediaEval benchmark workshop [
5,
6,
7]. The workshop aimed to encourage social media and satellite data based solutions for different challenges of the flooding event. MediaEval, 2017 targeted solutions of classifying flooding events with the help of text and images retrieved from social media data [
5]. MediaEval, 2018 has focused on the presence of roads and their passability status by using social media data [
6]. MediaEval, 2019 evaluated the severity of the flooding situation by predicting a person in an image experiencing a water level above their knees, and the prediction of whether or not an online article is related to a flooding situation [
7].
Deep Learning based approaches are widely used for flood classification using social media [
4,
8,
9,
10,
11,
12]. A significant effort has been invested in searching for better solutions using deep learning approaches. Ensemble techniques such as bagging are becoming increasingly significant as they have frequently shown the ability to improve upon the generalization ability of a single deep learning model [
3,
4,
11]. This paper describes our ensemble-based deep learning system VRBaggedNet, presented in MediaEval competitions for flood classification using social media. This system ranked first in the Medieval 2020 flood classification task when evaluated independently by the organizers. The main novelty in this paper is the ensemble of several bagged based Visual Geometry Group (VGG) and Residual Network (ResNet) models. Several deep learning models are trained using Bagging, i.e., by sampling with replacement of training data. These models are later combined using aggregation in order to reduce the over-fitting and the error rate of individual learner. In addition, the proposed ensemble-based system also alleviates the inherent problem of class imbalance in some MediaEval tasks. The following are our main contributions:
A comprehensive literature survey of the state-of-the-art methods for the classification of different flooding events is provided in this paper. The discussion includes shallow learning- as well as deep learning-based methods for feature extraction and classification of flooding events.
A VRBagged-Net ensemble classification framework has been proposed for the successful classification of flooding events. Several accurate and diverse models were trained using VGG and ResNet-based deep learning models. These models were then combined using Majority Voting.
Experiments were conducted on several standard benchmarks for flood classification and compared with the state-of-the-art approaches.
This paper is organized as follows.
Section 2 reviews existing work followed by description of data sets in
Section 4.
Section 3 discusses the proposed methodology with experiments and results in
Section 5.
Section 6 concludes the paper. The ensemble-based deep learning models have provided various useful outcomes in prediction tasks in different research fields including cancer prediction, speech recognition and crude oil price prediction. Stacking-based ensemble learning has been performed for speech recognition, in which various deep learning neural networks are stacked, including DNN, CNN, and RNN [
13]. Researchers have combined the merits of Stacked Denoising Autoencoder (SDAE) and Bootstrap Aggregation (Bagging) and formulated a method for prediction of crude oil price [
14]. Another ensemble-based approach has been proposed for the prediction of cancer patients, which has combined different machine learning and deep learning models [
15]. Ensemble-based framework IBaggedFCNet has been proposed for the detection of anomalies in videos, which has utilized Inception-v3 and a 3-Layer Fully Connected (FC) neural network [
16].
Social media data can also produce fruitful information for disaster response when combined with various other data resources, including hydrological data. An effective multimodal neural network has been designed by combining the text of tweets and hydrological information based on the timestamp and location mentioned in tweets [
17]. Another research effort has generated a method for improving situation awareness during disasters by combining social media data with hydrological and sensor-based data [
18]. Another research attempt has combined crowdsourced photos and volunteered geographic data to produce an effective method for the estimation of flooding events and identification of its affected regions [
19].
2. Literature Review
In recent times, social media data has been used to identify clues of disastrous events. This section of the literature review discusses methods used for the classification of various flooding events. The first part includes the literature, regarding use of social media data for the detection of availability or unavailability of flood in images. The second part of the section includes the literature on identifying the availability of passable roads in the images and the final part discusses methods used to find flood-related topics of articles, from their respective images.
At the MediaEval 2017 workshop, a specific task was designed to combine social media text and images, along with satellite images, to identify flooding events for emergency response [
5]. The subtask of the MediaEval, 2017 named as Disaster Image Retrieval from Social Media (DIRSM) [
5], provided a dataset of images, along with their relevant text taken from different social media networks. The participants of the workshop used diverse approaches to solve the challenge. Researchers [
8] have used Deep Convolutional Neural Network (DCNN) to perform binary image classification on the basis of availability or unavailability of flood. Features have been extracted from images with the help of GoogleNet [
9], pretrained on places205, and then extracted features were merged with conventional features, which includes AutoColor Correlation (AC), Edge Histogram (EH), and Tamura. Finally, a Support Vector Machine (SVM) classifier was used to perform binary classification. Another team of researchers [
9] used the AlexNet model, pretrained on Places and ImageNet [
20] datasets for feature extraction and a Support Vector Machine (SVM) was used for the classification of images [
9].
Many other researchers have applied different state-of-the-art methods over the dataset of DIRSM [
5]. Researchers [
2] have used Spectral Regression in combination with Kernel Discriminant Analysis (SRKDA), over the ensemble of conventional features to predict confidence for binary image classification on the DIRSM dataset. The MultiBrasil team [
21] utilized GoogleNet [
22], which is pre-trained on the ImageNet dataset, and performed binary classification on the DIRSM dataset. In another research effort [
23], X-ResNet [
24] pretrained on DeepSentiBank [
25] was used, along with Support Vector Machine (SVM) to conduct binary classification of the images of DIRSM dataset. X-ResNet [
24] is the extension of ResNet [
26].
The prompt response to disastrous events heavily depends upon the availability of pass-able roads, particularly in a flooding situation. The research was initiated by MediaEval, 2018 [
6], which aimed to find the availability of evidence of passable roads. Each instance of dataset comprises of a tweet, with both text and images. Classification was performed by using text, images and the combination of both [
6]. Researchers have used text to find the status of roads, but the literature shows that the text-based part of a dataset does not produced any significant outcome, which could help in either finding evidence or status of passable roads [
3,
10,
11,
27]. However, another dimension of research has been explored by using image instances given in tweets. Research efforts have utilized various feature extraction and classification techniques to find the status of roads. Different pretrained networks including VGG [
28], DenseNet201 [
29], Inception V3 [
30] and ResNet50 [
26] have been used in a variety of methods and obtained promising results [
3,
10,
11,
27]. It has been observed that in comparison to text, visual data have provided significantly better outcomes.
The response system for a flooding event greatly depends on knowledge of the severity of the situation. Medieval Benchmark Workshop 2019 released the dataset for “Multimodal Flood Level Estimation” [
7]. The dataset consists of images related to flooding disasters. The dataset was designed to create a better method of image classification on the basis of whether or not one person is available in the image in a standing position, who has water level above their knees [
7]. Researchers have utilized different techniques for the detection of such a person and water level above their knees, which includes the use of a 22 layer GoogleNet [
22], five fold cross-validation approach with VGG16 and Inception V3, combination of Faster-RCNN, VGG16 and ResNet50 architecture and so on [
4,
12,
31]. In another task of MediaEval Workshop 2019, articles have been analyzed by their images and classified on the basis of whether or not the topic of the article is related to a flooding event [
7]. The task is named as “Image-based News Topic Disambiguation” [
7]. Prominent research efforts have given more attention to pretrained networks for extraction of features and classification. Few research efforts include utilization of a cross-validation-based approach along with VGG16 and Inception V3 pretrained networks, implementation of the ensembled method by using VGG16 pretrained on ImageNet and Places365 datasets. State-of-the-art results have been evaluated by using the F1 measure and it has been shown that deep learning-based methods have produced highly successful outcomes [
4,
12,
31]. Different ensemble-based approaches have been used in disaster response systems that exploit social media data. The majority of the approaches have utilized the weights of different pre-trained Convolutional Neural Networks (CNN) and various ensemble-based methods. Major challenges faced by the researchers include class imbalance, and use of pre-trained weights from ImageNet dataset, which primarly focuses on object-level information, rather than scene-level information.