Report For Face Mask Detection Using Python and Deep Learning
Report For Face Mask Detection Using Python and Deep Learning
on
BACHLEOR OF TECHNOLOGY
in
Electronics and Communication Engineering [ECE]
by
J. TANVI [Roll No.18311A04L0]
K. SREEJA [Roll No.18311A04L2]
K. LIKHITHA [Roll No.18311A04L8]
1
SREENIDHI INSTITUTE OF SCIENCE & TECHNOLOGY
CERTIFICATE
This is to certify that the Project Work entitled “FACE MASK DETECTION USING
2
DECLARATION
We hereby declare that the work described in this report, entitled “FACE MASK
(Telangana) -500085 is the result of investigations carried out by us under the Guidance of
Technology, Hyderabad. The work is original and has not been submitted for any
Place: Hyderabad
3
ACKNOWLEDGEMENT
We hereby declare that the work described in the Project report, entitled “FACE MASK
DETECTION USING PYTHON AND DEEP LEARNING” which is being submitted by
us in partial fulfilment for the award of Bachelor of Technology in the Dept. of Electronics&
Communication Engineering, Sreenidhi Institute of Science & Technology affiliated to
Jawaharlal Nehru Technological University Hyderabad, Kukatpally, Hyderabad (Telangana)
is the work on our own effort and has not been submitted elsewhere.
We are very thankful to MR.P. VIKRAM, ECE Dep., Sreenidhi Institute of Science and
Technology, Ghatkesar for providing the necessary guidance to this group project and giving
valuable timely suggestions over the work.
We are very thankful to DR.S. RAMANI, ECE Dep., Sreenidhi Institute of Science and
Technology, Ghatkesar for providing an initiative to this group project and giving valuable
timely suggestions over the work.
We convey our sincere thanks to Dr. S.P.V. SUBBA RAO, Head of the Department (ECE),
Sreenidhi Institute of Science and Technology, Ghatkesar, for his kind cooperation in the
completion of this work.
We even convey our sincere thanks to Dr. CHAKKALAKAL TOMY, Executive Director
and Dr.T.CH.SHIVA REDDY, Principal, Sreenidhi Institute of Science and Technology,
Ghatkesar for their kind cooperation in the completion of the group project.
Finally, we extend our sense of gratitude to all our friends, teaching and non-teaching faculty,
who directly or indirectly helped us in this endeavor.
4
ABSTRACT
Effective strategies to restrain COVID-19 pandemic need high attention to mitigate
negatively impacted communal health and global economy, with the brim-full horizon yet to
unfold. In the absence of effective antiviral and limited medical resources, many measures
are recommended by WHO to control the infection rate and avoid exhausting the limited
medical resources. Wearing a mask is among the non-pharmaceutical intervention measures
that can be used to cut the primary source of SARS-CoV2 droplets expelled by an infected
individual. Regardless of discourse on medical resources and diversities in masks, all
countries are mandating coverings over the nose and mouth in public. To contribute towards
communal health, this paper aims to devise a highly accurate and real-time technique that can
efficiently detect non-mask faces in public and thus, enforcing to wear mask. The proposed
technique is ensemble of one-stage and two-stage detectors to achieve low inference time and
high accuracy. We start with ResNet50 as a baseline and applied the concept of transfer
learning to fuse high-level semantic information in multiple feature maps. In addition, we
also propose a bounding box transformation to improve localization performance during
mask detection. The experiment is conducted with three popular baseline models viz.
ResNet50, Alex Net and Mobile Net. We explored the possibility of these models to plug-in
with the proposed model so that highly accurate results can be achieved in less inference
time. It is observed that the proposed technique achieves high accuracy (98.2%) when
implemented with ResNet50. Besides, the proposed model generates 11.07% and 6.44%
higher precision and recall in mask detection when compared to the recent public baseline
model published as Retina Facemask detector. The outstanding performance of the proposed
model is highly suitable for video surveillance devices.
5
CONTENTS
CHAPTER 1: INTRODUCTION
1.1 Motivation
1.2 Flow
1.3 Image processing
6
INTRODUCTION
COVID-19 or Corona virus is responsible for producing an atmosphere of terror as it can
transmit through the respiratory system. Currently, there is neither medicine nor vaccine to
fight against this virus. Therefore, the only options people have to maintain are the social
distancing, wash hands regularly, and wear a mask. According to the World Health
Organization (WHO)’s, official Situation Report – 205, Corona virus disease 2019 (COVID-
19) has globally infected over 20 million people causing over 0.7 million deaths. Individuals
with COVID-19 have had a wide scope of symptoms reported like shortness of breath or
difficulty in breathing. Elder people having lung disease are at higher risk of getting corona
virus than most. The importance of wearing masks lie in reducing vulnerability of risk from a
noxious individual during the “pre-symptomatic” period to restrain the spreading of the virus.
WHO stresses on prioritizing medical masks and respirators for health care assistants?
Therefore, face mask detection has become a crucial task in the present situation. Face mask
detection involves detection of the location of the face and then determines whether it has a
mask on it or not. The issue is proximately close to general object detection to detect the
classes of objects. Face identification deals with distinguishing a specific group of entities,
i.e., face. It has numerous applications, such as autonomous driving, education, surveillance,
and so on . Deep learning has been used to find out who is not wearing the facial mask using
Convolutional neural network (CNN).
7
RELATED WORK
In face detection method, a face is detected from an image that has several attributes on it.
According to, research into face detection requires expression recognition, face tracking, and
pose estimation. Given a solitary image, the challenge is to identify the face from the picture.
Face detection is a difficult errand because the faces change in size, shape, colour, etc. and
they are not immutable. It becomes a laborious job for opaque image impeded by some other
thing not confronting camera, and so forth. Authors in think occlusive face detection comes
with two major challenges: first, unavailability of sizably voluminous datasets containing
both masked and unmasked faces, second, exclusion of facial expression in the covered area.
Utilizing the locally linear embedding (LLE) algorithm and the dictionaries trained on an
immensely colossal pool of masked faces, synthesized mundane faces, several mislaid
expressions can be recuperated and the ascendancy of facial cues can be mitigated to great
extent. According to the work reported in, convolutional neural network (CNNs) in computer
vision comes with a strict constraint regarding the size of the input image. The prevalent
practice reconfigures the images before fitting them into the network to surmount the
inhibition. In, a robust and efficient technique for liveness detection was proposed. The
authors used the deep learning Deb Net approach for feature extraction and classification. In,
the authors used SVM for proposing a machine learning based face detection and recognition
system. The proposed model was used to detect the faces of students for monitoring their
activities during online examinations. The proposed system used feature vectors from the
input images for detecting the faces in a faster manner. In, a multi-task deep learning method
called F-DR Net for recognizing and detecting was used.
8
Image Processing
9
BACKGROUND OR LITERATURE REVIEW
Two datasets have been used in the model. Dataset 1 consists of 1376 images in which 690
images with people wearing face masks and the rest 686 images without face masks. mostly
contains front face pose with single face and with same type and colour of mask (white only).
Dataset 2 from Kaggle consists of 853 images and its countenances are clarified either with
or without a mask. some face collections are head turn, tilt and slant with multiple faces in
the frame with different types and colours of masks.
INCORPORATED PACKAGES
TensorFlow
TensorFlow, an interface for expressing machine learning (ML) algorithms, is utilized for
implementing ML systems into various areas of computer science, including sentiment
analysis, voice recognition, geographic information extraction, computer vision, text
summarization, information retrieval, computational drug discovery and flaw detection to
pursue research. The proposed model, the whole Sequential CNN architecture (consists of
several layers) uses TensorFlow at backend. It is also used to reshape the data in the data
processing.
Keras
Keras gives fundamental reflections and building units for creation and transportation of ML
arrangements with high iteration velocity. It takes full advantage of the scalability and cross-
platform capabilities of TensorFlow. The core data structures of Keras are layers and models.
All the layers used in the CNN model are implemented using Keras, the conversion of the
class vector to the binary class matrix in data processing, helps to compile the overall model.
OpenCV
OpenCV (Open-Source Computer Vision Library), is an open-source computer vision and
ML software library, is utilized to differentiate and recognize faces, recognize objects, group
movements in recordings, trace progressive modules, follow eye gesture, track camera
actions, expel red eyes from pictures taken utilizing flash, find comparative pictures from an
image database, perceive landscape and set up markers to overlay it with increased reality
and so forth. The proposed method makes use of these features of OpenCV in resizing and
color conversion of data images.
10
BLOCK DIAGRAM
11
OpenCV (Open-Source Computer Vision Library)
It is an open-source computer vision and machine learning software library.
● The library has more than 2500 optimized algorithms.
● It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android
and Mac OS.
● Will help us to load images in Python and convert them into array.
Features of OpenCV
● Face Detection
● Geometric Transformations
● Image Thresholding
● Smoothing Images
● Canny Edge Detection
● Background Removals
● Image Segmentation
12
FIGURE 3: Features for face detection
13
FIGURE 4: An example for face detection
14
Data Pre-Processing
Data pre-processing involves conversion of data from a given format to much more user
friendly, desired and meaningful format. It can be tables, images, videos, graphs, etc. This
organized information fit in with information model and captures relationship between
different entities [6]. The proposed method deals with image and video data using NumPy
and OpenCV.
IMAGE RESHAPING
The input during relegation of an image is a three-dimensional tensor, where each channel
has a prominent unique pixel. All the images must have identically size corresponding to 3D
feature tensor. However, neither images are customarily coextensive nor their corresponding
feature tensors [10]. Most CNNs can only accept fine-tuned images. This introduces several
problems throughout data collection and implementation of model that can be solved by
reconfiguring the input images before augmenting them into the network [11]. The images
are normalized to converge the pixel range between 0 and 1. Then they are converted to 4
dimensional arrays using data=np. reshape (data, (data. Shape [0], img size, img size,1))
where 1 indicates the Gray scale image. As, the final layer of the neural network has 2
outputs with mask and without mask i.e., it has categorical representation, the data is
converted to categorical labels.
15
TRAINING OF MODEL
Data Mapping
Data visualization is the process of transforming abstract data to meaningful representations
using knowledge communication and insight discovery through encodings. It is helpful to
study particular pattern in the dataset [7]. The total number of images in the dataset is
visualized in both categories – ‘with mask’ and ‘without mask’. The statement categories=os.
listdir(data path) categorizes the list of directories in the specified data path. The variable
categories now looks like: [‘with mask’, ‘without mask’] Then to find the number of labels,
we need to distinguish those categories using labels=[i for i in range(Len(categories))]. It sets
the labels as: [0, 1] Now, each category is mapped to its respective label using label dict=dict
(zip(categories, labels)) which at first returns an iterator of tuples in the form of zip object
where the items in each passed iterator is paired together consequently. The mapped variable
label dict looks like: (‘with mask’: 0, ‘without mask’: 1)
Splitting the data and training the CNN model
After setting the blueprint to analyse the data, the model needs to be trained using a specific
dataset and then to be tested against a different dataset. A proper model and optimized train
test split help to produce accurate results while making a prediction. The test size is set to 0.1
i.e., 90% data of the dataset undergoes training and the rest 10% goes for testing purposes.
The validation loss is monitored using Model Checkpoint. Next, the images in the training set
and the test set are fitted to the Sequential model. Here, 20% of the training data is used as
validation data. The model is trained for 20 epochs (iterations) which maintains a trade-off
between accuracy and chances of overfitting. Fig.4 depicts visual representation of the
proposed model.
16
shape - pool size + 1) / strides), where strides has default value (1,1). As shown in Fig. 5, the
second Convolution layer has 100 filters and Kernel size is set to 3 x 3. It is followed by
ReLu and MaxPooling layers. To insert the data into CNN, the long vector of input is passed
through a Flatten layer which transforms matrix of features into a vector that can be fed into a
fully connected neural network classifier. To reduce overfitting a Dropout layer with a 50%
chance of setting inputs to zero is added to the model. Then a Dense layer of 64 neurons with
a ReLU activation function is added. The final layer with two outputs for two categories uses
the SoftMax activation function
The learning process needs to be configured first with the compile method. Here “adam”
optimizer is used categorical cross entropy which is also known as multiclass log loss is used
as a loss function (the objective that the model tries to minimize). As the problem is a
classification problem, metrics is set to “accuracy”.
MACHINE LEARNING
Machine learning is a method of data analysis that automates analytical model building. It is a
branch of artificial intelligence based on the idea that systems can learn from data, identify
patterns and make decisions with minimal human intervention.
The types of machine learning algorithms are mainly divided into four categories:
● Supervised learning
● Un-supervised learning
● Semi-supervised learning
● Reinforcement learning
17
FIGURE 6: Algorithms for machine learning
SCIKIT-LEARN
Scikit-learn is the most useful and robust library for machine learning in Python. It features
various algorithms like support vector machine, random forests, and neighbours, and it also
supports Python numerical and scientific libraries like NumPy and SciPy.
18
FIGURE 7: An output with mask
19
FIGURE 9: An output without mask
Let’s dive into the code for face mask detector project:
We are going to build this project in two parts. In the first part, we will write a python script
using Keras to train face mask detector model. In the second part, we test the results in a real-
time webcam using OpenCV.
Make a python file train.py to write the code for training the neural network on our dataset.
Follow the steps:
1. Imports:
20
5. from keras.layers import Conv2D, Input, ZeroPadding2D, BatchNormalization,
Activation, MaxPooling2D, Flatten, Dense,Dropout
6. from keras.models import Model, load_model
7. from keras.callbacks import TensorBoard, ModelCheckpoint
8. from sklearn.model_selection import train_test_split
9. from sklearn.metrics import f1_score
10. from sklearn.utils import shuffle
11. import imutils
12. import numpy as np
This convolution network consists of two pairs of Conv and MaxPool layers to extract
features from the dataset. Which is then followed by a Flatten and Dropout layer to convert
the data in 1D and ensure overfitting.
1. model = Sequential([
2. Conv2D(100, (3,3), activation='relu', input_shape=(150, 150, 3)),
3. MaxPooling2D(2,2),
4. Conv2D(100, (3,3), activation='relu'),
5. MaxPooling2D(2,2),
6. Flatten(),
7. Dropout(0.5),
8. Dense(50, activation='relu'),
9. Dense(2, activation='softmax')
10. ])
11. model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
1. TRAINING_DIR = "./train"
2. train_datagen = ImageDataGenerator(rescale=1.0/255,
3. rotation_range=40,
4. width_shift_range=0.2,
5. height_shift_range=0.2,
6. shear_range=0.2,
7. zoom_range=0.2,
8. horizontal_flip=True,
9. fill_mode='nearest')
10. train_generator = train_datagen.flow_from_directory(TRAINING_DIR,
11. batch_size=10,
12. target_size=(150, 150))
13. VALIDATION_DIR = "./test"
21
14. validation_datagen = ImageDataGenerator(rescale=1.0/255)
15. validation_generator =
validation_datagen.flow_from_directory(VALIDATION_DIR,
16. batch_size=10,
17. target_size=(150, 150))
4. Initialize a callback checkpoint to keep saving best model after each epoch while training:
checkpoint = ModelCheckpoint('model2-
{epoch:03d}.model',monitor='val_loss',verbose=0,save_best_only=True,mode='auto')
5. Train the model:
history = model.fit_generator(train_generator,
epochs=10,
validation_data=validation_generator,
callbacks=[checkpoint])
22
Now we will test the results of face mask detector model using OpenCV.
1. import cv2
2. import numpy as np
3. from keras.models import load_model
4. model=load_model("./model-010.h5")
5. results={0:'without mask',1:'mask'}
6. GR_dict={0:(0,0,255),1:(0,255,0)}
7. rect_size = 4
8. cap = cv2.VideoCapture(0)
9. haarcascade = cv2.CascadeClassifier('/home/user_name/.local/lib/python3.6/site-
packages/cv2/data/haarcascade_frontalface_default.xml')
10. while True:
11. (rval, im) = cap.read()
12. im=cv2.flip(im,1,1)
13. rerect_size = cv2.resize(im, (im.shape[1] // rect_size, im.shape[0] // rect_size))
14. faces = haarcascade.detectMultiScale(rerect_size)
15. for f in faces:
16. (x, y, w, h) = [v * rect_size for v in f]
17. face_img = im[y:y+h, x:x+w]
18. rerect_sized=cv2.resize(face_img,(150,150))
19. normalized=rerect_sized/255.0
20. reshaped=np.reshape(normalized,(1,150,150,3))
21. reshaped = np.vstack([reshaped])
22. result=model.predict(reshaped)
23. label=np.argmax(result,axis=1)[0]
24. cv2.rectangle(im,(x,y),(x+w,y+h),GR_dict[label],2)
25. cv2.rectangle(im,(x,y-40),(x+w,y),GR_dict[label],-1)
26. cv2.putText(im, results[label], (x, y-10),cv2.FONT_HERSHEY_SIMPLEX,0.8,
(255,255,255),2)
27. cv2.imshow('LIVE', im)
28. key = cv2.waitKey(10)
29. if key == 27:
30. break
31. cap.release()
32. cv2.destroyAllWindows()
23
FIGURE 10: A sample video with and without mask
24
The system can efficiently detect faces that are partially occluded (either with a mask or hair
or hand). Based on the occlusion degree of four regions (nose, mouth, chin and eye) it
differentiates between annotated mask and face covered by hand. Therefore, a mask covering
the face fully including nose and chin will only be treated as “with mask” by the model.
25
The main challenges faced by the method mainly comprise of varying angles and lack of
clarity. The movement of indistinct faces in the video stream makes it more difficult.
However, following the trajectories of several frames of the video helps to create a better
decision – “with mask” or “without mask”.
26
BENEFITS
Manual Monitoring is very difficult for officers to check whether the peoples are wearing
mask or not. So, in our technique, we are using web cam to detect people’s faces and to
prevent from virus transmission.
FUTURE SCOPE
In this work, a deep learning-based approach for detecting masks over faces in public places
to curtail the community spread of Coronavirus is presented. The proposed technique
efficiently handles occlusions in dense situations by making use of an ensemble of single and
two-stage detectors at the pre-processing level.
The ensemble approach not only helps in achieving high accuracy but also improves
detection speed considerably. Furthermore, the application of transfer learning on pre-trained
models with extensive experimentation over an unbiased dataset resulted in a highly robust
and low-cost system. The identity detection of faces, violating the mask norms further,
increases the utility of the system for public benefits.
Finally, the work opens interesting future directions for researchers. Firstly, the proposed
technique can be integrated into any high-resolution video surveillance devices and not
limited to mask detection only. Secondly, the model can be extended to detect facial
landmarks with a facemask for biometric purposes.
CONCLUSION
Wearing a face mask all the time is difficult and exhausting task but is obligatory since
Covid-19 crisis because face mask can help in controlling the outspread of the virus. Many
public service providers ask the customers to wear masks in order to fulfil their services. In
this paper, we briefly explained the motivation of the work at first. Then, we illustrated the
learning and performance task of the model. Using basic ML tools and simplified techniques
the method has achieved reasonably high accuracy. In future, the model can be extended to
detect if a person will wear the mask properly (as instructed by WHO) and also to detect the
type of mask.
27
REFERENCES
1. World Health Organization et al. Coronavirus disease 2019 (covid-19): situation report, 96.
2020. - Google Search.
(n.d.). https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200816-
covid-19-sitrep-209.pdf?sfvrsn=5dde1ca2_2.
2. Social distancing, surveillance, and stronger health systems as keys to controlling COVID-
19 Pandemic, PAHO Director says - PAHO/WHO | Pan American Health Organization.
(n.d.). https://www.paho.org/en/news/2-6-2020-social-distancing-surveillance-and-stronger-
health-systems-keys-controlling-covid-19.
3. Garcia Godoy L.R. Facial protection for healthcare workers during pandemics: a scoping
review, BMJ. Glob. Heal. 2020;5(5) doi: 10.1136/bmjgh-2020-002553. [PMC free
article] [PubMed] [CrossRef] [Google Scholar]
4. Eikenberry S.E. To mask or not to mask: Modeling the potential for face mask use by the
general public to curtail the COVID-19 pandemic. Infect. Dis. Model. 2020;5:293–308.
doi: 10.1016/j.idm.2020.04.001. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
5. Wearing surgical masks in public could help slow COVID-19 pandemic’s advance: Masks
may limit the spread diseases including influenza, rhinoviruses and coronaviruses --
ScienceDaily. (n.d.). https://www.sciencedaily.com/releases/2020/04/200403132345.htm.
6. Nanni L., Ghidoni S., Brahnam S. Handcrafted vs. non-handcrafted features for computer
vision classification. Pattern Recogn. 2017;71:158–172.
doi: 10.1016/j.patcog.2017.05.025. [CrossRef] [Google Scholar]
7. Y. Jia et al., Caffe: Convolutional architecture for fast feature embedding, in: MM 2014 -
Proceedings of the 2014 ACM Conference on Multimedia, 2014, doi:
10.1145/2647868.2654889.
8. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Lecun, OverFeat:
Integrated Recognition, Localization and Detection using Convolutional Networks, 2014.
9. Erhan D., Szegedy C., Toshev A., Anguelov D. Proceedings of the IEEE conference on
computer vision and pattern recognition. 2014. Scalable Object Detection using Deep Neural
Networks; pp. 2147–2154. [CrossRef] [Google Scholar]
10. J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time
object detection, in: Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, 2016, vol. 2016-Decem, pp. 779–788, doi:
10.1109/CVPR.2016.91.
11. M. Jiang, X. Fan, and H. Yan, RetinaMask: A Face Mask detector, 2020,
http://arxiv.org/abs/2005.03950.
12. Inamdar M., Mehendale N. Real-Time Face Mask Identification Using Facemasknet
Deep Learning Network. SSRN Electron. J. 2020
doi: 10.2139/ssrn.3663305. [CrossRef] [Google Scholar]
13. Qiao S., Liu C., Shen W., Yuille A. Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition. 2018. Few-Shot Image
Recognition by Predicting Parameters from Activations. [CrossRef] [Google Scholar]
28
14. Kumar A., Zhang Z.J., Lyu H. Object detection in real time based on improved single
shot multi-box detector algorithm. J. Wireless Com. Netw. 2020;2020:204.
doi: 10.1186/s13638-020-01826-x. [CrossRef] [Google Scholar]
15. Morera Á., Sánchez Á., Moreno A.B., Sappa Á.D., Vélez J.F. SSD vs. YOLO for
detection of outdoor urban advertising panels under multiple variabilities. Sensors
(Switzerland) 2020 doi: 10.3390/s20164587. [PMC free article] [PubMed]
[CrossRef] [Google Scholar]
16. Girshick R., Donahue J., Darrell T., Malik J. Region-based Convolutional Networks for
Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach.
Intell. 2015;38(1):142–158. doi: 10.1109/TPAMI.2015.2437384. [PubMed]
[CrossRef] [Google Scholar]
17. He K., Zhang X., Ren S., Sun J. Spatial Pyramid Pooling in Deep Convolutional
Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015
doi: 10.1109/TPAMI.2015.2389824. [PubMed] [CrossRef] [Google Scholar]
18. R. Girshick, Fast R-CNN, in: Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, 2015,
pp. 1440–1448, doi: 10.1109/ICCV.2015.169.
19. Nguyen N.D., Do T., Ngo T.D., Le D.D. An Evaluation of Deep Learning Methods for
Small Object Detection. J. Electr. Comput.
Eng. 2020;2020 doi: 10.1155/2020/3189691. [CrossRef] [Google Scholar]
20. Cai Z., Fan Q., Feris R.S., Vasconcelos N. A unified multi-scale deep convolutional
neural network for fast object detection. Lect. Notes Comput. Sci. (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2016
doi: 10.1007/978-3-319-46493-0_22. [CrossRef] [Google Scholar]
21. C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A.C. Berg, DSSD : Deconvolutional Single Shot
Detector, 2017, arXiv preprint arXiv:1701.06659 (2017).
22. A. Shrivastava, R. Sukthankar, J. Malik, A. Gupta, Beyond Skip Connections: Top-Down
Modulation for Object Detection, 2016, arXiv preprint arXiv:1612.06851 (2016).
23. N. Dvornik, K. Shmelkov, J. Mairal, C. Schmid, BlitzNet: A Real-Time Deep Network
for Scene Understanding, in: Proceedings of the IEEE International Conference on Computer
Vision, 2017, doi: 10.1109/ICCV.2017.447.
24. Z. Liang, J. Shao, D. Zhang, L. Gao, Small object detection using deep feature pyramid
networks, in: Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11166 LNCS, pp.
554–564, doi: 10.1007/978-3-030-00764-5_51.
25. K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask R-CNN, in: Proc. IEEE Int. Conf.
Comput. Vis., vol. 2017-Octob, 2017, pp. 2980–2988, doi: 10.1109/ICCV.2017.322.
29
30