A cascaded deep-learning-based model for face mask detection

DOIhttps://doi.org/10.1108/DTA-02-2022-0076
Published date28 June 2022
Date28 June 2022
Pages84-107
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
AuthorAkhil Kumar
A cascaded deep-learning-based
model for face mask detection
Akhil Kumar
Department of Computer Science, Himachal Pradesh University, Shimla, India
Abstract
Purpose This work aims to present a deep learning model for face mask detection in surveillance
environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face
masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and
law oenders commit crimes by hiding their faces behind a face mask. The face mask detector model
proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous
surveillance environments to identify and catch law oenders and criminals.
Design/methodology/approach The proposed face mask detector is developed by integrating the
residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection
layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature
map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been
applied to the feature extraction network so that it can get trained with images of varying complexities. The
proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635
images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO
v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics
such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average
precision (AP) for each class of the dataset.
Findings The proposed face mask detector achieved 4.759.75 per cent higher detection accuracy in terms
of mAP, 531 per cent higher AP for detection of faces with masks and, specically, 230 per cent higher AP
for detection of face masks on the face region as compared to the tested baseline variants of YOLO.
Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model
reduced the training time and the detection time. The proposed face mask detection model can perform
detection over an image in 0.45 s, which is 0.20.15 s lesser than that for other tested YOLO variants, thus
making the proposed detection model perform detections at a higher speed.
Research limitations/implications The proposed face mask detector model can be utilized as a tool to
detect persons with face masks who are a potential threat to the automatic surveillance environments such as
ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can
be trained and tested for other object detection problems such as cancer detection in images, sh species
detection, vehicle detection, etc.
Practical implications The proposed face mask detector can be integrated with automatic surveillance
systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc.
and in the present times of COVID-19 to detect if the people are following a COVID-appropriate behavior of
wearing a face mask or not in the public areas.
Originality/value The novelty ofthis work lies in the usage of the ResNet34feature extractor with YOLO
detectionlayers, which makesthe proposed model a compactand powerful convolutionalneural-network-based
face mask detector model. Furthermore, the SPP layer has been applied to the ResNet34 feature extractor to
make it able to extract a rich and dense feature map. The other novelty of the present work is the
implementation of Mosaic and MixUp data augmentation in the training network that provided the feature
extractor with 3× images of varying complexities and orientations and further aided in achieving higher
detectionaccuracy. The proposed modelis novel in terms of extractingrich features, performingaugmentation
at the training timeand achieving high detection accuracywhile maintaining the detectionspeed.
Keywords Face mask detection, ResNet34, YOLO, SPP layer, Deep learning
Paper type Research paper
Availability of data and material: The face mask detection dataset is available on request at:
https://drive.google.com/drive/folders/1_lBlgHXewEadtLFKtw8hnug_cOr6erbj?usp=sharing
The network architectures of YOLO detectors used in this work are available at: https://github.
com/AlexeyAB/darknet
Competing interests: The authors declare that they have no competing interests.
ThecurrentissueandfulltextarchiveofthisjournalisavailableonEmeraldInsightat:
https://www.emerald.com/insight/2514-9288.htm
84
Received21 February 2022
Revised2 May 2022
Accepted22 May 2022
Data Technologies and
Applications
Vol. 57 No. 1, 2023
pp. 84-107
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-02-2022-0076
DTA
57,1
1. Introduction
Reports show that more crimes and oenses are carried out by using face masks as a tool by
criminals and oenders to deceive surveillance systems (Wen et al., 2005;Babwin and
Dazio, 2020;Southall and Van Syckle, 2020;Gaiss, 2021). Furthermore, at places like ATMs
(automatic teller machines) and banks where absolute clarity of the face area is a regulation
and a criterion to recognize a persons identity, criminals and law oenders use face masks
as a tool to spoof ATMs and commit theft and robbery. In the pre-COVID-19 world, there
were certain rules for face-mask-wearing status at security zones like ATMs, airports and
biometric systems. However, with the ongoing pandemic of COVID-19, covering faces with
masks has emerged as a new norm and a government guideline. In the post-COVID-19
world, the old rules will again come into place and there will be a dire need for face mask
detectors at ATMs, airports and other security zones to determine the hackers, criminals
and law oenders who use face masks to commit crimes. To address the problem of
detection of persons with face masks at the security zones where absolute visibility of the
complete face area is a guideline, the present work paves the way to propose a novel face
mask detector that can detect faces with masks and without masks with high detection
accuracy and precision. Furthermore, the proposed face mask detector can integrate with
low-end devices like biometric systems and surveillance systems to detect persons wearing
face masks who can be a potential threat, highlight mask regions on the face area with and
prevent thefts, robberies, crimes, breach of law, etc. by raising an early alarm. The proposed
face mask detector is developed by fusing a deep-learning-based residual network (ResNet)
34 feature extractor, spatial pyramid pooling (SPP) layer and You Only Look Once (YOLO)
detection layers. Furthermore, some features like Mosaic and MixUp data augmentation
have been applied to the training feature extraction network of the proposed detector model
to improve its performance. Adding data augmentation operations at the training time
scaled up the training data by three times and provided images with dierent complexities
that helped the detector model to train on new and unique features out from the actual
features obtained from the images of the employed dataset. The proposed face mask
detector is a solution to both, i.e., detection of persons wearing face masks at security
zones and detection of persons wearing and not wearing face masks during the ongoing
COVID-19 pandemic to lay down the government guidelines where wearing a face mask in
public spaces is a mandate.
In recent years, deep-learning-based object classiersanddetectorshaveproventheir
competency in general object classication and detection tasks. In order to address the
above-specied problem, in this work, we have utilized the deep-learning-based ResNet34
architecture proposed by He et al. (2016), which is a miniature version of ResNet50, 101
and 152. ResNet architectures are based on residual learning and have shown fascinating
results in classication tasks on the ImageNet dataset. We have utilized ResNet34 as
a feature extractor on the employed face mask detection dataset and passed on the
generated features to the YOLO detection layers for predictions. As an improvement to
the ResNet34 feature extractor, we have additionally applied an SPP layer proposed in He
et al. (2015) to it that aided the ResNet34 network to obtain a rich feature map to generate
a pool of diverse features. The YOLO detector proposed in Redmon et al. (2015),Redmon
and Farhadi (2016,2018) and Bochkovskiy et al. (2020) is an advanced and futuristic
object detector and has achieved benchmark detection accuracy on the MS COCO dataset
for general object detection. The backbone network of the YOLO object detector is
formulated on DarkNet 19, DarkNet 53 and CSPDarkNet 53 feature extractors.
However, in this work, we have replaced the backbone DarkNet extractor of the YOLO
object detector with SPP-ResNet34 and utilized the detection layers of the tiny YOLO v4
object detector.
A deep
learning model
for face mask
detection
85

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT