Unmasking Face Embeddings by Self-restrained Triplet Loss for Accurate Masked Face Recognition
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
1
Unmasking Face Embeddings by Self-restrained Triplet Loss for
Accurate Masked Face Recognition
Fadi Boutrosa,b , Naser Damera,b,∗ , Florian Kirchbuchnera , Arjan Kuijpera,b
a Fraunhofer Institute for Computer Graphics Research IGD, Darmstadt, Germany
b Mathematical and Applied Visual Computing, TU Darmstadt, Darmstadt, Germany
Email: fadi.boutros@igd.fraunhofer.de
Using the face as a biometric identity trait is motivated by the does not require any modification or training of the existing
contactless nature of the capture process and the high accuracy
arXiv:2103.01716v1 [cs.CV] 2 Mar 2021
face recognition model. We achieved this goal by proposing
of the recognition algorithms. After the current COVID-19 the Embedding Unmasking Model (EUM) operated on the
pandemic, wearing a face mask has been imposed in public places
to keep the pandemic under control. However, face occlusion embedding space. The input for EUM is feature embedding
due to wearing a mask presents an emerging challenge for face extracted from the masked face, and its output is new feature
recognition systems. In this paper, we presented a solution to im- embedding similar to an embedding of a unmasked face of
prove the masked face recognition performance. Specifically, we the same identity, whereas, it is dissimilar from any other
propose the Embedding Unmasking Model (EUM) operated on embedding of any other identity. To achieve that through our
top of existing face recognition models. We also propose a novel
loss function, the Self-restrained Triplet (SRT), which enabled the EUM, we propose a novel loss function, Self-restrained Triplet
EUM to produce embeddings similar to these of unmasked faces Loss (SRT) to guide the EUM during the training phase. The
of the same identities. The achieved evaluation results on two face SRT shares the same learning objective with the triplet loss
recognition models and two real masked datasets proved that i.e. it enables the model to minimize the distance between
our proposed approach significantly improves the performance genuine pairs and maximize the distance between imposter
in most experimental settings.
pairs. Nonetheless, unlike triplet loss, the SRT can dynamical
self-adjust its learning objective by focusing on minimizing the
Index Terms—COVID-19, Biometric recognition, Identity ver- distance between the genuine pairs when the distance between
ification, Masked face recognition.
the imposter pairs is deemed to be sufficient.
The presented approach is evaluated on top of two face
I. I NTRODUCTION recognition models, ResNet-50 [9] and MobileFaceNet [10]
Face recognition is one of the preferable biometric recogni- trained with the loss function, Arcface loss [11], to validate
tion solutions due to its contactless nature and the high accu- the feasibility of adopting our solution on top of different deep
racy achieved by face recognition algorithms. Face recognition neural network architectures. With a detailed evaluation of the
systems have been widely deployed in many application sce- proposed EUM and SRT, we reported the verification per-
narios such as automated border control, surveillance, as well formance gain by the proposed approach on two real masked
as convenience applications [1], [2]. However, these systems face datasets [7], [3]. We further experimentally supported our
are mostly designed to operate on none occluded faces. After theoretical motivation behind our SRT loss by comparing its
the current COVID-19 pandemic, wearing a protective face performance with the conventional triplet loss. The overall ver-
mask has been imposed in public places by many governments ification result showed that our proposed approach improved
to reduce the rate of COVID-19 spread. This new situation the performance in most of the experimental settings. For
raises a serious unusually challenge for the current face example, when the probes are masked, the achieved FMR100
recognition systems. Recently, several studies have evaluated measures (the lowest false non-match rate (FNMR) for false
the effect of wearing a face mask on face recognition accuracy match rate (FMR) ≤ 1.0 %) by our approach on top of
[3], [4], [5], [6]. The listed studies have reported the negative MobileFaceNet are reduced by ∼ 28% and 26% on the two
impact of masked faces on the face recognition performance. evaluated datasets
The main conclusion of these studies [3], [4], [5], [6] is that In the rest of the paper, we discuss first the related works
the accuracy of face recognition algorithm with masked faces focusing on masked face recognition in Section II. Then, we
is significantly degraded, in comparison to unmasked face. present our proposed EUM architecture and our SRT loss in
Motivate by this new circumstance we propose in this paper Section III. In Section IV, we present the experimental setups
a new approach to reduce the negative impact of wearing a and implementation details applied in this work. Section V
facial mask on face recognition performance. The presented presents and discuss the achieved results. Finally, a set of
solution is designed to operate on top of existing face recog- conclusions are drawn in Section VI.
nition models, and thus avoid retraining existing solutions .
used for unmasked faces. Recent works either proposed to
train face recognition models with simulated masked faces II. R ELATED W ORK
[7], or to train a model to learn the periocular area of the face In recent years, significant progress has been made to
images exclusively [8]. Unlike these, our proposed solution improve face recognition verification performance with essen-2
tially none-occluded face. Several previous works [12], [13] purpose. Recently, a rapid number of researches are published
addressed general face occlusion e.g. wearing sunglasses or a to address the detection of wearing a face mask [16], [17],
scarf. Nonetheless, they did not directly address facial mask [18]. These studies did not directly address the effect of
occlusion (before the current COVID-19 situation). wearing a mask on the performance of face recognition or
After the current COVID-19 situation, four major studies presenting a solution to improve masked face recognition.
have evaluated the effect of wearing a facial mask on face As reported in a previous study [3], [4], face recognition
recognition performance [3], [4], [5], [6]. The National In- systems might fail in detecting a masked face. Thus, a face
stitute of Standards and Technology (NIST) has published recognition system could benefit from the detection of face
two specific studies on the effect of masked faces on the mask to improve the face detection, alignment and cropping as
performance of face recognition solutions submitted by ven- they are an essential preprocessing stpdf for feature extraction.
dors using pre-COVID-19 [5] algorithms, and post-COVID- Motivated by the recent evaluations efforts on the negative
19 [6] algorithms. These studies are part of the ongoing Face effect of wearing a facial mask on the face recognition
Recognition Vendor Test (FRVT). The studies by the NIST performance [3], [4], [5], [6] and driven by the need for ex-
concluded that wearing a face mask has a negative effect clusively developing an effective solution to improve masked
on the face recognition performance. However, the evaluation face recognition, we present in this work a novel approach to
by NIST is conducted using synthetically generated masks, improve masked face recognition performance. The proposed
which may not fully reflect the actual effect of wearing a solution is designed to run on top of existing face recognition
protective face mask on the face recognition performance. The models. Thus, it does not require any retraining the existing
recent study by Damer et al. [3] has tackled this limitation face recognition models as presented in next Section III.
by evaluating the effect of wear mask on two academic
face recognition algorithms and one commercial solution III. M ETHODOLOGY
using a specific collected database for this purpose from In this section, we present our proposed approach to im-
24 participants over three collaborative sessions. The study prove the verification performance of masked face recognition.
indicates the significant effect of wearing a face mask on face The presented solution is designed to be built on top of existing
recognition performance. A similar study was carried out by face recognition models. To achieve this goal, we propose an
the Department of Homeland Security (DHS) [4]. In this study, EUM. The input for our proposed model is a face embedding
several face recognition systems (using six face acquisition extracted from a masked face image and the output is what
systems and 10 matching algorithms) were evaluated on a we call an ”unmasked face embedding”, which aims at being
specifically collected database of 582 individuals. The main more similar to the embedding of the same identity when not
conclusion from this study is that the accuracy of most best- wearing a mask. Thus, the proposed solution does not require
performing face recognition system is degraded from 100% to any modification or training of the existing face recognition
96% when the subject is wearing a facial mask. solution. Figure 1 shows an overview of the proposed approach
Li et al. [8] proposed to use an attention-based method to workflow in training and in operation modes.
train a face recognition model to learn from the periocular area Furthermore, we propose the SRT to control the model
of masked faces. The presented method show improvement during the training phase. Similar to the well-known triplet-
in the masked face recognition performance, however, the based learning, the SRT loss has two learning objectives:
proposed approach is only tested on simulated masked face 1) Minimizing the inter-class variation i.e. minimizing the
datasets and essentially only maps the problem into a perioc- distance between genuine pairs of unmasked and masked
ular recognition problem. A recent preprint by [7] presented face embeddings. 2) Maximizing the intra-class variation i.e.
a small dataset of 269 unmasked and masked face images of maximizing the distance between imposter pairs of masked
53 identities crawled from the internet. The work proposed face embeddings. However, unlike the traditional triplet loss,
to fine-tune Facenet model [14] using simulated masked face the proposed SRT loss function can self-adjust its learning ob-
images to improve the recognition accuracy. However, the jective by only focusing on optimizing the inter-class variation
proposed solution is only tested using a small database (269 when the intra-class variation is deemed to be sufficient. When
images). Wang et al. [15] presented three datasets crawled the gap on intra-class variation is violated, our proposed loss
from the internet for face recognition, detection, and simulated behave as traditional triplet loss. The theoretical motivation
masked faces. The face recognition dataset contains 5000 behind our SRT loss will be presented along with the function
masked face images of 525 identities, and 90000 unmasked formulation later in this section.
face images of the same 525 identities. The authors claim to In the following, this section presents our proposed EUM
improve the verification accuracy from 50% to 95% on the architecture and the SRT loss.
masked face. However, they did not provide any information
about the evaluation protocol, proposed solution, or imple-
mentation details. Moreover, the published part of the dataset A. Embedding Unmasking Model Architecture
does not contain pairs of unmasked-masked images for most The architecture of the EUM is based on a Fully Connected
of the identities 1 . Thus, such a dataset could be more suitable Neural Network (FCNN). Having a fully connected layer,
for face mask detection [16], [17], [18] than face recognition where all neurons in two consecutive layers are connected to
each other, enables us to demonstrate a generalized EUM de-
1 https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset sign. This is the case because this structure is easily adaptable3
backpropagation
Embedding unmasking model
CNN
Anchor Feature extractor
(Pretrained) Face
embedding Unmasked face
embedding
Positive
d1
Anchor
CNN d2
Negative
Self-restrained triplet loss
Positive Feature extractor
(Pretrained) Face
embedding
CNN
Negative
Feature extractor
(Pretrained)
Face
embedding
Fig. 1: The workflow of the face recognition model with our proposed EUM. The top part of the figure (inside the dashed
rectangle) shows the proposed solution in operation mode. Given an embedding obtained from the masked face, the proposed
model is trained with SRT loss to output a new embedding that is similar to the one of the unmasked face of the same identity
and different from the masked face embedding from all other identities.
to different input shapes, and thus can be adapted on the top report concluded that FMR seems to be less affected when the
of different face recognition models, motivating our choice of probes are masked. A similar observation from the study by
the FCNN. The model input is a masked feature embedding Damer et al. [3] stated that the genuine score distributions are
(i.e. resulting from a masked face image) of the size d (d significantly affected by masked probes. The study reported
depends on the used face recognition network), and the model that the genuine score distribution strongly shifts towards the
output is a features vector of the same size d. The proposed imposter score distributions. On the other hand, the imposter
model consists of four fully connected layers (FC)- one input score distributions do not seem to be strongly affected by
layer, two hidden layers, and one output layer. The input size masked face probes.
for all FC layers is of the size d. Each of the layers 1, 2, One of the main observations of the previous studies in
and 3 is followed by batch normalization(BN) [19] and Leaky [3], [5], is that wearing a face mask significantly increase the
ReLU non-linearity activation function [20]. The last FC layer FNMR, whereas the FMR seem to be less affected by wearing
is followed by BN. a mask. Similar remarks have been also reported in our result
(see Section V). Based on these observations, we motivate
B. Unmasked Face Embedding Learning our proposed SRT loss function to focus on increasing the
The learning objective of our model is to reduce the FNMR similarity between genuine pairs of unmasked and masked face
of genuine unmasked-masked pairs. The main motivation embeddings, while maintaining the imposter similarity at an
behind this learning goal is inspired by the latest reports on acceptable level. In the following section, we briefly present
evaluating the effect of the masked faces on face recogni- the naive triplet loss followed by our proposed SRT loss.
tion performance by the National Institute of Standards and 1) Self-restrained Triplet Loss
Technology (NIST) [5] and the recent work by Damer et Previous works [14], [21] indicated that utilizing triplet-
al. [3]. The NIST report [5] stated that the false non-match based learning is beneficial for learning discriminative face
rates (FNMR) are increased in all evaluated algorithms when embeddings. Let x ∈ X represents a batch of training samples,
the probes are masked. For the most accurate algorithms, and f (x) is the face embeddings obtained from the face
the FNMR increased from 0.3% to 5% at FMR of 0.001% recognition model. Training with triplet loss requires a triplet
when the probes are masked. On the other hand, the NIST of samples in the form {xai , xpi , xni } ∈ X, where xai , the4
2.5
2.5
2.0
2.0
Distance
Distance
Triplet loss-d1 Triplet loss-d1
Triplet loss-d2 Triplet loss-d2
1.5 SRT loss-d1 SRT loss-d1
1.5
SRT loss-d2 SRT loss-d2
1.0 1.0
0.5 0.5
0 20000 40000 60000 80000 100000 0 20000 40000 60000 80000 100000
Iteration Iteration
(a) ResNet-50: Triplet loss vs. SRT loss (b) MobileFaceNet: Triplet loss vs. SRT loss
Fig. 2: Naive triplet loss vs. SRT loss distances learning over training iterations. The plots show the learned d1 (distance
between genuine pairs) and d2 (distance between imposter pairs) by each loss over training iterations. It can be clearly noticed
that the anchor (model output) of model trained with SRT loss is more similar to the positive than the anchor of the model
trained with naive triple loss.
anchor, and xpi , the positive, are two different samples of the a huge number of training triplet with large computational
same identity, and xni , the negative, is a sample belonging to resources for selecting the optimal triplets for training. Given
a different identity. The learning objective of the triplet loss a masked face embedding, our model is trained to generate
is that the distance between f (xai ) and f (xpi ) (genuine pairs) a new embedding such as it is similar to the unmasked face
with the addition of a fixed margin value (m) is smaller than embedding of the same identity and dissimilar from other face
the distance between f (xai ) and any face embedding f (xpi ) of embeddings of any other identities. As discussed earlier in
any other identities (imposter pairs). In FaceNet [14], triplet this section, the distance between imposter pairs have been
loss is proposed to learn face embeddings using euclidean found to be less affected by wearing a mask [3], [5]. Thus,
distance applied on normalized face embeddings. Formally, we aim to ensure that our proposed loss focuses on minimizing
the triplet loss `t for a min-batch of N samples is defined as the distance between the genuine pairs (similar to scenario 2)
following: while maintaining the distance between imposter pairs.
N Training EUM with SRT loss requires a triple to be defined
1 X
`t = max{d(f (xai ), f (xpi )) − d(f (xai ), f (xni )) + m, 0}, as follows: f (xai ) is an anchor of masked face embedding,
N i EU M (f (xai )) is the anchor given as an output of the EUM,
(1) f (xpi ) is a positive of unmasked embedding, and f (xni ) is a
where m is a margin applied to impose the separability negative embedding of a different identity than anchor and
between genuine and imposter pairs. An d is the euclidean positive. This triplet is illustrated in Figure 1. We want to
distance applied on normalized features, and is given by: ensure that the distance (d1) between EU M (f (xai )) and f (xpi )
2 in addition to a predefined margin is smaller than the distance
d(xi , yi ) = kxi − yi k2 . (2)
(d2) between EU M (f (xai )) and f (xni ). Our goal is to train
Figure 3 visualize two triplet loss learning scenarios. Figure EUM to have more focus on minimizing d1, as d2 is less
3.a shows the initial training triplet, and Figure 3.b and 3.c affected by mask.
illustrate two scenarios that can be learnt using triplet loss. Under the assumption that the distance between the positive
In both scenarios, the goal of the triplet loss is achieved and the negative embeddings (d3) are close to optimal and
i.e. d(f (xai ), f (xni )) > d(f (xai ), f (xpi )) + m. In Figure 3.b it does not contribute to the back-propagation of the EUM
(scenario 1), both distances are optimized. However, in this model, we propose to use this distance as a reference to control
scenario, the optimization of d2 distance is greater than the the triplet loss. The main idea is to train the model as a naive
optimization on d1 distance. Whereas, in Figure 3.c (scenario triplet loss when d2 (anchor-negative distance) is smaller than
2), the triplet loss enforces the model to focus on minimizing d3 (positive-negative distance). In this case, the SRT guides the
the distance between the anchor and the positive. The optimal model to maximize d2 distance and to minimize d1 distance.
state for the triplet loss is achieved when both distance are When d2 is equal or greater than d3, we replace d2 by d3 in
fully optimized i.e. d(f (xai ), f (xpi )) is equal to zero and the loss calculation. This distance swapping allows the SRT
d(f (xai ), f (xni )) is greater than the predefined margin. How- to learn only, at this point, to minimize d1 distance. At any
ever, achieving such a state may not be feasible, and it requires point of the training, when the condition on d2 is violated i.e5
d(d2) < d(d3), the SRT behave again as a naive triplet loss. version of MS-Celeb-1M [24] dataset. Our choice to employ
We opt to compare the d2 and d3 distances on the batch level Arcface loss as it achieved state-of-the-art performance of
to avoid swapping the distance on every minor update on the several face recognition testing datasets such as Labeled Face
distance between the imposter pairs. In this case, we want to in the Wild(LFW) [25]. The achieved accuracy on LFW by
ensure that the d1 distance, with the addition of a margin m, ResNet-50 and MobileFaceNet trained with Arcface loss using
is smaller than the mean of the d3 distances calculated on the MS1MV2 dataset are 99.80% and 99.55%, respectively. The
mini-batch of a triplet. Thus, our loss is less sensitive to the considered face recognition models are evaluate with cosine-
outliers resulted from comparing imposter pairs. We define our distance for comparison. The Multi-task Cascaded Convo-
SRT loss for a mini-batch of the size N as follow: lutional Networks (MTCNN) solution [26] is employed to
1 PN a p a n
detect and align the input face image. Both models process
N i max{d(f (xi ), f (xi )) − d(f (xi ), f (xi )) aligned and cropped face image of size 112 × 112 pixels to
+m, 0} if µ(d2) < µ(d3)
`SRT = 1 PN a p (3) produce 512−D embedding feature by ResNet-50 and 128−D
max{d(f (x i ), f (x i )) − µ(d3) embedding feature by MobileFaceNet.
N i
+m, 0} otherwise,
where µ(d2) is the mean of the distances between the anchor B. Synthetic Mask Generation
and the
Pnegative pairs calculated on the mini-batch level, given As there is no large scale database with pairs of unmasked
N
as N1 i (d(f (xai ), f (xni )). µ(d3) is the mean of the distances and masked face images, we opted to use a synthetically
between the positive and thePnegative pairs calculated on the generated mask to train our proposed approach. Specifically,
N
mini-batch level, given as N1 i (d(f (xpi ), f (xni )). An d is the we employ the synthetic mask generation method proposed
euclidean distance computed on normalized feature embedding by NIST [5]. The synthetic generation method depends on
(Equation 2). the Dlib [27] toolkit to detect and extract 68 facial landmarks
Figure 2 illustrates the optimization of d1 (distance between from a face image. Based on the extracted landmark points,
genuine pairs) and d2 (distance between imposter pairs) by a face mask of different shapes, heights, and colors can be
naive triplet loss and SRT loss over the training iterations of drawn on the face images. The detailed implementation of
two EUM models. In Figure 2a, the EUM model is trained the synthetic mask generation method is described in [5]. The
on top of feature embeddings obtained from a ResNet-50 face synthetic mask generation method provided six mask types
recognition model. In Figure 2b, the EUM model is trained with different high and coverage: A) wide-high coverage, B)
on top of feature embeddings obtained from MobileFaceNet- round-high coverage, C) wide-medium coverage, D) round-
50 face recognition model. Details on the training is presented medium coverage, E) wide-low coverage, and F) round-low
in Section IV. It can be clearly noticed that the d1 distance coverage. Figure 4 shows samples of simulated face mask
(anchor-positive distance) learned by SRT is significantly of different type and coverage. The mask color is randomly
smaller than the one learned by naive triplet loss. This in- selected from the RGB color space. To synthetically generate a
dicates that the output embedding of the EUM trained with masked face image, we extract first the facial landmarks of the
SRT is more similar to the embedding of the same identity input face image. Then, a masked of specific color and type
(the positive) than the output embedding of EUM trained with can be drawn on the face image using the x, y coordination
triplet loss. of the facial landmarks points.
IV. E XPERIMENTAL SETUP
C. Database
This section presents the experimental setups and the im-
We used MS1MV2 dataset [11] to train our proposed
plementation details applied in the paper.
approach. The MS1MV2 is a refined version of MS-Celeb-
1M [24] dataset. The MS1MV2 contains 58m images of 85k
A. Face Recognition Model different identities. We generated a masked version of the
To provide a deep evaluation of the performance of the MS1MV2 noted as MS1MV2-Masked as described in Section
proposed solution, we evaluated our proposed solution on top IV-C. The mask type (as described in Section ) and color
of two face recognition models - ResNet-50 [9] and Mobile- are randomly selected for each image to add more diversity
FaceNet [10]. ResNet is one of the widely used Convolutional of mask color and coverage to the training dataset. The Dlib
Neural Network (CNN) architecture employed in several face failed in extracting the facial landmarks from 426k images.
recognition models e.g. ArcFace [11] and VGGFace2 [22]. These images are neglected from the training database. A
MobileFaceNet is a compact model designed for low subset of 5k images are randomly selected from MS1MV2-
computational powered devices. MobileFaceNet model archi- Masked to validate the model during the training phase.
tecture is based on residual bottlenecks proposed by Mo- The proposed solution is evaluated using two real masked
bileNetV2 [23] and depth-wise separable convolutions layer, face datasets: Masked Faces in Real World for Face Recog-
which allows building a CNN model with a much smaller set nition (MRF2) [7] and the extended masked face recognition
of parameters in comparison to standard CNNs. To provide (MFR) dataset [3]. To the best of our knowledge, these are
fair and comparable evaluation results, both models are trained the only datasets available for research including pairs of
using the same loss function, Arcface loss [11], and the same real masked face images and unmasked face images. Table
training database, MS1MV2 [11]. The MS1MV2 is a refined I summarize the evaluation datasets used in this work. The6
Positive
d1
Anchor d3
Positive
d2
d1
Learning Negative
d3 b) Scenario 1
Anchor
Positive
d2 d1
Negative Anchor d3
d2
Negative
a) Training triplet c) Scenario 2
Fig. 3: Triplet loss guide the model to maximize the distance between the anchor and negative such as it is greater than
the distance between the anchor and positive with the addition of a fixed margin value. One can be clearly noticed the high
similarity between the anchor and positive (d1) learned in scenario 2, in comparison to the d1 learned one in scenario 1,
whereas, the distance, d2, between the anchor and the negative (imposter pairs) in scenario 1 is extremely large than the d2
in scenario 2.
(a) Wide-high coverage (b) Round-high coverage (c) Wide-medium coverage
(d) Round-medium coverage (e) Wide-low coverage (f) Round-low coverage
Fig. 4: Samples of the synthetically generated face masks of different shape and coverage.
MFR2 contains 269 images of 53 identities crawled from the and triplet loss noted as F-M(SRT) and M-M(SRT) and F-
internet. Therefore, the images in the MRF2 dataset can be M(T) and M-M(T), respectively.
considered to be captured under in-the-wild conditions. The
database contains images of masked and unmasked faces with Also, We deploy an extended version of the MFR dataset
an average of 5 images per identity. The baseline evaluation [3]. The extended version of MFR is collected from 48 par-
of the considered face recognition on MFR2 is done by ticipants using their webcams under three different sessions-
performing an N:N comparisons between the unmasked face session 1 (reference) and session 2 and 3 (probes). The
noted as F-F, the unmasked faces and masked faces noted sessions are captured on three different days. Each session
as F-M, the masked faces noted as M-M to evaluate the contains data captured using three videos. In each session,
influence of wearing a facial mask on the face recognition the first video is recorded when the subject is not wearing a
performance. Finally, to evaluate our proposed solution, we facial mask in the daylight without additional electric lighting.
report the verification performance of the M-M and F-M The second and third videos are recorded when the subject is
settings based on our proposed EUM model with SRT loss wearing a facial mask and with no additional electric lighting
in the second video and with electric lighting in the third7
video (room light is turned on). The baseline reference (BLR) settings. FDR is a class separability criterion described in [28],
contains 480 images from the first video of the first session and it is given by:
(day) as described in [3]. The mask reference (MR) contains
(µG − µI )2
960 images from the second and third videos of the first F DR = ,
(σG )2 + (σI )2
session. The baseline probe (BLP) contains 960 images from
the first video of the second and third sessions and contains where µG and µI are the genuine and imposter scores mean
face images with no mask. The mask probe (MP) contains values and σG and σI are their standard deviations values.
1920 images from the second and third videos of the second The larger FDR value, the higher is the separation between
and third sessions. The baseline evaluation of face recognition the genuine and imposters scores and thus better expected
performance on the MFR database is done by performing verification performance.
N:N between BLR and BLP (unmasked face), noted as BLR-
BLP. To evaluate the effect of wearing a facial mask on face V. R ESULT
recognition performance, we perform an N:N comparisons In this section, we present and discuss our achieved results.
between BLR and MP data splits, noted as BLR-MP, and We experimentally present first the negative impact of wearing
an N:N comparisons between MR and MP data splits, noted a face mask on face recognition performance. Then, we present
as MR-MP. Finally, we evaluate and report the verification and discuss the impact of our EUM trained with SRT on
performance of the MR-MP and BLR-MP settings achieved by enhancing the separability between the genuine and imposter
our EUM model trained with SRT and, the naive triplet loss (as comparison scores. Then, we present the gain in the masked
a baseline of our SRT loss), noted as MR-MP(SRT) and BLR- face verification performance achieved by our proposed EUM
MP(SRT), and MR-MP(T) and BLR-MP(T), respectively. trained with SRT on the collaborative and in-the-wild masked
face recognition. Finally, we present an ablation study on SRT
D. Model Training Setup to experimentally supported our theoretical motivation behind
the SRT loss by comparing its performance with the triplet
We trained four instances of the EUM model. The first and loss.
second instances, model1 and model2, are trained with SRT
loss using feature embeddings obtained from ResNet-50 and
MobileFaceNet, respectively. The third and fourth instances, A. Impact of Wearing a Protective Face Mask on Face
model3 and model4, are trained with triplet loss using feature Recognition Performance
embeddings obtained from ResNet-50 and MobileFaceNet, Tables II, III, IV, and V present a comparison between
respectively as an ablation study to our proposed SRT. The the baseline evaluation where both reference and probe are
proposed EUM models in this paper are implemented by unmasked (F-F, BLR-BLP), the case where only the probe
Pytorch and trained on Nvidia GeForce RTX 2080 GPU. All is masked (F-M,BLR-MP), and the case where reference and
models are trained using SGD optimizer with initial learning probe are masked (M-M,MR-MP). Using unmasked images,
rate of 1e-1 and batch size of 512. The learning rate is divided the considered face recognition models, ResNet-50 and Mo-
by 10 at 30k, 60k, , 90k training iterations. The early-stopping bileFaceNet, achieve a very high verification performance.
patience parameter is set to 3 (around 30k training iteration) This is demonstrated by scoring 0.0% and 0.0% EER on the
causing model1, model2, model3, and model4 to stop after MFR database and 0% and 0.0106% on the MRF2 database,
80k, 70k, 60k, 10k training iterations, respectively. respectively by model ResNet-50 and MobileFaceNet. The
verification performances of the considered models is substan-
tially degraded when evaluated on masked face images. This
E. Evaluation Metric is indicated by the degradation in verification performance
The verification performance is reported as Equal Error measures and FDR values, in comparison to the case where
Rate (EER), FMR100 and FMR1000 which are the lowest probe and reference are unmasked. Furthermore, the consid-
FNMR for a FMR≤1.0% and ≤0.1%, respectively. We also ered models achieved lower verification performance when a
report the mean of the genuine scores (G-mean) and the mean masked probe is compared to a masked reference than the case
of imposter scores (I-mean) to analysis the shifts in genuine where only the probe is masked, as seen in Tables II, III, IV,
and imposter scores distributions induced by wearing a face and V. For example, using MFR dataset, when the probe is
mask and to demonstrate the improvement in the verification masked, the achieved EER by ResNet-50 model is 1.2492%
performance achieved by our proposed solution. For each (BLR-MP). This error rate is raised to 1.2963% when both
of the evaluation settings, we plot the receiver operating probe and reference are masked (MR-MP) as seen in Table II.
characteristic (ROC) curves. Also, for each of the conducted This results is also supported by G-mean, I-mean and FDR as
experiment, we report the failure to extract rate (FTX) to shown in Tables II, III, IV and V.
capture the effect of wearing a face mask on face detection. We also make two general observations. 1) The compact
FTX measures is proportion of comparisons where the feature model, MobileFaceNet, achieved lower verification perfor-
extraction was not possible. Further, we enrich our reported mance than the ResNet-50 model when comparing masked
evaluation results by reporting the Fisher Discriminant Ratio probes to unmasked/masked references. One of the reasons
(FDR) to provide an in-depth analysis of the separability for this performance degradation might be due to the smaller
of genuine and imposters scores for different experimental embedding size of MobileFaceNet (128-D), in comparison to8
Dataset Multi-sessions No. images No. identities Capture scenario
MFR [3] Yes 269 53 Collaborative
MRF2 [7] No 4320 48 In-the-wild
TABLE I: An overview of the evaluation datasets employed in this work. We evaluate our solution on real (not simulated)
masked datasets in two different capture scenarios, collaborative and in-the-wild.
Resnet50 EER % FMR100 % FMR1000 % G-mean I-mean FDR FTX %
BLR-BLP 0.0 0.0 0.0 0.8538 0.0349 55.9594 0.0
BLR-MP 1.2492 1.4251 3.778 0.5254 0.0251 12.6189 4.4237
BLR-MP(T) 1.9789 2.9533 7.9988 0.4401 0.0392 9.4412 4.4237
BLR-MP(SRT) 0.9611 0.946 2.5652 0.5447 0.0272 13.4045 4.4237
MR-MP 1.2963 1.4145 2.6311 0.7232 0.0675 15.1356 4.4736
MR-MP(T) 1.3091 1.456 2.8259 0.8269 0.4169 13.0528 4.4736
MR-MP(SRT) 1.1207 1.1367 2.4523 0.7189 0.0557 15.1666 4.4736
TABLE II: The achieved verification performance of different experimental settings achieved by ResNet-50 model along with
EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MFR dataset. It can be clearly
noticed the significant improvement in the verification performance induced by our proposed approach (SRT).
Mobilefacenet EER% FMR100% FMR1000% G-mean I-mean FDR FTX %
BLR-BLP 0.0 0.0 0.0 0.8432 0.0488 37.382 0.0
BLR-MP 3.4939 6.507 20.564 0.468 0.0307 7.1499 4.4237
BLR-MP(T) 5.2759 12.7835 28.8175 0.3991 0.0501 5.9623 4.4237
BLR-MP(SRT) 2.8805 4.6331 13.4384 0.5013 0.0383 8.6322 4.4237
MR-MP 3.506 6.8842 17.3479 0.6769 0.1097 7.9614 4.4736
MR-MP(T) 4.2947 7.9124 16.3772 0.8082 0.4716 6.6455 4.4736
MR-MP(SRT) 3.1866 5.6166 13.529 0.6636 0.0837 8.0905 4.4736
TABLE III: The achieved verification performance of different experimental settings achieved by MobileFaceNet model along
with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MFR dataset. It can be clearly
noticed the significant improvement in the verification performance induced by our proposed approach (SRT).
the embedding size of 512-D in ResNet-50. 2) The considered measures by considered face recognition models, as shown in
models achieved lower performance when evaluated on the Table II, III, V and IV. This indicates a general expected im-
MRF2 dataset than the case when evaluated on the MRF provement in verification performance of the face recognition
dataset. This result was expected as the images in the MRF2 and enhancing the general trust in the verification decision.
dataset are crawled from the internet with large variations in For example, when the ResNet-50 model is evaluated on the
facial expression, pose, illumination. On the other hand, the MFR dataset, and the probe is masked, the FDR is increased
images in the MFR dataset are collected in a collaborative from 12.6189 (BLR-MP) to 13.4045 (BLR-MP(SRT)) using
environment. our proposed approach. This improvement in the separability
To summarize, wearing a protective face mask has negative between the genuine and the imposter samples by our proposed
effect on the face recognition verification performance. This approach is consistent in all reported results.
result supports and complements (by evaluating masked-to-
masked pairs in this work) to the previous findings in the C. Impact of our EUM with SRT solution on the collabora-
studies in [3], [4], [5], [6] evaluated the impact of wearing a tive masked face recognition
mask on face recognition performance.
When the considered models are evaluated on the MFR
dataset, it can be observed that our proposed approach sig-
B. Impact of our EUM with SRT on the Separability between nificantly enhanced the masked face verification performance
Genuine and Imposter Comparison Scores as shown in Table II and III. For example, when comparing
The proposed approach significantly enhanced the separa- masked probes to unmasked references, the achieved EER
bility between the genuine and imposter comparison scores in by ResNet-50 model is 1.2492% (BLR-MP). This error rate
both considered face recognition models and on both evaluated is decreased to 0.9611% by our proposed approach (BLR-
datasets. This improvement is noticeable from the increment MP(SRT)) indicating a clear improvement in the verification
in the FDR separability measure achieved by our proposed performance induced by our proposed approach as shown in.
EUM trained with SRT, in comparison to the achieved FDR Table II. Similar enhancement in the verification performance9
Resnet50 EER% FMR100% FMR1000% G-mean I-mean FDR FTX %
F-F 0.0 0.0 0.0 0.7477 0.0038 37.9345 0.0
F-M 4.3895 6.7568 10.473 0.4263 0.0005 8.2432 0.9497
F-M(T) 6.4169 7.7703 12.1622 0.3567 -0.0066 6.8853 0.9497
F-M(SRT) 4.7274 7.4324 9.4595 0.4553 0.0014 8.4507 0.9497
M-M 6.8831 10.0156 13.7715 0.6496 0.0301 4.7924 1.203
M-M(T) 6.8831 9.7027 14.0845 0.7759 0.3663 4.8791 1.203
M-M(SRT) 6.2578 9.0767 11.8936 0.6488 0.0144 4.9381 1.203
TABLE IV: The achieved verification performance of different experimental settings achieved by ResNet-50 model along
with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MRF2 dataset. Our proposed
approach (SRT) achieved competitive performance on F-M experimental setting and the best performance on M-M experimental
setting.
1.00 1.00
0.98 0.98
1 - FNMR
0.96 1 - FNMR 0.96
0.94 0.94
0.92 BLR-MP AUC = 0.9975 0.92 MR-MP AUC = 0.9968
BLR-MP(T) AUC = 0.9960 MR-MP(T) AUC = 0.9969
BLR-MP(SRT) AUC = 0.9971 MR-MP(SRT) AUC = 0.9968
0.90 0.90
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
FMR FMR
(a) ResNet-50: BLR-MP (b) ResNet-50: MR-MP
1.00 1.00
0.98 0.98
1 - FNMR
1 - FNMR
0.96 0.96
0.94 0.94
0.92 BLR-MP AUC = 0.9908 0.92 MR-MP AUC = 0.9916
BLR-MP(T) AUC = 0.9878 MR-MP(T) AUC = 0.9875
BLR-MP(SRT) AUC = 0.9949 MR-MP(SRT) AUC = 0.9918
0.90 0.90
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
FMR FMR
(c) MobileFaceNet: BLR-MP (d) MobileFaceNet: MR-MP
Fig. 5: The verification performance for the considered face recognition models, our proposed EUM trained with naive triplet
loss, and our proposed EUM trained with SRT loss, are presented as ROC curves. The curves are plot using MFR dataset for
the experimental settings BLR-MP (probe is masked) and MR-MP (reference and probe are masked). For each ROC curve,
the area under the curve (AUC) is listed inside the plot. The improvement in the performance is noticeable by our proposed
SRT. This improvement is very clear for the cases where masked probe significantly affected the verification performance.
is observed by our approach when comparing masked probes to masked references. In this case, the EER is decreased from10
M-M AUC = 0.9766
1.00 1.00 M-M(T) AUC = 0.9836
M-M(SRT) AUC = 0.9780
0.98 0.98
1 - FNMR
1 - FNMR
0.96 0.96
0.94 0.94
0.92 F-M AUC = 0.9937 0.92
F-M(T) AUC = 0.9869
F-M(SRT) AUC = 0.9904
0.90 0.90
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
FMR FMR
(a) ResNet-50: F-M (b) ResNet-50: M-M
F-M AUC = 0.9717 M-M AUC = 0.9609
1.00 F-M(T) AUC = 0.9669 1.00 M-M(T) AUC = 0.9671
F-M(SRT) AUC = 0.9753 M-M(SRT) AUC = 0.9675
0.98 0.98
1 - FNMR
1 - FNMR
0.96 0.96
0.94 0.94
0.92 0.92
0.90 0.90
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
FMR FMR
(c) MobileFaceNet: F-M (d) MobileFaceNet: M-M
Fig. 6: The verification performance for the considered face recognition models, our proposed EUM trained with naive triplet
loss, and our proposed EUM trained with SRT loss, are presented as ROC curves. The curves are plot using MRF2 dataset
for the experimental settings F-M (probe is masked) and M-M (reference and probe are masked). For each ROC curve, the
area under the curve (AUC) is listed inside the plot. The improvement in the performance is noticeable by our proposed SRT
in the plots c and d, in comparison to the achieved performance by the base face recognition model.
1.2963% (MR-MP) to 1.1207% (MR-MP(SRT)). EER by ResNet-50 model is 4.3895% (F-M) as shown in Table
When the probes are masked, the achieved EER by the II. Only in this case, the EER did not improve by our proposed
MobileFaceNet model is 3.4939% (BLR-MP). This error is approach where the achieved EER by our proposed approach
reduced to 2.8805% using our proposed approach (BLR- is 4.7274% (F-M(SRT)). Nonetheless, a notable improvement
MP(SRT)). Also, when comparing masked probes to masked in the FMR1000 and the FDR separability measures can be
references, the EER is decreased from 3.506% (MR-MP) to observed from the reported result. The increase in FDR points
3.1866 (MR-MP(SRT)) by our approach. The improvement in out the possibility that given a larger more representative eval-
the masked face recognition verification performance is also uation data, the consistent enhancement in verification accu-
noticeable from the improvement in FMR100 and FMR1000 racy will be apparent here as well. A significant improvement
measures. in the verification performance is achieved by our approach
when comparing masked probes to masked references. In this
D. Impact of our EUM with SRT on in-the-wild masked face case, the achieved EER is decreased from 6.8831% (M-M)
recognition to 6.2578% (M-M(SRT)). A Similar conclusion can be made
from the improvement from the other performance verification
The achieved evaluation results on the MRF2 dataset by
measures and the FDR measure.
ResNet-50 and MobileFaceNet models are presented in Tables
IV and V, respectively. Using masked probes, the achieved Using masked probes, the achieved verification performance11
Mobilefacenet EER% FMR100% FMR1000% G-mean I-mean FDR FTX %
F-F 0.0106% 0.0 0.0 0.7318 0.0078 26.4276 0.0
F-M 6.4169 16.8919 24.3243 0.3803 -0.0019 4.6457 0.9497
F-M(T) 7.7685 15.8784 34.4595 0.3304 -0.0027 4.2067 0.9497
F-M(SRT) 6.079 12.5 21.9595 0.4157 -0.0018 5.2918 0.9497
M-M 8.4777 18.1534 28.795 0.6087 0.0509 3.2505 1.203
M-M(T) 8.7634 17.5274 26.2911 0.7638 0.3966 3.5408 1.203
M-M(SRT) 7.8232 15.0235 22.5352 0.6087 0.0241 3.5815 1.203
TABLE V: The achieved verification performance of different experimental settings achieved by MobileFaceNet model along
with EUM trained with triplet loss and EUM trained with SRT loss. The result is reported using MRF2 dataset. In both F-M
and M-M experimental settings, our proposed approach (SRT) achieved the best performances.
by MobileFaceNet is significantly enhanced by our proposed masked anchor is similar (to some degree) to the positive
approach. Similar improvement in the verification performance (unmasked embedding) and it is dissimilar (to some degree)
is achieved when comparing masked probe to masked refer- from the negative. Therefore, finding triplets that violate the
ence by our approach as shown in Table V. For example,when triplet condition is not trivia and it could not be possible for
the probes and references are masked, the achieved EER many of triplet in the training dataset. This explains the poor
by MobileFaceNet is 8.4777%. This error rate is reduced to result achieved when the EUM model is trained with triplet
7.8232% using our proposed approach. loss, as there are only few triplets violating the triplet loss
condition. One can assume that using a larger margin value
allows the EUM model to further optimizing the genuine pairs
E. Ablation Study on Self-restrained Triplet loss
distance and imposter pairs distance as the triplet condition
In this subsection we experimentally prove, and theoretically can be violated by increasing the margin value. However, by
discuss, the advantage of our proposed SRT solution over the increasing the margin value, we increase the upper bound
common naive triplet loss. Using masked face dataset, we of the loss function. Thus, we ignore the fact that distance
explore first the validity of training EUM model with triplet between imposter pairs is sufficient respect to the distance
loss. It is noticeable that training EUM with naive triplet between genuine pairs in the embedding space. For example,
is inefficient for learning from masked face embedding as using unmasked data, the mean of the imposter scores achieved
presented in in Table II, IV III and V. For example, when the by ResNet-50 on MFR dataset is 0.0349. When the probe is
probe is masked, the achieved EER by EUM with triplet loss masked, the mean of imposter scores is 0.0251 as shown in
on top of ResNet-50 is 1.9789%, in comparison to 0.9611% Table II. Therefore, any further optimization on the distance
EER achieved by EUM with our SRT as shown in Table II. It is between the imposter pairs will effect the discriminative fea-
crucial for learning with triplet loss that the input triplet violate tures learned by the base face recognition model and there
the condition d(f (xai ), f (xni )) > d(f (xai ), f (xpi )) + m. Thus, is no restriction on the learning process that insures that the
the model can learn to minimize the distance between the model will maintain the distance between the imposter pairs.
genuine pairs and maximize the distance between the imposter Alternatively, training the EUM model with our SRT loss
pairs. When the previous condition is not violated, the loss achieved significant improvement on minimizing the distance
value will be close to zero and the model will not be able between the genuine pairs, simultaneously, it maintains the dis-
to further optimizing the distances of the genuine pairs and tance between the imposter pairs to be close to the one learned
imposter pairs. This motivated our SRT solution. by the base face recognition model. It is noticeable from the
Given that our proposed EUM solution is built on top of reported result that the I-mean achieved by our SRT is closer to
a pre-trained face recognition model. The feature embeddings the I-mean achieved when the model is evaluated on unmasked
of the genuine pairs are similar (to a large degree), and the data, in comparison to the one achieved by naive triplet loss as
ones of imposter pairs are dissimilar. However, this similarity shown in Tables II, III, IV and V. The achieved result pointed
is affected (to some degree) when the faces are masked and out the efficiency of our proposed EUM trained with SRT
our main goal is to reduce this effect. This statement can be in improving the masked face recognition, in comparison to
observed from the achieved results presented in Tables II, III, the considered face recondition models. Also, it supported our
IV and V. For example, using MFR dataset, the achieved theoretical motivation behind SRT where training the EUM
G-mean and I-mean by ResNet-50 is 0.8538 and 0.0349, with SRT significantly outperformed the EUM trained with
respectively. When the probe is masked, the achieved G- naive triplet loss.
mean and I-mean shift to 0.5254 and 0.0251, respectively a) :
as shown in Table II. The shifting in G-mean pointed out
that the similarity between the genuine pairs is reduced (to
some degree) when the probe is masked. Training EUM with VI. C ONCLUSION
naive triplet loss requires selecting a triplet of embeddings In this paper, we presented and evaluated a novel solution to
violated the triplet condition. As we discussed earlier, the reduce the negative impact of wearing a protective face mask12
on face recognition performance. This work was motivated [12] L. Song, D. Gong, Z. Li, C. Liu, and W. Liu, “Occlusion robust
by the recent evaluation efforts on the effect of masked faces face recognition based on mask learning with pairwise differential
siamese network,” in 2019 IEEE/CVF International Conference on
on the face recognition performance [3], [4], [5], [6]. The Computer Vision, ICCV 2019, Seoul, Korea (South), October 27
presented solution is designed to operate on top of existing - November 2, 2019, 2019, pp. 773–782. [Online]. Available:
face recognition models, thus avoid the need for retraining https://doi.org/10.1109/ICCV.2019.00086
[13] M. Opitz, G. Waltner, G. Poier, H. Possegger, and H. Bischof, “Grid
existing face recognition solutions used for unmasked faces. loss: Detecting occluded faces,” CoRR, vol. abs/1609.00129, 2016.
This goal has been accomplished by proposing the EUM op- [Online]. Available: http://arxiv.org/abs/1609.00129
erated on the embedding space. The learning objective of our [14] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified
embedding for face recognition and clustering,” pp. 815–823, 2015.
EUM is to increase the similarity between genuine unmasked- [Online]. Available: https://doi.org/10.1109/CVPR.2015.7298682
masked pairs and to decrease the similarity between imposter [15] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi,
pairs. We achieved this learning objective by proposing a K. Jiang, N. Wang, Y. Pei et al., “Masked face recognition dataset and
application,” arXiv preprint arXiv:2003.09093, 2020.
novel loss function, the SRT. Through ablation study and [16] M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M. Khalifa, “A
experiments on two real masked face dataset and two face hybrid deep transfer learning model with machine learning methods for
recognition models, we demonstrated that our proposed EUM face mask detection in the era of the covid-19 pandemic,” Measurement,
vol. 167, p. 108288, 2021.
with SRT significantly improved the masked face verification [17] Z. Wang, P. Wang, P. C. Louis, L. E. Wheless, and Y. Huo, “Wearmask:
performance in most experimental settings. Fast in-browser face mask detection with serverless edge computing for
covid-19,” arXiv preprint arXiv:2101.00784, 2021.
[18] B. Qin and D. Li, “Identifying facemask-wearing condition using
ACKNOWLEDGMENT image super-resolution with classification network to prevent covid-19,”
Sensors, vol. 20, no. 18, p. 5236, 2020.
This research work has been funded by the German Federal [19] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
Ministry of Education and Research and the Hessen State network training by reducing internal covariate shift,” in Proceedings
of the 32nd International Conference on Machine Learning, ICML
Ministry for Higher Education, Research and the Arts within 2015, Lille, France, 6-11 July 2015, ser. JMLR Workshop and
their joint support of the National Research Center for Applied Conference Proceedings, F. R. Bach and D. M. Blei, Eds.,
Cybersecurity ATHENE. vol. 37. JMLR.org, 2015, pp. 448–456. [Online]. Available:
http://proceedings.mlr.press/v37/ioffe15.html
[20] A. L. Maas, “Rectifier nonlinearities improve neural network acoustic
R EFERENCES models,” 2013.
[21] Y. Feng, H. Wang, H. R. Hu, L. Yu, W. Wang, and S. Wang,
[1] D. Gorodnichy, S. Yanushkevich, and V. Shmerko, “Automated border “Triplet distillation for deep face recognition,” in IEEE International
control: Problem formalization,” in Computational Intelligence in Bio- Conference on Image Processing, ICIP 2020, Abu Dhabi, United Arab
metrics and Identity Management (CIBIM), 2014 IEEE Symposium on. Emirates, October 25-28, 2020. IEEE, 2020, pp. 808–812. [Online].
IEEE, 2014, pp. 118–125. Available: https://doi.org/10.1109/ICIP40778.2020.9190651
[2] G. Lovisotto, R. Malik, I. Sluganovic, M. Roeschlin, P. Trueman, and [22] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2:
I. Martinovic, “Mobile biometrics in financial services: A five factor A dataset for recognising faces across pose and age,” in 13th IEEE
framework,” Tech. Rep. International Conference on Automatic Face & Gesture Recognition, FG
[3] N. Damer, J. H. Grebe, C. Chen, F. Boutros, F. Kirchbuchner, and 2018, Xi’an, China, May 15-19, 2018. IEEE Computer Society, 2018,
A. Kuijper, “The effect of wearing a mask on face recognition pp. 67–74. [Online]. Available: https://doi.org/10.1109/FG.2018.00020
performance: an exploratory study,” in BIOSIG 2020 - Proceedings of [23] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen,
the 19th International Conference of the Biometrics Special Interest “Inverted residuals and linear bottlenecks: Mobile networks for
Group, online, 16.-18. September 2020, ser. LNI, A. Brömme, C. Busch, classification, detection and segmentation,” CoRR, vol. abs/1801.04381,
A. Dantcheva, K. B. Raja, C. Rathgeb, and A. Uhl, Eds., vol. P-306. 2018. [Online]. Available: http://arxiv.org/abs/1801.04381
Gesellschaft für Informatik e.V., 2020, pp. 1–10. [Online]. Available: [24] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “Ms-celeb-1m: A dataset
https://dl.gi.de/20.500.12116/34316 and benchmark for large-scale face recognition,” in Computer Vision -
[4] Department of Homeland Security, “Biometric Technology Rally at ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands,
MDTF,” https://mdtf.org/Rally2020, 2020, last accessed: March 3, 2021. October 11-14, 2016, Proceedings, Part III, ser. Lecture Notes in
[5] M. L. Ngan, P. J. Grother, and K. K. Hanaoka, “Ongoing face recognition Computer Science, B. Leibe, J. Matas, N. Sebe, and M. Welling,
vendor test (frvt) part 6b: Face recognition accuracy with face masks Eds., vol. 9907. Springer, 2016, pp. 87–102. [Online]. Available:
using post-covid-19 algorithms,” 2020. https://doi.org/10.1007/978-3-319-46487-9 6
[6] ——, “Ongoing face recognition vendor test (frvt) part 6b: Face recog- [25] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled
nition accuracy with face masks using post-covid-19 algorithms,” 2020. faces in the wild: A database for studying face recognition in uncon-
[7] A. Anwar and A. Raychowdhury, “Masked face recognition for secure strained environments,” University of Massachusetts, Amherst, Tech.
authentication,” 2020. Rep. 07-49, October 2007.
[8] Y. Li, K. Guo, Y. Lu, and L. Liu, “Cropping and attention based approach [26] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and
for masked face recognition,” Applied Intelligence, pp. 1–14, 2021. alignment using multitask cascaded convolutional networks,” IEEE
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
image recognition,” in 2016 IEEE Conference on Computer Vision [27] D. E. King, “Dlib-ml: A machine learning toolkit,” J. Mach.
and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June Learn. Res., vol. 10, pp. 1755–1758, 2009. [Online]. Available:
27-30, 2016. IEEE Computer Society, 2016, pp. 770–778. [Online]. https://dl.acm.org/citation.cfm?id=1755843
Available: https://doi.org/10.1109/CVPR.2016.90 [28] N. Poh and S. Bengio, “A study of the effects of score normalisation
[10] S. Chen, Y. Liu, X. Gao, and Z. Han, “Mobilefacenets: Efficient cnns prior to fusion in biometric authentication tasks,” IDIAP, Tech. Rep.,
for accurate real-time face verification on mobile devices,” CoRR, vol. 2004.
abs/1804.07573, 2018. [Online]. Available: http://arxiv.org/abs/1804.
07573
[11] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular
margin loss for deep face recognition,” in IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2019, Long Beach,
CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE,
2019, pp. 4690–4699. [Online]. Available: http://openaccess.thecvf.com/
content CVPR 2019/html/Deng ArcFace Additive Angular Margin
Loss for Deep Face Recognition CVPR 2019 paper.htmlYou can also read