DetectorGuard: Provably Securing Object Detectors against Localized Patch Hiding Attacks

Page created by Ramon Mendez
 
CONTINUE READING
DetectorGuard: Provably Securing Object Detectors against Localized Patch Hiding Attacks
DetectorGuard: Provably Securing Object Detectors against Localized Patch
                                                                        Hiding Attacks

                                                                            Chong Xiang                              Prateek Mittal
                                                                        Princeton University                      Princeton University
                                                                       cxiang@princeton.edu                      pmittal@princeton.edu
arXiv:2102.02956v1 [cs.CV] 5 Feb 2021

                                                                 Abstract                                      53, 56]. Eykholt et al. [14] and Chen et al. [7] demonstrate
                                                                                                               successful physical attacks against YOLOv2 [40] and Faster
                                        State-of-the-art object detectors are vulnerable to localized
                                                                                                               R-CNN [42] detectors for traffic sign recognition. Wu et
                                        patch hiding attacks where an adversary introduces a small ad-
                                                                                                               al. [53] and Xu et al. [56] succeed in evading object detection
                                        versarial patch to make detectors miss the detection of salient
                                                                                                               via wearing a T-shirt printed with adversarial perturbations.
                                        objects. In this paper, we propose the first general framework
                                                                                                               Unfortunately, securing object detectors is extremely chal-
                                        for building provably robust detectors against the localized
                                                                                                               lenging: only a limited number of defenses [8, 43, 59] have
                                        patch hiding attack called DetectorGuard. To start with, we
                                                                                                               been proposed, and they all suffer from at least one of the
                                        propose a general approach for transferring the robustness
                                                                                                               following issues: limited clean performance, lack of provable
                                        from image classifiers to object detectors, which builds a
                                                                                                               robustness, and inability to adapt to localized patch attacks
                                        bridge between robust image classification and robust object
                                                                                                               (see Section 7).
                                        detection. We apply a provably robust image classifier to a
                                        sliding window over the image and aggregates robust win-                  In this paper, we investigate countermeasures against the
                                        dow classifications at different locations for a robust object         localized patch hiding attack in object detection. The local-
                                        detection. Second, in order to mitigate the notorious trade-off        ized patch attacker can arbitrarily modify image pixels within
                                        between clean performance and provable robustness, we use              a restricted region and easily mount a physical-world attack
                                        a prediction pipeline in which we compare the outputs of a             by printing and attaching the adversarial patch to the ob-
                                        conventional detector and a robust detector for catching an            ject. The practical nature of patch attacks has made them the
                                        ongoing attack. When no attack is detected, DetectorGuard              first choice of physical-world attacks against object detec-
                                        outputs the precise bounding boxes predicted by the conven-            tors [7, 14, 47, 53, 56]. The focus of our work is on hiding
                                        tional detector to achieve a high clean performance; otherwise,        attacks that aim to make the object detector fail to detect the
                                        DetectorGuard triggers an attack alert for security. Notably,          victim object. This attack can cause serious consequences in
                                        our prediction strategy ensures that the robust detector incor-        scenarios like an autonomous vehicle missing an upcoming
                                        rectly missing objects will not hurt the clean performance of          car and ending up with a car crash. To secure real-world ob-
                                        DetectorGuard. Moreover, our approach allows us to formally            ject detectors from these threats, we propose DetectorGuard
                                        prove the robustness of DetectorGuard on certified objects,            as the first general framework for building provably robust
                                        i.e., it either detects the object or triggers an alert, against       object detectors against localized patch hiding attacks. We
                                        any patch hiding attacker. Our evaluation on the PASCAL                design DetectorGuard with the following two key insights.
                                        VOC and MS COCO datasets shows that DetectorGuard has                     Insight I: Transferring robustness from image classi-
                                        the almost same clean performance as conventional detectors,           fiers to object detectors. There has been a significant ad-
                                        and more importantly, that DetectorGuard achieves the first            vancement in robust image classification research in recent
                                        provable robustness against localized patch hiding attacks.            years [9,10,16,20,21,30,33,34,38,44,52,54,60] while object
                                                                                                               detectors remain vulnerable to attacks. In DetectorGuard, we
                                                                                                               aim to make use of well-studied robust image classifiers and
                                        1   Introduction                                                       transfer their robustness to object detectors. To achieve this,
                                                                                                               we leverage a key observation: almost all state-of-the-art im-
                                        While object detection is widely deployed in critical applica-         age classifiers and object detectors use Convolutional Neural
                                        tions like autonomous driving, video surveillance, and identity        Networks (CNNs) as their backbone for feature extraction.
                                        verification, conventional detectors have been shown vulner-           The major difference lies in that an image classifier makes a
                                        able to a number of real-world adversarial attacks [7, 14, 47,         prediction based on all extracted features (or all image pixels)

                                                                                                           1
Clean         Adversarial
                                                               Setting        Setting

                    dog   dog         dog   dog

                  Base Detector                                                                Base Detector                             ALERT!
                                                         dog   dog

  Input Image                                         Detection Output       Input Image                                              Detection Output
    (clean)                                                                  (adversarial)
                    Objectness                                                                  Objectness
                    Predictor     Detection Matcher                                             Predictor        Detection Matcher

Figure 1: DetectorGuard Overview. Base Detector predicts precise bounding boxes on clean images, and Objectness Predictor outputs robust
objectness feature map. Detection Matcher compares the outputs of Base Detector and Objectness Predictor to determine the final output.
In the clean setting (left figure), the dog on the left is detected by both Base Detector and Objectness Predictor. This leads to a match and
DetectorGuard outputs the bounding box predicted by Base Detector. In the meantime, the dog on the right is only detected by Base Detector.
Detection Matcher will consider this as a benign mismatch, and DetectorGuard will trust Base Detector in this case by outputting the predicted
bounding box from Base Detector. In the adversarial setting (right figure), a patch makes Base Detector fail to detect any object while
Objectness Predictor still robustly outputs high activation. Detection Matcher detects a malicious mismatch and triggers an attack alert.

while an object detector predicts each object using a small                   while Objectness Predictor can still robustly output high ob-
portion of features (or image pixels) at each location. This                  jectness activation. This mismatch will trigger an attack alert,
observation suggests that we can build a robust object detec-                 and DetectorGuard will abstain from making predictions. Our
tor by doing robust image classification on every subset of                   design ensures that Objectness Predictor incorrectly missing
extracted features (or image pixels). Towards this end, we                    objects (false negatives) will not hurt the clean performance
build an Objectness Predictor by using a sliding window over                  of DetectorGuard (Figure 1 left) while Objectness Predictor
the whole image or feature map and applying a robust image                    robustly detecting objects provides provable security guaran-
classifier for robust window classification at each location.                 tee for DetectorGuard (Figure 1 right). This approach miti-
We then securely aggregate and post-process all window clas-                  gates the trade-off between clean performance and provable
sifications to generate a robust objectness map, in which each                robustness.1 In Section 4, we will rigorously show that De-
element indicates the objectness at its corresponding location.               tectorGuard can achieve a similarly high clean performance
In Section 4.2, we prove the robustness of Objectness Predic-                 as conventional detectors and prove the robustness of Detec-
tor using the provable analysis of the robust image classifier.               torGuard on certified objects against any patch hiding attack
                                                                              considered in our threat model.
   Insight II: Mitigating the trade-off between clean per-                       Desirable properties of DetectorGuard. DetectorGuard
formance and provable robustness. The robustness of                           is the first provably robust defense for object detection against
security-critical systems usually comes at the cost of clean                  localized patch hiding attacks. Notably, DetectorGuard has
performance, making the defense deployment less appealing.                    four desirable properties. First, DetectorGuard has a high de-
To mitigate this common trade-off, we design DetectorGuard                    tection performance in the clean setting because its clean
in a manner such that our defense achieves substantial prov-                  predictions come from state-of-the-art detectors (when no
able robustness and also maintains a clean performance that                   false alert is triggered). Second, DetectorGuard is agnostic to
is close to state-of-the-art detectors. We provide our defense                attack algorithms and can provide strong provable robustness
overview in Figure 1. DetectorGuard has three modules: Base                   against any adaptive attack considered in our threat model.
Detector, Objectness Predictor, and Detection Matcher. Base                   Third, DetectorGuard is agnostic to the design of Base De-
Detector can be any state-of-the-art object detector that can                 tector and therefore compatible with any conventional object
make precise predictions on clean images but is vulnerable to                 detector. Fourth, DetectorGuard is compatible with any ro-
patch hiding attacks. We build Objectness Predictor on top of                 bust image classification technique, and can benefit from any
a provably robust image classifier and use it for robust object-              progress in the relevant research.
ness predictions. We then use Detection Matcher to compare                       We evaluate DetectorGuard performance on the PASCAL
the outputs of Base Detector and Objectness Predictor, which                  VOC [13] and MS COCO [23] datasets. In our evaluation,
will trigger an attack alert if and only if Objectness Predic-                we instantiate the Base Detector with a hypothetical perfect
tor detects an object while Base Detector misses. When no                     clean detector, YOLOv4 [2, 49], and Faster R-CNN [42]. We
attack is detected, DetectorGuard outputs the predictions of
Base Detector and thus has a high clean performance. When                        1 Incontrast, the clean performance of traditional attack-detection-based
a hiding attack occurs, Base Detector could miss the object                   defenses [30, 57] is bottlenecked by the errors of the defense module.

                                                                         2
implement Objectness Predictor using PatchGuard [54] as                           between the predicted bounding box and the ground-truth box,
the building-block robust image classifier. Our evaluation                        measured by Intersection over Union (IoU), exceeds a certain
shows that our defense has a minimal impact on the clean                          threshold τ. We term a correct detection a true positive (TP).
performance and achieves the first provable robustness against                    On the other hand, any predicted bounding box that fails to
patch hiding attacks.                                                             satisfy both two TP criteria is considered as a false positive
  Our contributions can be summarized as follows.                                 (FP). Finally, if a ground-truth object is not detected by any
                                                                                  TP bounding box, it is a false negative (FN). Research on
    • We propose a general approach for transferring robust-                      object detection aims to minimize FP and FN errors.
      ness from image classifiers to object detectors. Specif-
      ically, we build an Objectness Predictor using a robust
      image classifier and prove its robustness against any                       2.2       Attack Formulation
      patch hiding attack within our threat model.
                                                                                  Attack objective. The hiding attack, also referred to as the
    • We design a prediction pipeline that uses a combination                     false-negative (FN) attack, aims to make object detectors miss
      of Base Detector and Objectness Predictor to catch an on-                   the detection of certain objects (which increases FN).3 The
      going attack and use it to mitigate the trade-off between                   hiding attack can cause serious consequences in scenarios
      clean performance and provable robustness.                                  like an autonomous vehicle missing a pedestrian. Therefore,
                                                                                  defending against patch hiding attacks is of great importance.
    • We extensively evaluate our defense on the PASCAL                           Attacker capability. The localized adversary is allowed to
      VOC [13] and MS COCO [23] datasets and demonstrate                          arbitrarily manipulate pixels within one restricted region.4
      the first provable robustness against patch hiding attacks,                 Formally, we can use a binary pixel mask pm ∈ {0, 1}W ×H
      as well as its high clean performance.                                      to represent this restricted region, where the pixels within
                                                                                  the region are set to 1. The adversarial image then can be
2     Problem Formulation                                                         represented as x0 = (1 − pm) x + pm x00 where denotes
                                                                                  the element-wise product operator, and x00 ∈ [0, 1]W ×H×C is
In this section, we first introduce the object detection task,                    the content of the adversarial patch. pm is a function of patch
followed by the localized patch hiding attack and defense                         size and patch location. The patch size should be limited such
formulation.                                                                      that the object is recognizable by a human (otherwise, the
                                                                                  attack is meaningless). For patch locations, we consider three
2.1     Object Detection                                                          different threat models: over-patch, close-patch, far-patch,
                                                                                  where the patch is over, close to (partial overlap), or far away
Detection objective. The goal of object detection is to predict                   from (no overlap) the victim object, respectively.
a list of bounding boxes for all objects in the input image                          Previous works [27, 43] have shown that attacks against
x ∈ [0, 1]W ×H×C , where pixel values are rescaled into [0, 1],                   object detectors can succeed even when the patch is far away
and W, H,C is the width, height, and channels of the image,                       from the victim object. Therefore, defending against all three
respectively. Each bounding box b is represented as a tuple                       threat models is of our interest.
(xmin , ymin , xmax , ymax , l), where xmin , ymin , xmax , ymax together
illustrate the coordinates of the bounding box, and l ∈ L =
{0, 1, · · · , N − 1} denotes the predicted object label (N is the                2.3       Defense Formulation
number of object classes).2                                                       Defense objective. We focus on defenses against patch hiding
Conventional detector. Object detection models can be cate-                       attacks. We consider our defense to be robust if 1) its detection
gorized into two-stage and one-stage detectors depending on                       on the clean image is correct and 2) the defense can detect
their detection pipelines. A two-stage detector first generates                   part of the object or send out an attack alert on the adversarial
proposal for regions that might contain objects and then uses                     image.5
the proposed regions for object classification and bounding-
                                                                                     Crucially, we design our defense to be provably robust:
box regression. Representative examples include Faster R-
                                                                                  our defense can either detect the certified object or issue an
CNN [42] and Mask R-CNN [18]. On the other hand, a one-
stage detector does detection directly on the input image with-                      3 We   use “hiding attack" and “FN attack" interchangeably in this paper.
                                                                                     4 Provably
out any region proposal step. SSD [26], YOLO [2, 39–41, 49],                                      robust defenses against one single patch are currently an
RetinaNet [22], and EfficientDet [46] are representative one-                     open/unsolved problem, and hence the focus of this paper. In Appendix C,
                                                                                  we will justify our one-patch threat model and discuss the implication of
stage detectors.                                                                  multiple patches.
   Conventionally, a detection is considered correct when 1)                          5 We note that in the adversarial setting, we only require the predicted

the predicted label matches the ground truth and 2) the overlap                   bounding box to cover part of the object. This is because that it is likely
                                                                                  that only a small part of the object is recognizable due to the adversarial
    2 Conventionalobject detectors usually output objectness score and pre-       patch (e.g., the left dog in the right part of Figure 1). We provide additional
diction confidence as well—we discard them in notation for simplicity.            justification for our defense objective in Appendix E.

                                                                              3
alert regardless of what the adversary does (including any              whether there is an object. We then securely aggregate all win-
adaptive attacks within our threat model). This robustness              dow classifications for a robust object detection output. Our
property is agnostic to the attack algorithm and holds against          general approach transfers the robustness of image classifiers
an adversary that has full knowledge of our defense as well             to object detectors so that robust object detection can also
as access to the parameters of our defense model.                       benefit from ongoing advances in robust image classification.
Remark: primary focus on hiding attacks. In this paper,                 Insight II: using an ensemble prediction strategy to mit-
we focus on the hiding attack because it is the most funda-             igate the trade-off between clean performance and prov-
mental and notorious attack against object detectors. We can            able robustness. It is well known that the robustness of ma-
visualize dividing the object detection task into two steps: 1)         chine learning based systems usually comes at the cost of
detecting the object bounding box and then 2) classifying the           clean performance (measured by TP, FP, and FN in object
detected object. If the first step is compromised by the hiding         detection as introduced in Section 2.1). To mitigate this com-
attack, there is no hope for robust object detection. On the            mon trade-off, we propose an ensemble prediction strategy
other hand, securing the first step against the patch hiding            that uses a robust detector and a state-of-the-art conventional
attack lays a foundation for the robust object detection; we            object detector for catching an ongoing attack. We use a con-
can design effective remediation for the second step if needed.         ventional detector to make precise predictions when no attack
   Take the application domain of autonomous vehicles (AV)              is detected, and use a robust detector to provide substantial
as an example: an AV missing the detection of an upcom-                 robustness in the adversarial setting. The clean performance
ing car could end up with a serious car accident. However,              of this ensemble is maintained close to state-of-the-art de-
if the AV detects the upcoming object but predicts an in-               tectors and can also be improved given any advances in be-
correct class label (e.g., mistaking a car for a pedestrian), it        nign/conventional object detection research.
can still make the correct decision of stopping and avoiding            DetectorGuard design. Recall that Figure 1 provides an
the collision. Moreover, in challenging applications domains            overview of DetectorGuard, which will either output a list
where the predicted class label is of great importance (e.g.,           bounding box predictions (left figure; clean setting) or an
traffic sign recognition), we can feed the detected bound box           attack alert (right figure; adversarial setting). There are three
to an auxiliary image classifier to re-determine the class la-          major modules in DetectorGuard: Base Detector, Objectness
bel. The defense problem is then reduced to the robust im-              Predictor, and Detection Matcher. Base Detector is respon-
age classification and has been studied by several previous             sible for making precise detections in the clean setting and
works [21, 33, 54, 60]. Therefore, we make the hiding attack            can be any popular high-performance object detector such as
as the primary focus of this paper and will also discuss the            YOLOv4 [2,49] and Faster R-CNN [42]. Objectness Predictor
extension of DetectorGuard against other attacks in Section 6.          is built on our first insight and aims to output robust objectness
                                                                        feature map in the adversarial environment; the robustness
3     DetectorGuard                                                     is derived from its building block—a robust image classi-
                                                                        fier. Detection Matcher leverages the detection outputs of
In this section, we first introduce the key insights and overview       Base Detector and Objectness Predictor to catch a malicious
of DetectorGuard. We then detail the design of our defense              attack using defined rules. When no attack is detected, Detec-
components (Objectness Predictor, Detection Matcher) and                torGuard will output the detection results of Base Detector
our choice of the underlying robust image classifier.                   (i.e., a conventional detector), so that our clean performance is
                                                                        close to state-of-the-art detectors. When a patch hiding attack
                                                                        occurs, Base Detector can miss the object while Objectness
3.1    Defense Overview                                                 Predictor is likely to robustly detect the presence of an ob-
We leverage two key insights to design the DetectorGuard                ject. This malicious mismatch will be caught by Detection
framework.                                                              Matcher, and DetectorGuard will send out an attack alert.
Insight I: exploiting the tight connection between image                Algorithm Pseudocode. We provide the pseudocode
classification and object detection tasks to transfer the               of DetectorGuard in Algorithm 1. The main procedure
robustness from classifiers to detectors. We observe that               DG(·) has three sub-procedures: BASE D ETECTOR(·),
almost all state-of-the-art image classifiers and object detec-         O BJ P REDICTOR(·), and D ET M ATCHER(·). The sub-
tors use CNNs as their backbone for feature extraction. An              procedure BASE D ETECTOR(·) can be any off-the-shelf
image classifier makes a prediction based on all extracted              detector as discussed previously. We introduce the remaining
features (or image pixels) while an object detector predicts            two sub-procedures in the following subsections. All
each object using a partial feature map (or image pixels) at            tensors/arrays are represented with bold symbols and scalars
different locations. This observation motivates our design of a         are in italic. All tensor/array indices start from zeros; the
robust object detector using a robust image classifier. We use          tensor/array slicing is in Python style (e.g., [i : j] means
a sliding window over the entire image or feature map and               all indices k satisfying i ≤ k < j). We assume that the
perform robust classification on each window to determine               “background" class corresponds to the largest class index. We

                                                                    4
Table 1: Summary of important notation                                Algorithm 1 DetectorGuard
  Notation     Description            Notation     Description                       Input: input image x, window size (wx , wy ), binarizing
  x            Input image           b            bounding box                           threshold T , Base Detector BASE D ETECTOR(·), robust
  om           Objectness map        v            classification logits
  l            classification label  N            number of object classes
                                                                                         classification procedure RC(·), cluster detection proce-
  (wx , wy )   window size           (px , py )   patch size                             dure D ET C LUSTER(·)
  T            binarizing threshold D             detection results                  Output: robust detection D ∗ or ALERT
  u, l         upper/lower bound of classification logits values of each class        1: procedure DG(x, wx , wy , T )
                                                                                      2:     D ← BASE D ETECTOR(x) . Conventional detection
                                                                                      3:     om ← O BJ P REDICTOR(x, wx , wy , T ) . Objectness
give a summary of important notation in Table 1.                                      4:     a ← D ET M ACTHER(D , om) . Detect hiding attacks
                                                                                      5:     if a == True then            . Malicious mismatch
3.2       Objectness Predictor                                                        6:          D ∗ ← ALERT                   . Trigger an alert
                                                                                      7:     else
Objectness Predictor is built using our Insight I and aims to                         8:          D ∗ ← D . Return Base Detector’s predictions
output a robust objectness prediction map in an adversarial                           9:     end if
environment. In doing so, we use a sliding window over the                           10:     return D ∗
image (or feature map) to make robust window classification,                         11: end procedure
and then post-process window classifications to generate the
objectness map. Objectness Predictor is designed to be prov-                         12:   procedure O BJ P REDICTOR(x, wx , wy , T )
ably robust against patch hiding attacks. We introduce this                          13:        X,Y, _ ← S HAPE(x)
prediction pipeline in this subsection and analyze its provable                      14:         ¯ ← Z EROA RRAY[X,Y, N + 1]
                                                                                                om                                         . Initialization
robustness in Section 4.2.                                                           15:        for each valid (i, j) do      . Every window location
Robust window classification. The pseudocode of Object-                              16:            l, v ← RC(x[i : i + wx , j : j + wy ])      . Classify
ness Predictor is presented as O BJ P REDICTOR(·) in Algo-                           17:              ¯ : i + wx , j : j + wy ] ← om[i
                                                                                                    om[i                             ¯ : i + wx , j : j +
rithm 1. The key operation is to use a sliding window and                                  wy ] + v                         . Add classification logits
make window classifications at different locations.6 Each win-                       18:        end for
dow classification aims to predict the object class or “back-                        19:        om ← B INARIZE(om,    ¯ T · wx · wy )      . Binarization
ground" based on all pixels (or features) within the window.                         20:        return om
To make the window classification robust even when some                              21:   end procedure
pixels (or features) are corrupted by the adversarial patch, we
apply the robust classification technique (Line 16). For each                        22:   procedure D ET M ATCHER(D , om)
window location, represented as (i, j), we feed the correspond-                               . Match each detected box to objectness map
ing window x[i : i + wx , j : j + wy ] to the robust classification                  23:      for i ∈ {0, 1, · · · , |D | − 1} do
sub-procedure RC(·) to get the classification label l and the                        24:          xmin , ymin , xmax , ymax , l ← b ← D [i]
classification logits v ∈ RN+1 for N object classes and the                          25:          if S UM(om[xmin : xmax , ymin : ymax ]) > 0 then
“background" class. DetectorGuard is compatible with any                             26:               om[xmin : xmax , ymin : ymax ]) ← 0
robust classification technique, and we treat RC(·) as a black-                      27:          end if
box procedure in DetectorGuard. We postpone the discussion                           28:      end for
of RC(·) until Section 3.4 for ease of presentation.                                 29:      if D ET C LUSTER(om) is None then
Objectness map generation. Given the robust window clas-                             30:          return False                  . All objectness explained
sification results, we aim to output an objectness map that                          31:      else
indicates the objectness (i.e., a confidence score indicating                        32:          return True                    . Unexplained objectness
the likelihood of the presence of an object) of each location.                       33:      end if
First, we generate an all-zero array om  ¯ for holding the object-                   34:   end procedure
ness score (Line 14); each objectness vector in om    ¯ has N + 1
elements for all object classes plus the “background" class.
Next, for each window classification, we added logits v to                           om ∈ {0, 1}X×Y as the final output (Line 19). In B INARIZE(·),
every objectness vector located within the window (Line 17).                         we examine each location in om.  ¯ If the maximum objectness
After accumulating objectness scores from all sliding win-                           scores for the non-background class at that location is larger
dows, we binarize om   ¯ to obtain the binary objectness map                         than the threshold T · wx · wy , we set the objectness score in
   6 We note that the sliding window can be either in the pixel space or
                                                                                     om to one; otherwise, it is set to zero. We note that we discard
feature space; we abuse the notation of x to let it represent either an input
                                                                                     the information of classification label l in this binarization
image or an extracted feature map in O BJ P REDICTOR(·). Discussion on               operation. This helps reduce FPs when the model correctly
pixel-space and feature-space windows is available in Appendix F.                    detects the object but fails to predict the correct label, which

                                                                                 5
could happen frequently between similar object classes like            strategies.
bicycle-vs-motorbike.                                                  Processing detected bounding boxes. Line 23-28 of Al-
Remark: Limitation of Objectness Predictor. We note that               gorithm 1 demonstrate the matching process for each de-
the underlying robust image classifier RC(·) in Objectness             tected bounding box. For each box b, we get its coordi-
Predictor usually suffers from a trade-off between robustness          nates xmin , ymin , xmax , ymax , and calculate the sum of object-
and clean performance; therefore, Objectness Predictor can             ness scores within the same box on the objectness map. If the
sometimes be imprecise on the clean images (e.g., missing              objectness sum is larger than zero, we assume that the bound-
objects). However, as we discuss next, this limitation will not        ing box b correctly matches the objectness map om. Next,
significantly hurt the clean performance of DetectorGuard              we zero out the corresponding region in om, to indicate that
due to our special ensemble structure inspired by Insight II.          this region of objectness has been explained by the detected
                                                                       bounding box. On the other hand, if all objectness scores are
3.3    Detection Matcher                                               zeros, we assume it is a benign mismatch, and the algorithm
                                                                       does nothing.
Detection Matcher leverages our Insight II to mitigate the             Processing the objectness map. The final step of the match-
trade-off between provable robustness and clean performance.           ing is to analyze the objectness map om. We use the sub-
It takes as inputs the predicted bounding boxes of Base Detec-         procedure D ET C LUSTER(·) to determine if any non-zero
tor and the generated objectness map of Objectness Predictor,          points in om form a large cluster. Specifically, we choose
and tries to match each predicted bounding box to a high               DBSCAN [12] as the cluster detection algorithm, which
activation region in the objectness map. Detection Matcher             will assign each point to a certain cluster or label it as an
will label each matching attempt as either a match, a mali-            outlier based on the point density in its neighborhood. If
cious mismatch, or a benign mismatch. The matching results             D ET C LUSTER(om) returns None, it means that no cluster is
determine the final prediction of DetectorGuard. We will first         found, and that all objectness activations predicted by Object-
introduce the high-level matching rules and then elaborate on          ness Predictor are explained by the predicted bounding boxes
the matching algorithm.                                                of Base Detector, and D ET M ATCHER(·) returns False. On
Matching rules. A match corresponds to both Base Detec-                the other hand, receiving a non-empty cluster set indicates
tor and Objectness Predictor detecting an object at a certain          that there are clusters of unexplained objectness activations
location while a mismatch corresponds to only one of them de-          in om (i.e, Base Detector misses an object but Objectness
tecting an object. There are three possible matching outcomes,         Predictor detects an object). Detection Matcher will regard
each of them leading to a different prediction strategy:               this as a sign of patch hiding attacks, and return True.
                                                                       Final output. Line 5-10 demonstrates the strategy for final
   • A match happens when Base Detector and Objectness
                                                                       prediction. If the alert flag a is True (i.e., a malicious mis-
     Predictor reach a consensus on an object at a specific
                                                                       match is detected), DetectorGuard returns D ∗ = ALERT. In
     location. In this simplest case, our defense will assume
                                                                       other cases, DetectorGuard returns the detection D ∗ = D .
     the detection is correct and output the precise bounding
     box predicted by Base Detector.
                                                                       3.4    Robust Image Classifier in Objectness Pre-
   • A malicious mismatch will be flagged when only Ob-
     jectness Predictor detects the object. This is most likely
                                                                              dictor
     to happen when a hiding attack succeeds in fooling the            In this subsection, we discuss the design choice of robust
     conventional detector to miss the object while our Ob-            classifier in Objectness Predictor. Our approach is compati-
     jectness Predictor still makes robust predictions. In this        ble with any image classifier that is provably robust against
     case, our defense will send out an attack alert.                  adversarial patch attacks. In this paper, we follow Patch-
                                                                       Guard [54] to build robust image classifier RC(·) as it is a
   • A benign mismatch occurs when only Base Detector
                                                                       general defense framework and it subsumes several defense
     detects the object. This can happen when Objectness Pre-
                                                                       instances [21, 33, 54, 60] that have state-of-the-art provable
     dictor incorrectly misses the object due to its limitations
                                                                       robustness and clean accuracy.
     (recall the trade-off between robustness and clean perfor-
                                                                       PatchGuard: backbone CNNs with small receptive fields.
     mance). In this case, we trust Base Detector and output
                                                                       The PatchGuard framework [54] proposes to use a CNN with
     its predicted bounding box. We note that this mismatch
                                                                       small receptive fields to limit the impact of a localized adver-
     can also be caused by other attacks that are orthogonal
                                                                       sarial patch. The receptive field of a CNN is the input pixel
     to the focus of this paper (we focus on the hiding attack).
                                                                       region where each extracted feature is looking at, or affected
     We will discuss strategies for defending against other
                                                                       by. If the receptive field of a CNN is too large, then a small
     attacks in Section 6.
                                                                       adversarial patch has the potential to corrupt most extracted
Next, we will discuss the concrete procedure for determining           features and easily manipulate the model behavior [27,43,54].
matching outcomes and applying corresponding prediction                   There are two main design choices for CNNs with a small

                                                                   6
receptive field: the BagNet architecture [3] and an ensemble            Predictor has an FN when it fails to output high objectness
architecture using small pixel patches [21]. In our evaluation,         activation for certain objects. Fortunately, this FN of Object-
we select the BagNet as the backbone CNN for our Objectness             ness Predictor will not hurt the performance of DetectorGuard
Predictor since it achieves state-of-the-art performance on             because our defense will label it as a benign mismatch and
high resolution images and is also more efficient [54].                 trust the high-performance Base Detector by taking D as the
PatchGuard: secure feature aggregation. The use of Bag-                 final output (as introduced in Section 3.3).
Net ensures that a small adversarial patch is able to corrupt           A false-positive (FP) of Objectness Predictor will trigger
only a small number of extracted features. The second step in           a false alert of DetectorGuard. Objectness Predictor has an
PatchGuard is to perform a secure aggregation technique on              FP when it incorrectly outputs high objectness activation for
extracted features; design choices include clipping [54, 60],           regions that do not contain any real object. The FP will result
masking [54], majority voting [21, 33]. In this paper, we use           in unexplained objectness activation in Detection Matcher
robust masking due to its state-of-the-art provable robust-             and cause a false alert. Let tp, fp, fn be the TP, FP, FN of
ness for high-resolution image classification [54]. We provide          Base Detector (i.e., the vanilla undefended object detector),
more details of robust masking as well as its provable clas-            and fa be the number of objects within the clean image on
sification analysis in Appendix H. We will also discuss and             which DetectorGuard has false alerts. The TP, FP, and FN of
implement other aggregation techniques in Appendix B to                 DetectorGuard satisfy: tp0 ≥ tp−fa, fp0 ≤ fp, fn0 ≤ fn+fa.
demonstrate the generality of our framework. Next, we dis-              Therefore, we aim to optimize for a low fa in DetectorGuard,
cuss how to specifically adapt and train these building blocks          or equivalently a low FP in Objectness Predictor, which can
in the context of object detection.                                     be achieved with properly chosen hyper-parameters as will
Training image classifiers with object detection datasets.              be shown in Section 5.5.
Each image in an object detection dataset has multiple ob-                 In summary, DetectorGuard has a slightly lower clean per-
jects with different class labels. To train an image classifier         formance compared with state-of-the-art detectors when we
given a list of bounding boxes and labels, we first map pixel-          optimize for a low FP in Objectness Predictor (resulting in
space bounding boxes to the feature space and get a list of             few false alerts in DetectorGuard). This small clean perfor-
cropped feature maps and labels (details of box mapping are             mance drop is worthwhile given the provable robustness of
in Appendix F). We then teach BagNet to make a correct                  DetectorGuard, which we will discuss in the next subsection.
prediction on each cropped feature map by minimizing the
cross-entropy loss between the aggregated feature prediction
                                                                        4.2    Provable Robustness
and the one-hot encoded label vector. In addition, we aggre-
gate all features outside any feature boxes as the “negative"           Recall that we consider DetectorGuard to be provably robust
feature vector for the “background" classification.                     for a given object (in a given image) when it can make correct
                                                                        detection on the clean image and will either detect part of the
                                                                        object or issue an alert in the presence of any patch hiding at-
4     Theoretical Defense Analysis                                      tacker within our threat model. In this subsection, we will first
In this section, we theoretically analyze the defense model per-        show the sufficient condition for the provable robustness of
formance in clean and adversarial settings. In the clean setting,       DetectorGuard, then present our provable analysis algorithm,
we analyze the impact of false positives and false negatives            and finally prove its soundness.
in the Objectness Predictor module, and how DetectorGuard               Sufficient condition for DetectorGuard’s robustness.
can achieve clean performance that is only slightly lower than          First, we show in Lemma 1 that the robustness of Object-
state-of-the-art detectors. In the adversarial setting, we for-         ness Predictor implies the robustness of DetectorGuard. We
mally show that DetectorGuard can achieve certified/provable            abuse the notation “∈" by letting b ∈ D denote that one pre-
robustness against patch hiding attacks.                                dicted box b̄ in D matches the ground-truth box b, and letting
                                                                        b ∈ om denote that the objectness map om has high object-
                                                                        ness activation that matches b.
4.1    Clean Performance
                                                                        Lemma 1. Consider a given an object in an image, which is
Here, we analyze the performance of the defense in the clean            represented as a bounding box b and can be correctly detected
setting. Recall that DetectorGuard is an ensemble of Base De-           by DetectorGuard in a clean image x. DetectorGuard has
tector and Objectness Predictor. When we instantiate Base De-           provable robustness to any valid adversarial image x0 , i.e.,
tector with a state-of-the-art object detector that rarely makes        b ∈ D ∗ or D ∗ = ALERT for D ∗ = DG(x0 ), if Objectness
mistake on the clean images (i.e., D is typically correct),             Predictor is robust to any valid adversarial image x0 , i.e.,
Objectness Predictor becomes the major source of errors in              b ∈ om = O BJ P REDICTOR(x0 ).
DetectorGuard.
A false-negative (FN) of Objectness Predictor will not                  Proof. We prove by contradiction. Suppose that Detec-
hurt the clean performance of DetectorGuard. Objectness                 torGuard is vulnerable to an adversarial image x0 . Then we

                                                                    7
have that 1) D ∗ 6= ALERT and 2) b 6∈ D ∗ .                             Algorithm 2 Provable Analysis of DetectorGuard
   From b ∈ om = O BJ P REDICTOR(x0 ) and D ∗ 6= ALERT,                 Input: input image x, window size (wx , wy ), matching thresh-
we will have b ∈ D = BASE D ETECTOR(x) to avoid ALERT.                      old T , the set of patch locations P , the object bound-
Since no alert is triggered, DG(·) returns D ∗ = D . We then                ing box b, provable analysis of the robust classifier
have b ∈ D = D ∗ , which contradicts with the condition 2)                  RC-PA(·), cluster detection procedure D ET C LUSTER(·)
b 6∈ D ∗ . Thus, DetectorGuard must not be vulnerable to any            Output: whether the object b in x has provable robustness
adversarial image x0 when Objectness Predictor is robust.                1: procedure DG-PA(x, wx , wy , T, P , b)
Provable robustness of DetectorGuard. We will use the                    2:     if b 6∈ DG(x, wx , wy , T ) then
provable analysis of the robust image classifier, denoted as             3:         return False           . Clean detection is incorrect
RC-PA(·), as the analysis building block to prove the robust-            4:     end if
ness of DetectorGuard. Given the provable analysis procedure             5:     for each p ∈ P do          . Check every patch location
RC-PA(·), we can reason about the objectness map output in               6:         x, y, px , py ← p
Objectness Predictor. If its worse-case output still has high ob-        7:         r ← DG-PA-O NE(x, x, y, wx , wy , px , py , b, T )
jectness activation, we can certify the provable robustness of           8:         if r == False then
Objectness Predictor. Finally, using Lemma 1, we can derive              9:              return False             . Possibly vulnerable
the robustness of DetectorGuard.                                        10:         end if
   We present the provable analysis of DetectorGuard in Algo-           11:     end for
rithm 2. The algorithm takes a clean image x, a ground-truth            12:     return True                           . Provably robust
object bounding box b, and a set of valid patch locations P as          13: end procedure
inputs, and will determine whether the object in bounding box
b in the image x has provable robustness against any patch at           14:   procedure DG-PA-O NE(x, x, y, wx , wy , px , py , b, T )
any location in P . We state the correctness of Algorithm 2 in          15:        X,Y, _ ← S HAPE(x)
Theorem 1, and will explain the algorithm details by proving            16:        om¯ ∗ ← Z EROA RRAY[X,Y, N + 1]                   . Initialization
the theorem.                                                                       . Generates worse-case objectness map for analysis
                                                                        17:        for each valid (i, j) do            . Every window location
Theorem 1. Given an object bounding box b in a clean                    18:            u, l ← RC-PA(x, x − i, y − j, px , py , mx , my )
image x, a set of patch locations P , window size (wx , wy ),           19:            om ¯ ∗ [i : i + wx , j : j + wy ] ← om  ¯ ∗ [i : i + wx , j : j +
and binarizing threshold T (used in DG(·)), if Algorithm 2                    wy ] + l                . Add worst-case (lower-bound) logits
returns True, i.e., DG-PA(x, wx , wy , T, b, P ) = True, De-            20:        end for
tectorGuard has provable robustness for the object b against            21:        om∗ ← B INARIZE(om          ¯ ∗ , T · wx · wy ) . Binarization
any patch hiding attack using any patch location in P .                 22:        xmin , ymin , xmax , ymax , l ← b
Proof. DG-PA(·) first calls DG(·) of Algorithm 1 to deter-              23:        if D ET C LUSTER(om∗ [xmin : xmax , ymin : ymax ]) is
mine if DetectorGuard can detect the object bounding box b                    None then
on the clean image x. The algorithm will proceed only when              24:            return False                    . No high objectness left
the clean detection is correct (Line 2-4).                              25:        else
    Next, we iterate over each patch location in P and call the         26:            return True                . High worst-case objectness
sub-procedure DG-PA-O NE(·), which analyzes worst-case                  27:        end if
behavior over all possible adversarial strategies, to determine         28:   end procedure
the model robustness. If any call of DG-PA-O NE(·) returns
False, the algorithm returns False, indicating that at least
one patch location can bypass our defense. On the other hand,           We then iterate over each sliding window and call RC-PA(·),
if the algorithm tries all valid patch locations and does not re-       which takes the image x (or feature map as discussed in Sec-
turn False, this means that DetectorGuard is provably robust            tion 3.2), relative patch coordinates (x − i, y − j), patch size
to all patch locations in P and the algorithm returns True.             (px , py ) as inputs and outputs the upper bound u and lower
    In sub-procedure DG-PA-O NE(·), we analyze the robust-              bound l of the classification logits.7 Since the goal of the hid-
ness of Objectness Predictor against the given patch location.          ing attack is to minimize the objectness scores, we add the
We use the provable analysis of the robust image classifier             lower bound of classification logits to om¯ ∗ . After we analyze
(i.e., RC-PA(·)) to determine the lower/upper bounds of clas-           all valid windows, we call B INARIZE(·) for the worse-case
sification logits for each window. If the aggregated worse-case         objectness map om∗ (recall that the logits values for “back-
(i.e., lower bound) objectness map still has high activation            ground" is discarded in binarization). We then get the cropped
for the object of interest, we can certify the robustness of            feature map that corresponds to the object of interest (i.e.,
Objectness Predictor and then DetectorGuard (by Lemma 1).
    As shown in DG-PA-O NE(·) pseudocode, we first initial-                7 We  treat RC-PA(·) as a black-box sub-procedure in Algorithm 2; more
ize a zero array om¯ ∗ to hold the worse-case objectness scores.        details for RC-PA(·) are available in Appendix H.

                                                                    8
om∗ [xmin : xmax , ymin : ymax ]) and feed it to the cluster detec-       Objectness Predictor model: BagNet-33 [3]. We use
tion algorithm D ET C LUTSER(·). If None is returned, a hiding            BagNet-33, which has a 33×33 receptive field, as the back-
attack using this patch location might succeed, and the sub-              bone network of Objectness Predictor. We zero-pad each im-
procedure returns False. Otherwise, Objectness Predictor has              age to a square and resize it to 416×416 before feeding it
a high worse-case object activation and is thus robust to any             to BagNet. We take a BagNet model that is pre-trained on
attacked using this patch location. This implies the provable             ImageNet [11] and fine-tune it on our detection datasets.
robustness, and the sub-procedure returns True.                           Default hyper-parameters. In Objectness Predictor, we
                                                                          choose to use a sliding window in the feature space, and we
                                                                          set the default feature-space window size to 14. We discuss
5     Evaluation                                                          the mapping between pixel space and feature space in Ap-
                                                                          pendix F. In the Detection Matcher, we set the default thresh-
In this section, we provide a comprehensive evaluation of                 old to 10. In our D ET C LUSTER(·), we use DBSCAN [12]
DetectorGuard on PASCAL VOC [13] and MS COCO [23]                         algorithm with eps = 3, min_points = 28. We will analyze
datasets. We will first introduce the datasets and models used            the effect of different hyper-parameters in Section 5.5. We
in our evaluation, followed by our evaluation metrics. We then            will also release our source code upon publication.
report our main evaluation results on different models and
datasets, and finally discuss the effect of hyper-parameters.
                                                                          5.2    Metric
5.1    Datasets and Models                                                Clean performance: precision and recall. We calculate pre-
                                                                          cision as TP/(TP+FP) and recall as TP/(TP+FN). For the clean
Dataset: PASCAL VOC [13]. The detection challenge of                      images without a false alert, we follow previous works [8, 59]
PASCAL Visual Object Classes (VOC) project is a popular                   setting the IoU threshold τ = 0.5 and count TPs, FPs, FNs in
object detection benchmark dataset with annotations for 20                the conventional manner. For images that have false alerts,
different classes. We take trainval2007 (5k images) and                   we set TP and FP to zeros, and FN to the number of ground-
trainval2012 (11k images) as our training set and evaluate                truth objects since no bounding box is predicted. We note
our defense on test2007 (5k images), which is a conven-                   that conventional detectors use a confidence threshold to filter
tional usage of the PASCAL VOC dataset [26, 59].                          out bounding boxes with low confidence values. As a result,
Dataset: MS COCO [23]. The Microsoft Common Objects                       different confidence thresholds will give different precision
in COntext (COCO) dataset is an extremely challenging ob-                 and recall values; we will plot the entire precision-recall curve
ject detection dataset with 80 annotated common object cat-               to show the model performance.
egories. We use the training and validation set of COCO2017               Clean performance: average precision (AP). To remove
for our experiments. The training set has 117k images, and                the dependence on the confidence threshold and to have a
the validation set has 5k images.                                         global view of model performance, we also report AP as done
Base Detector model: YOLOv4 [2, 49]. YOLOv4 [2] is the                    in object detection research [13, 23]. We vary the confidence
state-of-the-art one-stage detector that achieves the optimal             threshold from 0 to 1, record the precision and recall at differ-
speed and accuracy of object detection. We choose Scaled-                 ent thresholds, and calculate AP as the averaged precision at
YOLOv4-P5 [49] in our evaluation. We adopt the same image                 different recall levels.
pre-processing pipeline and network architecture as proposed              Clean performance: false alert rate (FAR@0.x). FAR is
in the original paper. For MS COCO, we use the pre-trained                defined as the percentage of clean images on which Detec-
model. For PASCAL VOC, we do transfer learning by fine-                   torGuard will trigger a false alert. We note that FAR is also
tuning the model previously trained on MS COCO.                           closely tied to the confidence threshold of Base Detector: a
Base Detector model: Faster R-CNN [42]. Faster R-CNN is                   higher confidence threshold leads to fewer predicted bounding
a representative two-stage detector. We use ResNet101-FPN                 boxes, leading to higher unexplained high objectness activa-
as its backbone network. Image pre-processing and model                   tion, and finally higher FAR. We will report FAR at different
architecture follows the original paper. We use pre-trained               recall levels for a global evaluation, and use FAR@0.x to
models for MS COCO and do transfer learning to train a                    denote FAR at a clean recall of 0.x.
PASCAL VOC detector.                                                      Provable robustness: certified recall (CR@0.x). We use
Base Detector model: a perfect clean detector (PCD). We                   certified recall as the robustness metric against patch hiding
use the ground-truth annotations to simulate a perfect clean              attacks. The certified recall is defined as the percentage of
detector. The perfect clean detector can always make correct              ground-truth objects that have provable robustness against
detection in the clean setting but is assumed vulnerable to               any patch hiding attack. Recall that an object has provable
patch hiding attacks. This hypothetical detector ablates the              robustness when DetectorGuard can detect the object in the
errors of Base Detector and helps us better understand the                clean setting and Objectness Predictor can output high object-
behavior of Objectness Predictor and Detection Matcher.                   ness activation in the worst case (as discussed in Section 2.3

                                                                      9
Table 2: Clean performance of DetectorGuard
                                                                              PASCAL VOC                                             MS COCO
                                                     AP w/o defense             AP w/ defense    FAR@0.8         AP w/o defense       AP w/ defense        FAR@0.6
           Perfect clean detector                             100%                    98.3%          1.5%             100%                 96.3%              3.8%
                 YOLOv4                                       92.6%                   91.3%          4.1%             73.4%                71.2%              4.1%
              Faster R-CNN                                    90.0%                   88.7%          2.7%             66.7%                64.7%              3.5%

                        1.0       Precision-PCD-V                                                    DetectorGuard works well across different datasets. We
                        0.9       Precision-PCD-DG                                                   can see that the observation of high clean performance is
                                  FAR-PCD-DG
                        0.8       Precision-YOLO-V                                                   similar across two different datasets: DetectorGuard achieves
                        0.7       Precision-YOLO-DG                                                  a low FAR and a similar AP as the vanilla Base Detector on
                                  FAR-YOLO-DG
      Precision / FAR

                        0.6       Precision-FRCNN-V                                                  both PASCAL VOC and MS COCO (the precision-recall plot
                        0.5       Precision-FRCNN-DG
                                  FAR-FRCNN-DG                                                       for MS COCO is available in Appendix D). These similar
                        0.4                                                                          results show that DetectorGuard is a general approach and
                        0.3
                                                                                                     can be used for both easier and challenging detection tasks.
                        0.2
                        0.1
                        0.0                                                                          5.4     Provable Robustness
                           0.1   0.2   0.3   0.4   0.5      0.6   0.7   0.8    0.9   1.0
                                                         Recall
                                                                                                     In this subsection, we first introduce the robustness evaluation
Figure 2: Clean performance of DetectorGuard on PASCAL VOC                                           setup and then report the provable robustness of our defense
(V – vanilla; DG – DetectorGuard; PCD – perfect clean detector;                                      against any patch hiding attack within our threat model.
FRCNN – Faster R-CNN)                                                                                Setup. We use a 32×32 adversarial pixel patch on the re-
                                                                                                     scaled and padded 416×416 images to evaluate the provable
                                                                                                     robustness.8 We consider all possible image locations as can-
and Section 4.2). Note that CR is affected by the performance
                                                                                                     didate locations for the adversarial patch to evaluate the model
of Base Detector (e.g., confidence threshold), and we use
                                                                                                     robustness. We categorize our results into three categories de-
CR@0.x to denote the certified recall at a clean recall of 0.x.
                                                                                                     pending on the distance between an object and the patch loca-
                                                                                                     tion. When the patch is totally over the object, we consider it
5.3    Clean Performance                                                                             as over-patch. When the patch partially overlaps with the ob-
                                                                                                     ject, we consider it as close-patch. The other patch locations
In this subsection, we evaluate the clean performance of                                             are considered as far-patch. For each patch location and each
DetectorGuard with three different base detectors and two                                            object, we use Algorithm 2 to determine the robustness. We
datasets. In Table 2, we report AP of vanilla Base Detector                                          note that the above algorithm already considers all possible
(AP w/o defense), AP of DetectorGuard (AP w/ defense), and                                           adaptive attacks (attacker strategies) within our threat model.
FAR at a clean recall of 0.8 or 0.6 (FAR@0.8 or FAR@0.6).                                            We use CR@0.x as the robustness metric, and we also report
We also plot the precision-recall and FAR-recall curve for                                           the percentage of objects that can be detected by Objectness
PASCAL VOC in Figure 2 for detailed model analysis, and a                                            Predictor in the clean setting as Max-CR. We call it Max-CR
similar plot for MS COCO is in Appendix D.                                                           because DetectorGuard can only certify the robustness for
DetectorGuard has a low FAR and a high AP. We can see                                                objects that are detected by Objectness Predictor. Given the
from Table 2 that DetectorGuard has a low FAR of 1.5%                                                large number of all possible patch locations, we only use a
and a high AP of 98.3% on PASCAL VOC when we use                                                     400-image subset of the test/validation datasets for evaluation
a perfect clean detector as Base Detector. The result shows                                          (due to computational constraints).
that DetectorGuard only has a minimal impact on the clean                                            DetectorGuard achieves the first non-trivial provable ro-
performance.                                                                                         bustness against patch hiding attack. We report the certi-
DetectorGuard is highly compatible with different con-                                               fied recall at a clean recall of 0.8 or 0.6 (CR@0.8 or CR@0.6)
ventional detectors. From Table 2 and Figure 2, we can see                                           in Table 3. As shown in Table 3, DetectorGuard can certify the
that when we use YOLOv4 or Faster R-CNN as Base De-                                                  robustness for around 30% of PASCAL VOC objects when
tector, the clean AP as well as the precision-recall curve of                                        the patch is far away from the object; which means no attack
DetectorGuard is close to that of its vanilla Base Detector.
                                                                                                        8 DPatch [27] demonstrates that even a 20×20 adversarial patch at the
Furthermore, the FAR@0.8 for PASCAL VOC is as low as
                                                                                                     image corner can have a malicious effect. In Appendix A, we show that
4.1% for YOLOv4 and 2.7% for Faster R-CNN. These results                                             more than 15% of PASCAL VOC objects and 44% of MS COCO objects are
show that DetectorGuard is highly compatible with different                                          smaller than a 32×32 patch. We also provide robustness results for different
conventional detectors.                                                                              patch sizes as well as visualizations in Appendix A.

                                                                                                10
Table 3: Provable robustness of DetectorGuard
                                                PASCAL VOC (CR@0.8)                                       MS COCO (CR@0.6)
                                          far-patch close-patch over-patch                        far-patch close-patch over-patch
                 Perfect clean detector    29.6%      21.9%            7.4%                         9.5%                   4.9%           2.4%
                       YOLOv4              26.6%      19.9%            7.1%                         8.0%                   4.7%           2.4%
                    Faster R-CNN           27.9%      21.2%            6.7%                         8.6%                   4.9%           2.4%

within our threat model can successfully attack these certified                                  0.45
objects. We also plot the CR-recall curve for PASCAL VOC                                         0.40
in Figure 3 (a similar plot for MS COCO is in Appendix D).                                       0.35
The figures show that the provable robustness improves as                                        0.30
                                                                                                              Max-CR

                                                                              Certified Recall
the clean recall increases, and the performance of YOLOv4                                        0.25         CR-PCD-far
and Faster R-CNN is close to that of a perfect clean detector                                                 CR-PCD-close
                                                                                                 0.20         CR-PCD-in
when the recall is close to one.                                                                              CR-YOLO-far
                                                                                                 0.15         CR-YOLO-close
DetectorGuard is especially effective when the patch is                                                       CR-YOLO-over
                                                                                                 0.10         CR-FRCNN-far
far away from the objects. From Table 3 and Figure 3, we
                                                                                                 0.05         CR-FRCNN-close
can clearly see that the provable robustness of DetectorGuard                                                 CR-FRCNN-over
is especially good when the patch gets far away from the                                         0.00
                                                                                                        0.1   0.2    0.3    0.4    0.5 0.6 0.7   0.8   0.9   1.0
                                                                                                                                  Clean Recall
object. This model behavior aligns with our intuition that a
localized adversarial patch should only have a spatially con-          Figure 3: Provable robustness of DetectorGuard on PASCAL VOC
strained adversarial effect. Moreover, this observation shows
that DetectorGuard has made the attack much more difficult:
                                                                                                                                                 PCD-missed
to have a chance to bypass DetectorGuard, the adversary has                                      0.25                                            PCD-robust
to put the patch close to or even over the victim object, which                                                                                  PCD-vulnerable
is not always feasible in real-world scenarios. We also note                                     0.20
that in the over-patch threat mode, we allow the patch to be
                                                                              % Objects

                                                                                                 0.15
anywhere over the object. This means that the patch can be
placed over the most salient part of the object (e.g., the face                                  0.10
of a person), and makes robust detection extremely difficult.
Larger objects are more robust than small objects in De-                                         0.05
tectorGuard. To better understand DetectorGuard’s provable
robustness, we plot the histogram of object sizes for PASCAL                                     0.00
                                                                                                        0.0    0.1    0.2     0.3 0.4 0.5        0.6   0.7    0.8
VOC in Figure 4. We categorize all objects into three groups:                                                                 Object size (%)
1) objects that are missed by Objectness Predictor in the clean        Figure 4: Histograms of object sizes for PASCAL VOC (close-patch;
setting (missed); 2) objects that are detected by Objectness           results for far-patch and over-patch are in Appendix D)
Predictor but are not provably robust (vulnerable); 3) objects
that are provably robust (robust). As shown in the figure, most
of the missed and vulnerable objects are in small sizes. This          ing threshold T in O BJ P REDICTOR(·) to see how the model
is an expected behavior because it is hard for even humans             performance changes. For each threshold, we report CR for
to perfectly detect all small objects. Moreover, considering           three patch threat models as well as the Max-CR. We also
that missing a big object is much more serious than miss-              include AP and 1-FAR to understand the effect of threshold
ing a small object in real-world applications, we believe that         on clean performance. We report these results in the leftmost
DetectorGuard has strong foundational potential.                       sub-figure in Figure 5. We can see that when the binarizing
                                                                       threshold is low, the CR is high because more objectness is
                                                                       retained after the binarization. However, more objectness also
5.5    Analysis of Hyper-parameters
                                                                       makes it more likely to trigger a false alert in the clean setting,
In this subsection, we take the hypothetical perfect clean             and we can see both AP and 1-FAR are affected greatly as we
detector (PCD) as Base Detector and use the PASCAL VOC                 decrease the threshold T . Therefore, we need to balance the
dataset to analyze the performance of DetectorGuard under              trade-off between clean performance and provable robustness.
different hyper-parameter settings. Note that using PCD helps          In our default parameter setting, we set T = 10 to have a FAR
us to focus on the behavior of Objectness Predictor, which is          lower than 2% while maintaining decent provable robustness.
the most important component in this paper.                            Effect of window size. We consider the effect of using differ-
Effect of the binarizing threshold. We first vary the binariz-         ent window sizes in the second sub-figure in Figure 5. The

                                                                  11
You can also read