DetectorGuard: Provably Securing Object Detectors against Localized Patch Hiding Attacks
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
DetectorGuard: Provably Securing Object Detectors against Localized Patch Hiding Attacks Chong Xiang Prateek Mittal Princeton University Princeton University cxiang@princeton.edu pmittal@princeton.edu arXiv:2102.02956v1 [cs.CV] 5 Feb 2021 Abstract 53, 56]. Eykholt et al. [14] and Chen et al. [7] demonstrate successful physical attacks against YOLOv2 [40] and Faster State-of-the-art object detectors are vulnerable to localized R-CNN [42] detectors for traffic sign recognition. Wu et patch hiding attacks where an adversary introduces a small ad- al. [53] and Xu et al. [56] succeed in evading object detection versarial patch to make detectors miss the detection of salient via wearing a T-shirt printed with adversarial perturbations. objects. In this paper, we propose the first general framework Unfortunately, securing object detectors is extremely chal- for building provably robust detectors against the localized lenging: only a limited number of defenses [8, 43, 59] have patch hiding attack called DetectorGuard. To start with, we been proposed, and they all suffer from at least one of the propose a general approach for transferring the robustness following issues: limited clean performance, lack of provable from image classifiers to object detectors, which builds a robustness, and inability to adapt to localized patch attacks bridge between robust image classification and robust object (see Section 7). detection. We apply a provably robust image classifier to a sliding window over the image and aggregates robust win- In this paper, we investigate countermeasures against the dow classifications at different locations for a robust object localized patch hiding attack in object detection. The local- detection. Second, in order to mitigate the notorious trade-off ized patch attacker can arbitrarily modify image pixels within between clean performance and provable robustness, we use a restricted region and easily mount a physical-world attack a prediction pipeline in which we compare the outputs of a by printing and attaching the adversarial patch to the ob- conventional detector and a robust detector for catching an ject. The practical nature of patch attacks has made them the ongoing attack. When no attack is detected, DetectorGuard first choice of physical-world attacks against object detec- outputs the precise bounding boxes predicted by the conven- tors [7, 14, 47, 53, 56]. The focus of our work is on hiding tional detector to achieve a high clean performance; otherwise, attacks that aim to make the object detector fail to detect the DetectorGuard triggers an attack alert for security. Notably, victim object. This attack can cause serious consequences in our prediction strategy ensures that the robust detector incor- scenarios like an autonomous vehicle missing an upcoming rectly missing objects will not hurt the clean performance of car and ending up with a car crash. To secure real-world ob- DetectorGuard. Moreover, our approach allows us to formally ject detectors from these threats, we propose DetectorGuard prove the robustness of DetectorGuard on certified objects, as the first general framework for building provably robust i.e., it either detects the object or triggers an alert, against object detectors against localized patch hiding attacks. We any patch hiding attacker. Our evaluation on the PASCAL design DetectorGuard with the following two key insights. VOC and MS COCO datasets shows that DetectorGuard has Insight I: Transferring robustness from image classi- the almost same clean performance as conventional detectors, fiers to object detectors. There has been a significant ad- and more importantly, that DetectorGuard achieves the first vancement in robust image classification research in recent provable robustness against localized patch hiding attacks. years [9,10,16,20,21,30,33,34,38,44,52,54,60] while object detectors remain vulnerable to attacks. In DetectorGuard, we aim to make use of well-studied robust image classifiers and 1 Introduction transfer their robustness to object detectors. To achieve this, we leverage a key observation: almost all state-of-the-art im- While object detection is widely deployed in critical applica- age classifiers and object detectors use Convolutional Neural tions like autonomous driving, video surveillance, and identity Networks (CNNs) as their backbone for feature extraction. verification, conventional detectors have been shown vulner- The major difference lies in that an image classifier makes a able to a number of real-world adversarial attacks [7, 14, 47, prediction based on all extracted features (or all image pixels) 1
Clean Adversarial Setting Setting dog dog dog dog Base Detector Base Detector ALERT! dog dog Input Image Detection Output Input Image Detection Output (clean) (adversarial) Objectness Objectness Predictor Detection Matcher Predictor Detection Matcher Figure 1: DetectorGuard Overview. Base Detector predicts precise bounding boxes on clean images, and Objectness Predictor outputs robust objectness feature map. Detection Matcher compares the outputs of Base Detector and Objectness Predictor to determine the final output. In the clean setting (left figure), the dog on the left is detected by both Base Detector and Objectness Predictor. This leads to a match and DetectorGuard outputs the bounding box predicted by Base Detector. In the meantime, the dog on the right is only detected by Base Detector. Detection Matcher will consider this as a benign mismatch, and DetectorGuard will trust Base Detector in this case by outputting the predicted bounding box from Base Detector. In the adversarial setting (right figure), a patch makes Base Detector fail to detect any object while Objectness Predictor still robustly outputs high activation. Detection Matcher detects a malicious mismatch and triggers an attack alert. while an object detector predicts each object using a small while Objectness Predictor can still robustly output high ob- portion of features (or image pixels) at each location. This jectness activation. This mismatch will trigger an attack alert, observation suggests that we can build a robust object detec- and DetectorGuard will abstain from making predictions. Our tor by doing robust image classification on every subset of design ensures that Objectness Predictor incorrectly missing extracted features (or image pixels). Towards this end, we objects (false negatives) will not hurt the clean performance build an Objectness Predictor by using a sliding window over of DetectorGuard (Figure 1 left) while Objectness Predictor the whole image or feature map and applying a robust image robustly detecting objects provides provable security guaran- classifier for robust window classification at each location. tee for DetectorGuard (Figure 1 right). This approach miti- We then securely aggregate and post-process all window clas- gates the trade-off between clean performance and provable sifications to generate a robust objectness map, in which each robustness.1 In Section 4, we will rigorously show that De- element indicates the objectness at its corresponding location. tectorGuard can achieve a similarly high clean performance In Section 4.2, we prove the robustness of Objectness Predic- as conventional detectors and prove the robustness of Detec- tor using the provable analysis of the robust image classifier. torGuard on certified objects against any patch hiding attack considered in our threat model. Insight II: Mitigating the trade-off between clean per- Desirable properties of DetectorGuard. DetectorGuard formance and provable robustness. The robustness of is the first provably robust defense for object detection against security-critical systems usually comes at the cost of clean localized patch hiding attacks. Notably, DetectorGuard has performance, making the defense deployment less appealing. four desirable properties. First, DetectorGuard has a high de- To mitigate this common trade-off, we design DetectorGuard tection performance in the clean setting because its clean in a manner such that our defense achieves substantial prov- predictions come from state-of-the-art detectors (when no able robustness and also maintains a clean performance that false alert is triggered). Second, DetectorGuard is agnostic to is close to state-of-the-art detectors. We provide our defense attack algorithms and can provide strong provable robustness overview in Figure 1. DetectorGuard has three modules: Base against any adaptive attack considered in our threat model. Detector, Objectness Predictor, and Detection Matcher. Base Third, DetectorGuard is agnostic to the design of Base De- Detector can be any state-of-the-art object detector that can tector and therefore compatible with any conventional object make precise predictions on clean images but is vulnerable to detector. Fourth, DetectorGuard is compatible with any ro- patch hiding attacks. We build Objectness Predictor on top of bust image classification technique, and can benefit from any a provably robust image classifier and use it for robust object- progress in the relevant research. ness predictions. We then use Detection Matcher to compare We evaluate DetectorGuard performance on the PASCAL the outputs of Base Detector and Objectness Predictor, which VOC [13] and MS COCO [23] datasets. In our evaluation, will trigger an attack alert if and only if Objectness Predic- we instantiate the Base Detector with a hypothetical perfect tor detects an object while Base Detector misses. When no clean detector, YOLOv4 [2, 49], and Faster R-CNN [42]. We attack is detected, DetectorGuard outputs the predictions of Base Detector and thus has a high clean performance. When 1 Incontrast, the clean performance of traditional attack-detection-based a hiding attack occurs, Base Detector could miss the object defenses [30, 57] is bottlenecked by the errors of the defense module. 2
implement Objectness Predictor using PatchGuard [54] as between the predicted bounding box and the ground-truth box, the building-block robust image classifier. Our evaluation measured by Intersection over Union (IoU), exceeds a certain shows that our defense has a minimal impact on the clean threshold τ. We term a correct detection a true positive (TP). performance and achieves the first provable robustness against On the other hand, any predicted bounding box that fails to patch hiding attacks. satisfy both two TP criteria is considered as a false positive Our contributions can be summarized as follows. (FP). Finally, if a ground-truth object is not detected by any TP bounding box, it is a false negative (FN). Research on • We propose a general approach for transferring robust- object detection aims to minimize FP and FN errors. ness from image classifiers to object detectors. Specif- ically, we build an Objectness Predictor using a robust image classifier and prove its robustness against any 2.2 Attack Formulation patch hiding attack within our threat model. Attack objective. The hiding attack, also referred to as the • We design a prediction pipeline that uses a combination false-negative (FN) attack, aims to make object detectors miss of Base Detector and Objectness Predictor to catch an on- the detection of certain objects (which increases FN).3 The going attack and use it to mitigate the trade-off between hiding attack can cause serious consequences in scenarios clean performance and provable robustness. like an autonomous vehicle missing a pedestrian. Therefore, defending against patch hiding attacks is of great importance. • We extensively evaluate our defense on the PASCAL Attacker capability. The localized adversary is allowed to VOC [13] and MS COCO [23] datasets and demonstrate arbitrarily manipulate pixels within one restricted region.4 the first provable robustness against patch hiding attacks, Formally, we can use a binary pixel mask pm ∈ {0, 1}W ×H as well as its high clean performance. to represent this restricted region, where the pixels within the region are set to 1. The adversarial image then can be 2 Problem Formulation represented as x0 = (1 − pm) x + pm x00 where denotes the element-wise product operator, and x00 ∈ [0, 1]W ×H×C is In this section, we first introduce the object detection task, the content of the adversarial patch. pm is a function of patch followed by the localized patch hiding attack and defense size and patch location. The patch size should be limited such formulation. that the object is recognizable by a human (otherwise, the attack is meaningless). For patch locations, we consider three 2.1 Object Detection different threat models: over-patch, close-patch, far-patch, where the patch is over, close to (partial overlap), or far away Detection objective. The goal of object detection is to predict from (no overlap) the victim object, respectively. a list of bounding boxes for all objects in the input image Previous works [27, 43] have shown that attacks against x ∈ [0, 1]W ×H×C , where pixel values are rescaled into [0, 1], object detectors can succeed even when the patch is far away and W, H,C is the width, height, and channels of the image, from the victim object. Therefore, defending against all three respectively. Each bounding box b is represented as a tuple threat models is of our interest. (xmin , ymin , xmax , ymax , l), where xmin , ymin , xmax , ymax together illustrate the coordinates of the bounding box, and l ∈ L = {0, 1, · · · , N − 1} denotes the predicted object label (N is the 2.3 Defense Formulation number of object classes).2 Defense objective. We focus on defenses against patch hiding Conventional detector. Object detection models can be cate- attacks. We consider our defense to be robust if 1) its detection gorized into two-stage and one-stage detectors depending on on the clean image is correct and 2) the defense can detect their detection pipelines. A two-stage detector first generates part of the object or send out an attack alert on the adversarial proposal for regions that might contain objects and then uses image.5 the proposed regions for object classification and bounding- Crucially, we design our defense to be provably robust: box regression. Representative examples include Faster R- our defense can either detect the certified object or issue an CNN [42] and Mask R-CNN [18]. On the other hand, a one- stage detector does detection directly on the input image with- 3 We use “hiding attack" and “FN attack" interchangeably in this paper. 4 Provably out any region proposal step. SSD [26], YOLO [2, 39–41, 49], robust defenses against one single patch are currently an RetinaNet [22], and EfficientDet [46] are representative one- open/unsolved problem, and hence the focus of this paper. In Appendix C, we will justify our one-patch threat model and discuss the implication of stage detectors. multiple patches. Conventionally, a detection is considered correct when 1) 5 We note that in the adversarial setting, we only require the predicted the predicted label matches the ground truth and 2) the overlap bounding box to cover part of the object. This is because that it is likely that only a small part of the object is recognizable due to the adversarial 2 Conventionalobject detectors usually output objectness score and pre- patch (e.g., the left dog in the right part of Figure 1). We provide additional diction confidence as well—we discard them in notation for simplicity. justification for our defense objective in Appendix E. 3
alert regardless of what the adversary does (including any whether there is an object. We then securely aggregate all win- adaptive attacks within our threat model). This robustness dow classifications for a robust object detection output. Our property is agnostic to the attack algorithm and holds against general approach transfers the robustness of image classifiers an adversary that has full knowledge of our defense as well to object detectors so that robust object detection can also as access to the parameters of our defense model. benefit from ongoing advances in robust image classification. Remark: primary focus on hiding attacks. In this paper, Insight II: using an ensemble prediction strategy to mit- we focus on the hiding attack because it is the most funda- igate the trade-off between clean performance and prov- mental and notorious attack against object detectors. We can able robustness. It is well known that the robustness of ma- visualize dividing the object detection task into two steps: 1) chine learning based systems usually comes at the cost of detecting the object bounding box and then 2) classifying the clean performance (measured by TP, FP, and FN in object detected object. If the first step is compromised by the hiding detection as introduced in Section 2.1). To mitigate this com- attack, there is no hope for robust object detection. On the mon trade-off, we propose an ensemble prediction strategy other hand, securing the first step against the patch hiding that uses a robust detector and a state-of-the-art conventional attack lays a foundation for the robust object detection; we object detector for catching an ongoing attack. We use a con- can design effective remediation for the second step if needed. ventional detector to make precise predictions when no attack Take the application domain of autonomous vehicles (AV) is detected, and use a robust detector to provide substantial as an example: an AV missing the detection of an upcom- robustness in the adversarial setting. The clean performance ing car could end up with a serious car accident. However, of this ensemble is maintained close to state-of-the-art de- if the AV detects the upcoming object but predicts an in- tectors and can also be improved given any advances in be- correct class label (e.g., mistaking a car for a pedestrian), it nign/conventional object detection research. can still make the correct decision of stopping and avoiding DetectorGuard design. Recall that Figure 1 provides an the collision. Moreover, in challenging applications domains overview of DetectorGuard, which will either output a list where the predicted class label is of great importance (e.g., bounding box predictions (left figure; clean setting) or an traffic sign recognition), we can feed the detected bound box attack alert (right figure; adversarial setting). There are three to an auxiliary image classifier to re-determine the class la- major modules in DetectorGuard: Base Detector, Objectness bel. The defense problem is then reduced to the robust im- Predictor, and Detection Matcher. Base Detector is respon- age classification and has been studied by several previous sible for making precise detections in the clean setting and works [21, 33, 54, 60]. Therefore, we make the hiding attack can be any popular high-performance object detector such as as the primary focus of this paper and will also discuss the YOLOv4 [2,49] and Faster R-CNN [42]. Objectness Predictor extension of DetectorGuard against other attacks in Section 6. is built on our first insight and aims to output robust objectness feature map in the adversarial environment; the robustness 3 DetectorGuard is derived from its building block—a robust image classi- fier. Detection Matcher leverages the detection outputs of In this section, we first introduce the key insights and overview Base Detector and Objectness Predictor to catch a malicious of DetectorGuard. We then detail the design of our defense attack using defined rules. When no attack is detected, Detec- components (Objectness Predictor, Detection Matcher) and torGuard will output the detection results of Base Detector our choice of the underlying robust image classifier. (i.e., a conventional detector), so that our clean performance is close to state-of-the-art detectors. When a patch hiding attack occurs, Base Detector can miss the object while Objectness 3.1 Defense Overview Predictor is likely to robustly detect the presence of an ob- We leverage two key insights to design the DetectorGuard ject. This malicious mismatch will be caught by Detection framework. Matcher, and DetectorGuard will send out an attack alert. Insight I: exploiting the tight connection between image Algorithm Pseudocode. We provide the pseudocode classification and object detection tasks to transfer the of DetectorGuard in Algorithm 1. The main procedure robustness from classifiers to detectors. We observe that DG(·) has three sub-procedures: BASE D ETECTOR(·), almost all state-of-the-art image classifiers and object detec- O BJ P REDICTOR(·), and D ET M ATCHER(·). The sub- tors use CNNs as their backbone for feature extraction. An procedure BASE D ETECTOR(·) can be any off-the-shelf image classifier makes a prediction based on all extracted detector as discussed previously. We introduce the remaining features (or image pixels) while an object detector predicts two sub-procedures in the following subsections. All each object using a partial feature map (or image pixels) at tensors/arrays are represented with bold symbols and scalars different locations. This observation motivates our design of a are in italic. All tensor/array indices start from zeros; the robust object detector using a robust image classifier. We use tensor/array slicing is in Python style (e.g., [i : j] means a sliding window over the entire image or feature map and all indices k satisfying i ≤ k < j). We assume that the perform robust classification on each window to determine “background" class corresponds to the largest class index. We 4
Table 1: Summary of important notation Algorithm 1 DetectorGuard Notation Description Notation Description Input: input image x, window size (wx , wy ), binarizing x Input image b bounding box threshold T , Base Detector BASE D ETECTOR(·), robust om Objectness map v classification logits l classification label N number of object classes classification procedure RC(·), cluster detection proce- (wx , wy ) window size (px , py ) patch size dure D ET C LUSTER(·) T binarizing threshold D detection results Output: robust detection D ∗ or ALERT u, l upper/lower bound of classification logits values of each class 1: procedure DG(x, wx , wy , T ) 2: D ← BASE D ETECTOR(x) . Conventional detection 3: om ← O BJ P REDICTOR(x, wx , wy , T ) . Objectness give a summary of important notation in Table 1. 4: a ← D ET M ACTHER(D , om) . Detect hiding attacks 5: if a == True then . Malicious mismatch 3.2 Objectness Predictor 6: D ∗ ← ALERT . Trigger an alert 7: else Objectness Predictor is built using our Insight I and aims to 8: D ∗ ← D . Return Base Detector’s predictions output a robust objectness prediction map in an adversarial 9: end if environment. In doing so, we use a sliding window over the 10: return D ∗ image (or feature map) to make robust window classification, 11: end procedure and then post-process window classifications to generate the objectness map. Objectness Predictor is designed to be prov- 12: procedure O BJ P REDICTOR(x, wx , wy , T ) ably robust against patch hiding attacks. We introduce this 13: X,Y, _ ← S HAPE(x) prediction pipeline in this subsection and analyze its provable 14: ¯ ← Z EROA RRAY[X,Y, N + 1] om . Initialization robustness in Section 4.2. 15: for each valid (i, j) do . Every window location Robust window classification. The pseudocode of Object- 16: l, v ← RC(x[i : i + wx , j : j + wy ]) . Classify ness Predictor is presented as O BJ P REDICTOR(·) in Algo- 17: ¯ : i + wx , j : j + wy ] ← om[i om[i ¯ : i + wx , j : j + rithm 1. The key operation is to use a sliding window and wy ] + v . Add classification logits make window classifications at different locations.6 Each win- 18: end for dow classification aims to predict the object class or “back- 19: om ← B INARIZE(om, ¯ T · wx · wy ) . Binarization ground" based on all pixels (or features) within the window. 20: return om To make the window classification robust even when some 21: end procedure pixels (or features) are corrupted by the adversarial patch, we apply the robust classification technique (Line 16). For each 22: procedure D ET M ATCHER(D , om) window location, represented as (i, j), we feed the correspond- . Match each detected box to objectness map ing window x[i : i + wx , j : j + wy ] to the robust classification 23: for i ∈ {0, 1, · · · , |D | − 1} do sub-procedure RC(·) to get the classification label l and the 24: xmin , ymin , xmax , ymax , l ← b ← D [i] classification logits v ∈ RN+1 for N object classes and the 25: if S UM(om[xmin : xmax , ymin : ymax ]) > 0 then “background" class. DetectorGuard is compatible with any 26: om[xmin : xmax , ymin : ymax ]) ← 0 robust classification technique, and we treat RC(·) as a black- 27: end if box procedure in DetectorGuard. We postpone the discussion 28: end for of RC(·) until Section 3.4 for ease of presentation. 29: if D ET C LUSTER(om) is None then Objectness map generation. Given the robust window clas- 30: return False . All objectness explained sification results, we aim to output an objectness map that 31: else indicates the objectness (i.e., a confidence score indicating 32: return True . Unexplained objectness the likelihood of the presence of an object) of each location. 33: end if First, we generate an all-zero array om ¯ for holding the object- 34: end procedure ness score (Line 14); each objectness vector in om ¯ has N + 1 elements for all object classes plus the “background" class. Next, for each window classification, we added logits v to om ∈ {0, 1}X×Y as the final output (Line 19). In B INARIZE(·), every objectness vector located within the window (Line 17). we examine each location in om. ¯ If the maximum objectness After accumulating objectness scores from all sliding win- scores for the non-background class at that location is larger dows, we binarize om ¯ to obtain the binary objectness map than the threshold T · wx · wy , we set the objectness score in 6 We note that the sliding window can be either in the pixel space or om to one; otherwise, it is set to zero. We note that we discard feature space; we abuse the notation of x to let it represent either an input the information of classification label l in this binarization image or an extracted feature map in O BJ P REDICTOR(·). Discussion on operation. This helps reduce FPs when the model correctly pixel-space and feature-space windows is available in Appendix F. detects the object but fails to predict the correct label, which 5
could happen frequently between similar object classes like strategies. bicycle-vs-motorbike. Processing detected bounding boxes. Line 23-28 of Al- Remark: Limitation of Objectness Predictor. We note that gorithm 1 demonstrate the matching process for each de- the underlying robust image classifier RC(·) in Objectness tected bounding box. For each box b, we get its coordi- Predictor usually suffers from a trade-off between robustness nates xmin , ymin , xmax , ymax , and calculate the sum of object- and clean performance; therefore, Objectness Predictor can ness scores within the same box on the objectness map. If the sometimes be imprecise on the clean images (e.g., missing objectness sum is larger than zero, we assume that the bound- objects). However, as we discuss next, this limitation will not ing box b correctly matches the objectness map om. Next, significantly hurt the clean performance of DetectorGuard we zero out the corresponding region in om, to indicate that due to our special ensemble structure inspired by Insight II. this region of objectness has been explained by the detected bounding box. On the other hand, if all objectness scores are 3.3 Detection Matcher zeros, we assume it is a benign mismatch, and the algorithm does nothing. Detection Matcher leverages our Insight II to mitigate the Processing the objectness map. The final step of the match- trade-off between provable robustness and clean performance. ing is to analyze the objectness map om. We use the sub- It takes as inputs the predicted bounding boxes of Base Detec- procedure D ET C LUSTER(·) to determine if any non-zero tor and the generated objectness map of Objectness Predictor, points in om form a large cluster. Specifically, we choose and tries to match each predicted bounding box to a high DBSCAN [12] as the cluster detection algorithm, which activation region in the objectness map. Detection Matcher will assign each point to a certain cluster or label it as an will label each matching attempt as either a match, a mali- outlier based on the point density in its neighborhood. If cious mismatch, or a benign mismatch. The matching results D ET C LUSTER(om) returns None, it means that no cluster is determine the final prediction of DetectorGuard. We will first found, and that all objectness activations predicted by Object- introduce the high-level matching rules and then elaborate on ness Predictor are explained by the predicted bounding boxes the matching algorithm. of Base Detector, and D ET M ATCHER(·) returns False. On Matching rules. A match corresponds to both Base Detec- the other hand, receiving a non-empty cluster set indicates tor and Objectness Predictor detecting an object at a certain that there are clusters of unexplained objectness activations location while a mismatch corresponds to only one of them de- in om (i.e, Base Detector misses an object but Objectness tecting an object. There are three possible matching outcomes, Predictor detects an object). Detection Matcher will regard each of them leading to a different prediction strategy: this as a sign of patch hiding attacks, and return True. Final output. Line 5-10 demonstrates the strategy for final • A match happens when Base Detector and Objectness prediction. If the alert flag a is True (i.e., a malicious mis- Predictor reach a consensus on an object at a specific match is detected), DetectorGuard returns D ∗ = ALERT. In location. In this simplest case, our defense will assume other cases, DetectorGuard returns the detection D ∗ = D . the detection is correct and output the precise bounding box predicted by Base Detector. 3.4 Robust Image Classifier in Objectness Pre- • A malicious mismatch will be flagged when only Ob- jectness Predictor detects the object. This is most likely dictor to happen when a hiding attack succeeds in fooling the In this subsection, we discuss the design choice of robust conventional detector to miss the object while our Ob- classifier in Objectness Predictor. Our approach is compati- jectness Predictor still makes robust predictions. In this ble with any image classifier that is provably robust against case, our defense will send out an attack alert. adversarial patch attacks. In this paper, we follow Patch- Guard [54] to build robust image classifier RC(·) as it is a • A benign mismatch occurs when only Base Detector general defense framework and it subsumes several defense detects the object. This can happen when Objectness Pre- instances [21, 33, 54, 60] that have state-of-the-art provable dictor incorrectly misses the object due to its limitations robustness and clean accuracy. (recall the trade-off between robustness and clean perfor- PatchGuard: backbone CNNs with small receptive fields. mance). In this case, we trust Base Detector and output The PatchGuard framework [54] proposes to use a CNN with its predicted bounding box. We note that this mismatch small receptive fields to limit the impact of a localized adver- can also be caused by other attacks that are orthogonal sarial patch. The receptive field of a CNN is the input pixel to the focus of this paper (we focus on the hiding attack). region where each extracted feature is looking at, or affected We will discuss strategies for defending against other by. If the receptive field of a CNN is too large, then a small attacks in Section 6. adversarial patch has the potential to corrupt most extracted Next, we will discuss the concrete procedure for determining features and easily manipulate the model behavior [27,43,54]. matching outcomes and applying corresponding prediction There are two main design choices for CNNs with a small 6
receptive field: the BagNet architecture [3] and an ensemble Predictor has an FN when it fails to output high objectness architecture using small pixel patches [21]. In our evaluation, activation for certain objects. Fortunately, this FN of Object- we select the BagNet as the backbone CNN for our Objectness ness Predictor will not hurt the performance of DetectorGuard Predictor since it achieves state-of-the-art performance on because our defense will label it as a benign mismatch and high resolution images and is also more efficient [54]. trust the high-performance Base Detector by taking D as the PatchGuard: secure feature aggregation. The use of Bag- final output (as introduced in Section 3.3). Net ensures that a small adversarial patch is able to corrupt A false-positive (FP) of Objectness Predictor will trigger only a small number of extracted features. The second step in a false alert of DetectorGuard. Objectness Predictor has an PatchGuard is to perform a secure aggregation technique on FP when it incorrectly outputs high objectness activation for extracted features; design choices include clipping [54, 60], regions that do not contain any real object. The FP will result masking [54], majority voting [21, 33]. In this paper, we use in unexplained objectness activation in Detection Matcher robust masking due to its state-of-the-art provable robust- and cause a false alert. Let tp, fp, fn be the TP, FP, FN of ness for high-resolution image classification [54]. We provide Base Detector (i.e., the vanilla undefended object detector), more details of robust masking as well as its provable clas- and fa be the number of objects within the clean image on sification analysis in Appendix H. We will also discuss and which DetectorGuard has false alerts. The TP, FP, and FN of implement other aggregation techniques in Appendix B to DetectorGuard satisfy: tp0 ≥ tp−fa, fp0 ≤ fp, fn0 ≤ fn+fa. demonstrate the generality of our framework. Next, we dis- Therefore, we aim to optimize for a low fa in DetectorGuard, cuss how to specifically adapt and train these building blocks or equivalently a low FP in Objectness Predictor, which can in the context of object detection. be achieved with properly chosen hyper-parameters as will Training image classifiers with object detection datasets. be shown in Section 5.5. Each image in an object detection dataset has multiple ob- In summary, DetectorGuard has a slightly lower clean per- jects with different class labels. To train an image classifier formance compared with state-of-the-art detectors when we given a list of bounding boxes and labels, we first map pixel- optimize for a low FP in Objectness Predictor (resulting in space bounding boxes to the feature space and get a list of few false alerts in DetectorGuard). This small clean perfor- cropped feature maps and labels (details of box mapping are mance drop is worthwhile given the provable robustness of in Appendix F). We then teach BagNet to make a correct DetectorGuard, which we will discuss in the next subsection. prediction on each cropped feature map by minimizing the cross-entropy loss between the aggregated feature prediction 4.2 Provable Robustness and the one-hot encoded label vector. In addition, we aggre- gate all features outside any feature boxes as the “negative" Recall that we consider DetectorGuard to be provably robust feature vector for the “background" classification. for a given object (in a given image) when it can make correct detection on the clean image and will either detect part of the object or issue an alert in the presence of any patch hiding at- 4 Theoretical Defense Analysis tacker within our threat model. In this subsection, we will first In this section, we theoretically analyze the defense model per- show the sufficient condition for the provable robustness of formance in clean and adversarial settings. In the clean setting, DetectorGuard, then present our provable analysis algorithm, we analyze the impact of false positives and false negatives and finally prove its soundness. in the Objectness Predictor module, and how DetectorGuard Sufficient condition for DetectorGuard’s robustness. can achieve clean performance that is only slightly lower than First, we show in Lemma 1 that the robustness of Object- state-of-the-art detectors. In the adversarial setting, we for- ness Predictor implies the robustness of DetectorGuard. We mally show that DetectorGuard can achieve certified/provable abuse the notation “∈" by letting b ∈ D denote that one pre- robustness against patch hiding attacks. dicted box b̄ in D matches the ground-truth box b, and letting b ∈ om denote that the objectness map om has high object- ness activation that matches b. 4.1 Clean Performance Lemma 1. Consider a given an object in an image, which is Here, we analyze the performance of the defense in the clean represented as a bounding box b and can be correctly detected setting. Recall that DetectorGuard is an ensemble of Base De- by DetectorGuard in a clean image x. DetectorGuard has tector and Objectness Predictor. When we instantiate Base De- provable robustness to any valid adversarial image x0 , i.e., tector with a state-of-the-art object detector that rarely makes b ∈ D ∗ or D ∗ = ALERT for D ∗ = DG(x0 ), if Objectness mistake on the clean images (i.e., D is typically correct), Predictor is robust to any valid adversarial image x0 , i.e., Objectness Predictor becomes the major source of errors in b ∈ om = O BJ P REDICTOR(x0 ). DetectorGuard. A false-negative (FN) of Objectness Predictor will not Proof. We prove by contradiction. Suppose that Detec- hurt the clean performance of DetectorGuard. Objectness torGuard is vulnerable to an adversarial image x0 . Then we 7
have that 1) D ∗ 6= ALERT and 2) b 6∈ D ∗ . Algorithm 2 Provable Analysis of DetectorGuard From b ∈ om = O BJ P REDICTOR(x0 ) and D ∗ 6= ALERT, Input: input image x, window size (wx , wy ), matching thresh- we will have b ∈ D = BASE D ETECTOR(x) to avoid ALERT. old T , the set of patch locations P , the object bound- Since no alert is triggered, DG(·) returns D ∗ = D . We then ing box b, provable analysis of the robust classifier have b ∈ D = D ∗ , which contradicts with the condition 2) RC-PA(·), cluster detection procedure D ET C LUSTER(·) b 6∈ D ∗ . Thus, DetectorGuard must not be vulnerable to any Output: whether the object b in x has provable robustness adversarial image x0 when Objectness Predictor is robust. 1: procedure DG-PA(x, wx , wy , T, P , b) Provable robustness of DetectorGuard. We will use the 2: if b 6∈ DG(x, wx , wy , T ) then provable analysis of the robust image classifier, denoted as 3: return False . Clean detection is incorrect RC-PA(·), as the analysis building block to prove the robust- 4: end if ness of DetectorGuard. Given the provable analysis procedure 5: for each p ∈ P do . Check every patch location RC-PA(·), we can reason about the objectness map output in 6: x, y, px , py ← p Objectness Predictor. If its worse-case output still has high ob- 7: r ← DG-PA-O NE(x, x, y, wx , wy , px , py , b, T ) jectness activation, we can certify the provable robustness of 8: if r == False then Objectness Predictor. Finally, using Lemma 1, we can derive 9: return False . Possibly vulnerable the robustness of DetectorGuard. 10: end if We present the provable analysis of DetectorGuard in Algo- 11: end for rithm 2. The algorithm takes a clean image x, a ground-truth 12: return True . Provably robust object bounding box b, and a set of valid patch locations P as 13: end procedure inputs, and will determine whether the object in bounding box b in the image x has provable robustness against any patch at 14: procedure DG-PA-O NE(x, x, y, wx , wy , px , py , b, T ) any location in P . We state the correctness of Algorithm 2 in 15: X,Y, _ ← S HAPE(x) Theorem 1, and will explain the algorithm details by proving 16: om¯ ∗ ← Z EROA RRAY[X,Y, N + 1] . Initialization the theorem. . Generates worse-case objectness map for analysis 17: for each valid (i, j) do . Every window location Theorem 1. Given an object bounding box b in a clean 18: u, l ← RC-PA(x, x − i, y − j, px , py , mx , my ) image x, a set of patch locations P , window size (wx , wy ), 19: om ¯ ∗ [i : i + wx , j : j + wy ] ← om ¯ ∗ [i : i + wx , j : j + and binarizing threshold T (used in DG(·)), if Algorithm 2 wy ] + l . Add worst-case (lower-bound) logits returns True, i.e., DG-PA(x, wx , wy , T, b, P ) = True, De- 20: end for tectorGuard has provable robustness for the object b against 21: om∗ ← B INARIZE(om ¯ ∗ , T · wx · wy ) . Binarization any patch hiding attack using any patch location in P . 22: xmin , ymin , xmax , ymax , l ← b Proof. DG-PA(·) first calls DG(·) of Algorithm 1 to deter- 23: if D ET C LUSTER(om∗ [xmin : xmax , ymin : ymax ]) is mine if DetectorGuard can detect the object bounding box b None then on the clean image x. The algorithm will proceed only when 24: return False . No high objectness left the clean detection is correct (Line 2-4). 25: else Next, we iterate over each patch location in P and call the 26: return True . High worst-case objectness sub-procedure DG-PA-O NE(·), which analyzes worst-case 27: end if behavior over all possible adversarial strategies, to determine 28: end procedure the model robustness. If any call of DG-PA-O NE(·) returns False, the algorithm returns False, indicating that at least one patch location can bypass our defense. On the other hand, We then iterate over each sliding window and call RC-PA(·), if the algorithm tries all valid patch locations and does not re- which takes the image x (or feature map as discussed in Sec- turn False, this means that DetectorGuard is provably robust tion 3.2), relative patch coordinates (x − i, y − j), patch size to all patch locations in P and the algorithm returns True. (px , py ) as inputs and outputs the upper bound u and lower In sub-procedure DG-PA-O NE(·), we analyze the robust- bound l of the classification logits.7 Since the goal of the hid- ness of Objectness Predictor against the given patch location. ing attack is to minimize the objectness scores, we add the We use the provable analysis of the robust image classifier lower bound of classification logits to om¯ ∗ . After we analyze (i.e., RC-PA(·)) to determine the lower/upper bounds of clas- all valid windows, we call B INARIZE(·) for the worse-case sification logits for each window. If the aggregated worse-case objectness map om∗ (recall that the logits values for “back- (i.e., lower bound) objectness map still has high activation ground" is discarded in binarization). We then get the cropped for the object of interest, we can certify the robustness of feature map that corresponds to the object of interest (i.e., Objectness Predictor and then DetectorGuard (by Lemma 1). As shown in DG-PA-O NE(·) pseudocode, we first initial- 7 We treat RC-PA(·) as a black-box sub-procedure in Algorithm 2; more ize a zero array om¯ ∗ to hold the worse-case objectness scores. details for RC-PA(·) are available in Appendix H. 8
om∗ [xmin : xmax , ymin : ymax ]) and feed it to the cluster detec- Objectness Predictor model: BagNet-33 [3]. We use tion algorithm D ET C LUTSER(·). If None is returned, a hiding BagNet-33, which has a 33×33 receptive field, as the back- attack using this patch location might succeed, and the sub- bone network of Objectness Predictor. We zero-pad each im- procedure returns False. Otherwise, Objectness Predictor has age to a square and resize it to 416×416 before feeding it a high worse-case object activation and is thus robust to any to BagNet. We take a BagNet model that is pre-trained on attacked using this patch location. This implies the provable ImageNet [11] and fine-tune it on our detection datasets. robustness, and the sub-procedure returns True. Default hyper-parameters. In Objectness Predictor, we choose to use a sliding window in the feature space, and we set the default feature-space window size to 14. We discuss 5 Evaluation the mapping between pixel space and feature space in Ap- pendix F. In the Detection Matcher, we set the default thresh- In this section, we provide a comprehensive evaluation of old to 10. In our D ET C LUSTER(·), we use DBSCAN [12] DetectorGuard on PASCAL VOC [13] and MS COCO [23] algorithm with eps = 3, min_points = 28. We will analyze datasets. We will first introduce the datasets and models used the effect of different hyper-parameters in Section 5.5. We in our evaluation, followed by our evaluation metrics. We then will also release our source code upon publication. report our main evaluation results on different models and datasets, and finally discuss the effect of hyper-parameters. 5.2 Metric 5.1 Datasets and Models Clean performance: precision and recall. We calculate pre- cision as TP/(TP+FP) and recall as TP/(TP+FN). For the clean Dataset: PASCAL VOC [13]. The detection challenge of images without a false alert, we follow previous works [8, 59] PASCAL Visual Object Classes (VOC) project is a popular setting the IoU threshold τ = 0.5 and count TPs, FPs, FNs in object detection benchmark dataset with annotations for 20 the conventional manner. For images that have false alerts, different classes. We take trainval2007 (5k images) and we set TP and FP to zeros, and FN to the number of ground- trainval2012 (11k images) as our training set and evaluate truth objects since no bounding box is predicted. We note our defense on test2007 (5k images), which is a conven- that conventional detectors use a confidence threshold to filter tional usage of the PASCAL VOC dataset [26, 59]. out bounding boxes with low confidence values. As a result, Dataset: MS COCO [23]. The Microsoft Common Objects different confidence thresholds will give different precision in COntext (COCO) dataset is an extremely challenging ob- and recall values; we will plot the entire precision-recall curve ject detection dataset with 80 annotated common object cat- to show the model performance. egories. We use the training and validation set of COCO2017 Clean performance: average precision (AP). To remove for our experiments. The training set has 117k images, and the dependence on the confidence threshold and to have a the validation set has 5k images. global view of model performance, we also report AP as done Base Detector model: YOLOv4 [2, 49]. YOLOv4 [2] is the in object detection research [13, 23]. We vary the confidence state-of-the-art one-stage detector that achieves the optimal threshold from 0 to 1, record the precision and recall at differ- speed and accuracy of object detection. We choose Scaled- ent thresholds, and calculate AP as the averaged precision at YOLOv4-P5 [49] in our evaluation. We adopt the same image different recall levels. pre-processing pipeline and network architecture as proposed Clean performance: false alert rate (FAR@0.x). FAR is in the original paper. For MS COCO, we use the pre-trained defined as the percentage of clean images on which Detec- model. For PASCAL VOC, we do transfer learning by fine- torGuard will trigger a false alert. We note that FAR is also tuning the model previously trained on MS COCO. closely tied to the confidence threshold of Base Detector: a Base Detector model: Faster R-CNN [42]. Faster R-CNN is higher confidence threshold leads to fewer predicted bounding a representative two-stage detector. We use ResNet101-FPN boxes, leading to higher unexplained high objectness activa- as its backbone network. Image pre-processing and model tion, and finally higher FAR. We will report FAR at different architecture follows the original paper. We use pre-trained recall levels for a global evaluation, and use FAR@0.x to models for MS COCO and do transfer learning to train a denote FAR at a clean recall of 0.x. PASCAL VOC detector. Provable robustness: certified recall (CR@0.x). We use Base Detector model: a perfect clean detector (PCD). We certified recall as the robustness metric against patch hiding use the ground-truth annotations to simulate a perfect clean attacks. The certified recall is defined as the percentage of detector. The perfect clean detector can always make correct ground-truth objects that have provable robustness against detection in the clean setting but is assumed vulnerable to any patch hiding attack. Recall that an object has provable patch hiding attacks. This hypothetical detector ablates the robustness when DetectorGuard can detect the object in the errors of Base Detector and helps us better understand the clean setting and Objectness Predictor can output high object- behavior of Objectness Predictor and Detection Matcher. ness activation in the worst case (as discussed in Section 2.3 9
Table 2: Clean performance of DetectorGuard PASCAL VOC MS COCO AP w/o defense AP w/ defense FAR@0.8 AP w/o defense AP w/ defense FAR@0.6 Perfect clean detector 100% 98.3% 1.5% 100% 96.3% 3.8% YOLOv4 92.6% 91.3% 4.1% 73.4% 71.2% 4.1% Faster R-CNN 90.0% 88.7% 2.7% 66.7% 64.7% 3.5% 1.0 Precision-PCD-V DetectorGuard works well across different datasets. We 0.9 Precision-PCD-DG can see that the observation of high clean performance is FAR-PCD-DG 0.8 Precision-YOLO-V similar across two different datasets: DetectorGuard achieves 0.7 Precision-YOLO-DG a low FAR and a similar AP as the vanilla Base Detector on FAR-YOLO-DG Precision / FAR 0.6 Precision-FRCNN-V both PASCAL VOC and MS COCO (the precision-recall plot 0.5 Precision-FRCNN-DG FAR-FRCNN-DG for MS COCO is available in Appendix D). These similar 0.4 results show that DetectorGuard is a general approach and 0.3 can be used for both easier and challenging detection tasks. 0.2 0.1 0.0 5.4 Provable Robustness 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Recall In this subsection, we first introduce the robustness evaluation Figure 2: Clean performance of DetectorGuard on PASCAL VOC setup and then report the provable robustness of our defense (V – vanilla; DG – DetectorGuard; PCD – perfect clean detector; against any patch hiding attack within our threat model. FRCNN – Faster R-CNN) Setup. We use a 32×32 adversarial pixel patch on the re- scaled and padded 416×416 images to evaluate the provable robustness.8 We consider all possible image locations as can- and Section 4.2). Note that CR is affected by the performance didate locations for the adversarial patch to evaluate the model of Base Detector (e.g., confidence threshold), and we use robustness. We categorize our results into three categories de- CR@0.x to denote the certified recall at a clean recall of 0.x. pending on the distance between an object and the patch loca- tion. When the patch is totally over the object, we consider it 5.3 Clean Performance as over-patch. When the patch partially overlaps with the ob- ject, we consider it as close-patch. The other patch locations In this subsection, we evaluate the clean performance of are considered as far-patch. For each patch location and each DetectorGuard with three different base detectors and two object, we use Algorithm 2 to determine the robustness. We datasets. In Table 2, we report AP of vanilla Base Detector note that the above algorithm already considers all possible (AP w/o defense), AP of DetectorGuard (AP w/ defense), and adaptive attacks (attacker strategies) within our threat model. FAR at a clean recall of 0.8 or 0.6 (FAR@0.8 or FAR@0.6). We use CR@0.x as the robustness metric, and we also report We also plot the precision-recall and FAR-recall curve for the percentage of objects that can be detected by Objectness PASCAL VOC in Figure 2 for detailed model analysis, and a Predictor in the clean setting as Max-CR. We call it Max-CR similar plot for MS COCO is in Appendix D. because DetectorGuard can only certify the robustness for DetectorGuard has a low FAR and a high AP. We can see objects that are detected by Objectness Predictor. Given the from Table 2 that DetectorGuard has a low FAR of 1.5% large number of all possible patch locations, we only use a and a high AP of 98.3% on PASCAL VOC when we use 400-image subset of the test/validation datasets for evaluation a perfect clean detector as Base Detector. The result shows (due to computational constraints). that DetectorGuard only has a minimal impact on the clean DetectorGuard achieves the first non-trivial provable ro- performance. bustness against patch hiding attack. We report the certi- DetectorGuard is highly compatible with different con- fied recall at a clean recall of 0.8 or 0.6 (CR@0.8 or CR@0.6) ventional detectors. From Table 2 and Figure 2, we can see in Table 3. As shown in Table 3, DetectorGuard can certify the that when we use YOLOv4 or Faster R-CNN as Base De- robustness for around 30% of PASCAL VOC objects when tector, the clean AP as well as the precision-recall curve of the patch is far away from the object; which means no attack DetectorGuard is close to that of its vanilla Base Detector. 8 DPatch [27] demonstrates that even a 20×20 adversarial patch at the Furthermore, the FAR@0.8 for PASCAL VOC is as low as image corner can have a malicious effect. In Appendix A, we show that 4.1% for YOLOv4 and 2.7% for Faster R-CNN. These results more than 15% of PASCAL VOC objects and 44% of MS COCO objects are show that DetectorGuard is highly compatible with different smaller than a 32×32 patch. We also provide robustness results for different conventional detectors. patch sizes as well as visualizations in Appendix A. 10
Table 3: Provable robustness of DetectorGuard PASCAL VOC (CR@0.8) MS COCO (CR@0.6) far-patch close-patch over-patch far-patch close-patch over-patch Perfect clean detector 29.6% 21.9% 7.4% 9.5% 4.9% 2.4% YOLOv4 26.6% 19.9% 7.1% 8.0% 4.7% 2.4% Faster R-CNN 27.9% 21.2% 6.7% 8.6% 4.9% 2.4% within our threat model can successfully attack these certified 0.45 objects. We also plot the CR-recall curve for PASCAL VOC 0.40 in Figure 3 (a similar plot for MS COCO is in Appendix D). 0.35 The figures show that the provable robustness improves as 0.30 Max-CR Certified Recall the clean recall increases, and the performance of YOLOv4 0.25 CR-PCD-far and Faster R-CNN is close to that of a perfect clean detector CR-PCD-close 0.20 CR-PCD-in when the recall is close to one. CR-YOLO-far 0.15 CR-YOLO-close DetectorGuard is especially effective when the patch is CR-YOLO-over 0.10 CR-FRCNN-far far away from the objects. From Table 3 and Figure 3, we 0.05 CR-FRCNN-close can clearly see that the provable robustness of DetectorGuard CR-FRCNN-over is especially good when the patch gets far away from the 0.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Clean Recall object. This model behavior aligns with our intuition that a localized adversarial patch should only have a spatially con- Figure 3: Provable robustness of DetectorGuard on PASCAL VOC strained adversarial effect. Moreover, this observation shows that DetectorGuard has made the attack much more difficult: PCD-missed to have a chance to bypass DetectorGuard, the adversary has 0.25 PCD-robust to put the patch close to or even over the victim object, which PCD-vulnerable is not always feasible in real-world scenarios. We also note 0.20 that in the over-patch threat mode, we allow the patch to be % Objects 0.15 anywhere over the object. This means that the patch can be placed over the most salient part of the object (e.g., the face 0.10 of a person), and makes robust detection extremely difficult. Larger objects are more robust than small objects in De- 0.05 tectorGuard. To better understand DetectorGuard’s provable robustness, we plot the histogram of object sizes for PASCAL 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 VOC in Figure 4. We categorize all objects into three groups: Object size (%) 1) objects that are missed by Objectness Predictor in the clean Figure 4: Histograms of object sizes for PASCAL VOC (close-patch; setting (missed); 2) objects that are detected by Objectness results for far-patch and over-patch are in Appendix D) Predictor but are not provably robust (vulnerable); 3) objects that are provably robust (robust). As shown in the figure, most of the missed and vulnerable objects are in small sizes. This ing threshold T in O BJ P REDICTOR(·) to see how the model is an expected behavior because it is hard for even humans performance changes. For each threshold, we report CR for to perfectly detect all small objects. Moreover, considering three patch threat models as well as the Max-CR. We also that missing a big object is much more serious than miss- include AP and 1-FAR to understand the effect of threshold ing a small object in real-world applications, we believe that on clean performance. We report these results in the leftmost DetectorGuard has strong foundational potential. sub-figure in Figure 5. We can see that when the binarizing threshold is low, the CR is high because more objectness is retained after the binarization. However, more objectness also 5.5 Analysis of Hyper-parameters makes it more likely to trigger a false alert in the clean setting, In this subsection, we take the hypothetical perfect clean and we can see both AP and 1-FAR are affected greatly as we detector (PCD) as Base Detector and use the PASCAL VOC decrease the threshold T . Therefore, we need to balance the dataset to analyze the performance of DetectorGuard under trade-off between clean performance and provable robustness. different hyper-parameter settings. Note that using PCD helps In our default parameter setting, we set T = 10 to have a FAR us to focus on the behavior of Objectness Predictor, which is lower than 2% while maintaining decent provable robustness. the most important component in this paper. Effect of window size. We consider the effect of using differ- Effect of the binarizing threshold. We first vary the binariz- ent window sizes in the second sub-figure in Figure 5. The 11
You can also read