Secure Watermark for Deep Neural Networks with Multi-task Learning

Page created by Roy Jones
 
CONTINUE READING
Secure Watermark for Deep Neural Networks with Multi-task Learning

                                                                                        Fangqi Li, Shilin Wang
                                                                     School of Electronic Information and Electrical Engineering,
                                                                                    Shanghai Jiao Tong University,
                                                                               {solour_lfq,wsl}@sjtu.edu.cn
arXiv:2103.10021v2 [cs.CR] 27 Mar 2021

                                                                   Abstract                                       is about 0.005$ averagely regarding electricity consumption.1
                                         Deep neural networks are playing an important role in many               On the contrary, using a published DNN is easy, a user simply
                                         real-life applications. After being trained with abundant data           propagates the input forward. Such an imbalance between
                                         and computing resources, a deep neural network model pro-                DNN production and deployment calls for recognizing DNN
                                         viding service is endowed with economic value. An important              models as intellectual properties and designing better mecha-
                                         prerequisite in commercializing and protecting deep neural               nisms for authorship identification against piracy.
                                         networks is the reliable identification of their genuine au-                DNN models, as other multi-media objects, are usually
                                         thor. To meet this goal, watermarking schemes that embed                 transmitted in public channels. Hence the most influential
                                         the author’s identity information into the networks have been            methods for protecting DNNs as intellectual properties is dig-
                                         proposed. However, current schemes can hardly meet all the               ital watermark [59]. To prove the possession of an image,
                                         necessary requirements for securely proving the authorship               a piece of music, or a video, the owner resorts to a water-
                                         and mostly focus on models for classification. To explicitly             marking method that encodes its identity information into the
                                         meet the formal definitions of the security requirements and in-         media. After compression, transmission, and slight distortion,
                                         crease the applicability of deep neural network watermarking             a decoder should be able to recognize the identity from the
                                         schemes, we propose a new framework based on multi-task                  carrier [4].
                                         learning. By treating the watermark embedding as an extra                   As for DNN watermarking, researchers have been follow-
                                         task, most of the security requirements are explicitly formu-            ing a similar line of reasoning [48]. In this paper, we use host
                                         lated and met with well-designed regularizers, the rest is guar-         to denote the genuine author of a DNN model. The adversary
                                         anteed by using components from cryptography. Moreover, a                is one who steals and publishes the model as if it is the host.
                                         decentralized verification protocol is proposed to standardize           To add watermarks to a DNN, some information is embedded
                                         the ownership verification. The experiment results show that             into the network along with the normal training data. After
                                         the proposed scheme is flexible, secure, and robust, hence a             adversaries manage to steal the model and pretend to have
                                         promising candidate in deep learning model protection.                   built it on themselves, a verification process reveals the hid-
                                                                                                                  den information in the DNN to identify the authentic host.
                                         1   Introduction                                                         In the DNN setting, watermark as additional security insur-
                                                                                                                  ance should not sacrifice the model’s performance. This is
                                         Deep neural network (DNN) is spearheading artificial in-                 called the functionality-preserving property. Meanwhile, the
                                         telligence with broad application in assorted fields includ-             watermark should be robust against the adversaries’ modifica-
                                         ing computer vision [19, 36, 58], natural language process-              tions to the model. Many users fine-tune (FT) the downloaded
                                         ing [10, 17, 53], internet of things [14, 30, 41], etc. Increasing       model on a smaller data set to fit their tasks. In cases where
                                         computing resources and improved algorithms have boosted                 the computational resource is restricted (especially in the in-
                                         DNN as a trustworthy agent that outperforms humans in many               ternet of things), a user is expected to conduct neuron pruning
                                         disciplines.                                                             (NP) to save energy. A prudent user can conduct fine-pruning
                                            To train a DNN is much more expensive than to use it for              (FP) [31] to eliminate potential backdoors that have been
                                         inference. A large amount of data has to be collected, prepro-           inserted into the model. These basic requirements, together
                                         cessed, and fed into the model. Following the data preparation           with other concerns for integrity, privacy, etc, make DNN wa-
                                         is designing the regularizers, tuning the (hyper)parameters,                 1 Assume that one-kilowatt-hour electricity costs 0.1$. One epoch for
                                         and optimizing the DNN structure. Each round of tuning in-               training a DNN can consume over three minutes on four GPUs, each functions
                                         volves thousands of epochs of backpropagation, whose cost                at around 300W.

                                                                                                              1
termark a challenge for both machine learning and security               in [48]. The encoding is done through a regularizer that min-
communities.                                                             imizes the distance between a specific weight vector and a
   The diversity of current watermarking schemes originates              string encoding the author’s identity. The method in [16] is
from assumptions on whether or not the host or the notary has            an attempt of embedding message into the model’s weight in
white-box access to the stolen model.                                    a reversible manner so that a trusted user can eliminate the
   If the adversary has stolen the model and only provided an            watermark’s influence and obtain the clean model. Instead of
API as a service then the host has only black-box access to              weights, Davish et al. proposed Deepsigns [12] that embeds
the possibly stolen model. In this case, the backdoor-based              the host’s identity into the statistical mean of the feature maps
watermarking schemes are preferred. A DNN with a back-                   of a selected collection of samples, hence better protection is
door yields special outputs on specific inputs. For example,             achieved.
it is possible to train an image classification DNN to clas-                So far, the performance of a watermarking method is mainly
sify all images with a triangle stamp on the upper-left corner           measured by the decline of the watermarked model’s per-
as cats. Backdoor-based watermark was pioneered by [59],                 formance on normal inputs and the decline of the identity
where a collection of images is selected as the trigger set              verification accuracy against model fine-tuning and neuron
to actuate misclassifications. It was indicated in [3, 60] that          pruning. However, many of the results are empirical and lack
cryptological protocols can be used with the backdoor-based              analytic basis [12, 48]. Most watermarking methods are only
watermark to prove the integrity of the host’s identity. For a           designed and examined for DNNs for image classification,
more decent way of generating triggers, Li et al. proposed               whose backdoors can be generated easily. This fact challenges
in [29] to adopt a variational autoencoder (VAE), while Le               the universality of adopting DNN watermark for practical use.
Merrer et al. used adversarial samples as triggers [26]. Li et al.       Moreover, some basic security requirements against adversar-
proposed Wonder Filter that assigns some pixels to values                ial attacks have been overlooked by most existing watermark-
in [−2000, 2000] and adopted several tricks to guarantee the             ing schemes. For example, the method in [59] can detect the
robustness of watermark embedding in [27]. In [57], Yao et al.           piracy, but it cannot prove to any third-party that the model
illustrated the performance of the backdoor-based watermark              belongs to the host. As indicated by Auguste Kerckhoff’s
in transfer learning and concluded that it is better to embed            principle [24], the security of the system should rely on the
information in the feature extraction layers.                            secret key rather than the secrecy of the algorithm. Methods
   The backdoor-based watermarking schemes are essen-                    in [12, 48, 59] are insecure in this sense since an adversary
tially insecure given various methods of backdoor elimina-               knowing the watermark algorithm can effortlessly claim the
tion [9, 28, 32]. Liu et al. showed in [33] that a heuristic and         authorship. The influence of watermark overwriting is only
biomorphic method can detect backdoor in a DNN. In [44],                 discussed in [3, 12, 27]. The security against ownership piracy
Shafieinejad et al. claimed that it is able to remove water-             is only studied in [16, 27, 60].
marks given the black-box access of the model. Namba et al.                 In order to overcome these difficulties, we propose a new
proposed another defense using VAE against backdoor-based                white-box watermarking model for DNN based on multi-task
watermarking methods in [35]. Even without these specialized             learning (MTL) [7, 22, 43]. By turning the watermark em-
algorithms, model tuning such as FP [31, 47] can efficiently             bedding into an extra task, most security requirements can
block backdoor and hence the backdoor-based watermark.                   be satisfied with well-designed regularizers. This extra task
   If the host can obtain all the parameters of the model,               has a classifier independent from the backend of the original
known as the white-box access, then the weight-based water-              model, hence it can verify the ownership of models designed
marking schemes are in favor. Although this assumption is                for tasks other than classification. Cryptological protocols
strictly stronger than that for the black-box setting, its prac-         are adopted to instantiate the watermarking task, making the
ticality remains significant. For example, the sponsor of a              proposed scheme more secure against watermark detection
model competition can detect plagiarists that submit models              and ownership piracy. To ensure the integrity of authorship
slightly tuned from those of other contestants by examing                identification, a decentralized verification protocol is designed
the watermark. This legitimate method is better than check-              to authorize the time stamp of the ownership and invalid the
ing whether two models perform significantly different on a              watermark overwriting attack. The major contributions of our
batch of data, which is still adopted by many competitions.2             work are three-fold:
As another example, the investor of a project can verify the
originality of a submitted model from its watermark. Such ver-             1. We examine the security requirements for DNN water-
ification prevents the tenderers from submitting a (modified)                 mark in a comprehensive and formal manner.
copy or an outdated and potentially backdoored model.                      2. A DNN watermarking model based on MTL, together
   Uchida et al. firstly revealed the feasibility of incorporating            with a decentralized protocal, is proposed to meet all the
the host’s identity information into the weights of a DNN                     security requirements. Our proposal can be applied to
   2 http://host.robots.ox.ac.uk:8080/leaderboard/main_                       DNNs for tasks other than image classification, which
bootstrap.php                                                                 were the only focus of previous works.

                                                                     2
3. Compared with several state-of-the-art watermarking              2.1.3   Privacy concerns
       schemes, the proposed method is more robust and secure.
                                                                        As an extension to detection, we consider an adversary who
                                                                        is capable of identifying the host of a model without its per-
2     Threat Model and Security Requirements                            mission as a threat to privacy. A watermarked DNN should
                                                                        expose no information about its host unless the host wants to.
It is reasonable to assume that the adversary possesses fewer           Otherwise, it is possible that models be evaluated not by their
resources than the host, e.g., the entire training data set is          performance but by their authors.
not exposed to the adversary, and/or the adversary’s compu-
tation resources are limited. Otherwise, it is unnecessary for          2.1.4   Watermark overwriting
the adversary to steal the model. Moreover, we assume that
                                                                        Having obtained the model and the watermarking method, the
the adversary can only tune the model by methods such as
                                                                        adversary can embed its watermark into the model and declare
FT, NP or FP. Such modifications are common attacks since
                                                                        the ownership afterward. Embedding an extra watermark only
the training code is usually published along with the trained
                                                                        requires the redundancy of parameter representation in the
model. Meanwhile, such tuning is effective against systems
                                                                        model. Therefore new watermarks can always be embedded
that only use the hash of the model as the verification. On
                                                                        unless one proves that such redundancy has been depleted,
the other hand, it is hard and much involved to modify the
                                                                        which is generally impossible. A concrete requirement is: the
internal computational graph of a model. It is harder to adopt
                                                                        insertion of a new watermark should not erase the previous
model extraction or distillation that demands much data and
                                                                        watermarks.
computation [23, 40], yet risks performance and the ability of
generalization. Assume that the DNN model M is designed                    For a model with multiple watermarks, it is necessary that
to fulfil a primary task, Tprimary , with dataset Dprimary , data       an an incontrovertible time-stamp is included into ownership
space X , label space Y and a metric d on Y .                           verification to break this redeclaration dilemma.

                                                                        2.1.5   Ownership piracy
2.1     Threat Model
                                                                        Even without tuning the parameters, model theft is still pos-
We consider five major threats to the DNN watermarks.                   sible. Similar to [29], we define ownership piracy as attacks
                                                                        by which the adversary claims ownership over a DNN model
                                                                        without tuning its parameters or training extra learning mod-
2.1.1    Model tuning                                                   ules. For zero-bit watermarking schemes (no secret key is in-
                                                                        volved, the security depends on the secrecy of the algorithm),
An adversary can tune M by methods including: (1) FT: run-
                                                                        the adversary can claim ownership by publishing a copy of the
ning backpropagation on a local dataset, (2) NP: cut out links
                                                                        scheme. For a backdoor-based watermarking scheme that is
in M that are less important, and (3) FP: pruning unneces-
                                                                        not carefully designed, the adversary can detect the backdoor
sary neurons in M and fine-tuning M. The adversary’s local
                                                                        and claim that the backdoor as its watermark.
dataset is usually much smaller than the original training
                                                                           The secure watermarking schemes usually make use of
dataset for M and fewer epochs are needed. FT and NP can
                                                                        cryptological protocols [27, 60]. In these schemes, the adver-
compromise watermarking methods that encode information
                                                                        sary is almost impossible to pretend to be the host using any
into M’s weight in a reversible way [16]. Meanwhile, [31]
                                                                        probabilistic machine that terminates within time complexity
suggested that FP can efficiently eliminate backdoors from
                                                                        polynomial to the security parameters (PPT).
image classification models and watermarks within.

                                                                        2.2     Formulating the Watermarking Scheme
2.1.2    Watermark detection
                                                                        We define a watermarking scheme with security parameters N
If the adversary can distinguish a watermarked model from a             as a probabilistic algorithm WM that maps Tprimary (the descrip-
clean one, then the watermark is of less use since the adversary        tion of the task, together with the training dataset Dprimary ), a
can use the clean models and escape copyright regulation. The           description of the structure of the DNN model M and a secret
adversary can adopt backdoor screening methods [49, 50, 56]             key denoted by key to a pair (MWM , verify):
or reverse engineering [5, 20] to detect and possibly eliminate
                                                                                                                              
backdoor-based watermarks. For weight-based watermarks,                         WM : (MWM , verify) ← N, Tprimary , M , key ,
the host has to ensure that the weights of a watermarked
model do not deviate from that of a clean model too much.               where MWM is the watermarked DNN model and verify
Otherwise, the property inference attack [15] can distinguish           is a probabilistic algorithm with binary output for verifying
two models.                                                             ownership. To verify the ownership, the host provides verify

                                                                    3
and key. A watermarking scheme should satisfy the following             2.3.3   Security against watermark detection
basic requirements for correctness:
                                                                        According to [52], one definition for the security against wa-
                 Pr {verify(MWM , key) = 1} ≥ 1 − ε,          (1)       termark detection is: no PPT can distinguish a watermarked
                                                                        model from a clean one with nonnegligible probability. Al-
                                                                        though this definition is impractical due to the lack of a uni-
  PrM0 irrelevent to MWM , verify(M 0 , key0 ) = 0 ≥ 1 − ε,
                          
                                                              (2)       versal backdoor detector, it is crucial that the watermark does
                                                                        not differentiate a watermarked model from a clean model
      or key0 6= key
                                                                        too much. Moreover, the host should be able to control the
where ε ∈ (0, 1) reflects the security level. Condition (1) sug-        level of this difference by tuning the watermarking method.
gests that the verifier should always correctly identify the            Let θ be a parameter within WM that regulates such difference,
authorship while (2) suggests that it only accepts the correct          it is desirable that
key as the proof and it should not mistake irrelevant models
                                                                                                    ∞
as the host’s.                                                                                     MWM = Mclean ,                     (6)
   The original model trained without being watermarked                        ∞ is the model returned from WM with θ → ∞.
is denoted by Mclean . Some researchers [16] define WM as a             where MWM
mapping from (N, Mclean , key) to (MWM , verify), which is
a subclass of our definition.                                           2.3.4   Privacy-preserving
                                                                        To protect the host’s privacy, it is sufficient that any adversary
2.3     Security Requirements                                           cannot distinguish between two models watermarked with dif-
Having examined the toolkit of the adversary, we formally               ferent keys. Fixing the primary task Tprimary and the structure
define the security requirements for a watermarking scheme.             of the model M , we first introduce an experiment ExpdetectA    in
                                                                        which an adversary A tries to identify the host of a model:
2.3.1    Functionality-preserving
                                                                        Algorithm 1 Expdetect
                                                                                        A     .
The watermarked model should perform slightly worse than,               Require: N, WM, key0 6= key1 .
if not as well as, the clean model. The definition for this              1: Randomly select b ← {0, 1};
property is:                                                             2: Generate MWM from WM(N, Tprimary , M , keyb );
   Pr(x,y)∼Tprimary {d(Mclean (x), MWM (x)) ≤ δ} ≥ 1 − ε,     (3)        3: A is given MWM , N, WM, key0 , key1 and outputs b̂.
                                                                         4: A wins the experiment if b̂ = b.
which can be examined a posteriori. However, it is hard to
explicitly incorporate this definition into the watermarking
scheme. Instead, we resort to the following definition:                 Definition 1. If for all PPT adversary A , the probability that
                   ∀x ∈ X , d(Mclean (x), MWM (x)) ≤ δ.       (4)       A wins Expdetect
                                                                                     A    is upper bounded by 12 + ε(N), where ε is a
                                                                        negligible function, then WM is privacy-preserving.
Although it is stronger than (3), (4) is a tractable definition.
We only have to ensure that the parameters of MWM does not                 The intuition behind this definition is: an adversary cannot
deviate from those of Mclean too much.                                  identify the host from the model, even if the number of can-
                                                                        didates has been reduced to two. Almost all backdoor-based
                                                                        watermarking schemes are insecure under this definition. In
2.3.2    Security against tuning
                                                                        order to protect privacy, it is crucial that WM be a probabilistic
After being tuned with the adversary’s dataset Dadversary , the         algorithm and verify depend on key.
model’s parameters shift and the verification accuracy of the
                                           tuning
watermark might decline. Let M 0 ←−−−−− MWM denotes a                   2.3.5   Security against watermark overwriting
                                          Dadversary
model M 0 obtained by tuning MWM with Dadversary . A water-             Assume that the adversary has watermarked MWM with an-
marking scheme is secure against tuning iff:                            other secret key keyadv using a subprocess of WM and ob-
                                                                                                 overwriting
                                                                        tained Madv : Madv ←−−−−−− MWM . The overwriting water-
                     verify(M 0 , key) = 1 ≥ 1 − ε.
                   
   Pr Dadversary ,                                     (5)                                         keyadv
          0     tuning                                                  mark should not invalid the original one, formally, for any
        M ←−−−−−− MWM
              Dadversary                                                legal keyadv :
To meet (5), the host has to simulate the effects of tuning and
                                                                          Prkeyadv ,             {verify(Madv , key) = 1} ≥ 1 − ε. (7)
make verify(·, key) insensitive to them in the neighbour of
                                                                                   overwriting
MWM .                                                                        Madv ←−−−−−− MWM
                                                                                       keyadv

                                                                    4
Table 1: Security requirements and established watermarking schemes.

                               Zhu.     Adi.    Le Merrer.        Zhang.     Davish.       Li.       Li.      Uchida.     Guan.
   Security requirement                                                                                                              Ours.
                               [60].    [3].       [26].           [59].      [12].       [27].     [29].      [48].       [16].
 Functionality-preserving.       E        E           P             E            P          E         E           P          E         P
  Security against tuning.       N        E           E             E            E          E         N           E          N         P
     Security against
                                 N        N           N             N            N          P         P           N          N         P
   watermark detection.
    Privacy-preserving.          N        N           N             N            N          N         N           N          N         P
     Security against
                                 N        E           N             N            E          E         N           N          N         E
  watermark overwriting.
      Security against
                               III      II         I           I         II         III      II        I         III     III
     ownership piracy.
 P means the security requirement is claimed to be held by proof or proper regularizers. E means an empirical evaluation
 on the security was provided. N means not discussion was given or insecure.

During which the randomness in choosing keyadv , generat-               convince a third-party about ownership. Using a watermark-
ing Madv , and computing verify is integrated out. A water-             ing scheme of level III, a host can prove to any third-party the
marking scheme meets (7) is defined to be secure against                model’s possessor. This is the only case that the watermark
watermark overwriting. This property is usually examined                has forensics value.
empirically in the literature [3, 12, 27].                                 The scheme in [26] is a zero-bit watermarking scheme.
                                                                        The method proposed by Zhang et al. in [59] adopts marked
2.3.6   Security against ownership piracy                               images or noise as the backdoor triggers. But only a few
                                                                        marks that are easily forgeable were examined. The protocol
In an ownership piracy attack, the adversary pirate a model by          of Uchida et al. [48] can be enhanced into level III secure
recovering key and forging verify through querying MWM                  against ownership piracy only if an authority is responsible
(or verify if available). We define three levels of security            for distributing the secret key, e.g. [55]. But it lacks covertness
according to the efforts needed to pirate a model.                      and the privacy-preserving property.
                                                                           The VAE adopted in [29] has to be used conjugately with
  1. Level I: The adversary only needs to wrap MWM or query             a secret key that enhances the robustness of the backdoor.
     it for a constant number of times. All zero-bit watermark-         The adversary can collect a set of mistaken samples from one
     ing schemes belong to this level.                                  class, slightly disturb them, and claim to have watermarked
                                                                        the neural network. To claim the ownership of a model water-
  2. Level II: The adversary has to query MWM for a number
                                                                        marked by Adi et al. [3], the adversary samples its collection
     of times that is a polynomial function of the security
                                                                        of triggers from the mistaken samples, encrypts them with
     parameter. The more the adversary queries, the more
                                                                        a key, and submits the encrypted pairs. The perfect security
     likely it is going to succeed in pretending to be the host.
                                                                        of their scheme depends on the model to perform nearly per-
     The key and verify, in this case, is generally simple.
                                                                        fect in the primary task, which is unrealistic in practice. As
     For example, [3, 12] are of this level of security.
                                                                        for DeepSigns [12], one adversary can choose one class and
  3. Level III: The adversary is almost impossible to pirate            compute the empirical mean of the output of the activation
     ownership of the model given queries of times that is              functions (since the outliers are easy to detect) then generate
     a polynomial function of the security parameter. Such              a random matrix as the mask and claim ownership.
     schemes usually borrow methods from cryptography to                   The scheme in [60] is of level III secure against ownership
     generate the pseudorandomness. Methods in [27, 60] are             piracy as proved in the original paper. So is the method in [27]
     examples of this level.                                            since it is generally hard to guess the actual pattern of the
                                                                        Wonder Filter mask from a space with size 2P , where P is
Watermarking schemes of level I and II can be adopted as theft          the number of pixels of the mask. The scheme by Guan et
detectors. But the host can hardly adopt a level I/II scheme to         al. in [16] is secure but extremely fragile, hence is out of the

                                                                   5
MWM

                     ···

                                                                                                               Predictions
                Dprimary                                                                         cp                for
                                                                                                                Tprimary .

                                                                           ······
                     ···

                  key
       key       DWM                                                                                           Predictions
                                                                                               cWM                 for
                                                                                                                  TWM .

                                                                     fWM

Figure 1: Architecture of the MTL-based watermarking scheme. The orange blocks are the backbone, the pink block is the
backend for Tprimary , the blue block is the classifier for TWM .

scope of practical watermarking schemes.                               To better handle the forensic difficulties involving over-
  A comprehensive summary of established watermarking                written watermark and key management, we introduce a de-
schemes judged according to the enumerated security require-         centralized consensus protocol to authorize the time stamp
ments is given in Table 1.                                           embedded with the watermarks.

3     The Proposed Method                                            3.2     Overview
                                                                     The proposed model consists of the MTL-based watermarking
3.1    Motivation                                                    scheme and the decentralized verification protocol.
It is difficult for the backdoor-based or weight-based water-
marking methods to formally meet all the proposed security           3.2.1   The MLT-based watermarking scheme
requirements. Hence, we design a new white-box watermark-
ing method for DNN model protection using multiple task              The structure of our watermarking scheme is illustrated in
learning. The watermark embedding is designed as an addi-            Fig.1. The entire network consists of the backbone network
tional task TWM . A classifier for TWM is built independent to       and two independent backends: cp and cWM . The published
the backend for Tprimary . After training and watermark embed-       model MWM is the backbone followed by cp . While fWM is
ding, only the network structure for Tprimary is published.          the watermarking branch for the watermarking task, in which
   Reverse engineering or backdoor detection as [49] cannot          cWM takes the output of different layers from the backbone
find any evidence of the watermark. Since no trigger is em-          as its input. By having cWM monitor the outputs of different
bedded in the published model’s backend. On the other hand,          layers of the backbone network, it is harder for an adversary
common FT methods such as fine-tune last layer (FTLL) or             to design modifications to invalid cWM completely.
re-train last layers (RTLL) [3] that only modifies the backend          To produce a watermarked model, a host should:
layers of the model have no impact to our watermark.                                                         key
                                                                      1. Generate a collection of N samples DWM  = {xi , yi }Ni=1
   Under this formulation, the functionality-preserving prop-            using a pseudo-random algorithm with key as the ran-
erty, the security against tuning, the security against wa-              dom seed.
termark detection and privacy-preserving can be formally
addressed. A decently designed TWM ensures the security               2. Optimize the entire DNN to jointly minimize the loss
                                                                               key
against ownership piracy as well, making the MTL-based                   on DWM    and Dprimary . During the optimization, a series
watermarking scheme a secure and sound option for model                  of regularizers are designed to meet the security require-
protection.                                                              ments enumerated in Section 2.

                                                                 6
3. Publishes MWM .                                                   Publish message from s by hash(cWM ). The last steps follow
                                                                       Section. 3.2.1. After finishing the verification, this client can
  To prove its ownership over a model M to a third-party:              broadcast its result as the proof for s’s ownership over the
  1. The host submits M, cWM and key.                                  model in lM .
                                key
  2. The third-party generates DWM  with key and combines
     cWM with M’s backbone to build a DNN for TWM .
                                                                       3.3      Security Analysis of the Watermark Task
                                                                       We now elaborate the design of the watermarking task TWM
  3. If the statistical test indicates that cWM with M’s back-
                                  key                                  and analyze its security. For simplicity, TWM is instantiated as
     bone performs well on DWM         then the third-party con-
                                                                       a binary classification task, i.e., the output of the watermarking
     firms the host’s ownership over M.                                                                               key
                                                                       branch has two channels. To generate DWM            , key is used as
                                                                       the seed of a pseudo-random generator (e.g., a stream cipher)
3.2.2   The decentralized verification protocol                        to generate πkey , a sequence of N different integers from the
To enhance the reliability of the ownership protection, it is          range [0, · · · , 2m − 1], and a binary string lkey of length N,
necessary to use a protocol to authorize the watermark of the          where m = 3dlog2 (N)e.
model’s host. Otherwise any adversary who has downloaded                  For each type of data space X , a deterministic and injec-
MWM can embed its watermark into it and pirate the model.              tive function is adopted to map each interger in πkey into an
   One option is to use an trusted key distribution center or          element in X . For example, when X is the image domain,
a timing agency, which is in charge of authorizing the time            the mapping could be the QRcode encoder. When X is the
stamp of the hosts’ watermarks. However, such centralized              sequence of words in English, the mapping could map an
protocols are vulnerable and expensive. For this reason we             integer n into the n-th word of the dictionary.3 Without loss of
resort to decentralized consensus protocols such as Raft [37]          generality, let πkey [i] denotes the mapped data from the i-th
or PBFT [8], which were designed to synchronize message                integer in πkey . Both the pseudo-random generator and the
within a distributed community. Under these protocols, one             functions that map integers into specialized data space should
message from a user is responded and recorded by a majority            be accessible for all clients within the intellectual property
of clients within the community so this message becomes                protection community. Now we set:
authorized and unforgeable.                                                                key                                 N
                                                                                               = (πkey      key
                                                                                                      
   Concretely, a client s under this DNN watermarking proto-                              DWM      m [i], l     [i])           i=1
                                                                                                                                   ,
col is given a pair of public key and private key. s can publish
a watermarked model or claim its ownership over some model             where lkey [i] is the i-th bit of l. We now merge the security
by broadcasting:                                                       requirements raised in Section 2 into this framework.
   Publishing a model: After finishing training a model wa-
termarked withkey, s obtains MWM and cWM . Then s signs                3.3.1     The correctness
and broadcasts the following message to the entire commu-
                                                                       To verify the ownership of a model M to a host with key given
nity:
                                                                       cWM , the process verify operates as Algo. 2.
              hPublish:keyktimekhash(cWM )i,
where k denotes string concatenation, time is the time stamp,          Algorithm 2 verify(·, ·|cWM , γ)
and hash is a preimage resistant hash function mapping a               Require: M, key.
model into a string and is accessible for all clients. Other           Ensure: The verification of M’s ownership.
clients within the community verify this message using s’s              1: Build the watermarking branch f from M and cWM ;
public key, verify that time lies within a recent time window                          key
                                                                        2: Generate DWM from key;
and write this message into their memory. Once s is confirmed                                                                     key
                                                                        3: if f correctly classifies at least γ · N terms within DWM
that the majority of clients has recorded its broadcast (e.g.              then
when s receives a confirmation from the current leader under            4:    return 1.
the Raft protocol), it publishes MWM .                                  5: else
   Proving its ownership over a model M: s signs and broad-             6:    return 0.
casts the following message:                                            7: end if

                hClaim:lM khash(M)klcWM i,
                                                                         If M = MWM then M has been trained to minimize the bi-
where lM and lcWM are pointers to M and cWM . Upon receiv-             nary classification loss on TWM , hence the test is likely to
ing this request, any client can independently conduct the                 3 We suggest not to use function that encodes integers into terms that are
ownership proof. It firstly downloads the model from lM and            similar to data in Tprimary , especially to data of the same class. This increase
examines its hash. Then it downloads cWM and retrieves the             the difficulty for cWM to achieve perfect classification.

                                                                   7
succeed in Algo. 2, this justifies the requirement from (1). For             3.3.2   The functionality-preserving regularizer
an arbitrary key0 6= key, the induced watermark training data
   key0                              key                                     Denote the trainable parameters of the DNN model by w. The
DWM      can hardly be similar to DWM    . To formulate this intu-
                                                                             optimization target for Tprimary takes the form:
                                     key0
ition, consider the event where DWM shares q · N terms with
                                                                                                                          w
DWMkey
        , q ∈ (0, 1). With a pseudorandom generator, it is com-                L0 (w, Dprimary ) =        ∑           l (MWM (x), y) + λ0 · u(w),
putationally impossible to distinguish πkey from an sequence                                         (x,y)∈Dprimary
of N randomly selected intergers. The same argument holds                                                                                  (8)
for lkey and a random binary string of length N. Therefore                   where l is the loss defined by Tprimary and u(·) is a regularizer
the probability of this event can be upper bounded by:                       reflecting the prior knowledge on w. The normal training
                                                                             process computes the empirical loss in (8) by stochastically
                                                qN                     sampling batches and adopting gradient-based optimizers.
   N                                             r                              The proposed watermarking task adds an extra data depen-
       · rqN · (1 − r)(1−q)N ≤ (1 + (1 − q)N)            ,
  qN                                            1−r                          dent term to the loss function:

             N                                         1
                                                                               L (w, Dprimary , DWM ) = L0 (w, Dprimary )
where r =   2m+1
                 .   For an arbitrary q, let r <              then the
                  key    0           key
                                                   2+(1−q)N
                                                                                                        +λ·           ∑            w
                                                                                                                            lWM ( fWM (x), y) , (9)
probability that DWM  overlaps with DWM  with a portion of q                                                   (x,y)∈DWM
declines exponentially.
                                                                             where lWM is the cross entropy loss for binary classification.
   For numbers not appeared in πkey , the watermarking branch
                                                                             We omitted the dependency of DWM on key in this section
is expected to output a random guess. Therefore if q is smaller
                            key0                                             for conciseness.
than a threshold τ then DWM      can hardly pass the statistical
                                                                                To train multiple tasks, we can minimize the loss function
test in Algo.2 with n big enough. So let
                                                                             for multiple tasks (9) directly or train the watermarking task
                                                                             and the primary task alternatively [7]. Since DWM is much
                 m ≥ log2 [2N (2 + (1 − τ)N)]                                smaller than Dprimary , it is possible that TWM does not properly
                                                                             converge when being learned simultaneously with Tprimary .
and n be large enough would make an effective collision in the               Hence we first optimize w according to the loss on the primary
watermark dataset almost impossible. For simplicity, setting                 task (8) to obtain w0 :
m = 3 · dlog2 (N)e ≥ log2 (N 3 ) is sufficient.                                                              
                                                                                            w0 = arg min L0 (w, Dprimary ) .
   In cases MWM is replaced by an arbitrary model whose                                                  w

backbone structure happens to be consistent with cWM , the                      Next, instead of directly optimizing the network w.r.t. (9),
output of the watermarking branch remains a random guess.                    the following loss function is minimized:
This justifies the second requirement for correct verifica-
                                                                                                                                  w
tion (2).                                                                       L1 (w, Dprimary , DWM ) =         ∑        lWM ( fWM (x), y)
                                                                                                              (x,y)∈DWM                        (10)
   To select the threshold γ, assume that the random guess
strategy achieves an average accuracy of at most p = 0.5 + α,                                                 + λ1 · Rfunc (w),
where α ≥ 0 is a bias term which is assumed to decline with
the growth of n. The verification process returns 1 iff the                  where
watermark classifier achieves binary classification of accuracy                                 Rfunc (w) = kw − w0 k22 .                      (11)
no less than γ. The demand for security is that by randomly                  By introducing the regularizer Rfunc in (11), w is confined in
guessing, the probability that an adversary passes the test                  the neighbour of w0 . Given this constraint and the continuity
declines exponentially with n. Let X denotes the number                      of MWM as a function of w, we can expect the functionality-
of correct guessing with average accuracy p, an adversary                    preserving property defined in (4). Then the weaker version
suceeds only if X ≥ γ · N. By the Chernoff theorem:                          of functionality-preserving (3) is tractable as well.

                                                     !N
                                    1 − p + p · eλ                           3.3.3   The tuning regularizer
            Pr {X ≥ γ · N} ≤                              ,
                                         eγ·λ                                To be secure against adversary’s tuning, it is sufficient to make
                                                                             cWM robust against tuning by the definition in (5). Although
                                                                             Dadversary is unknown to the host, we assume that Dadversary
where λ is an arbitrary nonnegative number. If γ is
                                                  larger than
                                                     λ                      shares a similar distribution as Dprimary . Otherwise the stolen
p by a constant independent of N then 1−p+p·e
                                           eγ·λ
                                                   is less than              model would not have the state-of-the-art performance on the
1 with proper λ, reducing the probability of successful attack               adversary’s task. To simulate the influence of tuning, a subset
into negligibility.                                                          of Dprimary is firstly sampled as an estimation of Dadversary :

                                                                         8
0        sample
Dprimary ←−−−− Dprimary . Let w be the current configuration            for ownership verification. This justifies the security against
of the model’s parameter. Tuning is usually tantanmount to              watermark detection by the definition of (6), where λ1 casts
                                    0
minimizing the empirical loss on Dprimary by starting from              the role of θ.
                                                           tune
w, which results in an updated parameter: wt ←−
                                              0
                                                −−− w. In
                                                         Dprimary       3.3.5   Privacy-preserving
                                                          0
practice, wt is obtained by replacing Dprimary in (8) by Dprimary
                                                                        Recall the definition of privacy-preserving in Section 2.3.4.
and conducting a few rounds of gradient descents from w.
                                                                        We prove that, under certain configurations, the proposed
  To achieve the security against tuning defined in (5), it is
                                                                        watermarking method is privacy-preserving.
sufficient that the parameter w satisfies:
                                                                        Theorem 1. Let cWM take the form of a linear classifier whose
    0         sample                            tune
  ∀Dprimary ←−−−− Dprimary , wt ←−
                                 0
                                   −−− w, ∀(x, y) ∈ DWM ,               input dimensionality is L. If N ≤ (L + 1) then the watermark-
                                              Dprimary
                                                                        ing scheme is secure against assignment detection.
              wt
            fWM (x) = y.                                                Proof. The VC-dimension of a linear classifier with L chan-
                                                          (12)
                                                                        nels is (L +1). Therefore for N ≤ (L +1) inputs with arbitrary
The condition (12), Algo.1 together with the assumption that
                                                                        binary labels, there exists one cWM that can almost always
Dadversary is similar to Dprimary imply (5).                            perfectly classify them. Given M and an arbitrary key0 , it is
  To exert the constraint in (12) to the training process, we
                                                                        possible forge c0WM such that c0WM with M’s backbone per-
design a new regularizer as follows:                                                          key0
                                           t                          forms perfectly on DWM     . We only have to plug the parameters
    RDA (w) =                         l    f w
                                                (x), y  . (13)          of M into (14), set λ1 → ∞, λ2 = 0 and minimize the loss. This
                          ∑             W    WM
                                                                        step ends up with a watermarked model MWM = M and an
                    0          sample
                   Dprimary ←−−− Dprimary ,
                        tune
                                                                        evidence, c0WM , for key0 . Hence for the experiment defined
                   wt ←−−−−− w, (x, y) ∈ DWM
                       0
                      Dprimary                                          in Algo. 1, an adversary cannot identify the host’s key since
                                                                        evidence for both options are equally plausible. The adver-
Then the loss to be optimized is updated from (10) to:                  sary can only conduct a random guess, whose probability of
                                                                        success is 12 .
L2 (w, Dprimary , DWM ) = L1 (w, Dprimary , DWM )+λ2 ·RDA (w).
                                                           (14)            This theorem indicates that, the MTL-based watermarking
   RDA defined by (13) can be understood as one kind of data            scheme can protect the host’s privacy. Moreover, given N,
augmentation for TWM . Data augmentation aims to improve                it is crucial to increase the input dimensionality of cWM or
the model’s robustness against some specific perturbation in            using a sophiscated structure for cWM to increase its VC-
the input. This is done by proactively adding such perturbation         dimensionality.
to the training data. According to [45], data augmentation can
be formulated as an additional regularizer:                             3.3.6   Security against watermark overwriting
                                  l f w (x0 ), y .
                                                
                        ∑                                  (15)         It is possible to meet the definition of the security against
                                    perturb
                   (x,y)∈D ,x0    ←−−−−x                                watermark overwriting in (7) by adding the perturbation of
                                                                        embedding other secret keys into RDA . But this requires build-
   Unlike in the ordinary data domain of Tprimary , it is hard          ing other classifier structures and is expensive even for the
to explicitly define augmentation for TWM against tuning.               host. For an adversary with insufficient training data, it is
However, a regularizer with the form of (15) can be derived             common to freeze the weights in the backbone layers as in
from (13) by interchanging the order of summation so the                transfer learning [38], hence (7) is satisfied. For general cases,
perturbation takes the form:                                            an adversary would not disturb the backbone of the DNN too
                            t       perturb                           much for the sake of its functionality on the primary task.
                      w −1
              x0 ∈ [ fWM ]    w
                             fWM (x) ←−−−− x.                           Hence we expect the watermarking branch to remain valid
                                                                        after overwriting.
3.3.4   Security against watermark detection                               We leave the examination of the security against watermark
                                                                        overwriting as an empirical study.
Consider the extreme case where λ1 → ∞. Under this con-
figuration, the parameters of MWM are frozen and only the
                                                                        3.3.7   Security against ownership piracy
parameters in cWM are tuned. Therefore MWM is exactly the
same as Mclean and it seems that we have not insert any infor-          Recall that in ownership piracy, the adversary is not allowed
mation into the model. However, by broadcasting the designed            to train its own watermark classifier. Instead, it can only forge
message, the host can still prove that it has obtained the white-       a key given a model MWM and a legal cWM , this is possible
box access to the model at an early time, which fact is enough          if the adversary has participated in the proof for some other

                                                                    9
0/1

                                                              cWM

                                                                          ···
                                                                                                                       Sentimental
                                                                                                           cp             label
                                       LSTM             LSTM             ······      LSTM
                                        unit             unit                         unit

             A sentence from the                                     ······          over
              original dataset.         this             is

      A sentence with random words                                       ······
      generated by key by indexing.      done            torch                         when

                                      Figure 2: The network architecture for sentimental analysis.

client. Now the adversary is to find a new key keyadv such               watermarks [48, 55]. Embedding information into the outputs
       key
that DWMadv can pass the statistical test defined by the water-          of the network rather than its weights makes the MTL-based
marking branch MWM and cWM . Although it is easy to find a               watermark harder to erase. The adversary has to identify the
set of N intergers with half of them classified as 0 and half 1          decision boundary from cWM and tune M so samples drawn
by querying the watermarking branch as an oracle, it is hard             from key violates this boundary. This attack risks the model’s
to restore a legal keyadv from this set. The protocol should             performance on the primary task, requires huge amont of data
adopt a stream cipher secure against key recovery attack [42],           and computation resources and is beyond the competence of
which, by definition, blocks this sort of ownership piracy and           a model thief.
makes the proposed watermarking scheme of level III secure                  The remaining security risks are within the cryptological
against ownership piracy. If cWM is kept secret then the owner-          components and beyond the scope of our discussion.
ship piracy is impossible. Afterall, ownership piracy is invalid
when an authorized time stamp is avilable.
                                                                         4      Experiments and Discussions
3.4     Analysis of the Verification Protocol                            4.1      Experiment Setup
We now conduct the security analysis to the consensus proto-             To illustrate the flexibility of the proposed watermarking
cal and solve the redeclaration dilemma.                                 model, we considered four primary tasks: image classification
   To pirate a model under this protocol, an adversary must              (IC), malware classification (MC), image semantic segmen-
submit a legal key and the hash of a cWM . If the adversary              tation (SS) and sentimental analysis (SA) for English. We
does not have a legal cWM then this attack is impossible since           selected four datasets for image classification, one dataset for
the preimage resistance of hash implies that the adversary               malware classification, two datasets for semantic segmenta-
cannot forge such a watermark classifier afterwards. So this             tion and two datasets for sentimental classification. The de-
broadcast is invalid. If the adversary has managed to build a            scriptions of these datasets and the corresponding DNN struc-
legal cWM , compute its hash, but has not obtained the target            tures are listed in Table 2. ResNet [18] is a classical model for
model then the verification can hardly succeed since the out-            image processing. For the VirusShare dataset, we compiled a
put of cWM with the backbone of an unknown network on the                collection of 26,000 malware into images and adopted ResNet
watermark dataset is random guessing. The final case is that             as the classifier [11]. Cascade mask RCNN (CMRCNN) [6] is
the adversary has obtained the target model, conducted the               a network architecture specialized for semantic segmentation.
watermark overwriting and redeclared the ownership. Recall               Glove [39] is a pre-trained word embedding that maps En-
that the model is published only if its host has successfully            glish words into numerical vectors, while bidirectional long
broadcast its Publish message and notarized its time. Hence              short-term memory (Bi-LSTM) [21] is commonly used to
the overwriting dilemma can be solved by comparing the time              analyze natural languages.
stamp inside contradictive broadcasts.                                      For the first seven image datasets, cWM was a two-layer
   As an adaptive attack, one adversary participating in the             perceptron that took the outputs of the first three layers from
                                                                                                                                      key
proof of a host’s ownership over a model M obtains the corre-            the ResNet as input. QRcode was adopted to generate DWM          .
sponding key and cWM , with which it can erase weight-based              For the NLP datasets, the network took the structure in Fig. 2.

                                                                    10
FP, and (4) the decline of the performance of MW M on Tprimary
         Table 2: Datasets and their DNN structures.
                                                                           when NP made fWM ’s accuracy on TWM lower than γ. The
                                                                           first metric reflects the decline of a model’s performance after
        Dataset          Description         DNN structure
                                                                           being watermarked. The second and the third metrics measure
      MNIST [13]         IC, 10 classes         ResNet-18                  the watermark’s robustness against an adversary’s tuning. The
                                                                           last metric reflects the decrease of the model’s utility when an
       Fashion-
                         IC, 10 classes         ResNet-18                  adversary is determined to erase the watermark using NP. The
      MNIST [54]
                                                                           model for each dataset was trained by minimizing the MTL
   CIFAR-10 [25]         IC, 10 classes         ResNet-18                  loss defined by (14), where we adopted FT, NP and FP for
  CIFAR-100 [25]        IC, 100 classes         ResNet-18                  tuning and chose the optimal λ1 and λ2 by grid search. Then
                                                                           we attacked each model by FT with a smaller learning rate,
   VirusShare [1]       MC, 10 classes          ResNet-18                  FP [31] and NP. The results are collected in Table 3.
    Penn-Fudan                                 ResNet-50+                     We observe that by using Rfunc and RDA , it is possible to
                         SS, 2 classes                                     preserve the watermarked model’s performance on the pri-
  -Pedestrian [51]                             CMRCNN
                                                                           mary task and that on the watermarking task simultaneously.
                                               ResNet-50+                  Therefore we suggest that whenever possible, the two regular-
       VOC [2]          SS, 20 classes
                                               CMRCNN                      izers should be incorporated in training the model.
      IMDb [34]          SA, 2 classes      Glove+Bi-LSTM
       SST [46]          SA, 5 classes      Glove+Bi-LSTM                  4.3     Watermark Detection
                                                                           As an illustration of the security against watermark detec-
                                                                           tion, we illustrated the property inference attack [15]. The
   Throughout the experiments we set N = 600. To set the                   distributions of the parameters of a clean model, a model wa-
verification threshold γ in Algo. 2, we test the classification ac-        termarked by our method and one weight-based method [12]
curacy of fWM across nine datasets over 5,000 DWM s different              for CIFAR-10 are visualized in Fig. 4 and Fig. 5. In which
from the host’s. The result is visualized in Fig. 3, from which
we observed that almost all cases p fell in [0.425, 0.575]. We
selected γ = 0.7 so the probability of success piracy is less
than 2.69 × 10−8 with λ = 0.34 in the Chernoff bound.

                                                                           Figure 4: The difference between Mclean and a weight-based
                                                                           watermarked model [12].

          Figure 3: The empirical distribution of p.

  We conducted three tuning attacks: FT, NP, FP, and the
overwriting attack to the proposed watermarking framework.

4.2     Ablation Study
To examine the efficacy of Rfunc and RDA , we compared the
performance of the model under different combinations of
two regularizers. We are interested in four metrics: (1) the
performance of MWM on Tprimary , (2) the performance of fWM
on TWM after FT, (3) the performance of fWM on TWM after                         Figure 5: The difference between MWM and Mclean .

                                                                      11
Table 3: Ablation study on regularizer configuration. Each entry contains the four metrics in Section 4.2. Semantic segmentation
tasks were measured by average precision and these two models would not converge without Rfunc . The optimal/second optimal
configuration for each dataset and each metric are highlighted/underlined.

                               Mclean ’s                              Regularizer configuration
             Dataset
                             performance     No regularizers.           Rfunc            RDA            Rfunc and RDA
                                              98.7%,75.5%,       99.5%,81.5%,        99.3%,88.0%,       99.5%,95.5%,
              MNIST             99.6%
                                               85.0%,1.3%         85.5%,0.7%          92.0%,2.0%         92.5%,2.3%
             Fashion-                         92.0%,85.0%,       92.8%,92.0%,        91.4%,96.5%,       93.1%,95.5%,
                                93.3%
             MNIST                             60.5%,9.6%        74.5%,11.6%         85.5%,54.6%        86.0%,54.9%
                                              88.3%,91.5%,       90.8%,88.5%,        88.8%,96.0%,       88.8%,92.5%,
             CIFAR-10           91.5%
                                              74.5%,19.8%        79.5%,21.0%         95.0%,56.0%        91.5%,53.0%
                                              59.9%,90.5%,       65.4%,90.0%,        58.7%,99.0%,       63.8%,92.5%,
           CIFAR-100            67.7%
                                              88.0%,23.6%        87.0%,23.3%         98.0%,35.0%        97.0%,33.3%
                                              97.0%,65.0%,       97.2%,81.5%,        96.8%,99.5%,       97.3%,100%,
           VirusShare           97.4%
                                               88.0%,6.4%         86.5%,6.5%          100%,9.4%         100%,19.5%
          Penn-Fudan-                                                0.79,90.0%,                         0.78,100%,
                                0.79                —                                      —
           Pedestrian                                                54.5%,0.70                          100%,0.78
                                                                     0.67,74.0%,                         0.69,100%,
               VOC              0.69                —                                      —
                                                                     98.0%,0.65                          100%,0.68
                                              67.3%,66.8%,       85.0%,66.0%,        69.2%,81.3%,       85.0%,80.0%,
               IMDb             85.0%
                                              83.5%,12.0%        86.3%,12.2%         88.3%,29.5%        90.8%,30.5%
                                              71%,77.3%,         75.4%,62.5%,        70.8%,90.5%,       75.4%,86.8%,
               SST              75.4%
                                              95.8%,12.5%        95.0%,13.0%         98.3%,29.4%        99.0%,31.9%

                                                                     we adopted λ1 = 0.05. Unlike the weight-based watermark-
Table 4: Fluctuation of the accuracy of the host’s watermark-        ing method analyzed in [15], our method did not result in a
ing branch.                                                          significant difference between the distributions of parameters
                                                                     of the two models. Hence an adversary can hardly distinguish
                        Number of overwriting epochs                 a model watermarked by the MTL-based method from a clean
     Dataset                                                         one.
                        50     150      250       350
     MNIST            1.0%     1.5%      1.5%      2.0%
    Fashion-                                                         4.4    The Overwriting Attack
                      2.0%     2.5%      2.5%      2.5%
    MNIST
                                                                     After adopting both regularizers, we performed overwriting
    CIFAR-10          4.5%     4.5%      4.5%      4.5%              attack to models for all nine tasks, where each model was
   CIFAR-100          0.0%     0.5%      0.9%      0.9%              embedded different keys. In all cases the adversary’s water-
                                                                     mark could be successfully embedded into the model, as what
   VirusShare         0.0%     0.5%      0.5%      0.5%              we have predicted. The metric is the fluctuation of the water-
  Penn-Fudan-                                                        marking branch on the watermarking task after overwriting, as
                      0.5%     1.0%      1.0%      1.0%
   Pedestrian                                                        indicated by (7). We recorded the fluctuation for the accuracy
                                                                     of the watermarking branch with the overwriting epoches.
      VOC             1.3%     2.0%      2.1%      2.1%
                                                                     The results are collected in Table 4.
      IMDb            3.0%     3.0%      3.0%      3.0%                 The impact of watermark overwriting is uniformly bounded
       SST            2.5%     3.0%      3.0%      2.5%              by 4.5% in our settings. And the accuracy of the watermark-
                                                                     ing branch remained above the threshold γ = 0.7. Combined
                                                                     with Table 3, we conclude that the MTL-based watermarking

                                                                12
Table 5: The comparision between our method and [27, 60] with respect to: (1) the model’s performance on the primary task, (2)
the accuracy of the watermarking task/backdoor after FP, (3) the decline of the model’s accuracy on the primary task when NP
erase the watermark. The optimal method for each dataset with respect to each metric is highlighted.

                                Ours, Rfunc and RDA               Li et al. [27]                     Zhu et al. [60]
             Dataset
                            Primary      FP       NP        Primary     FP             NP      Primary     FP        NP
             MNIST           99.5%      92.5%     2.2%          99.0%       14.5%     0.9%      98.8%      7.0%      1.4%
        Fashion-MNIST        93.1%      86.0%     54.7%         92.5%       13.5%     17.5%     91.8%     11.3%      5.8%
           CIFAR-10          88.8%      91.5%     50.3%         88.5%       14.5%     13.6%     85.0%     10.0%     17.1%
           CIFAR-100         63.8%      97.0%     29.4%         63.6%        1.2%     5.5%      65.7%      0.8%      0.9%
           VirusShare        97.3%      100%      9.6%          95.1%        8.8%     1.5%      96.3%      9.5%      1.1%

method is secure against watermark overwriting.                           2. So far, backdoor-based watermarking methods can only
                                                                             be applied to image classification DNNs. This fact chal-
                                                                             lenges the generality of backdoor-based watermark.
4.5   Comparision and Discussion
We implemented the watermarking methods in [60] and [27],                 3. It is hard to design adaptive backdoor against specific
which are both backdoor-based method of level III secure                     screening algorithms. However, the MTL-based water-
against ownership piracy. We randomly generated 600 trig-                    mark can easily adapt to new tuning operators. This can
ger samples for [60] and assigned them with proper labels.                   be done by incorporating such tuning operator into RDA .
For [27], we randomly selected Wonder Filter patterns and
exerted them onto 600 randomly sampled images.
                                                                      5     Conclusion
   As a comparison, we list the performance of their water-
marked models on the primary task, the verification accuracy          This paper presents a MTL-based DNN watermarking model
of their backdoors after FP, whose damage to backdoors is             for ownership verification. We summarize the basic security
larger than FT, and the decline of the performance of the             requirements for DNN watermark formally and raise the pri-
watermarked models when NP was adopted to invalid the                 vacy concern. Then we propose to embed watermark as an
backdoors (when the accuracy of the backdoor triggers is              additional task parallel to the primary task. The proposed
under 15%) in Table. 5. We used the ResNet-18 DNN for                 scheme explicitly meets various security requirements by us-
all experiments and conducted experiments for the image               ing corresponding regularizers. Those regularizers and the
classifications, since otherwise the backdoor is undefined.           design of the watermarking task grant the MTL-based DNN
   We observe that for all metrics, our method achieved the           watermarking scheme tractable security. With a decentralized
optimal performance, this is due to:                                  consensus protocol, the entire framework is secure against all
 1. Extra regularizers are adopted to explicitly meet the se-         possible attacks.
    curity requirements.                                                 We are looking forward to using cryptological protocols
                                                                      such as zero-knowledge proof to improve the ownership ver-
 2. The MTL-based watermark does not incorporate back-                ification process so it is possible to use one secret key for
    doors into the model, so adversarial modifications such           multiple notarizations.
    as FP, which are designed to eliminate backdoor, can
    hardly reduce our watermark.
                                                                      Acknowledgments
 3. The MTL-based watermark relies on an extra module,
    cWM , as a verifier. As an adversary cannot tamper with           This work receives support from anonymous reviewers.
    this module, universal tunings as NP have less impact.

  Apart from these metrics, our proposal is better than other         Availability
backdoor-based DNN watermarking methods since:
                                                                      Materials of this paper, including source code and part of
 1. Backdoor-based watermarking methods are not privacy-              the dataset, are available at http://github.com/a_new_
    preserving.                                                       account/xxx.

                                                                 13
References                                                            [13] Li Deng. The mnist database of handwritten digit images
                                                                           for machine learning research [best of the web]. IEEE
 [1] Virusshare dataset. https://virusshare.com/.                          Signal Processing Magazine, 29(6):141–142, 2012.

 [2] Voc dataset.   http://host.robots.ox.ac.uk/                      [14] Mohamed Elhoseny, Mahmoud Mohamed Selim, and
     pascal/VOC/voc2012/.                                                  K Shankar. Optimal deep learning based convolution
                                                                           neural network for digital forensics face sketch synthe-
 [3] Yossi Adi, Carsten Baum, Moustapha Cisse, Benny                       sis in internet of things (iot). International Journal of
     Pinkas, and Joseph Keshet. Turning your weakness                      Machine Learning and Cybernetics, pages 1–12, 2020.
     into a strength: Watermarking deep neural networks by
     backdooring. In 27th {USENIX} Security Symposium                 [15] Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and
     ({USENIX} Security 18), pages 1615–1631, 2018.                        Nikita Borisov. Property inference attacks on fully con-
                                                                           nected neural networks using permutation invariant rep-
 [4] M. Arnold, M. Schmucker, and S. Wolthusen. Digital                    resentations. In Proceedings of the 2018 ACM SIGSAC
     Watermarking and Content Protection: Techniques and                   Conference on Computer and Communications Security,
     Applications. 2003.                                                   pages 619–633, 2018.
 [5] Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan           [16] Xiquan Guan, Huamin Feng, Weiming Zhang, Hang
     Picek. {CSI}{NN}: Reverse engineering of neural net-                  Zhou, Jie Zhang, and Nenghai Yu. Reversible water-
     work architectures through electromagnetic side channel.              marking in deep convolutional neural networks for in-
     In 28th {USENIX} Security Symposium ({USENIX} Se-                     tegrity authentication. In Proceedings of the 28th ACM
     curity 19), pages 515–532, 2019.                                      International Conference on Multimedia, pages 2273–
                                                                           2280, 2020.
 [6] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn:
     Delving into high quality object detection. In Proceed-          [17] Jian Guo, He He, Tong He, Leonard Lausen, Mu Li,
     ings of the IEEE conference on computer vision and                    Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan
     pattern recognition, pages 6154–6162, 2018.                           Xie, Sheng Zha, et al. Gluoncv and gluonnlp: Deep learn-
                                                                           ing in computer vision and natural language process-
 [7] Rich Caruana. Multitask learning. Machine learning,                   ing. Journal of Machine Learning Research, 21(23):1–7,
     28(1):41–75, 1997.                                                    2020.
 [8] Miguel Castro, Barbara Liskov, et al. Practical byzantine        [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
     fault tolerance. In OSDI, volume 99, pages 173–186,                   Sun. Deep residual learning for image recognition. In
     1999.                                                                 Proceedings of the IEEE conference on computer vision
                                                                           and pattern recognition, pages 770–778, 2016.
 [9] Xinyun Chen, Wenxiao Wang, Chris Bender, Yim-
     ing Ding, Ruoxi Jia, Bo Li, and Dawn Song. Re-                   [19] Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang,
     fit: a unified watermark removal framework for deep                   Junyuan Xie, and Mu Li. Bag of tricks for image clas-
     learning systems with limited data. arXiv preprint                    sification with convolutional neural networks. In Pro-
     arXiv:1911.07205, 2019.                                               ceedings of the IEEE Conference on Computer Vision
                                                                           and Pattern Recognition, pages 558–567, 2019.
[10] Jen-Tzung Chien. Deep bayesian natural language pro-
     cessing. In Proceedings of the 57th Annual Meeting of            [20] Weizhe Hua, Zhiru Zhang, and G Edward Suh.
     the Association for Computational Linguistics: Tutorial               Reverse engineering convolutional neural networks
     Abstracts, pages 25–30, 2019.                                         through side-channel information leaks. In 2018
                                                                           55th ACM/ESDA/IEEE Design Automation Conference
[11] Qianfeng Chu, Gongshen Liu, and Xinyu Zhu. Visu-                      (DAC), pages 1–6. IEEE, 2018.
     alization feature and cnn based homology classifica-
     tion of malicious code. Chinese Journal of Electronics,          [21] Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional
     29(1):154–160, 2020.                                                  lstm-crf models for sequence tagging. arXiv preprint
                                                                           arXiv:1508.01991, 2015.
[12] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushan-
     far. Deepsigns: an end-to-end watermarking framework             [22] Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-
     for ownership protection of deep neural networks. In                  task learning using uncertainty to weigh losses for scene
     Proceedings of the Twenty-Fourth International Confer-                geometry and semantics. In Proceedings of the IEEE
     ence on Architectural Support for Programming Lan-                    conference on computer vision and pattern recognition,
     guages and Operating Systems, pages 485–497, 2019.                    pages 7482–7491, 2018.

                                                                 14
You can also read