Facial Manipulation Detection Based on the Color Distribution Analysis in Edge Region

Page created by Sue Cannon
 
CONTINUE READING
Facial Manipulation Detection Based on the Color Distribution Analysis in Edge Region
Facial Manipulation Detection
                                                          Based on the Color Distribution Analysis in Edge Region

                                                      Dong-Keon Kim                           Donghee Kim                        Kwangsu Kim
                                                  SungKyunKwan University                      Hippo T&C                    SungKyunKwan University
                                                       kdk1996@skku.edu                    ym.dhkim@skku.edu                   kim.kwangsu@skku.edu
arXiv:2102.01381v1 [cs.CV] 2 Feb 2021

                                                                Abstract

                                            In this work, we present a generalized and robust facial
                                        manipulation detection method based on color distribution
                                        analysis of the vertical region of edge in a manipulated im-
                                        age. Most of the contemporary facial manipulation method
                                        involves pixel correction procedures for reducing awkward-
                                        ness of pixel value differences along the facial boundary in
                                        a synthesized image. For this procedure, there are distinc-
                                        tive differences in the facial boundary between face manipu-
                                        lated image and unforged natural image. Also, in the forged
                                        image, there should be distinctive and unnatural features
                                        in the gap distribution between facial boundary and back-          Figure 1. Summarized steps of face manipulation procedure. Gen-
                                        ground edge region because it tends to damage the natural          erated image is created from source face image and generative
                                        effect of lighting. We design the neural network for detect-       model. For pasting generated image to target image realistically,
                                                                                                           post-processing is involved after image manipulation
                                        ing face-manipulated image with these distinctive features
                                        in facial boundary and background edge. Our extensive ex-
                                        periments show that our method outperforms other existing
                                        face manipulation detection methods on detecting synthe-           rithms [23, 2, 38] or focusing on designing artificial neural
                                        sized face image in various datasets regardless of whether         network architectures [25, 1, 31] with referring whole im-
                                        it has participated in training.                                   age rather than paying attention to face manipulation proce-
                                                                                                           dure. Their own methods work well with specific datasets
                                                                                                           which they analyzed prior to design detection methods
                                                                                                           while they do not work well when it comes to classify syn-
                                        1. Introduction                                                    thesized video without prior information. Focusing on spe-
                                            With rapid development of AI-based generative model            cific manipulation algorithms or referring whole image fea-
                                        [16, 29, 11], facial manipulation methods also improved in         tures is inefficient method when it comes to real-world sce-
                                        terms of generating realistic fake face image and changing         nario where we determine the authenticity of an arbitrary
                                        one’s face identity to others. As face synthesizing algo-          image without any prior information about the source data
                                        rithms become more and more sophisticated, abuse cases             and manipulation procedures.
                                        and social issues derived from forged image have recently              Most of face synthesizing algorithms nowadays com-
                                        emerged. These realistically synthesized video for mali-           monly involve face boundary post-processing after face al-
                                        cious purpose already becomes threat in terms of truth issue       tering from one image to other image to reduce awkward-
                                        of public media, individual privacy [37], e.g., politician’s       ness of pixel distribution difference between original image
                                        fake news, celebrity pornography. For these reasons, de-           and altered face image [19], as shown in Figure 1. Recent
                                        tecting facial manipulated image effectively is crucial and        studies [18, 19, 33] pay attention to these shared procedures
                                        important issue in computer vision field.                          of face manipulation algorithms for detecting the synthe-
                                            Existing works suggested their method with only fo-            sized image considering its generalization ability with other
                                        cusing features that appear on the synthesized face image          various datasets. Involved post-process procedure with face
                                        in datasets which generated by specific manipulation algo-         retouching application or generative model damages the

                                                                                                       1
Facial Manipulation Detection Based on the Color Distribution Analysis in Edge Region
unique fingerprint in one image and makes the pixel dis-              search about detecting AI-generated image, existing studies
tribution difference along face boundary area between post-           [39, 22, 24, 17] show that there is a difference in color dis-
processed manipulated image and unforged natural image.               tribution between the real image and entirely-created fake
    Our work pays attention to extracting pixel value dif-            image made by AI-based generative model. There is a lim-
ference features along boundary area for detecting face               itation of generative model that the image totally created
manipulated image. With referring existing edge detec-                leaves their own artificial imprints. AI-based generative
tor algorithms [27, 3], we gather set of vertical line of fa-         model cannot perfectly reproduce original fingerprints of
cial boundary and compute corresponding pixel difference              authentic natural image which made by intrinsic properties
along vertical lines. If a facial image is altered and gets           of hardware such as surface of camera lens and the direc-
post-processing procedure to smooth the facial boundary,              tion of the light entering the camera. However, the problem
there will be little difference in pixel values nearby face           of detecting manipulated images in which only the face re-
while unforged natural image has relatively large difference          gion is altered must be approached slightly differently from
in pixel values nearby face because the pixel on the facial           the studies of determining authenticity of entirely generated
boundary were not modified.                                           images.
    Additionally, we gather boundary pixel differences along          AI-Based Face Manipulation Detection. With rapidly
perpendicular line of background boundary. To analyze the             progressed generative models and synthesizing techniques,
difference with the feature extracted from the face bound-            there have been a lot of malicious usage such as realistically
ary, the pixel difference values in the direction vertical to         synthesizing one’s face to other’s body. For arising threats
the background boundary are obtained in a similar method              of these image forgery in terms of privacy and trust issues,
to facial feature extraction procedure. The difference be-            there have been many studies [8, 21, 28] to detect face ma-
tween the features of the extracted background boundary               nipulated images. For instance, Matern et al. [23] present
and the features of the face boundary will be significantly           detection method based on visual artifacts, especially eyes
different between the unforged natural image and the face             and noses. Yang et al. [38] suggested a method of detect-
manipulated image.                                                    ing face manipulation images through the inconsistency of
    For classifying gathered features with outstanding per-           the head pose. While these works pay attention to figure
formance, we use one-dimensional convolutional neural                 out some awkwardness features of synthesized image, re-
network which is used for extracting distinctive features             cent works [25, 1, 31, 40], focus on detecting with well-
in sequential data [12] because the gathered features are             designed state-of-the-arts artificial neural networks, such as
computed as pixel difference data which are ordered ac-               recurrent neural network and capsule network.
cording to perpendicular direction of boundary of face and                All of these proposed works mainly focused on features
background. Rather than developing a complex and heavy                appearing in synthesized image or designing the model ar-
classification model, we developed our own model that is              chitecture with synthesized image itself in specific datasets.
lightweight and can classify with very good performance               Their works demonstrated only specific algorithms with
through the features we have extracted. We show our                   limited source datasets while verifying with extensive ex-
method with suggested classification model outperforms                periment including real-world scenario for detecting unseen
other existing manipulation detection in terms of model               data with no prior information has failed.
training stability and universality to other various datasets.
    The contribution of this paper to the field of face manipu-       Generalized Face Manipulation Detection. As the need
lation detection is its outstanding performance and general-          for a methodology that can be better applied to real situa-
ization regardless of dataset and generation algorithms. Our          tions is increasing, the latest studies [4, 8] begin to focus on
proposed method uses the pixel difference values as features          the characteristics that appear in the face image synthesis
and additionally uses information about the background of             process rather than to the synthesized image result. Espe-
the image, so it does not bias toward specific source data or         cially, Li et al. [19] pay attention to common procedure
specific generation algorithms and performs with remark-              of face manipulation which include post-processing with
ably high detection accuracy regardless of face manipula-             boundary smoothing. Face X-ray [18], which is most re-
tion datasets.                                                        cent work of generalized detection method focus on com-
                                                                      mon blending procedure in various face manipulation al-
2. Related Work                                                       gorithms and shows their generalization ability with un-
                                                                      seen datasets for real-world scenario. Their work perform
   In this section, we will briefly introduce several previous        well with FaceForensics++ dataset which has several sub-
image manipulation detection technologies from early stud-            datasets of different manipulation types with sharing source
ies of image forgery detection to the latest works which are          data, while experiences performance drop with totally dif-
related to our proposed method.                                       ferent datasets. Our study further derive higher performance
AI-generated image classification. In the beginning of re-            with more generality by extracting not only facial features,

                                                                  2
Facial Manipulation Detection Based on the Color Distribution Analysis in Edge Region
tion. In addition, we emphasize the difference between the
                                                                        unmanipulated image and the forged image by contrasting
                                                                        the difference between the original fingerprint appearing on
                                                                        the background which is not involved in the facial synthesis
                                                                        process and the fingerprint on the face. Taking all of these
                                                                        into account, we can detect face manipulated image based
                                                                        on information of entire image and facial area in terms of
                                                                        boundary pixel feature.
                                                                        Two-way feature extraction. We suggest two-way fea-
Figure 2. Brief illustration of edge window. Each point is spaced       ture extraction method for detailed information in terms of
equally above the line segments. Center point in window implies         two aspects, shared post-process procedure of face manip-
mid-point Mi,i+1 of 2 neighboring landmark points, Li and Li+1          ulation technology and boundary characteristics in whole
                                                                        image. One is facial boundary feature extraction which
                                                                        reflects pixel-level distributions by post-process procedure
but also extracting innate fingerprints of background edges             while the other is background boundary feature extraction
and comparing them with facial features.                                which contains pixel-level distribution along boundary in
                                                                        unforged region of image.
3. Methods
                                                                        3.1. Facial Boundary Feature Extraction
    In this section, we will explain the motivation of our
methods, brief reason of gathering boundary pixel features,                 With a frame image I extracted from the video, we detect
and detailed process of boundary feature extraction.                    face area in image with frontal face detector in dilb [13]. It
Motivation from Edge Detector Algorithm. We focus                       is assumed that only single face is appeared in the video.
on existing edge detector algorithms for defining boundary.             Then we extract 68 feature points of the detected face with
The edge detected by edge detector is the boundary point                68 face landmark shape predictor in dlib. Among those de-
where the change of intensity is large based on the gray-               tected landmarks, 17 points representing the outer boundary
scaled image [14]. Canny edge detector [3] which is the                 of the face are selected. These chosen landmark points are
most used edge detector nowadays preferentially compute                 used to define the facial boundary in image. The line seg-
1st order derivatives in the X and Y directions in given im-            ment Pi,i+1 perpendicular to the facial boundary line is de-
age by convolution with Sobel mask [32] before detecting                fined by vertical direction and midpoint of two neighboring
edges in image. The gradient G in which the pixel value                 landmark points Li and Li+1 in facial boundary points.
changes rapidly can be obtained with gathered 1st derivate                  Note that perpendicular line segment Pi,i+1 is paramet-
values in X and Y directions. Gradient G implies the per-               ric equation with parameter t which consists of 20 inte-
pendicular direction of the boundary appearing in the im-               ger parameter with range -10 to 9, from inside of face to
age. We focused on the process of edge detector finding the             outside of face. There are 16 perpendicular line segments
boundary in an image through gradient. Since the distribu-              which made from 17 face landmark points representing fa-
tion of pixel values of the face part in the image and the dis-         cial boundary. Each Pi,i+1 have 20 corresponding coordi-
tribution of pixel values of the non-face parts are different,          nate information along vertical direction of boundary line at
the pixel value will sharply change in a direction perpendic-           equal intervals.
ular to the boundary of the face.                                           To better reflect the perpendicular line segment in a spe-
Artificial Imprints Harming Natural Fingerprints. Early                 cific area of the face, we tuned line segment size with scale-
research in image forensic [10] mention that unforged nat-              factor, which is inversely proportional to the facial area. By
ural image has a unique fingerprint generated in the pro-               tuning with the scale-factor, the perpendicular line segment
cedure of image processing and hardware characteristics                 Pi,i+1 pass through a certain range of the facial boundary
such as camera lens. However, recent studies [39, 22] show              regardless of face size in image.
that forged images created by the generative models such                    For making window-shaped coordinate features, set of
as GAN has unique imprints from synthesizing procedure.                 horizontal line segments Vi,i+1 which are parallel to fa-
Regardless of generative model types, the face replacing                cial boundary line and go through the t-th point of gathered
process and the post-processing process to reduce the dif-              line segment Pi,i+1 is generated. Horizontal line segments
ference in pixel values between the source image and the                are also represented by parametric equation with parame-
target image damage the natural image fingerprints across               ter u which has 20 integer parameters with range -10 to
the facial boundary. For this reason, it is obvious that there          9. Window-shaped 20×20 corresponding coordinates are
is a difference between the synthesized facial boundary and             specified by set of 20 horizontal lines. Figure 2 illustrates
the unforged natural image in terms of pixel values distribu-           example of extracted edge window. We will call these 400

                                                                    3
Facial Manipulation Detection Based on the Color Distribution Analysis in Edge Region
Figure 3. Facial boundary window generation pipeline. (a) Capture image I from video. (b) Get a facial boundary line with face landmark
points detected from (a). (c) Make perpendicular line segment Pi,i+1 with neighboring landmark points. (d) Create set of horizontal lines
Vi,i+1 perpendicular to line segment Pi,i+1 at equal intervals.

coordinate features to edge window Wi , which is set of hor-
izontal lines Vi,i+1 with two parameter t and u. A total
of 16 windows are created per face image with 17 facial
boundary landmarks. Summarized pipeline of generating
windows from one facial image is shown in Figure 3.
   To find out how the corresponding RGB pixel values in
edge window Wi change from the inside to the outside of
the face, we compute the absolute difference of RGB pixel
values according to the parameter t. Then we average pixel
difference values in 20 horizontal lines for statistical differ-
ence in boundary region. These extracted 16×19×3 color
difference features are facial boundary features, 16 refers
the number of edge windows W in one facial image, 19                        Figure 4. Background edge image generation procedure
indicates the differentiated feature along parameter t and 3
stands for RGB.
                                                                       shown at Figure 4.
3.2. Background Boundary Feature Extraction                               For making background boundary feature from edge
                                                                       points in background edge image EI , we randomly sam-
    Prior to extracting the features of background edge                pled 10% points of total edge points in EI to get statistical
points, we make a background edge image EI from frame                  information which appears along background edges. Using
image I. For removing noise of background and extract                  sampled edge points, background windows are extracted in
prominent boundary parts, we blur the image I with Gaus-               a same way of facial boundary feature extraction, with Li
                                      0
sian filter. Then rough edge image EI including edge points            and Li+1 correspond to two nearest points of one sampled
on face is generated with Canny edge detector.                         point. Summarized background edge windows generating
                           0
    Facial edge points in EI must be excluded for extract na-          procedure is shown at 5.
ture fingerprint of background edges. For removing facial                 While edge windows in facial boundary have specified
edge points, we considered convex image CI that covers                 direction, from inside of face to outside of face with per-
face contour. CI is binary image that has same size of orig-           pendicular direction, the direction of parameter t in back-
inal image I with inner points of convex are 0 and outer               ground edge windows from background edge image can-
points of convex are 1. We would like to ensure that the               not be specified except that they have vertical direction of
edges appearing on the face were excluded. Big-face con-               boundary lines. For this reason, we fold the background
               0
vex image CI is created by blurring CI with Gaussian Filter            boundary feature and unfold symmetrically with the basis
and round down each value in blurred CI to 0 except the                on T = 10 for removing directional information in edge
pixels with the values of 1. With element-wise multiplying             window, e.g., all feature values in T = 1 and T = 19 are av-
  0         0
CI and EI , we can get edge image EI which excludes fa-                eraged by axis and modified to averaged values. Final shape
cial edge points. Brief pipeline of making edge image EI is            of mirrored background boundary color difference features

                                                                   4
Facial Manipulation Detection Based on the Color Distribution Analysis in Edge Region
Figure 5. Background boundary feature extraction pipeline. (a) Capture image I from video. (b) Make a background edge image without
facial boundary EI . (c) Sample edge points in Edge Image from edge image EI . (d) Make vertical lines and horizontal lines in the same
way as the facial line segments were extracted.

has shape of N ×19×3. Note that N is the number of sam-               ference in color distribution is inevitable.
pled edge points in background edge image EF while 19                    Also, the distribution seen in these three datasets have
and 3 are same as mentioned in Section 3.1.                           in common that the difference between facial features and
                                                                      background features is relatively larger for the unforged im-
4. Analysis with Extracted Features                                   age than for the face manipulated image. This observation
                                                                      shows that reflecting the facial information and background
    In this section, we briefly analyze how the features ex-          information together could help solve the problem of gen-
tracted from the image appear according to the authenticity           erality.
of image prior to experimenting.
Datasets. We would like to analyze with various up-                   5. Experiment
to-date face manipulation datasets with diverse generation
                                                                          In this section, we introduce our experiment setups, im-
algorithms for demonstrating our method’s generalization
                                                                      plementation details and result of extensive experiment with
ability. We adopt several datasets: FaceForensics++ [30],
                                                                      aspect of overall performance, universality and comparison
Celeb-DF [20], DFD [6] and DFDC [7]. All of these
                                                                      with other contemporary works to verify our methods.
datasets contains genuine video data and face manipulated
                                                                          For conclusive evaluation of detection performance, we
video data from genuine ones with reduced awkwardness of
                                                                      mainly trained with Celeb-DF dataset, which is difficult for
pixel values. We randomly choose 750 genuine videos and
                                                                      other papers to derive remarkable detection performance
750 manipulated videos for FaceForensics++, DFD, and
                                                                      [36] and use FaceForensics++, DFD, DFDC datasets ad-
DFDC each. Exceptionally, since Celeb-DF dataset has less
                                                                      ditionally. When testing, we cross-validate through all of
than 750 genuine video data, we choose 563 genuine video
                                                                      these datasets to demonstrate universality of our method.
data and 750 manipulated video data in Celeb-DF dataset.
                                                                      Additional preprocessing for experiment. We would like
    We plotted a graph to statistically analyze the change ac-
                                                                      to get information of difference between the distribution in-
cording to the parameter t value of the color features ob-
                                                                      dicated by the difference in color values at the edge region
tained by our method. Figure 6 shows difference value dis-
                                                                      with aspect of face and background. We get difference value
tributions along parameter t-direction from pre-processed
                                                                      with following Equ. (1)
facial features and background features in three datasets,
Celeb-DF, FaceForesics++ and DFDC. Note that boundary                                       DI,n = FI,n − BˆI                      (1)
edge points are at t = 9. We observe that there is a distri-
bution difference between the facial features derived from            Note that FI,n , BˆI and DI,n indicate facial boundary fea-
the face manipulated image and the facial features extracted          ture in nth window, averaged background boundary feature,
from the real image. There is significant difference in distri-       and difference between FI,n and BˆI each. Final features
bution points close to t = 9 point, which indicates the point         derived from background edge region is DI,n with a shape
along facial boundary line. Since a generated face image              of 16×19×3. Then we flatten the facial feature which ex-
may have unusual pixel distribution in face boundary re-              tracted in Section 3.1 from shape 16×19×3 to 304 linear
gion because of post-processing methods to correct the dif-           feature with 3 RGB channels. Also, the background fea-
ference in color values along the synthesized parts, the dif-         tures which extracted from Section 3.2 are flattened in the

                                                                  5
Figure 6. Color difference distribution of extracted features from 3 different datasets.

same way. The input shape of the preprocessed features for                                  Performance (AUC)
the classification model has a 6-channel 304 linear feature                    Study           FF++ / DFD DFDC             Celeb-DF
consisting of 3 channels of flatten face features and 3 chan-            Matern et al. [23]        78.0      66.2            55.1
nels of flatten background features.                                      Yang et al. [38]         47.3      55.9            54.6
Classification Model. To further highlight distinctive fea-                Li et al. [19]          93.0      75.5            64.6
tures between unforged image and face manipulated im-                     Zhou et al. [40]         70.1      61.4            53.8
age, we adopt 1-dimensional convolutional neural networks                Nguyen et al. [25]        96.6      53.3            57.5
(1D-CNN) which is widely used in the field of natural lan-                     Ours                99.8      97.9            95.2
guage processing for extracting features from sequential in-
puts [12] unlike other existing manipulation detection mod-                  Table 1. Performance result compared with recent works
els [30, 7] using 2-dimensional convolutional neural net-
works (2D-CNN) to process image features, since the ex-
tracted color difference features are sequential data accord-           ing rate. Cross-entropy function is used for loss function.
ing to the parameter t. While advanced face manipulated
                                                                        5.1. Detection Performance with Specific Dataset
image classification models are mainly designed to be com-
plex and heavy for extracting significant information from                  We firstly experiment with one specific dataset for both
whole images, we constructed a lightweight simple model                 training and testing. To evaluate our experiment results,
to process our pre-processed distinctive color features. The            several existing methods results [23, 38, 19, 40, 25] which
number of total epoch is set to 1000 and the batch size is set          are validated with all of four state-of-the-arts datasets (Face-
to 20. The learning rate is set as 0.00075 using Adam [15]              Forensics++, DFD, DFDC and Celeb-DF) cited from their
optimizer. Additionally we scheduled optimizer to multiply              original paper are shown with our results. Summarized per-
0.5 to learning rate at every 200 epochs for decaying learn-            formance compared with referred five manipulation detec-

                                                                    6
Performance (AUC)
                                                                                    Test set
    Study                 Training set          FF++                   DFD                DFDC               Celeb-DF
    Xception [30]         FF++                  -                      87.8               54.3               40.9
    Face X-ray [18]       FF++ and BI           98.52                  95.4               80.92              80.58
    Ours                  FF++                  99.8                   95.1               89.5               87.7
Table 2. Generalization ability comparison with Xception Network, Face X-ray and ours with several datasets. BI indicates generated
additional dataset made by FaceForensics++ datasets with applying method of Face X-ray [18]

           Cross-Validation Performance (AUC)                          ble 3 shows cross-validation performance result of our ap-
                                 Test set                              proach in terms of AUC. We observed that cross-validating
    Training set FF++ DFD DFDC Celeb-DF                                with Celeb-DF features trained model can obtain more gen-
       FF++        99.8     95.1    89.5     87.7                      eralized results compared with trained model with other
       DFD         95.9     99.8    88.5     87.6                      datasets. These inconsistency by training sets may be oc-
      DFDC         98.5     93.0    97.9     88.3                      cured because the extracted feature from Celeb-DF dataset
     Celeb-DF      99.8     99.5    97.1     95.2                      have relatively less distinctive difference between unforged
                                                                       image and manipulated image compared with other three
          Table 3. Cross-validation Performance result                 datasets. We can see that a model trained with a dataset with
                                                                       a relatively less prominence features such as Celeb-DF per-
                   Performance (AUC)                                   forms well when verifying a dataset with more prominent
              Study       FF++/DF Celeb-DF                             features. Nevertheless, any experiment results with DFDC
           FWA [19]         79.2     53.8                              and Celeb-DF as test dataset outperforms any other referred
         Face X-ray [18]    99.2     74.8                              face manipulation detection methods.
              Ours          99.8     95.2
                                                                       5.3. Comparison with Latest Works
Table 4. Performance comparison with FWA and Face X-ray
                                                                           For more precise evaluation with aspect of universality,
                   Performance (AUC))                                  we compared our performance result with latest works [30,
                                   Test set                            19, 18, 4, 8, 26] in terms of generalization ability. Table
            Study          Face2Face FaceSwap                          4 shows the comparison of best performance with ours and
                                                                       Face X-ray [18] in terms of training with one sub-dataset in
         Du et al. [8]        90.9          63.2
                                                                       FaceForensics++ [30] and testing with other dataset. There
       Nguyen et. al [26]     92.8          54.1
                                                                       are 4 sub-datasets in FaceForensics++ datasets, including
        Riess et al. [4]      94.5          72.6
                                                                       DeepFake [5], Face2Face [35], FaceSwap [9], NeuralTex-
            Ours              99.8          95.5                       tures [34]. We cite two compared results directly in [18],
Table 5. Performance comparison with 3 other recent works train-       XceptionNet results from [30] and Face X-ray results. Ta-
ing with Face2Face dataset.                                            ble 2 summarizes AUC results compared with these existing
                                                                       contemporary works and ours. Although we cannot repro-
                                                                       duce exactly same BI datasets which additionally generated
tion methods in terms of AUC, which is area under the Re-              from FaceForensics++ datasets with mentioned method in
ceiver Operating Characteristic curve is in Table 1. We                [18], we can see the our framework outperform other two
observe that our performance results outperform those of               approaches with aspects of best performance of AUC.
the other papers mentioned, especially in terms of verify-                 We also present the AUC of verifying with one dataset
ing with Celeb-DF dataset (95.2%) which makes other ad-                compared with Face Warping Artifacts [19] and Face X-ray
vanced manipulation detection methods performance dras-                [18] in Table 4. Our approach shows better results in both
tically dropped.                                                       detection performance with specific dataset, and universal-
                                                                       ity for various face synthesizing algorithms universality.
5.2. Cross-Validation with Various Datasets
                                                                           Lastly, we compare with other 3 recent works which sug-
   We design cross-validation experiment for testing our               gest other approaches for universality. Each studies focus
approach with aspect of robustness to real-world scenar-               on learning innate representation in manipulated image [4,
ios. In this experiment, we train our model with one                   8] and do detection and localization of synthesized feature
dataset and test with the other dataset for making scenar-             simultaneously [26]. Table 5 shows the generalization abil-
ios that detect manipulated image with unseen dataset. Ta-             ity results compared with referred 3 works [4, 8, 26]. We

                                                                   7
observe that our approach can show better performance in              [9]   Faceswap. MarekKowalski/FaceSwap. URL: http:
terms of universality for different type of synthesizing algo-              / / www . github . com / MarekKowalski /
rithm. compared with other studies which don’t approach                     FaceSwap.
with aspect of common manipulation procedure of synthe-              [10]   Hany Farid. “A survey of image forgery detection”.
sizing algorithms.                                                          In: IEEE Signal Processing Magazine 2.26 (2009),
                                                                            pp. 16–25.
6. Conclusion
                                                                     [11]   Ian Goodfellow et al. “Generative Adversarial
   We present a face manipulation detection methods with                    Nets”. In: Advances in Neural Information Pro-
leveraging facial boundary features which corresponding                     cessing Systems. Ed. by Z. Ghahramani et al.
to color features modified by common boundary post-                         Vol. 27. Curran Associates, Inc., 2014, pp. 2672–
processing of generation procedures. In addition to refer-                  2680. URL: https : / / proceedings .
ring facial features, background features corresponding to                  neurips . cc / paper / 2014 / file /
unchanged characteristics containing image’s innate finger-                 5ca3e9b122f61f8f06494c97b1afccf3 -
prints are used for detecting face synthesized image. We                    Paper.pdf.
reveal the importance of utilizing both facial edge infor-           [12]   Yoon Kim. “Convolutional neural networks
mation and background edge information with outstanding
                                                                            for sentence classification”. In: arXiv preprint
detection performance better than other existing advanced
                                                                            arXiv:1408.5882 (2014).
face manipulation detection methods. Cross-validating with
state-of-the-arts datasets demonstrate our approach not only         [13]   Davis E King. “Dlib-ml: A machine learning
remarkably performed with specific dataset but can also be                  toolkit”. In: The Journal of Machine Learning Re-
fully applied in real-world situations.                                     search 10 (2009), pp. 1755–1758.
                                                                     [14]   Paul H King. “Digital image processing and anal-
References                                                                  ysis: Human and computer applications with cvip-
                                                                            tools, (umbaugh, s.; 2011)[book reviews]”. In: IEEE
 [1]   Darius Afchar et al. “Mesonet: a compact facial
                                                                            Pulse 3.4 (2012), pp. 84–85.
       video forgery detection network”. In: 2018 IEEE In-
       ternational Workshop on Information Forensics and             [15]   Diederik P Kingma and Jimmy Ba. “Adam: A
       Security (WIFS). IEEE. 2018, pp. 1–7.                                method for stochastic optimization”. In: arXiv
                                                                            preprint arXiv:1412.6980 (2014).
 [2]   Shruti Agarwal et al. “Protecting World Leaders
       Against Deep Fakes.” In: CVPR Workshops. 2019,                [16]   Diederik P Kingma and Max Welling. Auto-
       pp. 38–45.                                                           Encoding Variational Bayes. 2014. arXiv: 1312 .
                                                                            6114 [stat.ML].
 [3]   John Canny. “A computational approach to edge de-
       tection”. In: IEEE Transactions on pattern analysis           [17]   Haodong Li et al. “Detection of deep network gener-
       and machine intelligence 6 (1986), pp. 679–698.                      ated images using disparities in color components”.
                                                                            In: arXiv preprint arXiv:1808.07276 (2018).
 [4]   Davide Cozzolino et al. “Forensictransfer: Weakly-
       supervised domain adaptation for forgery detection”.          [18]   L. Li et al. “Face X-Ray for More General Face
       In: arXiv preprint arXiv:1812.02510 (2018).                          Forgery Detection”. In: 2020 IEEE/CVF Confer-
                                                                            ence on Computer Vision and Pattern Recognition
 [5]   Deepfakes. deepfakes/faceswap. URL: http : / /
                                                                            (CVPR). 2020, pp. 5000–5009. DOI: 10 . 1109 /
       www.github.com/deepfakes/faceswap.
                                                                            CVPR42600.2020.00505.
 [6]   DFD. Contributing Data to Deepfake Detec-
                                                                     [19]   Yuezun Li and Siwei Lyu. “Exposing deepfake
       tion Research. Sept. 2019. URL: https : / /
                                                                            videos by detecting face warping artifacts”. In: arXiv
       ai . googleblog . com / 2019 / 09 /
                                                                            preprint arXiv:1811.00656 (2018).
       contributing - data - to - deepfake -
       detection.html.                                               [20]   Yuezun Li et al. Celeb-DF: A Large-scale Challeng-
                                                                            ing Dataset for DeepFake Forensics. 2020. arXiv:
 [7]   Brian Dolhansky et al. The Deepfake Detection Chal-
                                                                            1909.12962 [cs.CR].
       lenge (DFDC) Preview Dataset. 2019. arXiv: 1910.
       08854 [cs.CV].                                                [21]   Francesco Marra et al. “Detection of gan-generated
                                                                            fake images over social networks”. In: 2018 IEEE
 [8]   Mengnan Du et al. “Towards generalizable forgery
                                                                            Conference on Multimedia Information Processing
       detection with locality-aware autoencoder”. In: arXiv
                                                                            and Retrieval (MIPR). IEEE. 2018, pp. 384–389.
       preprint arXiv:1909.05999 (2019).

                                                                 8
[22]   Francesco Marra et al. “Do gans leave artificial fin-         [33]   Joel Stehouwer et al. “On the detection of digital face
       gerprints?” In: 2019 IEEE Conference on Multimedia                   manipulation”. In: arXiv preprint arXiv:1910.01717
       Information Processing and Retrieval (MIPR). IEEE.                   (2019).
       2019, pp. 506–511.                                            [34]   Justus Thies, Michael Zollhöfer, and Matthias
[23]   Falko Matern, Christian Riess, and Marc Stam-                        Nießner. “Deferred Neural Rendering: Image Syn-
       minger. “Exploiting visual artifacts to expose deep-                 thesis Using Neural Textures”. In: ACM Trans.
       fakes and face manipulations”. In: 2019 IEEE Win-                    Graph. 38.4 (July 2019). ISSN: 0730-0301. DOI: 10.
       ter Applications of Computer Vision Workshops                        1145 / 3306346 . 3323035. URL: https : / /
       (WACVW). IEEE. 2019, pp. 83–92.                                      doi.org/10.1145/3306346.3323035.
[24]   Scott McCloskey and Michael Albright. “Detecting              [35]   Justus Thies et al. “Face2Face: Real-Time Face Cap-
       GAN-generated imagery using saturation cues”. In:                    ture and Reenactment of RGB Videos”. In: Commun.
       2019 IEEE International Conference on Image Pro-                     ACM 62.1 (Dec. 2018), pp. 96–104. ISSN: 0001-
       cessing (ICIP). IEEE. 2019, pp. 4584–4588.                           0782. DOI: 10 . 1145 / 3292039. URL: https :
[25]   H. H. Nguyen, J. Yamagishi, and I. Echizen.                          //doi.org/10.1145/3292039.
       “Capsule-forensics: Using Capsule Networks to De-             [36]   Ruben Tolosana et al. “Deepfakes and beyond: A
       tect Forged Images and Videos”. In: ICASSP 2019                      survey of face manipulation and fake detection”. In:
       - 2019 IEEE International Conference on Acous-                       arXiv preprint arXiv:2001.00179 (2020).
       tics, Speech and Signal Processing (ICASSP). 2019,            [37]   Cristian Vaccari and Andrew Chadwick. “Deepfakes
       pp. 2307–2311. DOI: 10.1109/ICASSP.2019.                             and disinformation: exploring the impact of syn-
       8682602.                                                             thetic political video on deception, uncertainty, and
[26]   Huy H Nguyen et al. “Multi-task learning for de-                     trust in news”. In: Social Media+ Society 6.1 (2020),
       tecting and segmenting manipulated facial images                     p. 2056305120903408.
       and videos”. In: arXiv preprint arXiv:1906.06876              [38]   Xin Yang, Yuezun Li, and Siwei Lyu. “Expos-
       (2019).                                                              ing deep fakes using inconsistent head poses”.
[27]   Judith MS Prewitt. “Object enhancement and ex-                       In: ICASSP 2019-2019 IEEE International Confer-
       traction”. In: Picture processing and Psychopictorics                ence on Acoustics, Speech and Signal Processing
       10.1 (1970), pp. 15–19.                                              (ICASSP). IEEE. 2019, pp. 8261–8265.
[28]   Weize Quan et al. “Distinguishing between natural             [39]   Ning Yu, Larry S Davis, and Mario Fritz. “Attributing
       and computer-generated images using convolutional                    fake images to gans: Learning and analyzing gan fin-
       neural networks”. In: IEEE Transactions on Informa-                  gerprints”. In: Proceedings of the IEEE International
       tion Forensics and Security 13.11 (2018), pp. 2772–                  Conference on Computer Vision. 2019, pp. 7556–
       2787.                                                                7566.
[29]   Danilo Jimenez Rezende, Shakir Mohamed, and                   [40]   Peng Zhou et al. “Two-stream neural networks for
       Daan Wierstra. “Stochastic Backpropagation and                       tampered face detection”. In: 2017 IEEE Conference
       Approximate Inference in Deep Generative Models”.                    on Computer Vision and Pattern Recognition Work-
       In: ed. by Eric P. Xing and Tony Jebara. Vol. 32.                    shops (CVPRW). IEEE. 2017, pp. 1831–1839.
       Proceedings of Machine Learning Research 2. Be-
       jing, China: PMLR, 22–24 Jun 2014, pp. 1278–1286.
       URL: http : / / proceedings . mlr . press /
       v32/rezende14.html.
[30]   Andreas Rossler et al. “FaceForensics++: Learning
       to Detect Manipulated Facial Images”. In: Proceed-
       ings of the IEEE/CVF International Conference on
       Computer Vision (ICCV). Oct. 2019.
[31]   Ekraam Sabir et al. “Recurrent convolutional strate-
       gies for face manipulation detection in videos”. In:
       Interfaces (GUI) 3.1 (2019).
[32]   Irwin Sobel and Gary Feldman. “A 3x3 isotropic gra-
       dient operator for image processing”. In: a talk at the
       Stanford Artificial Project in (1968), pp. 271–272.

                                                                 9
You can also read