Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...

Page created by Lisa Williams
 
CONTINUE READING
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
Singe Image Rain Removal with Unpaired Information:
                            A Differentiable Programming Perspective

                      Hongyuan Zhu1 , Vijay Chanderasekh1 , Liyuan Li1 Joo-Hwee Lim1
                                      1
                                          Institute for Infocomm Research, A*STAR, Singapore,
                                             {zhuh, vijay, lyli, joo-hwee}@i2r.a-star.edu.sg

                           Abstract
  Single image rain-streak removal is an extremely challenging
  problem due to the presence of non-uniform rain densities
  in images. Previous works solve this problem using various
  hand-designed priors or by explicitly mapping synthetic rain
  to paired clean image in a supervised way. In practice, how-             (a) Input         (b) Ground truth       (c) Our result
  ever,the pre-defined priors are easily violated and the paired     Figure 1: A visual illustration of single image deraining. The
  training data are hard to collect. To overcome these limi-         target is to recover a clean image from the input rain image.
  tations, in this work, we propose RainRemoval-GAN (RR-             Our method produces a recovered image with faithful color
  GAN), the first end-to-end adversarial model that generates        and structure without using any paired training data.
  realistic rain-free images using only unpaired supervision.
  Our approach alleviates the paired training constraints by in-     where X and R denote the desired clean image and the rain
  troducing a physical-model which explicitly learns a recov-
  ered images and corresponding rain-streaks from the differen-
                                                                     streaks, respectively. Single image rain-streak removal aims
  tiable programming perspective. The proposed network con-          to estimate the hidden X and R from a given I, which is
  sists of a novel multiscale attention memory generator and a       an unconstrained problem because we need to estimate two
  novel multiscale deeply supervised discriminator. The multi-       unknown variables and there are infinitely solutions if no
  scale attention memory generator uses a memory with atten-         further regularization are imposed.
  tion mechanism to capture the latent rain streaks context at          To make the rain streak removal problem trackable, exist-
  different stages to recover the clean images. The deeply su-       ing methods can be roughly grouped into two categories:
  pervised multiscale discriminator imposes constraints at the       prior-based and data-driven. The prior-based methods es-
  recovered output in terms of local details and global appear-
                                                                     timate the rain-streak based on various priors or assump-
  ance to the clean image set. Together with the learned rain-
  streaks, a reconstruction constraint is employed to ensure         tions. Typical priors include but not limited to sparse-coding
  the appearance consistent with the input image. Experimen-         prior (Luo et al. 2015), low-rank prior (Chang et al. 2017)
  tal results on public benchmark demonstrates our promising         and Gaussian prior (Li et al. 2016). Despite remarkable
  performance compared with nine state-of-the-art methods in         progress achieved by adopting these priors, they are easily
  terms of PSNR, SSIM, visual qualities and running time.            violated in practice given the rain-streak does not strictly fol-
                                                                     low the Gaussian or sparse distribution and the background
                       Introduction                                  scenes is cluttered and contains complex illuminations.
Rain, snow, and fog are common visual artifacts which                   In recent, the interest has shifted to data-driven ap-
affects many vision-based applications (Zhu et al. 2017a;            proaches which utilize labeled data with paired rain im-
2016a; 2018a; 2015; 2016b), such as drone-based video                age and rain streak to learn a deep neural network (Fu
surveillance and self-driving cars. The performance of many          et al. 2017; Yang et al. 2017b; Zhang and Patel 2018;
computer vision systems often exhibits significant drop              Li et al. 2018). Given an input image, the rain streaks or
when they are presented with images that contain these ar-           clean image can be regressed with given input rain im-
tifacts. Hence, it is highly practical and expected to de-           age. More recently, inspired by the huge success of gener-
velop automatic artifacts removal methods (Li et al. 2017;           ative adversarial networks (GAN) in image-to-image trans-
Zhang et al. 2017c). In this paper, we mainly focus on the           lation tasks (Goodfellow et al. 2014; Isola et al. 2017),
problem of rain streak removal from a single image.                  (Zhang et al. 2017b) propose a GAN-based image derain-
   A rain image I can be modeled as an image composition             ing method which employs a generator to map input image
problem (Luo et al. 2015):                                           to the ground-truth clean image. One major limitation of the
                                                                     method is the necessity of paired training data and it is ex-
                          I =X +R                              (1)   tremely challenging to collect paired training data given the
Copyright c 2019, Association for the Advancement of Artificial      changing environment.
Intelligence (www.aaai.org). All rights reserved.                       However, collecting paired training data is extremely
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
challenging given the changing environment. One popular                                    Related Work
solution is simulating rain in controlled environments. How-
                                                                     In this section, we briefly review several recent related works
ever, most existing simulations are usually too simplified to
                                                                     on single image de-raining and generative adversarial net-
depict the complexity of real-world. Another common solu-
                                                                     works.
tion is using photo editing tools to add streak to clean nat-
ural images with a various rain-density levels with different
orientations and scales, but this method limits the scale of         Single Image De-raining
training data.                                                       Image deraining has been a classic image restoration prob-
   In fact, one could observe that it is easier to collect a large   lem for years. Some early methods (Zhang et al. 2006;
number of rain/rain-free images from Internet despite these          Garg and Nayar 2007; Santhaseelan and Asari 2015; Tri-
images are not in pair. Thus, it is highly excepted to develop       pathi and Mukhopadhyay 2014) exploit photometric con-
derain model which can utilize unpaired supervision. Note            sistency and temporal dynamics for rain removal. Although
that, CycleGAN (Zhu et al. 2017b) recently has become                these methods achieve promising performance, they are not
popular to learn cross-style image translation by using bidi-        applicable to the single image setting.
rectional constrains with adversarial learning. Although Cy-            Unlike video-based methods, prior-based methods have
cleGAN has achieved impressive performance in style trans-           been proposed to handle single image deraining problem by
lation, it is designed for style translation problem and may         assuming the rain-streaks follows certain assumption. There
not preserve the appearance consistency in the translated re-        are many hand-crafted priors have been proposed, e.g. sparse
sult.                                                                coding-based prior (Luo et al. 2015), low-rank prior (Chang
   Based on the above observations, we propose a novel               et al. 2017) and gaussian prior (Li et al. 2018), just to name a
single image deraining method called RainRemoval-GAN                 few. One major limitation of these methods is that the priors
(RR-GAN) which is specifically designed based on rain im-            are easily violated which results in over-/under-estimation
age composition model in Eq.1. The proposed RR-GAN is                of the rain-streaks (Zhang and Patel 2018).
significantly different from CycleGAN and its variants in               Recently, deep learning has become popular in both high-
structure and application. More specifically, we propose a           level and low-level vision tasks (Cai et al. 2016; Ren et al.
novel generator which uses an attention memory to cap-               2016; Yang et al. 2017a), several CNN-based methods have
tures the latent rain streaks contexts in a recurrent fashion        also been proposed for image de-raining (Fu et al. 2017;
to recover clean image X. Moreover, we propose a novel               Yang et al. 2017b; Zhang and Patel 2018; Li et al. 2018).
deeply-supervised multiscale discriminator to regularize the         In these methods, the idea is to learn a mapping between
recovered image to look as realistic as possible to the rain-        input rainy images and their corresponding rain-streaks us-
free images. Furthermore, these latent factors are compos-           ing a CNN structure. (Yang et al. 2017b) design a deep di-
ited together as in Eq.1 to reconstruct the original rain im-        lated network to joint detect and remove rain streaks. (Li et
age to preserve faithful color and structure, thus avoiding          al. 2018) propose incorporate recurrent neural network to
the inefficient CycleGAN’s bi-directional consistency train-         the dilaeted network of Yang et al. to preserve multi-stage
ing paradigm.                                                        contexts. (Zhang and Patel 2018) propose to use multi-scale
   The major contributions of this work are summarized as            densely connected network trained with additional rain den-
follows:                                                             sity classifier to predict the rain streaks. In summary, these
                                                                     methods aim to learn a mapping using synthesized rain/clean
• To the best of our knowledge, this is one of first works to        image pairs which are hard to collect in large scale.
  marriage CycleGAN for single image deraining, so that                 Our methods’s generator is similar to (Zhang and Pa-
  makes rain-removal training with unpaired data possible.           tel 2018) as we combines coarse- and fine-scale network
  The novel GAN consists of a novel generator and discrim-           for feature extraction, however we introduce a novel at-
  inator which is specifically designed by incorporating the         tention memory network to learn the rain-streaks automati-
  rain image generation model with un-paired training in-            cally from data in a recurrent fashion. The attention memory
  formation.                                                         network is inspired by recent work in language translation
• A novel multi-scale attention memory generator is pro-             (Gehring et al. 2017) which shows that the performance of
  posed with an attention memory to fuse the contexts from           machine translation models could be significantly improved
  coarse-scale and fine-scale densely connected network to           by solely using an attention model instead of using addi-
  recurrently learns the rain-streaks to recover the clean im-       tional gate in recurrent networks as in (Li et al. 2018), which
  age using un-paired training data.                                 significantly improve the inference speed. Moreover, our
                                                                     generator can explicitly learns two disentangled latent pa-
• A novel multiscale deeply-supervised discriminator is              rameters which corresponds to clean image and rain streaks
  proposed to regularize the generated recover image as re-          from data without paired information, hence significantly re-
  alistic as possible to the target image in terms of both low-      duce the labeling efforts.
  level details and high-level structures.
• Extensive experiments on public benchmark demon-                   Generative Adversarial Networks
  strates our method’s efficiency and effectiveness in terms         Recently generative adversarial networks (Goodfellow et al.
  of quantitative and qualitative performance.                       2014; Arjovsky et al. 2017; Zhao et al. 2017) have achieved
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
Multiscale Attention
                                Memory Generator

                                                Gf                       Gf
                                     Dense                     Dense                                                     Recovered
                                     Block                     Block                                                       Image

                                       …....

                                                                 …....
                                                                                                  Adversarial Regularization
                                     Dense                     Dense
                                     Block                     Block
         Input Image                                                                                                  Real/Fake
                                                                                                                      Real/Fake
                                                             Attention                                                Real/Fake
                                    Attention          …..                                                            Real/Fake
                                    Memory                   Memory

                                                                                                        Multiscale Deeply
                                                             Mask n                                  Supervised Discriminator

                                                              Mask

                 Consistency Regularization                               Reconstructed
                                                                             Image

Figure 2: The pipeline of our method. Our RR-GAN consists of a multi-scale attention memory generator and a multiscale
deeply-supervised discriminator. The generator uses un-paired rain and clean images to train an attention memory to aggregate
the latent rain streak context from different stages. The rain image together with the latent rain-streak imasks are used as an
input to the U-Net to generate the recovered image. This recovered image is regularized by a deeply-supervised multiscale
discriminator so that its appearance is as close as possible to the clean images used for training in terms of both low-level
details and global-level structures.

significant progress. The basic idea of GANs is transform-               attention memory generator using multiscale densely con-
ing the white noise (or other specified prior) through a para-           nected networks and attention memory in a recurrent fashion
metric model to generate candidate samples with the help of              to learn discriminative rain features and latent rain streaks
a discriminator and a generator. By optimizing a minimax                 to recover the clean image. Furthermore, our network con-
two-player game, the generator aims to learn the training                tains of a novel deeply-supervised multi-scale discriminator
data distribution, and the discriminator aims to judge that              training regime, so that the local details to global image ap-
a sample comes from the training data or the generator. An               pearance is enforced to look realistic to the clean image set.
image can be translated into an output by the conditional                Furthermore, these latent factors are composited together as
generator, which has various applications, such as image su-             in Eq.1 to reconstruct the original rain image to preserve
per resolution (Ledig et al. 2017), text2image (Zhang et al.             faithful color and structure, thus avoiding the inefficient Cy-
2017a), image2image (Yi et al. 2017). However, condition-                cleGAN’s bi-directional consistency training paradigm.
alGAN requires paired training images, whose ground-truth
can be very difficult to get.                                            Differentiable Programming
   This paper is inspired by the recent popular Cycle-                   Our work belongs to the family of differentiable program-
GAN (Zhu et al. 2017b) which can translate one image from                ming which treats a program as neural network such that the
one domain to another domain without paired information.                 program can be parametrized, automatically differentiated
However, our work is remarkably different from CycleGAN.                 and optimizable. The first well-known work of differentiable
In brief, our architecture is specifically designed for image            programming is the Learned ISTA (LISTA) (Gregor and Le-
deraining so that the original image’s color and structure               Cun 2010), which unfolds the popular l1 solver ISTA as a
could be well preserved. To the best of our knowledge, this              simple RNN such that the number of layers corresponds to
could one of the first works to develop specific single im-              the iteration number and the weight corresponds to dictio-
age deraining model that incorporates un-paired adversar-                nary. The LISTA’s RNN paradigm have been applied in a
ial learning. Our network is remarkably distinct from Cy-                wide range of tasks, e.g. hashing (Wang et al. 2016a), clas-
cleGAN in following aspects. First, we proposes a novel                  sification (Wang et al. 2016b), sparse coding (Zhou et al.
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
2018), image dehazing (Zhu et al. 2018b) and etc.                 is as follows:
   Different from conventional differentiable programming                     at = σ(W1 [Xc , Xf , Mt−1 , St−1 ] + b1 )
using RNN with difficult-to-interpret variables, our method                  St = at · St−1                                (2)
reformulate the image degradation model using a feed-
                                                                             Mt = tanh(W2 ∗ St + b2 )
forward convolutional neural network with prior knowledge.
Therefore, our formulation is more interpreatable and effi-       where W1 and b1 are the weight and bias of a convolutional
cient then conventional RNN based DP solver.                      layer with 3 × 3 kernels of one neuron, σ denotes the sig-
                                                                                              1
                                                                  moid function σ(x) = 1+exp(−x)    . W2 and b2 are the weight
                                                                  and bias of a deconvolutional layer with 3 × 3 kernels of
                       The Method                                 one neuron, which are used to output the rain-streak mask.
                                                                  In Fig.2, one can observe that the rain mask is recurrently
As shown in Fig.2, our RR-GAN consists of two networks,           improved by using the contexts from the previous stages.
namely, a multiscale attention memory generator (MAMG)            The input image together the final attention map are con-
and a multiscale deeply-supervised discriminator (MDSD)           catenated to feed into the U-Net auto-encoder for generat-
are specifically designed for single image deraining. In brief,   ing the rain-free image. Our deep autoencoder includes 16
MAMG uses an attention memory to recurrently attend to            convolutional (relu) blocks, which adopts skip connections
the rain-streak regions for the purpose of recovery of the        to prevent blurred outputs. To show the effectiveness of our
clean image. MDSD employs a deeply supervision to re-             method, Fig. 3 shows the visual results of different gener-
cover image from the generator by enforcing it look as re-        ators, the one which combines attention memory and skip
alistic as possible to the clean image set.                       U-Net auto-encoder yielded best visual result.

Multiscale Attention Memory Network
The multiscale attention memory network consists of an at-
tentive memory network and a U-Net autoencoder with skip-
connection. The former uses a multiscale feature extractor
Gf to extract rain-relevant regions features. To achieve bet-
ter performance, Gf combines the complementary coarse-
scale and dense-scale context using recently proposed dense                      (a)                        (b)
connection (Huang et al. 2017). The fine-scale dense net-
work applies densely connected modular with small ker-
nels to capture fine-scale structures relevant to rain re-
gions with a structure of C(1, 3)-C(3, 3)-C(5, 3)-C(7, 3) ,
where C(k1 , k2 ) denotes the convolution with a filter of size
k1 × k1 , an output channel number of k2 and a stride of
1 with ReLu output (Krizhevsky et al. 2012). Each layer
are densely connected with other layers. The coarse-scale                        (c)                        (d)
branch applies a similar architecture but with larger kernels
                                                                  Figure 3: A visual comparison on the effectiveness of dif-
to capture longer-range context, whose structure is of C(7,
                                                                  ferent generators. (a) Input; (b) the result of using context
3)-C(9, 3)-C(11, 3)-C(1, 3). The feature maps Xc and Xf
                                                                  auto-encoder alone; (c) the result of our generator without
from coarse-scale and fine-scale networks respectively, are
                                                                  using attention memory; (d) the result of our multiscale at-
concatenated and fed through the attention memory network
                                                                  tention memory generator.
to learn rain regions.
   Visual attention models are adopted to localize relevant       Multiscale Deeply-Supervised Discriminator
regions in an image so that the task-specific features are ob-    The function of discriminator is to regularize the output from
tained. In this paper, we introduce visual attention to learn     the generator so that the generated image looks as realistic as
where the rain-regions should be focused on. As shown in          possible to the target set clean images. Recently, a shallowly-
our architecture in Fig. 2, we present an attention memory        supervised discriminator with discriminator with supervi-
network which is inspired by recent convolutional sequence-       sion at the last layer has become popular in recent popular
to-sequence learning task in machine translation (Gehring et      gan architecrtures (e.g CycleGAN (Zhu et al. 2017c) and
al. 2017). Our attention memory network sequentially uses         ConditionalGAN (Isola et al. 2017)) which achieves good
attention to replace the recurrent networks and the attention     image translation performance. However, such shallowly-
map is iteratively refined given the accumulated context. At      supervised discriminator overlooks the low-level details and
each time stamp t, our attention memory network concate-          global structures which are useful for image enhancement
nates Xc , Xf , Mt−1 , and St−1 , where Xc is the input from      tasks.
coarse-scale network, Xf is from fine-scale network, Mt−1            To overcome the above disadvantage, we propose a novel
is the rain streak mask at the previous stage t − 1, and St       deeply-supervised discriminator which consists of four con-
denotes the state memory in previous stage. Note that, the        volutional layers, which is with the structure of C(3, 64)-
initial M0 and S0 are set to zero. The detailed updating rule     C(3, 128)-C(3, 256)-C(3, 512), where each convolutional
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
layer has a stride of two. Each convolutional feature map         Training Details
will be passed through an instance normalization layer with
                                                                  During training, a 286 × 286 image is randomly cropped
Leaky-ReLu activations and then be fed into the next con-
                                                                  from the input image (or its horizontal flip) of size 256256.
volutional layer. Moreover, each side-output follows by an
                                                                  Adam is used as optimization algorithm with a mini-batch
additional convolution layer of C(1, 1) with sigmoid to out-
                                                                  size of 1. The learning rate starts from 0.001. The models
put the probability of each patch to the clean images. Hence
                                                                  are trained for up to 10 epochs to ensure convergence. We
we will have four predictions which provide multiple level
                                                                  use a weight decay of 0.0001 and a momentum of 0.9. The
supervision to provide regularization at to make the gener-
                                                                  entire network is trained using the Pytorch framework. Dur-
ated image as realistic as the ground-truth image in terms of
                                                                  ing training, we set γ = 1. All the parameters are defined
low-level details and high-level structures.
                                                                  via cross-validation using the validation set.
Objective Function
                                                                                        Experiments
To stabilize the training of discriminator D, we minimize an
objective function Ld that consists of two terms LDreal and       We evaluate our method on public benchmark. We quan-
LDf ake as in (Mao et al. 2017):                                  titatively evaluate the rain-removal performance using two
                                                                  commonly used metrics, including peak signal to noise ratio
        Ld (G, D) =LDreal + LDf ake                               (PSNR) (Huynh-Thu and Ghanbari 2008) and structure sim-
                            K                                     ilarity index (SSIM) (Wang et al. 2004) for evaluation. We
                         1 X                                      also provide ablation study of each component proposed to
                    =          {Ey (1 − Di (x, y))2 +      (3)
                        2K i=1                                    demonstrate their effectiveness.
                        Ex (Di (G(x))2 }                             We use Rain800 (Zhang and Patel 2018) for benchmark-
                                                                  ing. The Rain800 dataset contains 700 synthesized images
which aims to train the discriminator Di so that it can try to    for training and 100 images for testing using randomly sam-
differentiate the fake examples G(x) from generator G and         pled outdoor images.
the real clean image y, where Ex is the expectation of x and
Di∈{1,2,...,K} is the classifier at ith level side-output.        Comparing Methods We compare our proposed ap-
   On the other hand, when we train the generator G, we           proach with 7 state-of-the-art methods, including image de-
optimize an objective function Lg which consists of a MSE         composition (ID) (Kang et al. 2012), discriminative sparse
loss Lr (Ren et al. 2016) and an adversarial learning loss        coding (DSC) (Luo et al. 2015), layer priors and (LP) (Li et
Ladv :                                                            al. 2016), DetailsNet (Fu et al. 2017), and joint rain detec-
                                                                  tion and removal (JORDER) (Yang et al. 2017b) and RES-
                      Lg = Lr + γLadv                      (4)    CAN (Li et al. 2018).
These three terms are designed for minimizing the recon-
struction error and improving the perceptual quality, re-         Results on Synthetic Datasets Table 1 shows quantita-
spectively. γ is a trade-off factors set according to cross-      tive results on the Rain800. One can observe that our RR-
validation.                                                       GAN considerably achieves promising results in terms of
   The MSE Loss Lm encourages the network to enforce              both PSNR and SSIM on Rain800.
the appearance consistency between the recovered image               From Table 1, data-driven methods, especially the deep
X and input image Il using the estimated rain mask R by           learning based methods (Fu et al. 2017; Yang et al. 2017b;
minimizing the discrepancy between the composite image            Zhang and Patel 2018; Li et al. 2018) significantly outper-
It = X + R and the input image Il :                               form the prior-based methods (Kang et al. 2012; Luo et al.
                                                                  2015; Li et al. 2016). The comparison demonstrates that the
                      W X H X C                                   features learned by the deep neural networks is much helpful
               1     X
    Lr =                        kI i,j,c − Ili,j,c k2      (5)    for the rain-removal task.
           C × W × H i=1 j=1 c=1 t
                                                                     On the one hand, the proposed RR-GAN is trained with
                                                                  unpaired training data, which outperform the JORDER
The W , H, and C are the width, height, and channel number
                                                                  method on the Rain800 dataset. Note that, JORDER is
of the input image It .
                                                                  trained using the paired supervised data. In other words, it
   The Adversarial Loss Ladv encourages the generator G           utilizes a stronger supervisor than our method does. The re-
to recover image G(x) as realistic as the ground-truth image      sult demonstrates that it is feasible and promising to develop
y by assigning a real label 1 to recovered example G(x):          deraining model using unpaired data which could be easily
                             K                                    accessed in practice.
                          1 X                                        In terms of running time, our method is also much faster
        Ladv (G, D) =           Ex (1 − Di (G(x)))2        (6)
                          K i=1                                   than other deep learning methods and achieve a running time
                                                                  of 0.03s per image, which demonstrates our method’s real-
where K is the layer number of discriminator, Di is i-th          time performance.
level classifier to classify whether an image is from the clean
image or generator G(x). The term penalizes the details of        Analysis of RR-GAN To demonstrate the effectiveness of
recovered image that looks different from clean images.           our generator used in RR-GAN, we compare it with other
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
Input               RainCNN                 DetailNet        DID-MDN                        Our                      Ground-Truth

                 Figure 4: Qualitative comparison with the state-of-the-arts on two randomly sampled Rain800.

           Dataset     R800                Time (s)                  From the table, one can observe that the best deraining
          Measure      PSNR     SSIM                              results can be obtain when both corresponding rain and rain-
             ID        18.88    0.5832       120s                 free images are used for training. However, other settings
            DSC        18.56    0.5996      189.3s                can still achieve comparable perfor- mance. This implies that
             LP        20.46    0.7297      674.8s                it is not necessary to have paired rain and rain-free scenes
          DetailNet    21.16    0.7320       0.3s                 during training (see setting 2).
          JORDER       22.24    0.7763       1.5s
          RESCAN       24.09    0.8410       0.4s                                           Pair(P)/
                                                                             Setting                           PSNR          SSIM
          RR-GAN       23.51    0.7566       0.03s                                         UnPair(U)
                                                                                     1         P                23.79        0.7951
Table 1: Quantitative experiments on Rain800. Best results                           2         U                23.51        0.7566
are marked in bold and the second best results are under-
lined.
                                                                  Analysis of Loss Function To better demonstrate the ef-
                                                                  fectiveness of our objective function, we conduct an abla-
network architectures. The first baseline is dense connec-        tion study by considering the combinations of the proposed
tion. To be specific, we replace the dense block with resid-      MSE loss Lr and the adversarial loss Ladv . Figure 5 and Ta-
ual block (ResNet) and fix other blocks in our model. One         ble 3 demonstrate qualitative and quantitative results on an
could find that the performance was dropped significantly         sample image, respectively. One can observe that by using
from 23.51/0.7566 (PSNR/SSIM) to 22.82/0.6991.                    MSE loss alone without regularization from the adversarial
   Besides the ablation study in generator, we also test the      loss, the quality of the recover image is very poor due to that
effectiveness of our deeply supervised multiscale discrim-        there is not sufficient infomation to allow generator to learn
inator. To the end, we adopt a new discriminator which            what makes the clean image. Using adversarial loss alone
only enforces the supervision at the last layer (shallowly-       can significantly improve the visual quality as the informa-
supervised discriminator). One could see that the perfor-         tion from clean image set help the generator lean use cues
mance is dropped to 21.43/0.7254.                                 to recover clean image, however the tone recovered image is
                                                                  bit yellowish as there is no consistency constraint between
  Component              Baseline          PSNR        SSIM       the recover image and the input image. By combining these
   Generator             ResNet            22.82       0.6991     two terms together, our method can produce recovered im-
 Discriminator     Shallow-supervision     21.43       0.7254     ages with realistic color and structures.
 Our Method             RR-GAN             23.51       0.7566

            Table 2: Ablation study on RR-GAN.

Analysis of Training Paradigm We analyze how the
presence of corresponding rain/rain-free scenes in the train-
ing samples can aeffect the model training. We first ran-                                (a) Input                  (b) Lr

domly split the Rain100H dataset into two halves (split 1 and
2). We use different combinations of the images for model
training, and then test the model on the rain images in split
2. Specifically, we train under 2 settings: the rain and clean
image are paired from split1 (setting 1), the clean images are
randomly drawn from split 1 (setting 2), whose testing per-               (c) Ladv                    (d) Lr+Ladv               (e) GroundTruth
formance on split 2’s rain images can be observed in Table..
                                                                         Figure 5: Qualitative studies on different loss.
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
Karol Gregor and Yann LeCun. Learning fast approxima-
      Table 3: Quantitative studies on different losses.         tions of sparse coding. In ICML, pages 399–406, USA,
         Metrics      Lr       Ladv     Lr + Ladv                2010.
          PSNR      17.08      21.33        23.51
          SSIM     0.3760 0.7328           0.7566                Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil-
                                                                 ian Q. Weinberger. Densely connected convolutional net-
                                                                 works. In CVPR, 2017.
                       Conclusion                                Q. Huynh-Thu and M. Ghanbari. Scope of validity of
                                                                 psnr in image/video quality assessment. Electronics Letters,
This paper proposed a novel RR-GAN for end-to-end single
                                                                 44(13):800–801, June 2008.
image deraining without using paired training data. The pro-
posed method includes a novel multiscale attention memory        Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A.
network and a novel deeply supervied multiscale discrim-         Efros. Image-to-image translation with conditional adver-
inator. The attention memory network uses a state mem-           sarial networks. In CVPR, pages 5967–5976, 2017.
ory to learn a latent rain-streak mask in a recurrent fash-      Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu. Automatic
ion to aggreate the rain context information from diffrent       single-image-based rain streaks removal via image decom-
stages. Together with the input image, the generator will try    position. IEEE TIP, 21(4):1742–1755, 2012.
to generate the recovered image by iusing un-paired rain and     Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
clean images training data. The proposed deeply supervised       Imagenet classification with deep convolutional neural net-
multiscale discriminator can effectively regularize the out-     works. In NIPS, pages 1106–1114, 2012.
put from the generator to look as realistic as possible to the
clean image sets in terms of local details and global struc-     C. Ledig, L. Theis, F. Huszr, J. Caballero, A. Cunning-
tures. Extensive experiments have demonstrated the promis-       ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and
ing performance of our method in terms of PSNR, SSIM,            W. Shi. Photo-realistic single image super-resolution using
running time and visual quality.                                 a generative adversarial network. In IEEE Conference on
                                                                 Computer Vision and Pattern Recognition, pages 105–114,
                                                                 Honolulu, HI, Jul. 2017.
                  Acknowledgement
                                                                 Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and
This was funded by the Deep learning 2.0. program at
                                                                 Michael S. Brown. Rain streak removal using layer priors.
I2R, A*STAR. The work of X. Peng was supported by the
                                                                 In CVPR, 2016.
Fundamental Research Funds for the Central Universities
under Grant YJ201748, by NFSC under Grant 61806135,              Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and
61432012 and Grant U1435213, and by the Fund of Sichuan          Dan Feng. An all-in-one network for dehazing and beyond.
UniversityTomorrow Advancing Life.                               CoRR, abs/1707.06543, 2017.
                                                                 Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin
                       References                                Zha. Recurrent squeeze-and-excitation context aggregation
                                                                 net for single image deraining. In ECCV, 2018.
Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein GAN. In International Conference on Machine          Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single
Learning, Jan. 2017.                                             image via discriminative sparse coding. In ICCV, 2015.
B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. Dehazenet:           Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau,
An end-to-end system for single image haze removal. IEEE         Zhen Wang, and Stephen Paul Smolley. Least squares gen-
Transactions on Image Processing, 25(11):5187–5198, Nov.         erative adversarial networks. In ICCV, 2017.
2016.                                                            Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao,
Yi Chang, Luxin Yan, and Sheng Zhong. Transformed low-           and Ming-Hsuan Yang. Single image dehazing via multi-
rank model for line pattern noise removal. In ICCV, 2017.        scale convolutional neural networks. In Bastian Leibe, Jiri
Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao          Matas, Nicu Sebe, and Max Welling, editors, ECCV, 2016.
Ding, and John Paisley. Removing rain from single images         Varun Santhaseelan and Vijayan K. Asari. Utilizing lo-
via a deep detail network. In CVPR, 2017.                        cal phase information to remove rain from video. IJCV,
Kshitiz Garg and Shree K. Nayar. Vision and rain. IJCV,          112(1):71–89, 2015.
75(1):3–27, 2007.                                                Abhishek Kumar Tripathi and Sudipta Mukhopadhyay. Re-
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats,       moval of rain from videos: a review. Signal, Image and
and Yann N. Dauphin. Convolutional sequence to sequence          Video Processing, 8(8):1421–1430, 2014.
learning. In ICML, 2017.                                         Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing            celli. Image quality assessment: from error visibility to
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,          structural similarity. IEEE Transactions on Image Process-
and Yoshua Bengio. Generative Adversarial Nets. In Ad-           ing, 13(4):600–612, April 2004.
vances in Neural Information Processing Systems, pages           Zhangyang Wang, Qing Ling, and Thomas S. Huang. Learn-
2672–2680, Montreal, Canada, Dec. 2014.                          ing deep l0 encoders. In AAAI, pages 2194–2200, 2016.
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
Zhangyang Wang, Yingzhen Yang, Shiyu Chang, Qing Ling,         Hongyuan Zhu, Romain Vial, and Shijian Lu. TOR-
and Thomas S. Huang. Learning A deep l∞ encoder for            NADO: A spatio-temporal convolutional regression network
hashing. In IJCAI, pages 2174–2180, 2016.                      for video action proposal. In IEEE International Conference
Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy            on Computer Vision, ICCV 2017, Venice, Italy, October 22-
Ren, and Yu-Wing Tai. Image dehazing using bilinear com-       29, 2017, pages 5814–5822, 2017.
position loss function. arXiv preprint arXiv:1710.00279,       Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.
2017.                                                          Efros. Unpaired image-to-image translation using cycle-
Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu,           consistent adversarial networks. In ICCV, 2017.
Zongming Guo, and Shuicheng Yan. Deep joint rain de-           Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.
tection and removal from a single image. In CVPR, 2017.        Efros. Unpaired image-to-image translation using cycle-
                                                               consistent adversarial networks. In ICCV, 2017.
Zili Yi, Hao Zhang, and Ping Tan Minglun Gong. DualGAN:
Unsupervised Dual Learning for Image-to-Image Transla-         H. Zhu, R. Vial, S. Lu, X. Peng, H. Fu, Y. Tian, and X. Cao.
tion. In IEEE International Conference on Computer Vision,     Yotube: Searching action proposal via recurrent and static
pages 502–510, Venice, Italy, October 2017.                    regression networks. IEEE Transactions on Image Process-
                                                               ing, 27(6):2609–2622, June 2018.
He Zhang and Vishal M. Patel. Density-aware single image
de-raining using a multi-stream dense network. In CVPR,        Hongyuan Zhu, Xi Peng, Vijay Chandrasekhar, Liyuan Li,
2018.                                                          and Joo-Hwee Lim. Dehazegan: When image dehazing
                                                               meets differential programming. In Proceedings of the
Xiaopeng Zhang, Hao Li, Yingyi Qi, Wee Kheng Leow, and         Twenty-Seventh International Joint Conference on Artificial
Teck Khim Ng. Rain removal in video by combining tem-          Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Swe-
poral and chromatic properties. In ICME, 2006.                 den., pages 1234–1240, 2018.
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei
Huang, Xiaogang Wang, and Dimitris N Metaxas. Stack-
GAN - Text to Photo-realistic Image Synthesis with Stacked
Generative Adversarial Networks. In IEEE International
Conference on Computer Vision, Venice, Italy, Oct. 2017.
He Zhang, Vishwanath Sindagi, and Vishal M. Patel. Image
de-raining using a conditional generative adversarial net-
work. CoRR, abs/1701.05957, 2017.
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
Lei Zhang. Beyond a gaussian denoiser: Residual learning
of deep CNN for image denoising. IEEE Trans. Image Pro-
cessing, 26(7):3142–3155, 2017.
Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-
based Generative Adversarial Network. In International
Conference on Learning Representations, Toulon, France,
Apr 2017.
Joey Tianyi Zhou, Kai Di, Jiawei Du, Xi Peng, Hao Yang,
Sinno Jialin Pan, Ivor Tsang, Yong Liu, Zheng Qin, and Rick
Siow Mong Goh. Sc2net: Sparse lstms for sparse coding. In
AAAI, 2018.
Hongyuan Zhu, Shijian Lu, Jianfei Cai, and Guangqing Lee.
Diagnosing state-of-the-art object proposal methods. In Pro-
ceedings of the British Machine Vision Conference 2015,
BMVC 2015, Swansea, UK, September 7-10, 2015, pages
11.1–11.12, 2015.
Hongyuan Zhu, Jiangbo Lu, Jianfei Cai, Jianmin Zheng,
Shijian Lu, and Nadia Magnenat-Thalmann. Multiple hu-
man identification and cosegmentation: A human-oriented
CRF approach with poselets. IEEE Trans. Multimedia,
18(8):1516–1530, 2016.
Hongyuan Zhu, Jean-Baptiste Weibel, and Shijian Lu. Dis-
criminative multi-modal feature fusion for RGBD indoor
scene recognition. In 2016 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV,
USA, June 27-30, 2016, pages 2969–2976, 2016.
Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ... Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...
You can also read