Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective - Vijay ...

Page created by Lisa Williams

Shopping

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Singe Image Rain Removal with Unpaired Information:
A Differentiable Programming Perspective

Hongyuan Zhu1 , Vijay Chanderasekh1 , Liyuan Li1 Joo-Hwee Lim1
1
Institute for Infocomm Research, A*STAR, Singapore,
{zhuh, vijay, lyli, joo-hwee}@i2r.a-star.edu.sg

Abstract
Single image rain-streak removal is an extremely challenging
problem due to the presence of non-uniform rain densities
in images. Previous works solve this problem using various
hand-designed priors or by explicitly mapping synthetic rain
to paired clean image in a supervised way. In practice, how- (a) Input (b) Ground truth (c) Our result
ever,the pre-defined priors are easily violated and the paired Figure 1: A visual illustration of single image deraining. The
training data are hard to collect. To overcome these limi- target is to recover a clean image from the input rain image.
tations, in this work, we propose RainRemoval-GAN (RR- Our method produces a recovered image with faithful color
GAN), the first end-to-end adversarial model that generates and structure without using any paired training data.
realistic rain-free images using only unpaired supervision.
Our approach alleviates the paired training constraints by in- where X and R denote the desired clean image and the rain
troducing a physical-model which explicitly learns a recov-
ered images and corresponding rain-streaks from the differen-
streaks, respectively. Single image rain-streak removal aims
tiable programming perspective. The proposed network con- to estimate the hidden X and R from a given I, which is
sists of a novel multiscale attention memory generator and a an unconstrained problem because we need to estimate two
novel multiscale deeply supervised discriminator. The multi- unknown variables and there are infinitely solutions if no
scale attention memory generator uses a memory with atten- further regularization are imposed.
tion mechanism to capture the latent rain streaks context at To make the rain streak removal problem trackable, exist-
different stages to recover the clean images. The deeply su- ing methods can be roughly grouped into two categories:
pervised multiscale discriminator imposes constraints at the prior-based and data-driven. The prior-based methods es-
recovered output in terms of local details and global appear-
timate the rain-streak based on various priors or assump-
ance to the clean image set. Together with the learned rain-
streaks, a reconstruction constraint is employed to ensure tions. Typical priors include but not limited to sparse-coding
the appearance consistent with the input image. Experimen- prior (Luo et al. 2015), low-rank prior (Chang et al. 2017)
tal results on public benchmark demonstrates our promising and Gaussian prior (Li et al. 2016). Despite remarkable
performance compared with nine state-of-the-art methods in progress achieved by adopting these priors, they are easily
terms of PSNR, SSIM, visual qualities and running time. violated in practice given the rain-streak does not strictly fol-
low the Gaussian or sparse distribution and the background
Introduction scenes is cluttered and contains complex illuminations.
Rain, snow, and fog are common visual artifacts which In recent, the interest has shifted to data-driven ap-
affects many vision-based applications (Zhu et al. 2017a; proaches which utilize labeled data with paired rain im-
2016a; 2018a; 2015; 2016b), such as drone-based video age and rain streak to learn a deep neural network (Fu
surveillance and self-driving cars. The performance of many et al. 2017; Yang et al. 2017b; Zhang and Patel 2018;
computer vision systems often exhibits significant drop Li et al. 2018). Given an input image, the rain streaks or
when they are presented with images that contain these ar- clean image can be regressed with given input rain im-
tifacts. Hence, it is highly practical and expected to de- age. More recently, inspired by the huge success of gener-
velop automatic artifacts removal methods (Li et al. 2017; ative adversarial networks (GAN) in image-to-image trans-
Zhang et al. 2017c). In this paper, we mainly focus on the lation tasks (Goodfellow et al. 2014; Isola et al. 2017),
problem of rain streak removal from a single image. (Zhang et al. 2017b) propose a GAN-based image derain-
A rain image I can be modeled as an image composition ing method which employs a generator to map input image
problem (Luo et al. 2015): to the ground-truth clean image. One major limitation of the
method is the necessity of paired training data and it is ex-
I =X +R (1) tremely challenging to collect paired training data given the
Copyright c 2019, Association for the Advancement of Artificial changing environment.
Intelligence (www.aaai.org). All rights reserved. However, collecting paired training data is extremely

challenging given the changing environment. One popular Related Work
solution is simulating rain in controlled environments. How-
In this section, we briefly review several recent related works
ever, most existing simulations are usually too simplified to
on single image de-raining and generative adversarial net-
depict the complexity of real-world. Another common solu-
works.
tion is using photo editing tools to add streak to clean nat-
ural images with a various rain-density levels with different
orientations and scales, but this method limits the scale of Single Image De-raining
training data. Image deraining has been a classic image restoration prob-
In fact, one could observe that it is easier to collect a large lem for years. Some early methods (Zhang et al. 2006;
number of rain/rain-free images from Internet despite these Garg and Nayar 2007; Santhaseelan and Asari 2015; Tri-
images are not in pair. Thus, it is highly excepted to develop pathi and Mukhopadhyay 2014) exploit photometric con-
derain model which can utilize unpaired supervision. Note sistency and temporal dynamics for rain removal. Although
that, CycleGAN (Zhu et al. 2017b) recently has become these methods achieve promising performance, they are not
popular to learn cross-style image translation by using bidi- applicable to the single image setting.
rectional constrains with adversarial learning. Although Cy- Unlike video-based methods, prior-based methods have
cleGAN has achieved impressive performance in style trans- been proposed to handle single image deraining problem by
lation, it is designed for style translation problem and may assuming the rain-streaks follows certain assumption. There
not preserve the appearance consistency in the translated re- are many hand-crafted priors have been proposed, e.g. sparse
sult. coding-based prior (Luo et al. 2015), low-rank prior (Chang
Based on the above observations, we propose a novel et al. 2017) and gaussian prior (Li et al. 2018), just to name a
single image deraining method called RainRemoval-GAN few. One major limitation of these methods is that the priors
(RR-GAN) which is specifically designed based on rain im- are easily violated which results in over-/under-estimation
age composition model in Eq.1. The proposed RR-GAN is of the rain-streaks (Zhang and Patel 2018).
significantly different from CycleGAN and its variants in Recently, deep learning has become popular in both high-
structure and application. More specifically, we propose a level and low-level vision tasks (Cai et al. 2016; Ren et al.
novel generator which uses an attention memory to cap- 2016; Yang et al. 2017a), several CNN-based methods have
tures the latent rain streaks contexts in a recurrent fashion also been proposed for image de-raining (Fu et al. 2017;
to recover clean image X. Moreover, we propose a novel Yang et al. 2017b; Zhang and Patel 2018; Li et al. 2018).
deeply-supervised multiscale discriminator to regularize the In these methods, the idea is to learn a mapping between
recovered image to look as realistic as possible to the rain- input rainy images and their corresponding rain-streaks us-
free images. Furthermore, these latent factors are compos- ing a CNN structure. (Yang et al. 2017b) design a deep di-
ited together as in Eq.1 to reconstruct the original rain im- lated network to joint detect and remove rain streaks. (Li et
age to preserve faithful color and structure, thus avoiding al. 2018) propose incorporate recurrent neural network to
the inefficient CycleGAN’s bi-directional consistency train- the dilaeted network of Yang et al. to preserve multi-stage
ing paradigm. contexts. (Zhang and Patel 2018) propose to use multi-scale
The major contributions of this work are summarized as densely connected network trained with additional rain den-
follows: sity classifier to predict the rain streaks. In summary, these
methods aim to learn a mapping using synthesized rain/clean
• To the best of our knowledge, this is one of first works to image pairs which are hard to collect in large scale.
marriage CycleGAN for single image deraining, so that Our methods’s generator is similar to (Zhang and Pa-
makes rain-removal training with unpaired data possible. tel 2018) as we combines coarse- and fine-scale network
The novel GAN consists of a novel generator and discrim- for feature extraction, however we introduce a novel at-
inator which is specifically designed by incorporating the tention memory network to learn the rain-streaks automati-
rain image generation model with un-paired training in- cally from data in a recurrent fashion. The attention memory
formation. network is inspired by recent work in language translation
• A novel multi-scale attention memory generator is pro- (Gehring et al. 2017) which shows that the performance of
posed with an attention memory to fuse the contexts from machine translation models could be significantly improved
coarse-scale and fine-scale densely connected network to by solely using an attention model instead of using addi-
recurrently learns the rain-streaks to recover the clean im- tional gate in recurrent networks as in (Li et al. 2018), which
age using un-paired training data. significantly improve the inference speed. Moreover, our
generator can explicitly learns two disentangled latent pa-
• A novel multiscale deeply-supervised discriminator is rameters which corresponds to clean image and rain streaks
proposed to regularize the generated recover image as re- from data without paired information, hence significantly re-
alistic as possible to the target image in terms of both low- duce the labeling efforts.
level details and high-level structures.
• Extensive experiments on public benchmark demon- Generative Adversarial Networks
strates our method’s efficiency and effectiveness in terms Recently generative adversarial networks (Goodfellow et al.
of quantitative and qualitative performance. 2014; Arjovsky et al. 2017; Zhao et al. 2017) have achieved

Multiscale Attention
Memory Generator

Gf Gf
Dense Dense Recovered
Block Block Image

…....

…....
Adversarial Regularization
Dense Dense
Block Block
Input Image Real/Fake
Real/Fake
Attention Real/Fake
Attention ….. Real/Fake
Memory Memory

Multiscale Deeply
Mask n Supervised Discriminator

Mask

Consistency Regularization Reconstructed
Image

Figure 2: The pipeline of our method. Our RR-GAN consists of a multi-scale attention memory generator and a multiscale
deeply-supervised discriminator. The generator uses un-paired rain and clean images to train an attention memory to aggregate
the latent rain streak context from different stages. The rain image together with the latent rain-streak imasks are used as an
input to the U-Net to generate the recovered image. This recovered image is regularized by a deeply-supervised multiscale
discriminator so that its appearance is as close as possible to the clean images used for training in terms of both low-level
details and global-level structures.

significant progress. The basic idea of GANs is transform- attention memory generator using multiscale densely con-
ing the white noise (or other specified prior) through a para- nected networks and attention memory in a recurrent fashion
metric model to generate candidate samples with the help of to learn discriminative rain features and latent rain streaks
a discriminator and a generator. By optimizing a minimax to recover the clean image. Furthermore, our network con-
two-player game, the generator aims to learn the training tains of a novel deeply-supervised multi-scale discriminator
data distribution, and the discriminator aims to judge that training regime, so that the local details to global image ap-
a sample comes from the training data or the generator. An pearance is enforced to look realistic to the clean image set.
image can be translated into an output by the conditional Furthermore, these latent factors are composited together as
generator, which has various applications, such as image su- in Eq.1 to reconstruct the original rain image to preserve
per resolution (Ledig et al. 2017), text2image (Zhang et al. faithful color and structure, thus avoiding the inefficient Cy-
2017a), image2image (Yi et al. 2017). However, condition- cleGAN’s bi-directional consistency training paradigm.
alGAN requires paired training images, whose ground-truth
can be very difficult to get. Differentiable Programming
This paper is inspired by the recent popular Cycle- Our work belongs to the family of differentiable program-
GAN (Zhu et al. 2017b) which can translate one image from ming which treats a program as neural network such that the
one domain to another domain without paired information. program can be parametrized, automatically differentiated
However, our work is remarkably different from CycleGAN. and optimizable. The first well-known work of differentiable
In brief, our architecture is specifically designed for image programming is the Learned ISTA (LISTA) (Gregor and Le-
deraining so that the original image’s color and structure Cun 2010), which unfolds the popular l1 solver ISTA as a
could be well preserved. To the best of our knowledge, this simple RNN such that the number of layers corresponds to
could one of the first works to develop specific single im- the iteration number and the weight corresponds to dictio-
age deraining model that incorporates un-paired adversar- nary. The LISTA’s RNN paradigm have been applied in a
ial learning. Our network is remarkably distinct from Cy- wide range of tasks, e.g. hashing (Wang et al. 2016a), clas-
cleGAN in following aspects. First, we proposes a novel sification (Wang et al. 2016b), sparse coding (Zhou et al.

2018), image dehazing (Zhu et al. 2018b) and etc. is as follows:
Different from conventional differentiable programming at = σ(W1 [Xc , Xf , Mt−1 , St−1 ] + b1 )
using RNN with difficult-to-interpret variables, our method St = at · St−1 (2)
reformulate the image degradation model using a feed-
Mt = tanh(W2 ∗ St + b2 )
forward convolutional neural network with prior knowledge.
Therefore, our formulation is more interpreatable and effi- where W1 and b1 are the weight and bias of a convolutional
cient then conventional RNN based DP solver. layer with 3 × 3 kernels of one neuron, σ denotes the sig-
1
moid function σ(x) = 1+exp(−x) . W2 and b2 are the weight
and bias of a deconvolutional layer with 3 × 3 kernels of
The Method one neuron, which are used to output the rain-streak mask.
In Fig.2, one can observe that the rain mask is recurrently
As shown in Fig.2, our RR-GAN consists of two networks, improved by using the contexts from the previous stages.
namely, a multiscale attention memory generator (MAMG) The input image together the final attention map are con-
and a multiscale deeply-supervised discriminator (MDSD) catenated to feed into the U-Net auto-encoder for generat-
are specifically designed for single image deraining. In brief, ing the rain-free image. Our deep autoencoder includes 16
MAMG uses an attention memory to recurrently attend to convolutional (relu) blocks, which adopts skip connections
the rain-streak regions for the purpose of recovery of the to prevent blurred outputs. To show the effectiveness of our
clean image. MDSD employs a deeply supervision to re- method, Fig. 3 shows the visual results of different gener-
cover image from the generator by enforcing it look as re- ators, the one which combines attention memory and skip
alistic as possible to the clean image set. U-Net auto-encoder yielded best visual result.

Multiscale Attention Memory Network
The multiscale attention memory network consists of an at-
tentive memory network and a U-Net autoencoder with skip-
connection. The former uses a multiscale feature extractor
Gf to extract rain-relevant regions features. To achieve bet-
ter performance, Gf combines the complementary coarse-
scale and dense-scale context using recently proposed dense (a) (b)
connection (Huang et al. 2017). The fine-scale dense net-
work applies densely connected modular with small ker-
nels to capture fine-scale structures relevant to rain re-
gions with a structure of C(1, 3)-C(3, 3)-C(5, 3)-C(7, 3) ,
where C(k1 , k2 ) denotes the convolution with a filter of size
k1 × k1 , an output channel number of k2 and a stride of
1 with ReLu output (Krizhevsky et al. 2012). Each layer
are densely connected with other layers. The coarse-scale (c) (d)
branch applies a similar architecture but with larger kernels
Figure 3: A visual comparison on the effectiveness of dif-
to capture longer-range context, whose structure is of C(7,
ferent generators. (a) Input; (b) the result of using context
3)-C(9, 3)-C(11, 3)-C(1, 3). The feature maps Xc and Xf
auto-encoder alone; (c) the result of our generator without
from coarse-scale and fine-scale networks respectively, are
using attention memory; (d) the result of our multiscale at-
concatenated and fed through the attention memory network
tention memory generator.
to learn rain regions.
Visual attention models are adopted to localize relevant Multiscale Deeply-Supervised Discriminator
regions in an image so that the task-specific features are ob- The function of discriminator is to regularize the output from
tained. In this paper, we introduce visual attention to learn the generator so that the generated image looks as realistic as
where the rain-regions should be focused on. As shown in possible to the target set clean images. Recently, a shallowly-
our architecture in Fig. 2, we present an attention memory supervised discriminator with discriminator with supervi-
network which is inspired by recent convolutional sequence- sion at the last layer has become popular in recent popular
to-sequence learning task in machine translation (Gehring et gan architecrtures (e.g CycleGAN (Zhu et al. 2017c) and
al. 2017). Our attention memory network sequentially uses ConditionalGAN (Isola et al. 2017)) which achieves good
attention to replace the recurrent networks and the attention image translation performance. However, such shallowly-
map is iteratively refined given the accumulated context. At supervised discriminator overlooks the low-level details and
each time stamp t, our attention memory network concate- global structures which are useful for image enhancement
nates Xc , Xf , Mt−1 , and St−1 , where Xc is the input from tasks.
coarse-scale network, Xf is from fine-scale network, Mt−1 To overcome the above disadvantage, we propose a novel
is the rain streak mask at the previous stage t − 1, and St deeply-supervised discriminator which consists of four con-
denotes the state memory in previous stage. Note that, the volutional layers, which is with the structure of C(3, 64)-
initial M0 and S0 are set to zero. The detailed updating rule C(3, 128)-C(3, 256)-C(3, 512), where each convolutional

layer has a stride of two. Each convolutional feature map Training Details
will be passed through an instance normalization layer with
During training, a 286 × 286 image is randomly cropped
Leaky-ReLu activations and then be fed into the next con-
from the input image (or its horizontal flip) of size 256256.
volutional layer. Moreover, each side-output follows by an
Adam is used as optimization algorithm with a mini-batch
additional convolution layer of C(1, 1) with sigmoid to out-
size of 1. The learning rate starts from 0.001. The models
put the probability of each patch to the clean images. Hence
are trained for up to 10 epochs to ensure convergence. We
we will have four predictions which provide multiple level
use a weight decay of 0.0001 and a momentum of 0.9. The
supervision to provide regularization at to make the gener-
entire network is trained using the Pytorch framework. Dur-
ated image as realistic as the ground-truth image in terms of
ing training, we set γ = 1. All the parameters are defined
low-level details and high-level structures.
via cross-validation using the validation set.
Objective Function
Experiments
To stabilize the training of discriminator D, we minimize an
objective function Ld that consists of two terms LDreal and We evaluate our method on public benchmark. We quan-
LDf ake as in (Mao et al. 2017): titatively evaluate the rain-removal performance using two
commonly used metrics, including peak signal to noise ratio
Ld (G, D) =LDreal + LDf ake (PSNR) (Huynh-Thu and Ghanbari 2008) and structure sim-
K ilarity index (SSIM) (Wang et al. 2004) for evaluation. We
1 X also provide ablation study of each component proposed to
= {Ey (1 − Di (x, y))2 + (3)
2K i=1 demonstrate their effectiveness.
Ex (Di (G(x))2 } We use Rain800 (Zhang and Patel 2018) for benchmark-
ing. The Rain800 dataset contains 700 synthesized images
which aims to train the discriminator Di so that it can try to for training and 100 images for testing using randomly sam-
differentiate the fake examples G(x) from generator G and pled outdoor images.
the real clean image y, where Ex is the expectation of x and
Di∈{1,2,...,K} is the classifier at ith level side-output. Comparing Methods We compare our proposed ap-
On the other hand, when we train the generator G, we proach with 7 state-of-the-art methods, including image de-
optimize an objective function Lg which consists of a MSE composition (ID) (Kang et al. 2012), discriminative sparse
loss Lr (Ren et al. 2016) and an adversarial learning loss coding (DSC) (Luo et al. 2015), layer priors and (LP) (Li et
Ladv : al. 2016), DetailsNet (Fu et al. 2017), and joint rain detec-
tion and removal (JORDER) (Yang et al. 2017b) and RES-
Lg = Lr + γLadv (4) CAN (Li et al. 2018).
These three terms are designed for minimizing the recon-
struction error and improving the perceptual quality, re- Results on Synthetic Datasets Table 1 shows quantita-
spectively. γ is a trade-off factors set according to cross- tive results on the Rain800. One can observe that our RR-
validation. GAN considerably achieves promising results in terms of
The MSE Loss Lm encourages the network to enforce both PSNR and SSIM on Rain800.
the appearance consistency between the recovered image From Table 1, data-driven methods, especially the deep
X and input image Il using the estimated rain mask R by learning based methods (Fu et al. 2017; Yang et al. 2017b;
minimizing the discrepancy between the composite image Zhang and Patel 2018; Li et al. 2018) significantly outper-
It = X + R and the input image Il : form the prior-based methods (Kang et al. 2012; Luo et al.
2015; Li et al. 2016). The comparison demonstrates that the
W X H X C features learned by the deep neural networks is much helpful
1 X
Lr = kI i,j,c − Ili,j,c k2 (5) for the rain-removal task.
C × W × H i=1 j=1 c=1 t
On the one hand, the proposed RR-GAN is trained with
unpaired training data, which outperform the JORDER
The W , H, and C are the width, height, and channel number
method on the Rain800 dataset. Note that, JORDER is
of the input image It .
trained using the paired supervised data. In other words, it
The Adversarial Loss Ladv encourages the generator G utilizes a stronger supervisor than our method does. The re-
to recover image G(x) as realistic as the ground-truth image sult demonstrates that it is feasible and promising to develop
y by assigning a real label 1 to recovered example G(x): deraining model using unpaired data which could be easily
K accessed in practice.
1 X In terms of running time, our method is also much faster
Ladv (G, D) = Ex (1 − Di (G(x)))2 (6)
K i=1 than other deep learning methods and achieve a running time
of 0.03s per image, which demonstrates our method’s real-
where K is the layer number of discriminator, Di is i-th time performance.
level classifier to classify whether an image is from the clean
image or generator G(x). The term penalizes the details of Analysis of RR-GAN To demonstrate the effectiveness of
recovered image that looks different from clean images. our generator used in RR-GAN, we compare it with other

Input               RainCNN                 DetailNet        DID-MDN                        Our                      Ground-Truth

                 Figure 4: Qualitative comparison with the state-of-the-arts on two randomly sampled Rain800.

           Dataset     R800                Time (s)                  From the table, one can observe that the best deraining
          Measure      PSNR     SSIM                              results can be obtain when both corresponding rain and rain-
             ID        18.88    0.5832       120s                 free images are used for training. However, other settings
            DSC        18.56    0.5996      189.3s                can still achieve comparable perfor- mance. This implies that
             LP        20.46    0.7297      674.8s                it is not necessary to have paired rain and rain-free scenes
          DetailNet    21.16    0.7320       0.3s                 during training (see setting 2).
          JORDER       22.24    0.7763       1.5s
          RESCAN       24.09    0.8410       0.4s                                           Pair(P)/
                                                                             Setting                           PSNR          SSIM
          RR-GAN       23.51    0.7566       0.03s                                         UnPair(U)
                                                                                     1         P                23.79        0.7951
Table 1: Quantitative experiments on Rain800. Best results                           2         U                23.51        0.7566
are marked in bold and the second best results are under-
lined.
                                                                  Analysis of Loss Function To better demonstrate the ef-
                                                                  fectiveness of our objective function, we conduct an abla-
network architectures. The first baseline is dense connec-        tion study by considering the combinations of the proposed
tion. To be specific, we replace the dense block with resid-      MSE loss Lr and the adversarial loss Ladv . Figure 5 and Ta-
ual block (ResNet) and fix other blocks in our model. One         ble 3 demonstrate qualitative and quantitative results on an
could find that the performance was dropped significantly         sample image, respectively. One can observe that by using
from 23.51/0.7566 (PSNR/SSIM) to 22.82/0.6991.                    MSE loss alone without regularization from the adversarial
   Besides the ablation study in generator, we also test the      loss, the quality of the recover image is very poor due to that
effectiveness of our deeply supervised multiscale discrim-        there is not sufficient infomation to allow generator to learn
inator. To the end, we adopt a new discriminator which            what makes the clean image. Using adversarial loss alone
only enforces the supervision at the last layer (shallowly-       can significantly improve the visual quality as the informa-
supervised discriminator). One could see that the perfor-         tion from clean image set help the generator lean use cues
mance is dropped to 21.43/0.7254.                                 to recover clean image, however the tone recovered image is
                                                                  bit yellowish as there is no consistency constraint between
  Component              Baseline          PSNR        SSIM       the recover image and the input image. By combining these
   Generator             ResNet            22.82       0.6991     two terms together, our method can produce recovered im-
 Discriminator     Shallow-supervision     21.43       0.7254     ages with realistic color and structures.
 Our Method             RR-GAN             23.51       0.7566

            Table 2: Ablation study on RR-GAN.

Analysis of Training Paradigm We analyze how the
presence of corresponding rain/rain-free scenes in the train-
ing samples can aeffect the model training. We first ran-                                (a) Input                  (b) Lr

domly split the Rain100H dataset into two halves (split 1 and
2). We use different combinations of the images for model
training, and then test the model on the rain images in split
2. Specifically, we train under 2 settings: the rain and clean
image are paired from split1 (setting 1), the clean images are
randomly drawn from split 1 (setting 2), whose testing per-               (c) Ladv                    (d) Lr+Ladv               (e) GroundTruth
formance on split 2’s rain images can be observed in Table..
                                                                         Figure 5: Qualitative studies on different loss.

Karol Gregor and Yann LeCun. Learning fast approxima-
Table 3: Quantitative studies on different losses. tions of sparse coding. In ICML, pages 399–406, USA,
Metrics Lr Ladv Lr + Ladv 2010.
PSNR 17.08 21.33 23.51
SSIM 0.3760 0.7328 0.7566 Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil-
ian Q. Weinberger. Densely connected convolutional net-
works. In CVPR, 2017.
Conclusion Q. Huynh-Thu and M. Ghanbari. Scope of validity of
psnr in image/video quality assessment. Electronics Letters,
This paper proposed a novel RR-GAN for end-to-end single
44(13):800–801, June 2008.
image deraining without using paired training data. The pro-
posed method includes a novel multiscale attention memory Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A.
network and a novel deeply supervied multiscale discrim- Efros. Image-to-image translation with conditional adver-
inator. The attention memory network uses a state mem- sarial networks. In CVPR, pages 5967–5976, 2017.
ory to learn a latent rain-streak mask in a recurrent fash- Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu. Automatic
ion to aggreate the rain context information from diffrent single-image-based rain streaks removal via image decom-
stages. Together with the input image, the generator will try position. IEEE TIP, 21(4):1742–1755, 2012.
to generate the recovered image by iusing un-paired rain and Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
clean images training data. The proposed deeply supervised Imagenet classification with deep convolutional neural net-
multiscale discriminator can effectively regularize the out- works. In NIPS, pages 1106–1114, 2012.
put from the generator to look as realistic as possible to the
clean image sets in terms of local details and global struc- C. Ledig, L. Theis, F. Huszr, J. Caballero, A. Cunning-
tures. Extensive experiments have demonstrated the promis- ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and
ing performance of our method in terms of PSNR, SSIM, W. Shi. Photo-realistic single image super-resolution using
running time and visual quality. a generative adversarial network. In IEEE Conference on
Computer Vision and Pattern Recognition, pages 105–114,
Honolulu, HI, Jul. 2017.
Acknowledgement
Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and
This was funded by the Deep learning 2.0. program at
Michael S. Brown. Rain streak removal using layer priors.
I2R, A*STAR. The work of X. Peng was supported by the
In CVPR, 2016.
Fundamental Research Funds for the Central Universities
under Grant YJ201748, by NFSC under Grant 61806135, Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and
61432012 and Grant U1435213, and by the Fund of Sichuan Dan Feng. An all-in-one network for dehazing and beyond.
UniversityTomorrow Advancing Life. CoRR, abs/1707.06543, 2017.
Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin
References Zha. Recurrent squeeze-and-excitation context aggregation
net for single image deraining. In ECCV, 2018.
Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein GAN. In International Conference on Machine Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single
Learning, Jan. 2017. image via discriminative sparse coding. In ICCV, 2015.
B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. Dehazenet: Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau,
An end-to-end system for single image haze removal. IEEE Zhen Wang, and Stephen Paul Smolley. Least squares gen-
Transactions on Image Processing, 25(11):5187–5198, Nov. erative adversarial networks. In ICCV, 2017.
2016. Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao,
Yi Chang, Luxin Yan, and Sheng Zhong. Transformed low- and Ming-Hsuan Yang. Single image dehazing via multi-
rank model for line pattern noise removal. In ICCV, 2017. scale convolutional neural networks. In Bastian Leibe, Jiri
Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Matas, Nicu Sebe, and Max Welling, editors, ECCV, 2016.
Ding, and John Paisley. Removing rain from single images Varun Santhaseelan and Vijayan K. Asari. Utilizing lo-
via a deep detail network. In CVPR, 2017. cal phase information to remove rain from video. IJCV,
Kshitiz Garg and Shree K. Nayar. Vision and rain. IJCV, 112(1):71–89, 2015.
75(1):3–27, 2007. Abhishek Kumar Tripathi and Sudipta Mukhopadhyay. Re-
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, moval of rain from videos: a review. Signal, Image and
and Yann N. Dauphin. Convolutional sequence to sequence Video Processing, 8(8):1421–1430, 2014.
learning. In ICML, 2017. Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing celli. Image quality assessment: from error visibility to
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, structural similarity. IEEE Transactions on Image Process-
and Yoshua Bengio. Generative Adversarial Nets. In Ad- ing, 13(4):600–612, April 2004.
vances in Neural Information Processing Systems, pages Zhangyang Wang, Qing Ling, and Thomas S. Huang. Learn-
2672–2680, Montreal, Canada, Dec. 2014. ing deep l0 encoders. In AAAI, pages 2194–2200, 2016.

Zhangyang Wang, Yingzhen Yang, Shiyu Chang, Qing Ling,         Hongyuan Zhu, Romain Vial, and Shijian Lu. TOR-
and Thomas S. Huang. Learning A deep l∞ encoder for            NADO: A spatio-temporal convolutional regression network
hashing. In IJCAI, pages 2174–2180, 2016.                      for video action proposal. In IEEE International Conference
Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy            on Computer Vision, ICCV 2017, Venice, Italy, October 22-
Ren, and Yu-Wing Tai. Image dehazing using bilinear com-       29, 2017, pages 5814–5822, 2017.
position loss function. arXiv preprint arXiv:1710.00279,       Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.
2017.                                                          Efros. Unpaired image-to-image translation using cycle-
Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu,           consistent adversarial networks. In ICCV, 2017.
Zongming Guo, and Shuicheng Yan. Deep joint rain de-           Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.
tection and removal from a single image. In CVPR, 2017.        Efros. Unpaired image-to-image translation using cycle-
                                                               consistent adversarial networks. In ICCV, 2017.
Zili Yi, Hao Zhang, and Ping Tan Minglun Gong. DualGAN:
Unsupervised Dual Learning for Image-to-Image Transla-         H. Zhu, R. Vial, S. Lu, X. Peng, H. Fu, Y. Tian, and X. Cao.
tion. In IEEE International Conference on Computer Vision,     Yotube: Searching action proposal via recurrent and static
pages 502–510, Venice, Italy, October 2017.                    regression networks. IEEE Transactions on Image Process-
                                                               ing, 27(6):2609–2622, June 2018.
He Zhang and Vishal M. Patel. Density-aware single image
de-raining using a multi-stream dense network. In CVPR,        Hongyuan Zhu, Xi Peng, Vijay Chandrasekhar, Liyuan Li,
2018.                                                          and Joo-Hwee Lim. Dehazegan: When image dehazing
                                                               meets differential programming. In Proceedings of the
Xiaopeng Zhang, Hao Li, Yingyi Qi, Wee Kheng Leow, and         Twenty-Seventh International Joint Conference on Artificial
Teck Khim Ng. Rain removal in video by combining tem-          Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Swe-
poral and chromatic properties. In ICME, 2006.                 den., pages 1234–1240, 2018.
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei
Huang, Xiaogang Wang, and Dimitris N Metaxas. Stack-
GAN - Text to Photo-realistic Image Synthesis with Stacked
Generative Adversarial Networks. In IEEE International
Conference on Computer Vision, Venice, Italy, Oct. 2017.
He Zhang, Vishwanath Sindagi, and Vishal M. Patel. Image
de-raining using a conditional generative adversarial net-
work. CoRR, abs/1701.05957, 2017.
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
Lei Zhang. Beyond a gaussian denoiser: Residual learning
of deep CNN for image denoising. IEEE Trans. Image Pro-
cessing, 26(7):3142–3155, 2017.
Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-
based Generative Adversarial Network. In International
Conference on Learning Representations, Toulon, France,
Apr 2017.
Joey Tianyi Zhou, Kai Di, Jiawei Du, Xi Peng, Hao Yang,
Sinno Jialin Pan, Ivor Tsang, Yong Liu, Zheng Qin, and Rick
Siow Mong Goh. Sc2net: Sparse lstms for sparse coding. In
AAAI, 2018.
Hongyuan Zhu, Shijian Lu, Jianfei Cai, and Guangqing Lee.
Diagnosing state-of-the-art object proposal methods. In Pro-
ceedings of the British Machine Vision Conference 2015,
BMVC 2015, Swansea, UK, September 7-10, 2015, pages
11.1–11.12, 2015.
Hongyuan Zhu, Jiangbo Lu, Jianfei Cai, Jianmin Zheng,
Shijian Lu, and Nadia Magnenat-Thalmann. Multiple hu-
man identification and cosegmentation: A human-oriented
CRF approach with poselets. IEEE Trans. Multimedia,
18(8):1516–1530, 2016.
Hongyuan Zhu, Jean-Baptiste Weibel, and Shijian Lu. Dis-
criminative multi-modal feature fusion for RGBD indoor
scene recognition. In 2016 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV,
USA, June 27-30, 2016, pages 2969–2976, 2016.