Pixel-Wise Crack Detection Using Deep Local Pattern Predictor for Robot Application - MDPI

Page created by Fred Adkins
 
CONTINUE READING
Pixel-Wise Crack Detection Using Deep Local Pattern Predictor for Robot Application - MDPI
sensors
Article
Pixel-Wise Crack Detection Using Deep Local Pattern
Predictor for Robot Application
Yundong Li 1 , Hongguang Li 2, * and Hongren Wang 3
 1   School of Electronic and Information Engineering, North China University of Technology,
     Beijing 100144, China; liyundong@ncut.edu.cn
 2   Unmanned Systems Research Institute, Beihang University, Beijing 100191, China
 3   School of Electronic and Information Engineering, Beihang University, Beijing 100191, China;
     wanghongren616@hotmail.com
 *   Correspondence: lihongguang@buaa.edu.cn
                                                                                                    
 Received: 14 July 2018; Accepted: 7 September 2018; Published: 11 September 2018                   

 Abstract: Robotic vision-based crack detection in concrete bridges is an essential task to preserve
 these assets and their safety. The conventional human visual inspection method is time consuming
 and cost inefficient. In this paper, we propose a robust algorithm to detect cracks in a pixel-wise
 manner from real concrete surface images. In practice, crack detection remains challenging in the
 following aspects: (1) detection performance is disturbed by noises and clutters of environment;
 and (2) the requirement of high pixel-wise accuracy is difficult to obtain. To address these limitations,
 three steps are considered in the proposed scheme. First, a local pattern predictor (LPP) is constructed
 using convolutional neural networks (CNN), which can extract discriminative features of images.
 Second, each pixel is efficiently classified into crack categories or non-crack categories by LPP,
 using as context a patch centered on the pixel. Lastly, the output of CNN—i.e., confidence
 map—is post-processed to obtain the crack areas. We evaluate the proposed algorithm on samples
 captured from several concrete bridges. The experimental results demonstrate the good performance
 of the proposed method.

 Keywords: local pattern predictor; crack detection; bridge inspection; convolutional neural networks;
 robotic vision

1. Introduction
     In robot applications, bridge inspection is extremely necessary to set up strategies of bridge
repair and rehabilitation, especially when maintenance resources are limited. Cracks are the most
important representations of the health condition of concrete structures. It is reported that more than
100,000 bridges suffer from cracking in the United States [1]. Therefore, the evaluation of cracking
plays a vital role in bridge inspection. The traditional crack detection relies on human visual inspection,
which is time-consuming and labor intensive when examining a large span bridge. To address these
limitations, computer-vision based method is promising to enhance human visual inspection or to be
combined with mechanical and aerospace structures, such as aerial robots [2], to conduct a robotic
crack inspection for crack detection.
     Conventional computer-vision based methods for crack detection proceed in a two-phase fashion:
feature extraction and feature classification. The key issue lies in the process of designing distinguishing
features. Basically, feature design principles may be divided into two groups: handcrafted methods
and learned methods. In the past decades, the handcrafted features, such as histograms of oriented
gradients (HOG) [3] and scale-invariant feature transform (SIFT) [4], had made great achievements
in face recognition, pedestrian detection, vision inspection, and so on. However, these are very

Sensors 2018, 18, 3042; doi:10.3390/s18093042                                   www.mdpi.com/journal/sensors
Sensors 2018, 18, 3042                                                                             2 of 13

rudimentary methods in the form of visual inspection which have several disadvantages, including
limited accuracy and narrow vision field of the whole bridge deck. Additionally, it is a dangerous
task for humans to conduct visual crack inspection, especially with passing traffic [5]. As a result,
a robotic system where robots are incorporated with visual detection methods to implement crack
assessment has large development prospects. Despite being widely used, the existing robotic systems
still employ manual approaches mentioned above for feature analysis, which leaves the problem of
low accuracy unsolved. Recently, it has been demonstrated that the hierarchical features extracted
with deep learning can outperform handcrafted features in most cases [6]. Crack detection based on
deep learning becomes a novel and potential approach to research.

2. Related Works
      Robotic systems incorporated with visual techniques to solve problems is no longer a new topic
in the field of computer vision, including visual perception, object recognition, detection, and tracking.
A human-robot interactive framework was proposed to endow robots with the ability to localize
a person by processing visual and audio data [7], so that people can be directed attention to whom they
were interacting. Robotic application is also extensively developed in 3D visual problems. Amatya et al.
developed a method of locating shaking positions based on objects pixel locations in the images [8].
An enhanced CHT [9] is employed for estimating the trajectory of a spherical target in three dimensions
to improve tracking accuracy. For robotic navigation in unstructured environments, a method based
on ToF cameras [10] was provided for 3D obstacle detection and classification. Meanwhile, in the
area of industrial and daily application [11], the robot mechanism presents a promising prospect.
A real-time collision detection algorithm for marine robotic manipulation was presented to be a useful
pilot assisting tool for subsea intervention operations [12]. In addition, Leite et al. proposed a new
skill-based architecture for MR which is used to overcome the difficulties in current autonomous robot
design [13] to improve robotic performance.
      Many crack detection algorithms have been developed before now. Koch et al. summarized the
computer-vision based methods, and roughly categorized them into pre-processing, feature-based,
model-based, pattern-based, and 3D reconstruction types [14]. Herein, we divide the existing
algorithms into non-learning-based and learning-based from the perspective of feature representation
and classification methods.
      Most of the non-learning-based methods employ image processing technologies, such as noise
filtering, edge detection, binary thresholding, and morphological operations [15–19]. Adhikari et al.
proposed a crack detection method by combining histogram enhancement, median filtering,
and thresholding [20]. Images are enhanced and smoothed by histogram equalization and median filter
first, then the candidate cracks are obtained by subtracting smoothed images from original images.
Dinh et al. proposed a histogram thresholding-based method [21]. Moving average filters were
employed to remove noise from images, then images were segmented with threshold determined
using histogram peak and valley information. However, this method can only be used in situations
of high contrast images. Lim et al. presented an automatic inspection system using mobile
robotic equipped with cameras [22]. A Laplacian of Gaussian (LoG) algorithm was utilized in their
scheme. Yamaguchi et al. proposed a crack detection method based on percolation image processing
technique [23]. In 2014, Nguyen et al. proposed a two-stage detection method [24]. Images were
enhanced by phase symmetry-based filters first, then binary images were obtained using thresholding
and morphological operations. In the second stage, the center line of a crack was fitted by cubic
splines and crack edges were determined by pixel intensity perpendicular to the splines. Kapela et al.
2015 proposed a HOG based crack detection scheme which was used for pavement inspection [25].
Although there are many studies, the bulk of non-learning-based approaches use only low-level image
processing techniques, which easily fail to separate cracks from complicated backgrounds because
they are noise sensitive, and the surface images always suffer from dirt, shadows, and other factors
Sensors 2018, 18, 3042                                                                              3 of 13

causing noise [1]. Furthermore, the binary segmentation with a fixed threshold in non-learning-based
methods is lacking flexibility with respect to variations of environments.
     Compared with non-learning methods, learning-based approaches learn patterns from surface
images and use patterns instead of pixels to predict cracks, which can alleviate negative effects of noise.
The potentials of machine learning for crack detection—such as support vector machines (SVM) [26],
neural networks [27], and principal component analysis (PCA)—were explored by researchers [28].
Lattanzi et al. presented a crack detection method using Candy operator and K-means clustering
algorithm [29]. In 2015, Bu et al. proposed an automatic crack detection scheme [30] where wavelet
features were firstly extracted using a sliding window texture analysis technique, then features
were classified by SVM. Prasanna et al. presented an automatic crack detection algorithm, called
spatially tuned robust multifeature (STRUM) classifier [1]. In their scheme, images were tiled as
patches, and cracks of each patch were approximated by curve fitting using random sample consensus
(RANSAC) algorithm. To eliminate parameter adjustment, several popular classifiers—such as SVM,
AdaBoost, and random forest—were used to remove false fitting results. In 2017, Chaudhury et al.
proposed a spatial-temporal non-linear filtering method combined with conditional random fields
(CRF) [31]. Experimental results showed that their method can obtain high accuracy. Li et al. 2017
developed a region-based active contour model combined with Canny operator for concrete crack
segmentation, and SVM was used to eliminate noises [32]. Sparse autoencoder (SAE) and tensor voting
were introduced in a pavement crack inspection by Qian et al. [33]. Features of potential patches
were extracted by SAE, and then classified by a softmax classifier. In 2016, Schmugge et al. proposed
a deep neural network-based method which was used in crack detection for nuclear power plant [34].
Both training and testing images were divided into patches of 224 × 224 pixels, and a GoogleNet
was used to classify each patch into crack or non-crack objects. Both methods in [33,34] can only get
rougher detection results because it was block-wise and could not locate cracks on a pixel level.
     As discussed above, on the basis of the existing research, crack detection algorithms still face two
challenges: strong adaptability to noisy environment and pixel-wise high accuracy requirement.

3. Work in This Paper
      In this study, a completely different solution for crack detection is explored. The idea was
motivated by the fact that adjacent pixels are highly spatially correlated. Whether a pixel belongs to
a crack depends upon its surroundings. Basically, pixels of cracks are located on regular geometrical
lines, while pixels of noise are not. Therefore, this distribution pattern could facilitate crack
identification. The crucial point is how to capture this distribution pattern. In our scheme, CNN is
employed to extract abstract representation of pixels’ pattern, then each pixel is classified into crack or
non-crack using the patterns as inputs of CNN classifier. Experiments show that patterns extracted by
CNN are robust enough to noise and variation of environments.
      This research has three main contributions. (1) We provide a local pattern predictor (LPP)
which reveals an underlying mapping function existed between a pixel probability to belong to
cracks and its context. (2) A CNN is designed to learn this mapping function of LPP in our scheme,
which is used to predict the probability of each pixel according to its context. To the authors’ best
knowledge, this represents a first attempt to use deep learning to forge an LPP and detect cracks
in a pixel-wise way. (3) A typical database, including concrete crack images, is collected and open
to fill the blank of the database in the area of crack detection. The image samples are available at
https://github.com/VivianWang0616/CrackDetectDLPP/tree/master.

4. Proposed Method

4.1. Scheme of Proposed Method
    In this paper, we propose a pixel-wise based method leveraging patterns of each pixel. In other
word, each pixel’s probability belongs to cracks is predicted using a local pattern predictor according
Sensors 2018, 18, 3042                                                                                                  4 of 13

to its pattern. The local pattern predictor is constructed based on CNN. It can divide each pixel into
crackSensors 2018, 18, x FOR
       or non-crack,         PEERas
                          using   REVIEW
                                    context a patch centered that pixel.                                         4 of 13

      The overall diagram of the method is shown in Figure 1. In the scheme, a seven-layer CNN is
           The overall diagram of the method is shown in Figure 1. In the scheme, a seven-layer CNN is
customized to facilitate the crack classification task. It is trained by a training dataset collected from
      customized to facilitate the crack classification task. It is trained by a training dataset collected from
different  bridges.
      different  bridges.Subsequently,
                            Subsequently,thethewell-trained    CNNisisused
                                                well-trained CNN         usedasas   a basis
                                                                                 a basis     to construct
                                                                                         to construct      a local
                                                                                                      a local        pattern
                                                                                                                pattern
predictor.   An 18
      predictor.   An×  1818  patch
                           × 18 patchcentering
                                       centeringeach
                                                 each pixel    tiledfrom
                                                       pixel tiled   fromconcrete
                                                                           concrete    surface
                                                                                     surface     images
                                                                                              images      is into
                                                                                                     is fed   fed local
                                                                                                                   into local
pattern   predictor.
      pattern  predictor. Thus,  the
                             Thus, theconfidence
                                        confidence map
                                                   map of of the
                                                              thetest
                                                                   testimage
                                                                        image is is  obtained
                                                                                 obtained   fromfrom  the output
                                                                                                  the output   of LPP.of LPP.
Finally, the confidence map is post-processed to locate crack areas.
      Finally, the  confidence   map   is post-processed  to locate  crack areas.

                                       Figure1.1.Overview
                                     Figure                of the
                                                  Overview of theproposed
                                                                  proposedalgorithm.
                                                                            algorithm.

4.2. Local  Pattern
      4.2. Local     Predictor
                 Pattern Predictor
     As mentioned
           As mentioned before,
                           before,learning-based       methods
                                    learning-based methods          always
                                                                 always       divide
                                                                         divide         test images
                                                                                 test images           into
                                                                                               into small     small and
                                                                                                           patches,  patches,
     features extracted
and features   extractedfromfromthethepatches
                                        patches areare
                                                     compared
                                                       compared   with  those
                                                                     with      of reference
                                                                            those              images
                                                                                    of reference        to identify
                                                                                                    images           the the
                                                                                                              to identify
     patches
patches       which
          which       containing
                  containing       cracks.These
                                cracks.     These kinds
                                                    kinds ofof methods
                                                                methodscan canonly
                                                                                 onlylabel thethe
                                                                                        label   blocks  contain
                                                                                                    blocks       cracks,
                                                                                                             contain   cracks,
     but  cannot locate  the crack pixels  accurately.  In this paper,  we  take a different  approach,
but cannot locate the crack pixels accurately. In this paper, we take a different approach, by predicting by predicting
     the probability of each pixel according to its context, which is called local pattern predictor (LPP).
the probability   of each pixel according to its context, which is called local pattern predictor (LPP).
           The context of a pixel is defined as a rectangular area centered at this pixel, with the width of w
     The context of a pixel is defined as a rectangular area centered at this pixel, with the width of w and
     and the height of h. Whether a pixel belongs to a cracked area is related to its context because the
the height
     concrete  h. Whether
            of surface  image a pixel
                                has a belongs
                                        strong 2Dtolocal
                                                     a cracked     areaSpatially
                                                           structure.   is related   to its context
                                                                                  adjacent  pixels arebecause
                                                                                                         alwaysthe   concrete
                                                                                                                  highly
surface  image   has  a  strong  2D   local  structure.    Spatially   adjacent   pixels   are  always    highly
     correlated. The grayscale values of pixels in the centering rectangle are arranged to construct a local       correlated.
The grayscale values      of pixels in the centering rectangle are arranged to construct a local pattern. Let qi
     pattern. Let q i be the vector of the local pattern of the central i-th pixel, and p i be the probability
be the vector   of the  local
     of the central i-th pixel pattern    of thetocentral
                                   to belong                i-th pixel,
                                                    a cracked           and pthe
                                                                  area, then    i bemapping
                                                                                      the probability
                                                                                                betweenofand thecan   be i-th
                                                                                                                  central
pixelexpressed
      to belongasto a cracked area, then the mapping between and can be expressed as
                                                         f (q ) = p                                               (1)
                                                        f (qi i) = pii                                                      (1)
           The key point is to find a non-linear mapping to approximate the relationship presented by
     The key (1).
     Equation   point   is to
                    In this      findCNN
                             paper,    a non-linear
                                            is trained mapping
                                                       to learn thistomapping
                                                                       approximate      the relationship
                                                                                from a large   set of samples.presented
                                                                                                                Studies by
     have (1).
Equation   shown   that paper,
               In this   the response
                                   CNNof  is the human
                                             trained  to brain
                                                          learn cortex  to external
                                                                this mapping        visual
                                                                                  from       stimulus
                                                                                        a large         is samples.
                                                                                                   set of  layered. The
                                                                                                                     Studies
     cortex  responds     to  line  information,    such   as the  edges   of  objects, and
have shown that the response of the human brain cortex to external visual stimulus is layered. then   perceives  shape
     information
The cortex        and thetoobjects
             responds                 themselves. Feature
                                line information,     such representation
                                                              as the edgesofofa CNN      is abstracted
                                                                                  objects,    and thenlayer    by layer,shape
                                                                                                           perceives
     which conforms to the learning process of human brain. It also has a capability of shift, scale, and
information and the objects themselves. Feature representation of a CNN is abstracted layer by
     distortion invariance. These merits make CNN well-suited to be a basis of local pattern predictor and
layer, which conforms to the learning process of human brain. It also has a capability of shift,
     to deal with degraded image quality affected by noise, variation of illumination, angles, and positions
scale,ofand distortion invariance. These merits make CNN well-suited to be a basis of local pattern
         cameras.
predictor and to deal with degraded image quality affected by noise, variation of illumination, angles,
and positions  of cameras.
    4.3. Convolutional Neural Networks for LPP
          Since Hinton [35,36] proposed a greedy layer-wise pre-training algorithm to initialize the
4.3. Convolutional Neural Networks for LPP
      weights of deep architectures, artificial neural networks have been revived. Deep neural networks
      Sincebecome
      have  Hinton a[35,36] proposed
                      new popular      a greedy
                                    topic          layer-wise
                                           and advanced        pre-training
                                                           in image           algorithm
                                                                     classification, objecttotracking
                                                                                              initialize[37]
                                                                                                          theand
                                                                                                              weights
of deep architectures, artificial neural networks have been revived. Deep neural networks have become
Sensors 2018, 18, 3042                                                                                                              5 of 13

a new popular topic and advanced in image classification, object tracking [37] and recognition [38],
      Sensors 2018, 18, x FOR PEER REVIEW                                                                         5 of 13
gesture recognition [39], action recognition [40], defect inspection [41–44], voice recognition, natural
language     understanding,
      recognition     [38], gestureetc.   Popular[39],
                                      recognition    deep  learning
                                                        action          frameworks
                                                                recognition              include
                                                                               [40], defect       stacked
                                                                                             inspection      autoencoders,
                                                                                                         [41–44],  voice
convolutional
      recognition,  neural
                      natural networks,   and restricted etc.
                                 language understanding,  Boltzmann
                                                              Popular deepmachine.
                                                                                learning frameworks include stacked
      autoencoders,
     In  the proposed     convolutional
                              method, aneural   networks,
                                          seven-layer   CNNandbased
                                                                restricted   Boltzmann
                                                                        on the   LeNet-5machine.
                                                                                            [45] was designed (Figure 2).
            In the   proposed     method, a seven-layer  CNN   based   on the
The input is an 18 × 18 image, which is corresponding to the size of a patch   LeNet-5   [45] was designed   (Figure
                                                                                                     centering    each2).pixel.
      The  input   is an   18 × 18 image, which is corresponding   to the  size of a patch centering
Layer C1 is a convolutional layer with six feature maps. Each point of C1 feature maps is connected   each pixel. Layer
      C1 is a convolutional layer with six feature maps. Each point of C1 feature maps is connected to a
to a 3 × 3 neighborhood in the input image by a 3 × 3 convolutional filter. Consequently, the size of
      3 × 3 neighborhood in the input image by a 3 × 3 convolutional filter. Consequently, the size of each
each map of layer C1 is 16 × 16. There are totally 60 trainable parameters and 15,360 connections in
      map of layer C1 is 16 × 16. There are totally 60 trainable parameters and 15,360 connections in
layerlayer
       C1. C1.

                                                       ..
                                                        .

                               Figure 2. Architecture of CNN used in the proposed method.
                                   Figure 2. Architecture of CNN used in the proposed method.
      Layer S1 is a sub-sampling layer with six feature maps. The size of each map of layer S1 is 8 × 8
because eachLayerunit S1 isin
                            a sub-sampling
                              S1 is correspondinglayer with  tosix
                                                                 a 2feature  maps. The size in
                                                                       × 2 neighborhood           of each
                                                                                                     layermapC1 of  layeraS1
                                                                                                                 using         is 8 × 8
                                                                                                                           max-pooling
      because each unit in S1 is corresponding to a 2 × 2 neighborhood in layer C1 using a max-pooling
technique. The meaning of max-pooling is to choose a maximum within the 2 × 2 neighborhood.
      technique. The meaning of max-pooling is to choose a maximum within the 2 × 2 neighborhood. Layer
Layer C2 is a convolutional layer with 12 feature maps. Each unit of C2 feature maps is connected
      C2 is a convolutional layer with 12 feature maps. Each unit of C2 feature maps is connected to a 3 × 3
       × 3 neighborhood
to a 3neighborhood                  in the
                          in the layer   S1 layer
                                             by a 3S1× 3by    a 3 × 3 convolutional
                                                          convolutional      filter. The size filter. Themap
                                                                                                 of each    sizeofof each
                                                                                                                   layer C2mapis 6 ×of6.layer
       6 × 6.are
C2 isThere       There     are  totally   120   trainable     parameters       and    25,920    connections
                     totally 120 trainable parameters and 25,920 connections in layer C2. Layer S2 is also       in layer   C2.   Layer
                                                                                                                                      a S2
is also  a sub-sampling
      sub-sampling               layer12with
                         layer with              12 feature
                                           feature   maps. The   maps.     The
                                                                     size of    size
                                                                             each  map of of
                                                                                          each
                                                                                             layermap    of3layer
                                                                                                     S2 is                   × 3unit
                                                                                                                    S2 is 3each
                                                                                                             × 3 because            because
each in
      unit
         S2 in   S2 is corresponding
              is corresponding       to a 2 × to  a 2 × 2 neighborhood
                                              2 neighborhood          in layer C2 in  layer
                                                                                   using       C2 using a max-pooling
                                                                                           a max-pooling      technique. Layer   technique.
                                                                                                                                     C3
LayerisC3stillisastill
                  convolutional      layer with
                       a convolutional        layer54 feature
                                                       with 54maps.featureSince  the size
                                                                              maps.        of each
                                                                                       Since          map of
                                                                                                the size   in each
                                                                                                               layer map
                                                                                                                     S2 is 3in× layer
                                                                                                                                 3, the S2 is
3 × 3,size
        theofsize
                feature   map ofmap
                    of feature      layerof
                                          C3layer
                                               is 1 ×C31 after
                                                           is 1 a×3 1× after
                                                                       3 convolutional     calculation. Layer
                                                                             a 3 × 3 convolutional                C3 can beLayer
                                                                                                            calculation.      treated C3 can
be treated as a full connection layer with 54 hidden units. The output of layer C3 is a 1 × 54 is
      as  a full   connection     layer  with   54  hidden     units.   The  output   of  layer   C3  is a 1 ×  54 vector,  which     vector,
      called feature vector of the input image extracted by CNN. There are totally 540 trainable parameters
which is called feature vector of the input image extracted by CNN. There are totally 540 trainable
      and 6480 connections in layer C3. The last layer is a softmax classifier consisting of two units, which
parameters and 6480 connections in layer C3. The last layer is a softmax classifier consisting of two
      divides each feature vector into crack or non-crack categories.
units, which      divides
            To train          each feature
                        the seven-layer         vector
                                             CNN,         into crack
                                                      a training          or non-crack
                                                                     dataset  comprisedcategories.
                                                                                             of 326,000 samples were collected
      To
      fromtrain45 the  seven-layer
                  images                CNN,
                             of different        a training
                                            bridges,    whichdataset
                                                                  includes comprised      of 326,000
                                                                             56,000 positive      samplessamples    were collected
                                                                                                             and 270,000    negative from
45 images
      samples. of different
                   The training bridges,    which
                                    parameters     areincludes
                                                        presented  56,000    positive
                                                                       in Section  5. samples and 270,000 negative samples.
The training parameters are presented in Section 5.
       4.4. Post-Processing
4.4. Post-Processing
             Confidence map is generated when a test image is fed into LPP, which contains each pixel’s
     probability, indicating
     Confidence              whether it when
                   map is generated     belongs  to crack
                                              a test      or non-crack.
                                                      image   is fed intoSubsequently,
                                                                          LPP, which post-processing   is
                                                                                        contains each pixel’s
     conducted
probability,     to segment
             indicating      the confidence
                          whether           map
                                   it belongs   to with
                                                   cracka or
                                                          fixed  threshold,Subsequently,
                                                             non-crack.     and isolated noisy points are
                                                                                          post-processing  is
     removed to obtain detection results. The binary threshold is simply fixed to 0.5.
conducted to segment the confidence map with a fixed threshold, and isolated noisy points are removed
to obtain detection results. The binary threshold is simply fixed to 0.5.
Sensors 2018, 18, 3042                                                                                                          6 of 13

   Sensors 2018, 18, x FOR PEER REVIEW                                                                                     6 of 13
      Sensors 2018, 18, x FOR PEER REVIEW                                                                               6 of 13
5. Experimental  Results
   5. Experimental Results
      5. Experimental Results
5.1. Data Set
   5.1.5.1.
         Data  Set
            Data Set
      A data
          AA
                setset
             data
                    is is
                        provided using         images of   real bridge    cracksunder
                                                                                    underdifferent
                                                                                              differentimaging
                                                                                                          imaging     environments
               data  set isprovided
                             providedusing usingimages
                                                 images of
                                                         of real bridge cracks
                                                            real bridge  cracksunder       different imaging        environments
                                                                                                                 environments
and  different
   and   different bridge
                     bridge   surfaces.      Some  challenging     factors   exist    in  the  data    set, which     are illustrated
      and   different  bridgesurfaces.
                                  surfaces.Some
                                              Somechallenging     factors exist
                                                    challenging factors    existininthethedata
                                                                                            dataset,
                                                                                                  set,which
                                                                                                        whichareare  illustrated
                                                                                                                  illustrated  in in
in Figure
   Figure 3.3. Figure
                 Figure3a3ais isa clear
                                    a clear    and
                                             and    distinct
                                                 distinct     standard
                                                          standard    crackcrack
                                                                             image. image.
                                                                                         In    In
                                                                                             FigureFigure
                                                                                                      3b,    3b,
                                                                                                           cracks  cracks
                                                                                                                    are     are and
                                                                                                                         weak    weak
      Figure 3. Figure 3a is a clear and distinct standard crack image. In Figure 3b, cracks are weak and
and  disorderly.
   disorderly.         InFigure
      disorderly.InInFigure Figure  3c,3c,
                                  3c,        background
                                       background
                                         background    noisenoise
                                                       noise  is    is strong.
                                                                 strong.
                                                              is strong.           In Figure
                                                                           In Figure
                                                                          In   Figure     3d,the
                                                                                         3d,   the3d,   the background
                                                                                                     background
                                                                                                   background      is is dirty
                                                                                                                      dirty   isand
                                                                                                                             and dirty
and  blurred.
   blurred.
      blurred.   In
              InIn   Figure
                 Figure
                    Figure3e,  3e,
                              3e,the the   illumination
                                   theillumination
                                        illumination is   is dark.
                                                      is dark.       In
                                                         dark. In FigureFigure
                                                                  Figure 3f,
                                                                           3f,the3f,
                                                                               thecrackthe  crack
                                                                                     crackisiswide.
                                                                                               wide.is wide.

                           aa                         bb                        cc

                               d                       e                        f
                           d                          e                             f
           Figure
      Figure       3. Examples
             3. Examples        of crack
                             of crack    image:(a)
                                       image:    (a)clear
                                                     clear and
                                                           and standard;
                                                               standard;(b)
                                                                         (b)weak
                                                                             weakand disorderly;
                                                                                   and           (c) noise;
                                                                                        disorderly;         (d) dirty
                                                                                                      (c) noise; (d) dirty
         Figure 3. Examples
           and blurred;       of crack
                         (e) dark;      image:  (a) clear and standard; (b) weak and disorderly; (c) noise; (d) dirty
      and blurred;   (e) dark; (f) (f)
                                   widewide crack.
                                          crack.
         and blurred; (e) dark; (f) wide crack.
     In ourIn our study,
               study,  allallthe
                              the images
                                   images are
                                           aredivided
                                                dividedintointo
                                                            two two
                                                                 parts:parts:
                                                                        45 images
                                                                               45 for training
                                                                                  images    forand   5 for testing.
                                                                                                  training   and 5 for
        In our
      Based  on study,
                the 45  all the images
                       training   images,are  divided
                                          326,000      into are
                                                   patches   twosampled,
                                                                 parts: 45which
                                                                            images
                                                                                 arefor training
                                                                                     used  to trainand
                                                                                                    a   5 for testing.
                                                                                                      seven-layer
testing. Based on the 45 training images, 326,000 patches are sampled, which are used to train
   Based
      CNN. onAmong
              the 45 these
                     training    images,
                            samples,     326,000
                                      56,000      patches
                                              patches       are sampled,
                                                      are positive sampleswhich
                                                                             whichare  usedcracks,
                                                                                   contain   to train
                                                                                                    anda seven-layer
                                                                                                         the other
a seven-layer CNN. Among these samples, 56,000 patches are positive samples which contain cracks,
   CNN.
      270,000 are negative samples chosen from the non-crack background. We will discuss howand
           Among   these  samples,   56,000  patches  are positive  samples   which  contain  cracks,       the other
                                                                                                        to sample
and the other 270,000 are negative samples chosen from the non-crack background. We will discuss
   270,000
      these are negative
            patches        samples
                    later in  Sectionchosen
                                      6.      from the non-crack background. We will discuss how to sample
how   to sample
   these  patchesthese   patches
                  later in  Sectionlater
                                     6. in Section 6.
      5.2. Metrics
5.2. Metrics
   5.2. Metrics
           A group of metrics including accuracy, recall and precision were employed to quantify the
      A  A groupofofaccuracy.
         group
      classification  metrics
                       metrics including       accuracy,
                                   The definition
                                 including    accuracy,       recalland
                                                       of accuracy,
                                                            recall    and   precision
                                                                        recall,  andwere
                                                                         precision       were
                                                                                      precision employed
                                                                                           employed           to quantify
                                                                                                   are described
                                                                                                       to quantify asthe
theclassification
     classification accuracy. The definition of accuracy, recall, and precision are described as as
                      accuracy.
      Equations (2)–(4),  where   trueThe   definition
                                        positive  (TP),    of
                                                        true   accuracy,
                                                              negative     recall,
                                                                       (TN),  false  and   precision
                                                                                    positive (FP), and are
                                                                                                       false described
                                                                                                             negative
      (FN)
Equations   are labeled
             (2)–(4),
   Equations (2)–(4),   in Figure
                      where
                        where true
                                true4positive
                                      [38].
                                       positive(TP),
                                                 (TP),true
                                                        true negative   (TN), false
                                                              negative (TN),    falsepositive
                                                                                      positive(FP),
                                                                                                (FP),and
                                                                                                      andfalse
                                                                                                           false negative
                                                                                                               negative
(FN)  areare
   (FN)   labeled   inin
              labeled  Figure
                         Figure4 4[38].
                                   accuracy
                                     [38].     = (TP + TN) / (TP + TN + FP + FN)                                 (2)
                                             = (TP
                                    accuracy =      + TN)
                                               (recall
                                                 TP +        / (TP   + TN
                                                                       TN + FP + FN)                                      (3) (2) (2)
                                    accuracy             TP)// ((TP
                                                       =TN       TP ++FN) + FP + FN)

                                                  recall
                                                 precision
                                                          ==
                                                   recall = TP
                                                            TP// (TP
                                                             TP(TP
                                                                      + FN)
                                                                      +
                                                                  / (TP +
                                                                        FN )
                                                                          FP)                                             (4)
                                                                                                                                (3)
                                                                                                                                      (3)

                                                precision = TP
                                                precision =        TP ++ FP
                                                            TP// ((TP    FP))                                                   (4) (4)

                                           Figure 4. Definitions of TN, FN, TP, and FP.

                                         Figure4.
                                        Figure  4. Definitions
                                                   Definitions of
                                                               of TN,
                                                                  TN, FN,
                                                                      FN,TP,
                                                                          TP,and
                                                                              andFP.
                                                                                  FP.
Sensors 2018, 18, 3042                                                                                7 of 13

5.3. Performance Comparisons
     To evaluate the performance of the proposed LPP method, we compared it with two up-to-date
algorithms: STRUM proposed by Prasanna et al. [1] and block-wise CNN method used by
Schmugge [34]. The testing code was implemented under Pytorch (version 0.3.0) and MATLAB
(version R2012b). The testing computer was configured with Intel i7 processor with 3.2 GHz frequency,
64 GB memory and GPU GT730.
     The architecture of CNN utilized in this paper is described in Section 3. Herein, the training
parameters are presented as follows: learning rate is set to 0.001, momentum is 0.9, batch size is 100,
and ReLU is used as activate function. We also implemented a CNN using sigmoid activate function.
Experimental results demonstrated that ReLU performances are much better than sigmoid function.
     Parts of the experimental results are listed in Figure 5. The first row of Figure 5 is the original
concrete surface images, which were contaminated by noise and clutters. It is difficult to separate
cracks from background using traditional image processing technique in such scenarios. The second
row is the ground truths labeled manually. The third row is the confidence maps predicted by LPP,
and the fourth row is the final detection results of the proposed LPP method after post-processing.
The rows 5 and 6 are the results of STRUM method. Row 5 gives the line fitness results using RANSAC
algorithm. After fitness, patches including lines are classified by SVM, AdaBoost, and random forest
in the literature [1]. The classification results of AdaBoost are only listed here because AdaBoost is
superior to SVM in our experiments. We can see that STRUM method is effective only for the example
of the left column. It is apparent that STRUM method is more suitable for thin cracks, and will fail
when cracks are thick. It will also fail when test image is seriously contaminated, for example, the test
image in the middle column. Conversely, LPP performs well in such scenarios. The last row shows
the detection results of block-wise method using CNN. In the literature, Schmugge et al. [34] used
LeNet architecture with input image of 224 × 224 pixels. Since the 224 × 224 block is too big for crack
detection, we use a smaller block of 18 × 18 pixels instead. The architecture of CNN used in block-wise
method is same as that of LPP method. Obviously, the detection results of block-wise CNN method
are much rougher than the proposed LPP method.
     The locating accuracies are listed in Table 1, and the best results are marked in bold. LPP gets the
highest accuracy scores in all testing items. According to the analyses above, the LPP method appears
superior to the STRUM and block-wise CNN methods in locating accuracy.

                 Table 1. Crack locating accuracy comparison (the best results are marked in bold).

                         Test Images         Method           Acc (%)    Recall    Precision
                                       STRUM+AdaBoost          99.05      24.87      83.19
                            No. 1       Block-wise CNN         93.87      86.97      14.73
                                              LPP              99.67      84.17      73.38
                                       STRUM+AdaBoost          97.91      13.61      78.42
                           No. 2        Block-wise CNN         95.48      58.59      27.56
                                              LPP              99.52      78.83      79.86
                                       STRUM+AdaBoost          96.64      14.75      48.03
                           No. 3        Block-wise CNN         90.88      56.90      15.94
                                              LPP              99.15      83.15      49.20
                                       STRUM+AdaBoost          99.38       3.24      25.00
                           No. 4        Block-wise CNN         97.17      30.01      6.81
                                              LPP              99.90      74.78      82.89
                                       STRUM+AdaBoost          97.08       8.20      20.13
                           No. 5        Block-wise CNN         93.67      93.42      26.22
                                              LPP              99.64      89.99      95.17
Sensors 2018, 18, 3042                                                                                                  8 of 13
Sensors 2018, 18, x FOR PEER REVIEW                                                                                     8 of 13

      Figure  5. Crack
      Figure 5.  Cracklocating
                       locatingresults
                                 resultscomparison.
                                          comparison.  RowRow1: Concrete surface
                                                                1: Concrete      images.
                                                                             surface      Row Row
                                                                                       images.  2: The2:ground-truths
                                                                                                         The ground-
      labeled
      truths labeled manually. Row 3: Confidence maps of LPP method. Row 4: Results of LPP method after
               manually.  Row   3: Confidence     maps    of LPP   method.   Row   4:  Results of  LPP   method  after
      post-processing.
      post-processing.Row
                        Row5:5:Line fitting
                                 Line        results
                                       fitting       of STRUM
                                                results  of STRUMmethod.  RowRow
                                                                     method.    6: Results of STRUM
                                                                                      6: Results        method
                                                                                                 of STRUM       using
                                                                                                             method
      AdaBoost classification. Row 7: Results of block-wise CNN method.
      using AdaBoost classification. Row 7: Results of block-wise CNN method.
6. Discussion
6. Discussion
6.1. Principle of Training Samples Choosing
6.1. Principle of Training Samples Choosing
      Training dataset is crucial to the performance of deep neural networks, especially for vision defect
      Training
detection         dataset
            because   thereisare
                               crucial
                                 many to    the backgrounds
                                         more    performancethan  of deep
                                                                      defectneural
                                                                               pixelsnetworks,
                                                                                        in the images.especially
                                                                                                            Positivefor  vision
                                                                                                                      samples
defect
are      detection
     collected  from because
                       all the there
                                pixelsare
                                        in many    more backgrounds
                                           crack areas.    However, thethan      defectsample
                                                                             negative      pixels in   the images.
                                                                                                   selecting    rule isPositive
                                                                                                                         much
samples
more       are collected
        skillful.  In the from    all the pixels
                            preliminary            in crack training
                                            experiments,     areas. However,
                                                                        dataset the      negative sample
                                                                                   is composed                 selecting
                                                                                                      of positive          rule
                                                                                                                      samples
is much more
collected           skillful.
            from crack    andInnegative
                                  the preliminary      experiments,
                                           samples selected              training
                                                                 uniformly     fromdataset      is composed
                                                                                       the background.        Theofdetection
                                                                                                                       positive
samplesusing
results     collected
                  CNNfrom      crack
                         trained   by and
                                       suchnegative
                                              trainingsamples      selected
                                                         dataset are   shown  uniformly
                                                                                 in Figurefrom      the
                                                                                               6b. It   is background.
                                                                                                           obvious that The the
detection    results  using   CNN    trained   by  such   training  dataset    are  shown     in
detected cracks are much thicker than the ground truths. The reason is that points near to cracks Figure    6b. It is  obvious
thatincorrectly
are   the detected    cracks are as
                    categorized     much    thicker
                                       cracks        than
                                                by the      thepattern
                                                         local  groundpredictor.
                                                                           truths. TheAreason
                                                                                            uniform  is that  pointsmethod
                                                                                                         sampling       near to
cracks
with      are incorrectly
       nearest  neighborscategorized
                              is proposedas   to cracks
                                                 addressbythistheissue
                                                                   localinpattern     predictor.
                                                                            this paper,    which A       uniformin sampling
                                                                                                    is shown         Figure 7.
method points
Besides    with nearest
                   sampled  neighbors
                              uniformlyisfrom
                                            proposed     to address
                                                  background,          thisnearest
                                                                  points     issue into this  paper,
                                                                                         cracks          which isasshown
                                                                                                  are counted         negativein
Figure 7. Besides
samples.              points sampled
             The detection     results are uniformly    from background,
                                              greatly improved      when using  points    nearestsampling
                                                                                      uniform       to crackswithare counted
                                                                                                                       nearest
as negativewhich
neighbors,     samples.    The detection
                      is shown     in Figure results arelocating
                                               6c. The    greatly accuracies
                                                                   improved when          using sampling
                                                                                 of different     uniform sampling
                                                                                                               processeswithare
nearest
listed  in neighbors,
           Table 2.      which is shown in Figure 6c. The locating accuracies of different sampling
processes are listed in Table 2.
Sensors 2018, 18, 3042                                                                                          9 of 13
Sensors 2018,
Sensors 2018, 18,
              18, xx FOR
                     FOR PEER
                         PEER REVIEW
                              REVIEW                                                                           99 of
                                                                                                                  of 13
                                                                                                                      13

                  (a)                                     (b)                                    (c)
     Figure 6.
     Figure  6. Crack-locating
                Crack-locating  results
                Crack-locating results  compared
                                results compared
                                        compared of of different
                                                       different sampling
                                                                 sampling processes:
                                                                           processes: (a)
                                                                                       (a) The
                                                                                            The ground-truths
                                                                                                 ground-truths
     labeled manually;
     labeled manually; (b)
                        (b) detection
                            detection result
                                       result of
                                              of uniform
                                                 uniform sampling;
                                                          sampling; (c)
                                                                    (c) detection
                                                                        detection result
                                                                                  result of
                                                                                         of uniform
                                                                                             uniform sampling
                                                                                                     sampling
     with nearest neighbors.
     with nearest neighbors.

                            Figure 7.
                            Figure    Illustration of
                                   7. Illustration
                                      Illustration of training
                                                      training dataset
                                                               dataset sampling processes.
                                                                       sampling processes.
                                                                                processes.
                            Figure 7.              of training dataset sampling
             Table 2. Accuracy of different sampling progress (the best results are marked in bold).
             Table 2.
             Table 2. Accuracy
                      Accuracy of
                               of different
                                  different sampling
                                            sampling progress
                                                     progress (the
                                                              (the best
                                                                   best results
                                                                        results are
                                                                                are marked
                                                                                    marked in
                                                                                            in bold).
                                                                                               bold).
                Test Images         Sampling Processes           Acc (%)      Recall Precision
                   Test Images         Sampling Processes       Acc (%)      Recall Precision
                                            Uniform                  97.83        95.40      34.89
                    No. 1                   Uniform              97.83       95.4084.17 34.8973.38
                        No. 1           Nearest neighbors            99.67
                                        Nearest neighbors        99.67       84.17      73.38
                                            Uniform                  97.58        80.19      48.59
                    No. 2                   Uniform
                                        Nearest neighbors
                                                                 97.58
                                                                     99.52
                                                                             80.1978.83 48.5979.86
                        No. 2
                                        Nearest neighbors
                                            Uniform
                                                                 99.52
                                                                     96.72
                                                                             78.8385.52 79.8650.40
                    No. 3                   Uniform              96.72       85.5283.15 50.40
                        No. 3           Nearest neighbors            99.15                   49.20
                                        Nearest neighbors
                                            Uniform              99.15
                                                                     99.17   83.1547.90 49.2034.91
                    No. 4                   Uniform              99.17       47.9074.78 34.91
                                        Nearest neighbors            99.90                   82.89
                        No. 4
                                        Nearest neighbors
                                            Uniform              99.90
                                                                     96.42   74.7899.93 82.8939.64
                    No. 5                   Uniform              96.42       99.9389.99 39.64
                                        Nearest neighbors            99.64                   95.17
                        No. 5           Nearest neighbors        99.64       89.99      95.17
6.2. Performance Improvement when Dataset Is Limited
6.2. Performance Improvement when Dataset Is Limited
           As mentioned before, the numbers of positive and negative samples are imbalanced because most
partsAs        mentioned
            of concrete     before,
                        surface      the numbers
                                 images  represent of positive and
                                                   backgrounds.     negative
                                                                  This        samples
                                                                       situation        are imbalanced
                                                                                 is common                 because
                                                                                              in visual inspection.
most parts of concrete surface images represent backgrounds. This situation is common in visual
To address the issue, we proposed a Fisher criterion based stacked denoising autoencoder (FCSDA) to
inspection.
enhance feature   To address   the issue,when
                        discrimination    we proposed  a Fisher
                                               only limited     criterion
                                                             positive     basedare
                                                                      samples    stacked  denoising
                                                                                    available [38]. Aautoencoder
                                                                                                      preliminary
(FCSDA) to enhance feature discrimination when only limited positive samples are available [38]. A
study is conducted to bring the idea into CNN in this paper.
preliminary study is conducted to bring the idea into CNN in this paper.
                                                                           m                             J int rara
           Suppose samples are divided into L classes, each class has m ii . Samples, i = 1,2,…, L . J int          and
 JJ int
    int er
        er   are the intra-class and inter-class distance of features, which are defined as (5)
and (6) respectively.
Sensors 2018, 18, 3042                                                                                        10 of 13

       Suppose samples are divided into L classes, each class has mi . Samples, i = 1, 2, . . . , L. Jintra and
Jinter are the intra-class and inter-class distance of features, which are defined as (5) and (6) respectively.

                                    1 L mi                     (i ) 2   1 L mi s  (k,nl )            2
                         Jintra =    ∑  ∑
                                    2 i k =1
                                             k h w,b ( x ) − M     k  =  ∑  ∑    ∑
                                                                        2 i k =1 j =1
                                                                                      aj   − M j (i )              (5)

                                                        1 L L                                2
                                             Jinter =     ∑      ∑
                                                        2 i =1 j = i +1
                                                                        k M (i ) − M ( j ) k                       (6)

where hw,b    ( x ) is the feature vector extracted from input x, nl indicates the output layer of CNN,
a j (k,nl ) = f z j (k,nl ) is the j-th element of the nl layer feature. M(i) is the average feature of the i-th
class, which is defined as
                                                           mi
                                                           ∑ a(k,nl )
                                                           k =1
                                                M (i ) =                , i = 1, 2 . . . , L                       (7)
                                                                  mi
      Add Fisher criterion term into a loss function of CNN, and loss function can be rewritten as

                                                1 n 1
                                                                                   2
                                                                                        J
                                     J(w,b) =     ∑
                                                n i =1 2
                                                         k h w,b   x (i )
                                                                          − y (i )
                                                                                    k + λ intra
                                                                                         Jinter
                                                                                                                   (8)

where JJintra
          inter
                is the Fisher criterion in the feature space, λ is a ratio. Minimizing loss function will
shorten the intra-class distance, while increasing the inter-class distance, which makes input patches
easier to classify.
      In this experiment, a reduced training dataset sub-sampled from the normal dataset is used to
train CNN. It includes 20,000 positive samples and 140,000 negative samples. All the other parameters
are the same as that in Section 4 except the fisher ratio set to 0.01. The experimental results comparison is
listed in Table 3. Among it, CNN refers to a normal CNN trained by a reduced dataset, and Fisher-based
CNN indicates Fisher criterion is added into the loss function of CNN. We can see that the performance
of Fisher-based CNN is superior to normal CNN from the preliminary experiment. It is a worthy study
direction to exploit the potentiality of Fisher criterion in CNN future applications.

               Table 3. Accuracy improved by Fisher criterion (the best results are marked in bold).

                          Test Images             Methods                Acc (%)         Recall   Precision
                                                   CNN                     99.27          49.74     81.68
                             No. 1
                                            Fisher-based CNN               99.29          53.83     80.20
                                                   CNN                     98.34          31.51     91.05
                             No. 2
                                            Fisher-based CNN               98.40          35.47     88.90
                                                   CNN                     96.84           5.79     76.92
                             No. 3
                                            Fisher-based CNN               97.02          12.79     83.33
                                                   CNN                     99.37           2.88     22.22
                             No. 4
                                            Fisher-based CNN               99.69           5.74     34.78
                                                   CNN                     99.01          93.10     72.66
                             No. 5
                                            Fisher-based CNN               99.05          93.23     73.40

6.3. Computational Time Analysis
     The training and testing time of three methods are shown in Table 4. The testing time is
counted from implementation of test image No. 1. Compared with the traditional Method 1,
the training processed of CNN based Methods 2 and 3 are time-consuming. However, in training
epoch, the proposed LPP is the lowest.
Sensors 2018, 18, 3042                                                                                          11 of 13

                                          Table 4. Time spent comparison.

            Index            Methods            Training Epoch        Training Time (s)      Testing Time (s)
               1         STRUM AdaBoost               1000                   138                     5.1
               2          Block-wise CNN              600                   33,084                   0.2
               3                LPP                   200                   15,411                  10.7

    In application, the training time of LPP can be greatly shortened by using a GPU accelerator.
Also, the training phase can be implemented offline. In that case, the proposed method can be
employed for real-time inspection application.

7. Conclusions
     In this paper, we propose a robotic vision-based crack detection method. A local pattern predictor
with the objective of detecting concrete surface cracks pixel-wisely can set up the underlying mapping
function between a pixel’s probability of belonging to cracks and its context. Based on a typical
CNN, the local pattern predictor can be trained used to detect cracks in images. A typical database,
including concrete crack images, is collected and open to fill the blank of the database in crack detection.
To the authors’ best knowledge, this represents a first attempt to use deep learning to forge a LPP and
detect cracks in a pixel-wise way. In addition, training dataset selection principles are also investigated
and a uniform sampling method with nearest neighbors is proposed.
     The experimental results show the proposed method appears effective and robust. In the future,
authors would like to explore the potential of the Fisher criterion in CNN, as well as integrating the
proposed method into an automatic inspection system.

Author Contributions: Y.L. and H.L. wrote the program and the manuscript. H.W. revised the paper. H.L. is the
corresponding author.
Funding: This work was supported by Beijing Natural Science Foundation (4182020) and Open Fund of
State Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University
(Grant No. 17E01).
Conflicts of Interest: The authors declare no conflict of interest.

References
1.    Prasanna, P.; Dana, K.J.; Gucunski, N.; Basily, B.B.; La, H.M.; Lim, R.S.; Parvardeh, H. Automated Crack
      Detection on Concrete Bridges. IEEE Trans. Autom. Sci. Eng. 2016, 13, 591–599. [CrossRef]
2.    Molina, M.; Frau, P.; Maravall, D. A Collaborative Approach for Surface Inspection Using Aerial Robots and
      Computer Vision. Sensors 2018, 18, 893. [CrossRef] [PubMed]
3.    Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE
      Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA,
      20–25 June 2005; Volume 1, pp. 886–893.
4.    Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International
      Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157.
5.    Lim, R.S.; La, H.M.; Shan, Z.; Sheng, W. Developing a crack inspection robot for bridge maintenance.
      In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13
      May 2011; pp. 6288–6293.
6.    Hasan, M.; Roy-Chowdhury, A.K. A Continuous Learning Framework for Activity Recognition Using Deep
      Hybrid Feature Models. IEEE Trans. Multimed. 2015, 17, 1909–1922. [CrossRef]
7.    Viciana-Abad, R.; Marfil, R.; Perez-Lorenzo, J.M.; Bandera, J.P.; Romero-Garces, A.; Reche-Lopez, P. Audio-visual
      perception system for a humanoid robotic head. Sensors 2014, 14, 9522–9545. [CrossRef] [PubMed]
8.    Amatya, S.; Karkee, M.; Zhang, Q.; Whiting, M.D. Automated Detection of Branch Shaking Locations for
      Robotic Cherry Harvesting Using Machine Vision. Robotics 2017, 6, 31. [CrossRef]
Sensors 2018, 18, 3042                                                                                            12 of 13

9.    Alzarok, H.; Fletcher, S.; Longstaff, A.P. 3D Visual Tracking of an Articulated Robot in Precision Automated
      Tasks. Sensors 2017, 17, 104. [CrossRef] [PubMed]
10.   Yu, H.; Zhu, J.; Wang, Y.; Jia, W.; Sun, M.; Tang, Y. Obstacle classification and 3D measurement in unstructured
      environments based on ToF cameras. Sensors 2014, 14, 10753–10782. [CrossRef] [PubMed]
11.   Indri, M.; Trapani, S.; Lazzero, I. Development of a Virtual Collision Sensor for Industrial Robots. Sensors
      2017, 17, 1148. [CrossRef] [PubMed]
12.   Sivcev, S.; Rossi, M.; Coleman, J.; Omerdic’, E.; Dooly, G.; Toal, D. Collision Detection for Underwater ROV
      Manipulator Systems. Sensors 2018, 18, 1117. [CrossRef] [PubMed]
13.   Leite, A.; Pinto, A.; Matos, A. A Safety Monitoring Model for a Faulty Mobile Robot. Robotics 2018, 7, 32.
      [CrossRef]
14.   Koch, C.; Georgieva, K.; Kasireddy, V.; Akinci, B.; Fieguth, P.W. A review on computer vision based defect
      detection and condition assessment of concrete and asphalt civil infrastructure. Adv. Eng. Inform. 2015, 29, 196–210.
      [CrossRef]
15.   Abdel-Qader, I.; Abudayyeh, O.; Kelly, M.E. Analysis of edge-detection techniques for crack identification in
      bridges. J. Comput. Civ. Eng. 2003, 17, 255–263. [CrossRef]
16.   Li, G.; He, S.; Ju, Y.; Du, K. Long-distance precision inspection method for bridge cracks with image processing.
      Autom. Constr. 2014, 41, 83–95. [CrossRef]
17.   Tong, X.; Guo, J.; Ling, Y.; Yin, Z. A new image-based method for concrete bridge bottom crack detection.
      In Proceedings of the 2011 International Conference on Image Analysis and Signal Processing, Wuhan,
      Hubei, China, 21–23 October 2011; pp. 568–571.
18.   Oh, J.K.; Jang, G.; Oh, S.; Lee, J.H.; Yi, B.J.; Moon, Y.S.; Lee, J.S.; Choi, Y. Bridge inspection robot system with
      machine vision. Autom. Constr. 2009, 18, 929–941. [CrossRef]
19.   Hutchinson, T.C.; Chen, Z.Q. Improved Image Analysis for Evaluating Concrete Damage. J. Comput. Civ. Eng.
      2006, 20, 210–216. [CrossRef]
20.   Adhikari, R.; Moselhi, O.; Bagchi, A. Image-based retrieval of concrete crack properties for bridge inspection.
      Autom. Constr. 2014, 39, 180–194. [CrossRef]
21.   Dinh, T.H.; Ha, Q.P.; La, H.M. Computer vision-based method for concrete crack detection. In Proceedings of the
      2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), Phuket, Thailand,
      13–15 November 2016; pp. 1–6.
22.   Lim, R.S.; La, H.M.; Sheng, W. A Robotic Crack Inspection and Mapping System for Bridge Deck Maintenance.
      IEEE Trans. Autom. Sci. Eng. 2014, 11, 367–378. [CrossRef]
23.   Yamaguchi, T.; Hashimoto, S. Fast crack detection method for large-size concrete surface images using
      percolation-based image processing. Mach. Vis. Appl. 2010, 21, 797–809. [CrossRef]
24.   Nguyen, H.N.; Kam, T.Y.; Cheng, P.Y. An Automatic Approach for Accurate Edge Detection of Concrete
      Crack Utilizing 2D Geometric Features of Crack. Signal Process. Syst. 2014, 77, 221–240. [CrossRef]
25.   Kapela, R.; Sniatala, P.; Turkot, A.; Rybarczyk, A.; Pozarycki, A.; Rydzewski, P.; Wyczalek, M.; Bloch, A. Asphalt
      surfaced pavement cracks detection based on histograms of oriented gradients. In Proceedings of the 2015 22nd
      International Conference Mixed Design of Integrated Circuits Systems (MIXDES), Torun, Poland, 25–27 June 2015;
      pp. 579–584.
26.   Wi, H.; Nguyen, V.; Lee, J.; Guan, H.; Loo, Y.C.; Blumenstein, M. Enhancing Visual-based Bridge Condition
      Assessment for Concrete Crack Evaluation Using Image Processing Techniques. IABSE Symp. Rep. 2013, 101, 479–480.
      [CrossRef]
27.   Zhao, H.; Ge, W.; Li, X. Detection of crack defect based on minimum error and pulse coupled neural networks.
      Chin. J. Sci. Instrum. 2012, 33, 637–642.
28.   Abdel-Qader, I.; Pashaie-Rad, S.; Abudayyeh, O.; Yehia, S. PCA-Based algorithm for unsupervised bridge
      crack detection. Adv. Eng. Softw. 2006, 37, 771–778. [CrossRef]
29.   Lattanzi, D.; Miller, G.R. Robust Automated Concrete Damage Detection Algorithms for Field Applications.
      J. Comput. Civ. Eng. 2014, 28, 253–262. [CrossRef]
30.   Bu, G.P.; Chanda, S.; Guan, H.; Jo, J.; Blumenstein, M.; Loo, Y.C. Crack detection using a texture analysis-based
      technique for visual bridge inspection. Electron. J. Struct. Eng. 2015, 14, 41–48.
31.   Chaudhury, S.; Nakano, G.; Takada, J.; Iketani, A. Spatial-Temporal Motion Field Analysis for Pixelwise
      Crack Detection on Concrete Surfaces. In Proceedings of the 2017 IEEE Winter Conference on Applications
      of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 336–344.
Sensors 2018, 18, 3042                                                                                             13 of 13

32.   Li, G.; Zhao, X.; Du, K.; Ru, F.; Zhang, Y. Recognition and evaluation of bridge cracks with modified active
      contour model and greedy search-based support vector machine. Autom. Constr. 2017, 78, 51–61. [CrossRef]
33.   Qian, B.; Tang, Z.; Xu, W. Pavement crack detection based on sparse autoencoder. Trans. Beijing Inst. Technol.
      2015, 35, 800–804.
34.   Schmugge, S.J.; Rice, L.; Nguyen, N.R.; Lindberg, J.; Grizzi, R.; Joffe, C.; Shin, M.C. Detection of cracks in nuclear
      power plant using spatial-temporal grouping of local patches. In Proceedings of the 2016 IEEE Winter Conference
      on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–7.
35.   Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006,
      313, 504–507. [CrossRef] [PubMed]
36.   Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006,
      18, 1527–1554. [CrossRef] [PubMed]
37.   Maher, A.; Taha, H.; Zhang, B. Realtime multi-aircraft tracking in aerial scene with deep orientation network.
      J. Real-Time Image Process. 2018, 1–13. [CrossRef]
38.   Zhang, B.; Luan, S.; Chen, C.; Han, J.; Wang, W.; Perina, A.; Shao, L. Latent Constrained Correlation Filter.
      IEEE Trans. Image Process. 2018, 27, 1038–1048. [CrossRef] [PubMed]
39.   Luan, S.; Zhang, B.; Zhou, S.; Chen, C.; Han, J.; Yang, W.; Liu, J. Gabor Convolutional Networks. IEEE Trans.
      Image Process. 2017, 27, 4357–4366. [CrossRef] [PubMed]
40.   Zhang, B.; Yang, Y.; Chen, C.; Yang, L.; Han, J.; Shao, L. Action Recognition Using 3D Histograms of Texture
      and A Multi-Class Boosting Classifier. IEEE Trans. Image Process. 2017, 26, 4648–4660. [CrossRef] [PubMed]
41.   Li, Y.; Zhao, W.; Pan, J. Deformable Patterned Fabric Defect Detection with Fisher Criterion-Based Deep
      Learning. IEEE Trans. Autom. Sci. Eng. 2017, 14, 1256–1264. [CrossRef]
42.   Li, Y.; Ai, J.; Sun, C. Online Fabric Defect Inspection Using Smart Visual Sensors. Sensors 2013, 13, 4659–4673.
      [CrossRef] [PubMed]
43.   Li, Y.; Zhang, C. Automated vision system for fabric defect inspection using Gabor filters and PCNN.
      SpringerPlus 2016, 5, 765. [CrossRef] [PubMed]
44.   Li, Y.; Zhang, J.; Lin, Y. Combining Fisher Criterion and Deep Learning for Patterned Fabric Defect Inspection.
      IEICE Trans. Inform. Syst. 2016, 99, 2840–2842. [CrossRef]
45.   LÉcun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
      1998, 86, 2278–2324. [CrossRef]

                         © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
                         article distributed under the terms and conditions of the Creative Commons Attribution
                         (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
You can also read