A Thrifty Annotation Generation Approach for Semantic Segmentation of Biofilms

Page created by Jaime Chen
 
CONTINUE READING
A Thrifty Annotation Generation Approach for Semantic Segmentation of Biofilms
2020 IEEE 20th International Conference on BioInformatics and BioEngineering (BIBE)

           A Thrifty Annotation Generation Approach for
                Semantic Segmentation of Biofilms
          Adithi D. Chakravarthy                                   Parvathi Chundi                                   Mahadevan Subramaniam
              College of IS&T                                      College of IS&T                                        College of IS&T
      University of Nebraska at Omaha                      University of Nebraska at Omaha                        University of Nebraska at Omaha
             Omaha, NE, USA                                       Omaha, NE, USA                                         Omaha, NE, USA
       achakravarthy@unomaha.edu                               pchundi@unomaha.edu                                 msubramaniam@unomaha.edu

                       Shankarachary Ragi                                                              Venkata R. Gadhamshetty
               Department of Electrical Engineering                                               Civil & Environmental Engineering
            South Dakota School of Mines & Technology                                         South Dakota School of Mines & Technology
                      Rapid City, SD, USA                                                                Rapid City, SD, USA
                  shankarachary.ragi@sdsmt.edu                                                     venkata.gadhamshetty@sdsmt.edu

      Abstract— Recent advances in semantic segmentation using
  deep learning methods have achieved promising results on several
  benchmark datasets. However, the primary challenge involved in                 equipment to be generated, and must be further annotated by
  such segmentation approaches is the availability of applicable                 multiple physicians or engineering scientists. Annotating
  training data. Since only experts are equipped to effectively
                                                                                 multiple entities in each image such as bacterial cells or biofilms
  annotate (or label) any available data for training semantic
                                                                                 on materials (technologically relevant metals, polymers and in
  segmentation networks, the effort and cost involved can be
  considerable, especially for larger datasets. In this paper, we aim
                                                                                 certain cases living substances such as human skin and tissue) is
  to address this problem by proposing a Thrifty Annotation                      a time consuming and tedious task. Consequently, a common
  Generation approach that records high performance on                           limitation of image segmentation in these domains is that
  segmentation networks with minimal expert effort and cost                      datasets include scarce (not enough training examples) or weak
  (intervention). We present a deep active learning framework that               image annotations (training examples are annotated at the image
  combines the use of marker-controlled watershed (MC-WS)                        level and no annotation is available at the pixel level) resulting
  algorithm to generate pseudo labels for segmentation networks (U-              in limited training data. In these settings, even the most
  Net) and active learning to significantly minimize effort and cost             advanced image segmentation models may fail to generalize
  by selecting only the most impactful training data for labeling. We            from training examples to real-world scenarios. Therefore, it is
  built the initial U-Net model by generating pseudo labels for the              important to develop solutions that can deal with scarce or weak
  training data using MC-WS. We then make use of the uncertainty                 image annotations for semantic segmentation.
  information (entropy) of each image provided by the U-Net to
  determine the most uncertain or effective images for expert                        In this paper, we propose a technique, called thrifty
  labeling. We evaluated the TAG approach using the 2012 ISBI                    annotation generation (TAG) based on the cost-effective
  Challenge dataset for 2D segmentation and a novel Biofilm                      annotation approach to build a model for semantic segmentation
  dataset. Our approach achieved promising segmentation accuracy                 of datasets for which no manual annotations are available. We
  (IoU) and classification accuracy with minimal expert                          focus on datasets where the foreground is an object of interest
  intervention. The results of our experiments also indicate that the            (such as neural cells or a biofilm on a background material
  TAG approach can be generalized to achieve high-performance                    surface). The TAG approach is based on the semi-supervised
  segmentation results on any dataset using minimal expert effort                learning with pseudo labels. It first generates pseudo labels
  and cost.                                                                      (      ) by applying the popular watershed segmentation
                                                                                 algorithm on a given unlabeled dataset . The pseudo labels are
     Keywords— Watershed algorithm, Semantic segmentation,                       used to train a model,   for semantic segmentation of .
  Pseudo labels, Biofilms, and Active learning.
                                                                                     The TAG approach uses a cost-effective active learning
                          I. INTRODUCTION                                        method based on entropy to choose images from for which
      Semantic segmentation of two-dimensional (2D) images is                    labels are obtained from experts. If the model obtained using
  one of key problems in computer vision applications in medical                        were successful in identifying distinct features within an
  and bioengineering fields. Recently semantic segmentation has                  image, then classification probabilities output by model       on
  made much progress due to the design and performance of deep                   that image would not be noisy. So, the TAG approach chooses
  convolutional models for image segmentation [1], [2]. However,                 those images whose classification probabilities when labeled by
  these advancements require large, high-quality annotated data                      have large entropy as candidates for expert annotation. The
  sets which are expensive to acquire particularly in medical and                pseudo labels are replaced by expert labels for these images to
  bioengineering domains where images need expensive                             train another classifier for semantic segmentation.

2471-7819/20/$31.00 ©2020 IEEE                                             602
DOI 10.1109/BIBE50027.2020.00103

             Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.
A Thrifty Annotation Generation Approach for Semantic Segmentation of Biofilms
We study if the proposed TAG approach would be effective                                          II. RELATED WORK
at all by first simulating it on a scarce annotation setting. Here,
the dataset contains a large number of labeled images with                      A. Watershed Transform
ground truth labels ( ) obtained from experts. However, only                        Due to the interesting properties of the watershed transform
a small number (up to 10%) of the images from              are used             [3], its application has been very useful especially in medical
each time, on-demand, in training to mimic a scarce annotation                  image segmentations [4]–[6]. However, a well-known
setting. The scarce annotation dataset simulation study                         challenge of the watershed transform reported in earlier works
establishes the legitimacy of the proposed TAG approach, which                  was over-segmentation. In this paper, we utilize a marker-
is then be used on datasets with no annotations.                                controlled watershed algorithm (MC-WS) [7] to alleviate over-
   So, we conduct the following two studies                                     segmentation and obtain initial pseudo labels for training
                                                                                images without labels.
   x     Scarce Annotation Simulation Study: Let             be the
         model constructed using the labeled dataset            and             B. Active Learning
              be the (classification or segmentation) accuracy                      Similar to active learning frameworks [8], [9], the TAG
         of      . Generate the labeled dataset         by applying             employs an iterative approach where in each iteration the current
         watershed segmentation on all the unlabeled images .                   model is applied to classify (segment) a set of unlabeled
         Let      be the model built using the dataset             .            instances out of which a few are selected for manual annotation
         Iterative improvement: Evolve model         to      . Done             based on the uncertainty of the model, and added to the training
         only when the accuracy of          built using the current             set to generate the next model. Recently there has been a lot of
                is less than    . Iterative improvement selects a               interest in developing deep active learning approaches with
         few images with maximum noise from the output of                       CNN-based network for semantic segmentation of medical
         model , produces the next            by replacing pseudo-              images [10]–[12] given the high cost and potential variability
         labels in the current            by labels from        and             manual image annotations.
         generates the model         .                                              The TAG active deep learning approach differs from these
   x     No Annotation Study: In this case there are no                         deep active learning approaches in using an automated
         annotations, i.e.,     is empty. Generate          using               segmentation method such as the watershed to generate a
         watershed and       and iteratively generate next model                preliminary set of annotations for the entire training dataset.
         and next        by identifying images with highest noise               Unlike the above approaches, the TAG can choose for correction
         in the current model output and ask an expert to provide               either the output of the watershed method or the output of the
         labels for these images. The expert annotated images are               model, whichever has a lower entropy. The TAG approach
         added to      . We continue the process of successive                  allows for varying amounts of expert annotations resulting in a
         refinement as long as the average noise in the                         model that has been trained mostly on pseudo annotations unlike
         classification probabilities of the model output                       the above approaches that require some form of human input for
         decreases. Finally, the model with the best accuracy,                  each training data item. Further, the use of automated segmented
         which is determined using accumulated , is output.                     images for training the model not only reduces the burden on
                                                                                human annotators for deep networks but also has the potential to
     In both above situations, the TAG approach is guaranteed to                reduce the inter-rater variability. To the best of our knowledge
terminate since only a finite number of pseudo label to expert                  our work is the first application of active deep learning for
label replacements are possible. We consider the TAG approach                   semantic segmentation of biofilms in material science domain.
to be legitimate for a scarce annotation and therefore, applicable
to a no annotation study if the model output in the scarce                                                III. APPROACH
annotation simulation, i) achieves accuracy           ± , for a                 A. Datasets
small chosen threshold , ii) the average noise in the successive
model outputs is non-increasing and iii) the dataset          , the                 1) EM Dataset
subset of the dataset       used across iterations, | | ≪ | |,                      The Electron Microscope (EM) data is a set of grayscale
i.e., size of    is much smaller than the size of    .                          images (512 × 512 pixels) from a serial section Transmission
                                                                                Electron Microscopy data set of the Drosophila first instar larva
   We applied the TAG approach to two datasets (one scarce                      ventral nerve cord [13]. This dataset was published as part of the
annotation simulation study and one no annotation study). The                   IEEE ISBI 2012 challenge on 2D segmentation. The goal of the
models built using the TAG approach were able to achieve                        challenge was to determine the boundary map (or binary label)
greater than 80% segmentation accuracy with less than 7%                        of each grayscale image, where “1” or white indicates a pixel
expert effort.                                                                  inside a cell, and “0” indicates a pixel at the boundary between
    The rest of this paper is organized as follows. Section II                  cross sections. A binary label was considered equivalent to a
discusses related work on semantic segmentation with scarce                     segmentation of the image. The ground truth binary labels for
labels. The description of datasets, the TAG approach, pre-                     the training images were provided as part of the challenge.
processing of the images using the watershed method along with                     2) BF Dataset
the network architectures used for building models are described                   The Biofilm (BF) dataset consists of Scanning Electron
in Section III. Experimental results are presented in Section IV                Microscope (SEM) images of Desulfovibrio alaskensis G20
and conclusions are discussed in Section V.                                     (DA-G20, a sulfate reducing bacteria (SRB) and their biofilm

   This work is partially supported by the NSF grant #1920954.

                                                                          603

            Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.
A Thrifty Annotation Generation Approach for Semantic Segmentation of Biofilms
grown on bare mild steel surfaces in batch microbiologically                              sure foreground and sure background regions. Marker labeling
influenced corrosion (MIC) experiments). The details of the                               was implemented by labeling all sure regions with positive
growth procedures and biocorrosion tests were discussed in [14].                          integers and labeling all unknown (or boundary) regions with a
Owing to its high ductility, weldability, and low cost, mild steel                        0. Finally, watershed was applied on the maker image to modify
remains a popular choice of metal in in civil infrastructure,                             the boundary region to obtain the watershed segmentation mask
transportation and oil and gas industry applications, and routine                         or binary label of the image.
applications. However, under aqueous conditions, mild steel is
susceptible to MIC caused by microorganisms including SRB.                                C. U-Net for Segmenation
The goal of semantic segmentation of BF dataset is to identify                                The U-Net [16] is an improved FCN which consists of an
the shape and size of each bacterial cell or a cluster of cells from                      encoder (contracting path) and decoder (expansive path)
each image to detect and track metal corrosion.                                           designed specifically to perform segmentation tasks on medical
                                                                                          images. The contracting path is a stack of convolutional and
                                                                                          max-pooling layers where high-level semantic information at
                                                                                          each layer is acquired, while the expansive path recovers the
                                                                                          spatial information of the image at each layer using transposed
                                                                                          convolutions. Bottleneck layers combine the information from
                                                                                          the contracting and expansive paths by concatenating the feature
                                                                                          maps, resulting in a symmetrical network in contrast to
                                                                                          traditional FCNs. The U-Net architecture used in this paper is
                                                                                          similar to the one proposed by Ronneberger et. al [16] and
                                                                                          accepts a set of unlabeled images with corresponding binary
                                                                                          labels as input to train a model.
                                                                                          D. TAG Algorithm
                                                                                               The inputs to the TAG algorithm are – a set of unlabeled or
                                                                                          original images = { … } and an optional set of ground
                                                                                          truth binary labels,        = { , … , }, ≤ . The output of
Fig. 1. (A)–(C) depict an EM dataset original unlabeled image, the                        the algorithm is a model and (binary) labels that semantically
coresponding watershed binary label of (A), and the corresponsing ground truth            segment unlabeled images . In this paper, we focus on the
binary label of (A) respectively. (D), (E) depict a BF dataset original unlabeled         binary segmentation of image pixels. The TAG employs an
patch, and the corresponsing watershed binary label of the original patch.                iterative algorithm that uses a sequence of training set of pseudo
                                                                                          labels = ( , … , ), to build a sequence of models
B. Pre-Processing and Watershed Segmentation                                              ( , … , ), that are used to produce a sequence of sets of
    Every input image in the EM and BF datasets was considered                            binary labels = ( , … , ) for . The model              is generated
as an unlabeled image. Contrast limited adaptive histogram                                at the     iteration using the training set . The model is applied
equalization [15] was applied to improve edge definitions and                             to the set to generate a set of binary labels = ( , … , ).
contrast. To account for the low data volume in the BF dataset,                           The binary labels,          are used to segment (annotate) the
each unlabeled training image was divided into non-overlapping                            corresponding images in using model . The algorithm
patches each 128 × 128 pixels in height and width. Next a                                 uses a set of ground truth labels to successively refine the pseudo
marker-controlled watershed algorithm (MC-WS) [7] along                                   labels. Let       denote the set of ground truth labels used across
with distance transform was applied to the processed EM images                            all the iterations. Initially,    = {}.
and patches from the BF dataset respectively to automatically
generate binary labels corresponding to each image and each                                   The TAG algorithm also takes a parameter as its input,
patch. Finally, every patch and its corresponding binary label                            which specifies the number of images for which are obtained
from the BF dataset was resized to 512 × 512 pixels in height                             (from experts) in each step of the iteration.
and width. We use the term image to refer to images as well as                            The main steps of the TAG algorithm are given below:
patches henceforth in the paper.
                                                                                             1.   Generate Initial Model ( ): Create an ensemble of
    Noise and local irregularities often lead to over-
                                                                                                  three watershed segmentations         ,     ,      and
segmentation while using watershed transform. The MC-WS
                                                                                                  apply it to the set              to generate labels
enhancement is used to flood the topographic image surface
                                                                                                         ,     ,      . Use majority voting to determine
from a pre-defined a set of markers, thereby preventing over-
                                                                                                  the initial set of pseudo binary labels, . Train a
segmentation. To apply MC-WS on each image, an approximate
                                                                                                  segmentation network on pair < , > to obtain
estimate of the foreground objects in the image was first found
using binarization. White noise and small holes in the image                                      initial model .
were removed using morphological opening and closing,                                        2.   Generate Next Models using Experts: Apply model
respectively. In order to extract the sure foreground region of the                                  (1 ≤ ≤ ) to to generate the next set of binary
image, distance transform was then used to apply a threshold. In                                  labels . Each element of is a set of binary labels,
order to extract the sure background region of the image, dilation                                one per image in . Identify elements, ( , … , )
was applied on the image. Finally, the boundaries of the                                          from with the highest entropy value, calculated using
foreground objects were computed as the difference between the                                    prediction confidence values obtained from output of

                                                                                    604

             Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.
. Obtain expert annotated binary labels ( , … , )                                  The experiments were carried out in a two-stage approach –
        corresponding to each of these elements of U. Add                                 by 1) evaluating the TAG approach using the EM dataset and 2)
        ( , … , ) to        . Generate the next training set                              studying the effectiveness of the TAG approach on the BF
        by replacing binary labels corresponding to ( , … , )                             dataset. For both EM and BF datasets, thresholds of 100, 110
        in the training set with expert annotated ground truth                            and 120 were used while implementing MC-WS. Note that
        binary labels ( , … , ) respectively and generate                                 model      built on the < , > pair was built using the initial
                                                                                          pseudo labels,        for any value of . Fig. 1 (A) and (B) show
        next model         by training the segmentation network
                                                                                          an unlabeled image from the EM dataset and its pseudo label in
        on pair < ,        >.
                                                                                          T1. Fig. 1 (D) and (E) show an unlabeled image from the BF
   3.   Test and Terminate: Apply           on to generate the                            dataset and its pseudo label in .
        next set of masks            . When            (     )>
                                                                                          B. Evaluation Metrics
                  ( ) , i.e., the confidence of model          is
        lesser than that of       stop. The decrease in model                                 We evaluated the results of the TAG approach using
        confidence indicates that the model is unable to learn                            intersection over union (IoU) and classification accuracy. IoU
                                                                                          (also known as segmentation accuracy) measures the percentage
        any new patterns during training at the ( + 1)
                                                                                          of overlap between the ground truth labels and the predicted
        iteration. Evaluate the performance of all + 1 models
                                                                                          outputs given by (2) below. IoU is preferred over classification
        using intersection over union (IoU) and accuracy. The
                                                                                          accuracy when there are only a few pixels in an image
        accuracy of the models is calculated using all available
                                                                                          representing objects. In such a case the overlap between the
        ground truth labels (i.e.,      ∪     ). Choose the
                                                                                          ground truth and the prediction pixels measured how many of
        with the highest mean IoU and mean accuracy as the
                                                                                          the pixels representing objects were classified correctly by the
        best or most thrifty model to obtain binary labels for
                                                                                          model. Classification accuracy, given in (3) includes both true
        using the least expert intervention.
                                                                                          negatives and true positives, giving a more balanced measure of
  Entropy, a measure of image information content can be                                  the model performance.
understood as the average degree of uncertainty in the image.
Higher entropy values highlight images in the data that are                                         (                                 )=              
important or interesting in terms of exhibiting more variation or                                                                 =                   
change in their local neighborhood compared with other images.
The entropy of an image is found by applying the following                                C. EM Dataset
formula to the entire image:                                                                  For the EM dataset, we used the values = {1, 3} to apply
                          −∑          log                the TAG algorithm and generated three more models       ,
                                                                                          and      for every value of . The values of were chosen to
where is the number of gray levels (usually 256 for 8 bit                                 reflect the minimum possible value ( = 1) and 10% ( = 3) of
images but in this paper, we bucketed 256 levels further into 10                          the training set.
levels), is the probability of a pixel having gray level and
is the base of the logarithm function (here = 2 ). Above,
         denotes the mean entropy over a set of images.
    Note that in the scarce annotation simulation study, the
inputs of the algorithm include the optional input         , where
| | = . In this case, the ground truth labels in each iteration
are generated by a simple lookup of           and these labels are
accumulated in      . In the no annotation study, | | = 0 and                             Fig. 2. Output of TAG approach on Fig. 1(A). (A) Binary label        from
the ground truth labels in each iteration are obtained by querying                           , = 1 and (B) Segmentation of Fig. 1(A) using binary label . (C) Binary
                                                                                          label    from , = 1 (D) Segmentation of Fig. 1(A) using binary label      .
the expert. The size of the set        provides a measure of the
human annotation effort involved.                                                             In order to generate models            for = {1,3} we first
                IV. EXPERIMENTS & RESULTS                                                 computed the entropy for all binary labels obtained from
                                                                                          using the entropy formula given by (1). binary labels with the
A. Setup                                                                                  highest entropy were then picked to be replaced with the
    Training for the U-Net models was implemented using Keras                             corresponding expert annotations in set to generate training
with a Tensorflow backend as the deep learning framework on                               set     , i.e., for = 1, binary label     with highest entropy was
an Ubuntu workstation with 12-Core Intel iO-9920x and 128GB                               replaced with corresponding           (expert annotations)      to
RAM. A random selection of             × 0.3 was used in the                              generate training set      for training     . Similarly, for = 3,
   iteration for validation within 25 epochs having a batch size                          binary labels ( ,                  ) with top-3 highest entropies
of 16 and the prediction of the model was tested on . The                                 were replaced with corresponding expert annotations
model was then compiled with Adam [17] optimizer using                                    ( ,                 ) to generate training set    for training    .
binary cross entropy loss function since each pixel gets either a                         To generate training set for training , binary labels with
“0” or “1” value. We used early-stop mechanism on the                                     highest entropy are replaced in for corresponding values.
validation set to avoid over-fitting.

                                                                                    605

           Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. (A) Classification accuracy of all models on EM dataset (B) Segmentation accuracy (IoU) and (B) Classification accuracy of all models on BF dataset while
applying the TAG approach.

    By picking the binary labels with higher entropy we                                 The image level comparison of IoU values between binary
intuitively replaced the labeled images exhibiting high                             labels generated by the model , = 1 and          built using all
classification uncertainty with the corresponding ground truth                      of the ground truth labels can be found in Fig. 4. The figure
labels to train the next model thereby reducing – 1) the overall                    shows two bar plots for each training image in the EM dataset.
uncertainty of segmentation output and 2) the need for expert                       The Y-axis plots the IoU values. The height of each bar shows
annotations for all input images.                                                   the IoU between the binary label generated from the model (
                                                                                    or ) and the ground truth label. The mean IoU of , = 1
    Fig. 2 illustrates the output of models constructed in an
                                                                                    surpassed the mean IoU of       by 0.7%. However, the IoU of
iterative manner for one image, , depicted in Fig. 1(A) from
                                                                                        is higher than that of , = 1 for 12 out of 30 images. The
the EM dataset. Fig. 2(A) and Fig. 2(B) show the binary label
                                                                                    IoU of     is lower than that of , = 1 for 9 out of 30 images.
     of   from      , = 1 and the segmentation of      using 21 .
                                                                                    For the remaining 9 images, the IoU values computed from both
Fig. 2(D) shows the label generated by the model         , = 1,
                                                                                    models are approximately the same.
constructed in the iterative step, Step 2 of the TAG approach.

   TABLE I.         ENTROPY CHANGE ON APPLYING THE TAG APPROACH
                         EM Dataset                     BF Dataset
   Model                    = .                            = .
                     =                =             =                =
                   2.568           2.658           1.855          1.721
                   2.423           2.465           1.589          1.712
                   2.576           2.897           1.581          2.917
                     -               -             1.883            -

    From Table I. we can observe the entropy values of the
segmentation labels for each model constructed iteratively by                       Fig. 4. Segmentation accuracy (IoU) of         .   , = 1 on the EM dataset
the TAG approach. Initially, the binary labels computed from                        while applying the TAG approach.
    had the highest mean entropy of 3.029. Model , = 1 has
the lowest mean entropy of 2.423. However, the mean entropy                             Although Fig. 3(A) and Fig. 4 show a weak link between
of predicted labels for , = 1 increases to 2.576 and the TAG                        mean entropy and mean classification accuracy, i.e., higher the
approach terminates. We also observe how mean entropy values                        entropy, lower the classification accuracy, the link did not hold
decrease for models ,       , and increases for  when = 3.                          when we ran more experiments. More study is required to
The TAG approach terminates at this point.                                          establish the presence or absence of a relationship between these
                                                                                    two measures.
    Fig. 3(A) shows the classification accuracy of all the models
computed by the TAG approach for the EM dataset. The mean                               From these experimental results, we established that the
classification accuracy of model       (constructed using all of the                model computed by the TAG approach generated binary labels
                                                                                    with optimal IoU values for 70% of the training images. It also
ground truth labels) is the highest at 0.828. Model , = 1 has
                                                                                    has the lowest entropy as well as the highest IoU and mean
recorded the highest mean classification accuracy at 0.832 of all
                                                                                    accuracy, higher than the mean accuracy obtained from using all
the models computed by the TAG approach. This is slightly
                                                                                    of the available ground truth labels ( ). Thus, we achieved the
higher (0.004) than that of            which may be somewhat
                                                                                    best performance using < 7% expert intervention (only 2 labels
surprising and needs further study.
                                                                                    out of 30        labels and |      | = 2 ). These experiments
    Although the mean entropy for      , = 1 and , = 3 are                          establish the legitimacy of the TAG approach.
close to     , = 1 (2.576 and 2.465 respectively), their mean
classifications accuracies are significantly lower (0.649 and                       D. BF Dataset
0.799 respectively) than      , = 1 and they involve more                               For the BF dataset, we adjusted the values since the size of
replacements (or additional expert intervention). Hence their                       the BF dataset was different than that of the EM dataset to =
performance is not optimal in line with our ‘thrifty’ approach.                     {1, 16} consistent with using the minimum possible value and

                                                                              606

            Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.
10% of the training set to apply the TAG algorithm. We first                        using < 7% expert intervention. Next, we applied the TAG
compute model       using the pseudo labels obtained from the                       approach on a novel Biofilm dataset and attained an IoU of 0.809
MC-WS algorithm. Then, we generated three more models           ,                   using < 2% expert intervention. To the best of our knowledge
    and     for = 16 and four more models           ,   ,    and                    this is the first application of active deep learning for semantic
    for = 1 to reach the terminating condition for the TAG                          segmentation of biofilms, specifically the microbial corrosion
approach. Note that we could not compute       for the BF dataset                   domain. The results of our extensive experiments using the TAG
as we did not have access to    for the entire dataset.                             approach demonstrated that high-performance segmentation
                                                                                    output can be achieved on any dataset with limited or minimal
    Table I. illustrates the entropy results of the TAG approach                    expert effort and cost.
on the BF dataset.       had the lowest mean entropy of 1.048.
Since we could not compute , we needed to construct more                                We plan to study the proposed TAG approach further by
models to see the gradient of mean entropy. For = 1, mean                           evaluating it on more benchmark datasets and fine-tuning the U-
entropy of      increases to 1.855 and drops to 1.581 at     only                   Net architecture to achieve state-of-the-art performance. We
to spike to 1.883 at         meeting the terminating condition.                     also plan to evaluate the model performance in terms of other
Similarly, for = 16, mean entropy of         decreases from 1.721                   metrics like pixel errors, and random errors.
( ) to 1.712 and increases to 2.917 at               meeting the
terminating condition.                                                                                                 REFERENCES
                                                                                    [1]   Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation
    Although all of these models had higher mean entropy                                  using deep neural networks,” Int. J. Multimed. Inf. Retr., vol. 7, no. 2, pp. 87–93,
values when compared to , we needed additional measures                                   Jun. 2018.
such as IoU and classification accuracy to establish the best                       [2]   N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, and X. Ding, “Embracing
model. In contrast to the EM dataset, the BF dataset came with                            Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image
                                                                                          Segmentation.”
only 50% ground truth binary labels, . Hence, we used this
                                                                                    [3]   A. Kornilov and I. Safonov, “An Overview of Watershed Algorithm
50%      to estimate IoU and accuracy for all the models as                               Implementations in Open Source Libraries,” J. Imaging, vol. 4, no. 10, p. 123, Oct.
shown in Fig. 3(B) and Fig. 3(C). , = 1 recorded the highest                              2018.
mean IoU of 0.809 and mean accuracy of 0.626 across all                             [4]   H. P. Ng, S. H. Ong, K. W. C. Foong, P. S. Goh, and W. L. Nowinski, “Medical
models. Though , = 16 obtained a high mean IoU of 0.772,                                  Image Segmentation Using K-Means Clustering and Improved Watershed
the mean accuracy of the model was only 0.464.                                            Algorithm,” 2006 IEEE Southwest Symp. Image Anal. Interpret., no. February 2001,
                                                                                          pp. 61–65, 2001.
    The difference in segmentation outputs of              for both                 [5]   V. Grau, R. Kikinis, M. Alcañiz, and S. K. Warfield, “Cortical gray matter
values of can be seen in Fig. 5. Moreover, , = 16 took a                                  segmentation using an improved watershed transform,” in Annual International
total 48 label replacements over four iterations to achieve a mean                        Conference of the IEEE Engineering in Medicine and Biology - Proceedings, 2003,
                                                                                          vol. 1, pp. 618–621.
IoU comparable to        , = 1, whereas        , = 1 took only 3
label replacements (|      | = 3) over four iterations to record the                [6]   V. Grau, A. U. J. Mewes, M. Alcañiz, R. Kikinis, and S. K. Warfield, “Improved
                                                                                          watershed transform for medical image segmentation using prior information,”
highest mean IoU and mean accuracy. Thus, we chose             , =                        IEEE Trans. Med. Imaging, vol. 23, no. 4, pp. 447–458, Apr. 2004.
1 as the thriftiest model to obtain binary labels for segmentation                  [7]   F. Meyer and S. Beucher, “Morphological segmentation,” J. Vis. Commun. Image
using MC-WS and < 2% expert intervention i.e., experts                                    Represent., vol. 1, no. 1, pp. 21–46, Sep. 1990.
provided       labels only for 3 out of the 160 training images.                    [8]   D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning with statistical
                                                                                          models,” J. Artif. Intell. Res., vol. 4, pp. 129–145, Mar. 1996.
                                                                                    [9]   B. Settles, “Computer Sciences Department Active Learning Literature Survey,”
                                                                                          2009.
                                                                                    [10] L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen, “Suggestive Annotation: A
                                                                                          Deep Active Learning Framework for Biomedical Image Segmentation,” Lect. Notes
                                                                                          Comput. Sci., vol. 10435 LNCS, pp. 399–407, Jun. 2017.
                                                                                    [11] M. L. di Scandalea, C. S. Perone, M. Boudreau, and J. Cohen-Adad, “Deep Active
                                                                                          Learning for Axon-Myelin Segmentation on Histology Data,” Jul. 2019.
Fig. 5. Output of TAG approach on Fig. 1 (D). (A) Binary label     from ,           [12] T. Kim et al., “Active learning for accuracy enhancement of semantic segmentation
  = 16 and (B) Segmentation of Fig. 1 (D) using binary label   . (C) Binary               with CNN-corrected label curations: Evaluation on kidney segmentation in
label    from , = 1 (D) Segmentation of Fig. 1 (D) using binary label     .               abdominal CT,” Sci. Rep., vol. 10, no. 1, pp. 1–7, Dec. 2020.
                                                                                    [13] I. Arganda-Carreras et al., “Crowdsourcing the creation of image segmentation
               V. CONCLUSIONS & FUTURE WORK                                               algorithms for connectomics,” Front. Neuroanat., vol. 9, no. November, pp. 1–13,
                                                                                          Nov. 2015.
    In this paper, we presented a new framework for biomedical                      [14] G. Chilkoor et al., “Maleic anhydride-functionalized graphene nanofillers render
image and biofilm segmentation by combining semi-supervised                               epoxy coatings highly resistant to corrosion and microbial attack,” Carbon N. Y.,
learning and active learning with minimal expert intervention.                            vol. 159, pp. 586–597, Apr. 2020.
Our new method provides two main contributions – (1) a MC-                          [15] K. Zuiderveld, “Contrast Limited Adaptive Histogram Equalization,” in Graphics
WS based approach that can generate pseudo labels for images                              Gems, 1994, pp. 474–485.
without annotation labels, to build segmentation models; (2) a                      [16] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
                                                                                          biomedical image segmentation,” in Lecture Notes in Computer Science, 2015, vol.
cost-effective (thrifty) annotation generation approach that can                          9351, pp. 234–241.
then direct expert intervention to the most effective label areas
                                                                                    [17] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in 3rd
to achieve high-performance segmentation output. We first                                 International Conference on Learning Representations, ICLR 2015 - Conference
validated the TAG approach using the 2012 ISBI Challenge                                  Track Proceedings, 2015.
dataset for 2D segmentation and achieved an mean IoU of 0.807

                                                                              607

            Authorized licensed use limited to: ASU Library. Downloaded on June 23,2021 at 15:38:12 UTC from IEEE Xplore. Restrictions apply.
You can also read