Quality Control of Weather Radar Data Using Texture Features and a Neural Network

Page created by Bradley Gardner
 
CONTINUE READING
1

      Quality Control of Weather Radar Data Using
        Texture Features and a Neural Network
                       V Lakshmanan1 , Kurt Hondl2 , Gregory Stumpf1 , Travis Smith1

   Abstract— Weather radar data is subject to many              to [7]. Local neighborhoods in the vicinity of ev-
contaminants, mainly due to non-precipitating tar-              ery pixel in the three weather radar moments were
gets (such as insects and wind-borne particles) and             examined by [2] and used for automated removal of
due to anamalous propagation (AP) or ground clut-               non-precipitating echoes. They achieved success by
ter. Although weather forecasters can usually iden-
                                                                examining some local statistical features (the mean,
tify, and account for, the presence of such con-
tamination, automated weather algorithms are af-                median, and standard deviation within a local neigh-
fected drastically. We discuss several local texture            borhood of each gate in the moment fields) and a few
features and image processing steps that can be                 heuristic features. [7] introduced the “SPIN” which is
used to discriminate some of these types of con-                the fraction of gate-to-gate differences in a 11x21 local
taminants. None of these features by themselves                 neighborhood that exceed a certain threshold (2dBZ
can discriminate between precipitating and non-                 in practice) to the total number of such differences. [2]
precipitating areas. A neural network is used for
                                                                introduced the “SIGN”, the average of the signs of the
this purpose. We discuss training this neural net-
work using a million-point data set, and accounting             gate-to-gate difference field within the local neighbor-
for the fact that even this data set is necessarily in-         hood. [7] used a decision tree to classify pixels into
complete.                                                       two categories – precipitation and non-precipitating
                                                                while [2] used a fuzzy rule base using features that
                                                                included the SPIN feature introduced by [7]. In addi-
                   I. Introduction                              tion to these elevation-based features, some vertical-
   From the point of view of automated applications             profile features were also used – the maximum height
operating on weather data, echoes in radar reflec-              of a 5dBZ echo was used by [7]. [2] discussed the use of
tivity may be contaminated. These applications re-              vertical differences between the two lowest reflectivity
quire that echoes in the radar reflectivity moment              scans.
correspond, broadly, to “weather”. By removing                     Neural networks (NNs) have been utilized in a vari-
ground clutter contamination, estimates of rainfall             ety of meteorological applications. For example, NNs
from the radar data using the National Weather                  have been used for prediction of rainfall amounts by [8]
Service (NWS) Weather Surveillance Radar-Doppler                and for identification of tornados by [9]. In fact, [10]
1998 (WSR-88D) can be improved [1], [2]. A large                attempted to solve the radar quality problem using
number of false positives for the Mesocyclone Detec-            neural networks. However, the performance of the
tion Algorithm [3] are caused in regions of clear-air re-       neural network was no better than a fuzzy logic clas-
turn [4]. A hierarchical motion estimation technique            sifier (Kessigner, personal correspondence), and the
segments and forecasts poorly in regions of ground              neural network attempt was dropped in favor of the
clutter [5], [6]. Hence, a completely automated al-             much more transparent fuzzy logic approach described
gorithm that can remove regions of ground clutter,              in [2]. We propose some rationale for why our neural
anamalous propagation and clear-air returns from the            network approach achieves significantly better results
radar reflectivity field would be very useful in improv-        than the network developed by [10] in Section III-A.
ing the performance of other automated weather al-
gorithms.                                                                   II. The Neural Networks
   For a good review of the literature on ground clut-          A. Inputs
ter contamination, the interested reader is refered               Based on the extensive literature on descriptions
 1
                                                                of AP and ground clutter [7], we chose as inputs to
   V Lakshmanan, Gregory Stumpf and Travis Smith are with
the Cooperative Institute of Mesoscale Meteorological Studies
                                                                the neural network the following: the data value, the
(CIMMS), University of Oklahoma. 2 Kurt Hondl is with the       mean, the median and the variance of each of the
National Severe Storms Laboratory, Norman, OK                   three moments (reflectivity, velocity, spectrum width)
2

at the lowest tilt of the radar. In addition, we took       a shorter range than the reflectivity one. We there-
the same four values for the second lowest tilt of the      fore divided the training pixels into two groups – one
radar. Finally, we computed some of the textural fea-       where velocity data were available and another where
tures that have been found to be useful in discrimi-        there was no Doppler velocity (or spectrum width) in-
nating between precipitation and AP/GC. These were          formation. Thus, two separate neural networks were
the SPIN [7], the gate-to-gate average square differ-       trained. In real-time operation, the appropriate net-
ence [2] and the SIGN [2]. We included the vertical         work was invoked for each pixel depending on whether
gradient (difference between the reflectivities at the      there were velocity data at that point. All the neural
two lowest scans) as a separate input to the neural         network inputs were scaled such that each feature in
network.                                                    the training data exhibited a zero mean and a unit
   In addition to these discriminants described in the      variance when the mean and variance are computed
literature, we considered a few others:                     across all patterns.
1. The maximum vertical reflectivity, over all the el-        Histograms of a few selected features are shown in
evations.                                                   Figure 1. It should be noted that these features are
2. The maximum reflectivity in the local neighbor-          not linear discriminants by any means – it is the com-
hood.                                                       bination of features that gives the neural network its
3. A weighted average of the reflectivity values over all   discriminating ability. The histogram of Figure 1d il-
the elevations where the weight of each data point is       lustrates the result of several strategies we adopt dur-
given by the height of that pixel above the radar. This     ing the training, so that higher reflectivities are not
takes into account the entire vertical profile instead of   automatically accepted.
just the first two elevations.
                                                            C. Network Architecture
4. The sum of all the heights at which an echo exists
(reflectivity value greater than 0 dBZ) at the pixel.          We used a resilient backpropagation neural network
5. The homogeneity of the reflectivity field defined as:    (RPROP) as described in [11]. There was one hidden
                                                            layer. Every input unit was connected to every hidden
                        P             1                     unit, and every hidden unit to the output unit. In ad-
                          iNxy 1+( Ixy −Ii )2
             homxy =
                                      Ixy
                                                     (1)    dition, there was a short-circuit connection from the
                          card(Nxy ) − 1                    input units directly to the output unit, to capture any
                                                            linear relationships i.e. the network was ”fully con-
where Nxy is the set of valid pixels (Ii ) in the neigh-    nected” and completely ”feed-forward”. Every hid-
borhood, Nxy , of the pixel at (x, y) in the image, Ixy     den node had a “tanh” activation function, chosen
is the pixel value and card(Nxy is the number of such       because of its signed range. The output unit had a
neighbors.                                                  sigmoidal activation function: g(a) = (1 + e−a )−1 so
6. Echo-size defined as the fraction of neighbors whose     that the outputs of the networks could be interpreted
values are within 10dBZ of this pixel’s reflectivity        as posterior probabilities [12]. Each non-input node
value.                                                      had, associated with it, a bias value which was also
7. Fraction of inflection points with inflections at 5,10   part of the training.
and 15dBZ thresholds. An inflection point is defined           The error function that was minimized was a
similar to the SPIN [7] except that the inflection is       weighted sum of the cross-entropy (which [12] sug-
defined not in a polar neighborhood, but along the          gests is the best measure of error in binary classifica-
entire radial until that point.                             tion problems) and the squared sum of all the weights
8. Echo-top height defined as the maximum height of         in the network:
reflectivity above a certain threshold. We used both                                          2
5dBZ and 10dBZ thresholds.                                                        E = Ee + λΣwij                       (2)
9. To decorrelate the data value from the mean and          The first term is a variation of the cross-entropy error
median, the difference between the data value and the       suggested by [12] and is defined as:
local mean was used.
                                                                         N
                                                                         X
                                                                Ee = −         cn (tn lny n + (1 − tn )ln(1 − y n ))   (3)
B. Computation of Inputs                                                 n=1

 Velocity data can be range-folded (aliased). In the        where tn is the target value of the nth training pattern
WSR-88D, at the lowest tilt, the velocity scan has          (0 if non-precipitating and 1 if precipitating) while y n
3

Fig. 1. Histograms of selected features on the training data set, after the features have been normalized to be of zero
mean and unit variance. (a)Homogeneity (b) Radial inflections (c) Mean spectrum width (d) Mean reflectivity (e) SPIN.
Note in (d) that, as a result of careful construction of the training set and selective emphasis, that the mean reflectivity
histograms are nearly identical – this is not the apriori distribution of the two classes since AP is rare, and clear-air
return tends to be smaller reflectivity values.

is the actual output of the neural network for that            to arbitrary accuracy [12].
pattern input. N is the total number of patterns.                 We used a testing set, independent of the training
The cost, cn , captures the importance of that pattern.        and validation sets, as decribed in Section III, and it
The second, square weights, term attempts to reduce            is this independent set that the results are reported
the size of the weights, and thus improves generaliza-         on.
tion [13]. The relative weight, λ, of the two measures
is computed every 50 epochs within a Bayesian frame-           D. Training
work with the assumption that the weights and the                 Eight volumes of WSR-88D data were selected.
errors have Gaussian distributions, such that the ra-          They covered a wide variety of weather and no-
tio of their variances gives a measure of how much to          weather scenarios. A human interpreter examined
decay the weights [14], [12]. We started by weighing           these volume scans and drew polygons using the
the sum-of-weights twice as much as the cross-entropy          WDSS-II display [16] to select “bad” echo regions.
term (λ = 2), updated λ based on the distribution of           An automated procedure used these human-generated
the weights and errors every 50 epochs and stopped             polygons to classify every pixel into the two categories
the learning process at 800 epochs. We chose the fi-           (precipitating and non-precipitating).
nal weights of the network from the epoch at which                The data we have is not representative of true apri-
the validation entropy error was minimum, as will be           ori probabilities, since each of the scenarios is a rare
discussed shortly.                                             event. Patterns are assigned different importance fac-
   The with-velocity network had 22 inputs, 5 hidden           tors cn (See Equation 3). It is easy to see that if
nodes and one output while the reflectivity-only net-          the cost factors are positive integers, the cost factor
work had 16 inputs, 4 hidden nodes and one output.             can be moved out of the error equation, by simply
                                                               repeating the nth pattern cn − 1 times. In addition
C.1 Validation
                                                               to assigning different costs, we also wished to train
   A validation set can ensure a network’s general-            the network with approximately the same number of
ization, typically through the use of early stopping           patterns in both classes. Because our dataset is nec-
methods [12]. In the neural network literature, a val-         essarily incomplete, we repeat the patterns so as to
idation set is also utilized to select the architecture of     have a balanced distribution of patterns at every re-
the neural network [15]. We used a validation set that         flectivity value. In the velocity network (a proxy for
consisted of features derived from three volume scans          pixels close to the radar), precipitating echoes are re-
that exhibited AP, convection and clear-air return.            peated d/20 times while non-precipitating echoes are
   We trained each network with three different num-           repeated d/10 times where d is the reflectivity value.
bers of hidden nodes. For each training run, we picked         Thus, AP with high-reflectivity (examples of which
the result of training at the epoch that the validation        are hard to find when training with very few radar vol-
error was at its minimum (See Figure 2). Thus, we              umes) is emphasized as are strong reflectivity cores.
used the validation set, both to determine when to             In the no-velocity network, non-precipitating echoes
stop, and to pick the final architecture of the neural         are repeated 3d/5 times. As can be seen from Equa-
network. Other than to choose the number of hid-               tion 3, the repeating of patterns has the same effect
den nodes, we did not consider any alternate network           as imposing a cost factor to each pattern. We are,
topologies since, in theory at least, a single hidden          in essence, assigning a higher cost to misclassifying
layer is enough to interpolate any continuous function         high-dBZ pixels than to misclassifying low-dBZ pix-
4

                             a                                   b                                    c
Fig. 2. Using a validation set to decide when to stop the training, and to decide on the number of hidden nodes. The
y-axis is Ee /N – see Equation 3. (a) Validation error when training the without-velocity neural network. Final choice
was 4 hidden nodes and the weights from the 310th epoch. (b) Validation error when training the with-velocity neural
network. Final choice was 5 hidden nodes and the weights from the 210th epoch. (c) Training error vs Validation error
for the final choices of hidden nodes. Note that the training error continues to decrease but the validation error starts
to increase after a while, showing that the training is becoming counter-productive.

els. The histogram in Figure 1d shows the effect of          pability of the neural network to learn the data, as
this selective emphasis.                                     measured by the probability of detection of precipita-
   Some input vectors can be classified very easily be-      tion, and the false alarm rate. We also found that the
cause they are obvious. To avoid CPU cycles both             use of the maximum in the local neighborhood hurt
in the training stage, and in the running stage, we          trainability.
pre-classify such pixels. Such pixels are not pre-              This pruning was not done in a rigorous manner.
sented to the neural network in training, and pixels         In particular, the numerous textural features were not
that match these criteria are pre-classified the same        pruned. We did not experiment with varying the set of
way in run-time as well. We discard shallow, low-            features used for each moment – it is likely that we can
reflectivity echoes and accept fast-moving and high-         use a different subset of features for the velocity than
topped echoes.                                               for the spectrum width, for example. Examination
   In addition to emphasizing some pixels and pre-           of the histograms did not yield many insights, since
classifying others, we remove a third set of pixels from     it is likely that it is a combination of features that
training altogether. In effect, we move them to an           possesses actual discrimination ability.
“ignore” category. These pixels are not presented to            The final set of features used in the network for
the network. The ignored pixels are those pixels for         which results are reported were:
which the echo size is less than 0.2. Because of the         1. Lowest scan of velocity, spectrum width and the
way the echo size is defined, small echo sizes are points    second lowest scan of reflectivity: local mean, local
associated with speckle and are at the boundaries of         variance, difference between the data value and the
storms where spatial statistics such as the mean and         mean
variance break down. To avoid the network expending          2. The lowest scan of reflectivity: local mean, local
cycles on these pixels, whose correct classification is      variance, difference between the data value and the
not of paramount interest, these pixels are not part of      local mean, REC Texture [2], homogeneity, SPIN [7],
the training at all.                                         number of inflections at a 2dBZ threshold, SIGN [2],
   In the process of training the networks, some of          echo size.
the computed inputs were removed and the neu-                3. Vertical profile of reflectivity: maximum value,
ral network re-optimized. The probability of detec-          weighted average, difference between data values at
tion of precipitating echoes and the false alarm rates       the two lowest scans, echo top height at a 5dBZ
for both networks (with-velocity and reflectivity-only)      threshold.
were noted. If removing the input had no significant            During the process of training, we also discovered
negative effect on the four statistics, the input was        that one of the training cases was essentially untrain-
permanently removed.                                         able. Rather than increase the complexity of the net-
   Using this process, it was found that retaining just      work, and risk a poor generalization, we chose to omit
the mean and variance in the local neighborhood was          part of this data case from the training. The original
enough – use of the median did not improve the ca-           reflectivity data, and the trained network’s output on
5

Fig. 3. Lowest scan of reflectivity from the KFWS radar
at 1995/04/19 03:58:51UTC and the resulting classification
of a network that included this data set in its training regi-
men. The network can learn to distinguish the AP, but not
the clear-air return to the south-east of the radar. The un-
learnable part of this volume scan (shown by the polygon)        Fig. 4. A ROC curve showing the performance of the
was removed from the training of the neural network.             neural network on the training and testing data sets. Also
                                                                 shown, for comparision, is the performance of the Radar
                                                                 Echo Classifier. Three thresholds are marked on each of
the data, are shown in Figure 3.                                 the curves – a indicates a 0.25 threshold, x a 0.5 threshold
   Finally, to improve the robustness of the local               and c a 0.75 threshold. Classifiers with curves above the
statistics being computed, we set all pixels in the re-          dashed diagonal can be considered skilled. The closer a
flectivity fields which could conceivably have had a             classifier is squashed to the left and top boundaries of the
radar return (those pixels with a height below 12km)             graph, the better it is.
which had a radar return below the noise threshold
(and was therefore set to missing) to be zero. Thus,             olds are marked , so the sensitivity of classifier perfor-
the only missing data values correspond to atmo-                 mance to the choice of threshold, as well as the effect
spheric regions which are not sensed by the radar at             of different thresholds may be gauged immediately.
all.
   Although the neural network computes the poste-               A. Comparision with Cornelius
rior probability that given the input vector, the pixel
                                                                    As mentioned in the introduction, [10] utilized a
corresponds to precipitating echoes, adjacent pixels
                                                                 neural network to solve the same radar quality prob-
are not truly independent. Hence, the final 2D polar
                                                                 lem. The network developed in this paper has a sig-
grid of posterior probabilities are mean filtered, and
                                                                 nificantly better performance. The reasons probably
it is this mean-field that is used to perform quality
                                                                 include:
control on the radar data. If the mean-field value is
                                                                 1. The choice of error function: we minimized
greater than 0.5, the pixel is assumed to have good
                                                                 a combination of cross-entropy and square-weights
precipitating data, and all elevations at that location
                                                                 whereas [10] minimized the mean absolute error, us-
are accepted. Bad data values are wiped out en-
                                                                 ing the cross-entropy only for stopping criteria. The
masse, although some researchers (e.g: [7]) use data
                                                                 cross-entropy is a better measure of performance for
from higher elevations in such cases.
                                                                 a classifier [12] and the use of a weight-decay allows
                                                                 greater generalization [13].
           III. Results and Conclusions
                                                                 2. Our use of a separate validation set to determine
   A diverse set of volume scans of weather data were            stopping criteria, whereas [10] did not, relying instead
chosen and bad echoes marked on these volume scans               on two measures of performance on the same data set.
by a human observer. The volume scans were pro-                  3. We used 4 or 5 hidden nodes, whereas [10] with
cessed using the trained neural network and using the            a lesser number of inputs than our network used 15
Radar Echo Classifier [2]. Comparisions were made on             hidden nodes.
a pixel-by-pixel basis of all pixels for which at least one      4. Our use of nearly all pixels (other than the pre-
of the elevations had a reflectivity value greater than          classified ones) in our data cases for training, whereas
zero dBZ. Performance is evaluated using the Receiver            the pixels were chosen by hand or by random sampling
Operating Characteristics (ROC) curve [15], shown in             by [10]. While a smaller selection improves training
Figure 4. In the ROC curve, the area under the curve             speed, the network is not trained on the full diversity
can be taken as a measure of classifier skill (with areas        of the data.
above 0.5 showing considerable skill). Several thresh-           5. Our use of costs, (cn in Equation 3) to direct the
6

            a                      b                      c                        d                      e                      f
Fig. 5. Testing cases: (a) A data case with significant AP (b) Edited using the neural network (c) Edited using REC.
Note that some very high-reflectivity AP values remain. (d) Typical spring precipitation (e) Edited using the neural
network (f) Edited using REC. Note that quite a few good echoes have been removed from the stratiform rain region.

network to expend its training where the errors are                 [6]    V. Lakshmanan, A Heirarchical, Multiscale Texture Seg-
less tolerable.                                                            mentation Algorithm for Real-World Scenes. PhD thesis,
                                                                           U. Oklahoma, Norman, OK, 2001.
In contrast, the additional local neighborhood and                  [7]    M. Steiner and J. Smith, “Use of three-dimensional reflec-
vertical profile features we used provide only a small,                    tivity structure for automated detection and removal of
incrementatal benefit.                                                     non-precipitating echoes in radar data,” J. Atmos. Ocea.
   As can be readily seen, the neural network out-                         Tech., no. 19, pp. 673–686, 2002.
                                                                    [8]    C. Venkatesan, S. Raskar, S. Tambe, B. Kulkarni, and
performs the fuzzy logic automated technique of [2],
                                                                           R. Keshavamurty, “Prediction of all india summer monsoon
one of a a number of algorithms that perform simi-                         rainfall using error-back-propagation neural networks,”
larly [17]. The first three images of Figure 5 show a                      Meteorology and Atmospheric Physics, vol. 62, pp. 225–
case of significant AP/GC while the last three show a                      240, 1997.
                                                                    [9]    C. Marzban and G. Stumpf, “A neural network for tornado
significant precipitation event. Looking at these im-
                                                                           prediction,” J. App. Meteo., vol. 35, p. 617, 1996.
ages, it is possible to put the quantitative measures in            [10]   R. Cornelius, R. Gagon, and F. Pratte, “Optimization of
context. We see that a lot of good data is misclassified                   WSR-88D clutter processing and AP clutter mitigation,”
by the Radar Echo Classifier. At the same time, the                        tech. rep., Forecast Systems Laboratory, Apr. 1995.
neural network makes its mistakes on lower reflectiv-               [11]   M. Riedmiller and H. Braun, “A direct adaptive method for
                                                                           faster backpropagation learning: The RPROP algorithm,”
ity values, but gets higher reflectivity values (whether                   in Proc. IEEE Conf. on Neural Networks, 1993.
AP/GC or good data) correct more often. This is a                   [12]   C. Bishop, Neural Networks for Pattern Recognition. Ox-
consequence of how the network was trained.                                ford, 1995.
                                                                    [13]   A. Krogh and J. Hertz, “A simple weight decay can improve
                IV. Acknowledgements                                       generalization,” in Advances In Neural Information Pro-
                                                                           cessing Systems (S. H. Moody, J. and R. Lippman, eds.),
  Funding for this research was provided under                             vol. 4, pp. 950–957, Morgan Kaufmann, 1992.
NOAA-OU Cooperative Agreement NA17RJ1227,                           [14]   D. J. C. MacKay, “A practical Bayesian framework for
FAA Phased Array Research MOU, and the National                            backprop networks,” in Advances in Neural Information
                                                                           Processing Systems 4 (J. E. Moody, S. J. Hanson, and R. P.
Science Foundation Grants 9982299 and 0205628.                             Lippmann, eds.), pp. 839–846, 1992.
                                                                    [15]   T. Masters, Practical Neural Network Recipes in C++. San
                         References                                        Diego: Morgan Kaufmann, 1993.
[1]   R. Fulton, D. Breidenback, D. Miller, and T. O’Bannon,        [16]   K. Hondl, “Current and planned activities for the warning
      “The WSR-88D rainfall algorithm,” Weather and Forecast-              decision support system-integrated information (WDSS-
      ing, vol. 13, pp. 377–395, 1998.                                     II),” in 21st Conference on Severe Local Storms, (San An-
[2]   C. Kessinger, S. Ellis, and J. Van Andel, “The radar echo            tonio, TX), Amer. Meteo. Soc., 2002.
      classifier: A fuzzy logic algorithm for the WSR-88D,” in      [17]   M. Robinson, M. Steiner, D. Wolff, C. Kessinger, and
      19th IIPS Conference, (Long Beach, CA), Amer. Meteo.                 R. Fulton, “Radar data quality control: Evaluation of sev-
      Soc., 2003.                                                          eral algorithms based on accumulating rainfall statistics,”
[3]   G. Stumpf, C. Marzban, and E. Rasmussen, “The new                    in 30th International Conference on Radar Meteorology,
      NSSL mesocyclone detection algorithm: a paradigm shift               (Munich), pp. 274–276, Amer. Meteo. Soc., 7 2001.
      in the understanding of storm-scale circulation detection,”
      in 27th Conference on Radar Meteorology, 1995.
[4]   K. McGrath, T. Jones, and J. Snow, “Increasing the use-
      fulness of a mesocyclone climatology,” in 21st Conference
      on Severe Local Storms, (San Antonio, TX), Amer. Meteo.
      Soc., 2002.
[5]   V. Lakshmanan, R. Rabin, and V. DeBrunner, “Multi-
      scale storm identification and forecast,” J. Atmospheric
      Research, pp. 367–380, July 2003.
You can also read