Recognition of Traffic Lights in Live Video Streams on Mobile Devices

Page created by Samantha Cross
 
CONTINUE READING
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                             1

Recognition of Traffic Lights in Live Video Streams
                on Mobile Devices
                                                Jan Roters, Xiaoyi Jiang, and Kai Rothaus

   Abstract—A mobile computer vision system is presented that                  more mobile phones than inhabitants and almost every current
helps visually impaired pedestrians cross roads. The system                    mobile device has a built-in camera.
detects pedestrian lights in the environment and gives feedback                   Recently, mobile devices have attracted substantial attention
about the current phase of the crucial light. For this purpose
the live video stream of a mobile phone is analyzed in four                    in the computer vision and multimedia community and became
steps: localization, classification, video analysis, and time-based            an active research field [2]–[4]. Due to the increasing compu-
verification. In particular, the temporal analysis allows to alleviate         tational power and memory capacity more and more complex
the inherent problems such as occlusions (by vehicles), falsified              algorithms can run directly on mobile devices.
colors, etc. and to further increase the decision certainty over a                In this work we present a mobile vision system to detect
period of time. Due to the limited resources of mobile devices very
efficient and precise algorithms have to be developed to ensure the
                                                                               pedestrian lights in live video streams to help pedestrians
reliability and the interactivity of the system. A prototype system            with visual impairment cross roads. Thereby, we are faced
was implemented on a Nokia N95 mobile phone and tested in real                 with different challenges. Pedestrian lights are standing on the
environment. It was trained to detect German traffic lights. For               opposite site of the street in an unknown environment. There
the prototype training and testing, we generated image and video               can be more than one traffic light or perhaps there are even
databases including manually specified ground truth meta-data.
These databases described in this paper are publicly available
                                                                               distracting lights from surrounding objects. Moreover, general
for the research community. Quantitative performance analysis                  issues typical in real world applications, such as awkward light
is provided to demonstrate the reliability and interactivity of the            and weather conditions, are essential in our system as well.
prototype system.                                                              Due to the limited resources of mobile devices very efficient
                                                                               algorithms have to be developed. The traffic lights have to be
                                                                               recognized in low resolution and low quality video streams.
                          I. I NTRODUCTION
                                                                               For this purpose we define features of the traffic lights to
     IGHTLESS people are limited in mobility. Alone in
S    Germany the quantity of people with visual disabilities
increased about 50% from 1985 to 2007. It is thus more
                                                                               analyze single video frames to get the locations of all visible
                                                                               lights in the field of view. Afterwards, we classify them to
                                                                               identify the crucial light, i. e. the light that matters to the user.
important than ever before to develop assistance systems to                    To increase the detection performance we extend our approach
help visually impaired people participate in everyday life.                    to concurrent frames using video analysis on the live video
   In this work, a system for mobile devices is presented that                 stream. For these steps we take care of two main objectives:
helps people with visual impairment cross roads with nearby                       1) Interactivity: The system should perform fast so that
traffic lights. Since guide dogs are too expensive and pedes-                         the user gets the information within a short time if it is
trian lights are rarely equipped with acoustic or haptic signals,                     safe to pass the pedestrian crossing or not.
small mobile devices offer a cheap and handy alternative. Our                     2) Reliability: A false positive feedback of a green light
contacts to an organization of visually impaired people clearly                       (i.e. a red traffic light is shown but the user gets a
confirmed such a need, which initialized and motivated our                            positive feedback to walk) should be avoided in any
work.                                                                                 circumstances.
   Our research has been motivated by two aspects: (1) the                     As a proof of concept, a prototype system was developed on a
demand of cheap and easy-to-use assistance systems to help                     Nokia N95 mobile phone that is able to give the user feedback
visually impaired people participate in all day life and (2) the               within a few seconds in real field tests. We tested the prototype
possibilities of mobile vision, which are offered by modern                    in real environments, i. e. normal situations, different lightings
mobile computing devices equipped with cameras (e.g. smart                     (sunlight and dusk) and awkward weather conditions (rainfall
phones or PDAs with camera).                                                   and snowfall).
   Mobile phones are becoming ubiquitous [1]. According                           Several interesting works have been reported in the field
to the International Telecommunications Union, mobile sub-                     of mobile vision for supporting people with visual disabili-
scriptions raised from 1 billion in 2002 to approximately 4.6                  ties. For instance, Liu [5] presented a currency reader that
billion at the end of 2009. In the Western world there are                     can identify the value of U.S. paper currency. Wachenfeld
  J. Roters, X. Jiang, and K. Rothaus are with the Department of Mathematics   et al. [6] used a mobile phone to read barcodes and to
and Computer Science, University of Münster, Einsteinstrasse 62, 48149        obtain related additional product information from internet.
Münster, Germany. E-mail: {jan.roters;xjiang;kai.rothaus}@uni-muenster.de     A system for helping blind people choose clothes is presented
  Copyright (c) 2011 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be        in [7]. The prototype reported in [8] is a machine that will
obtained from the IEEE by sending an email to pubs-permissions@ieee.org.       read a document for a person with visual impairment and
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                          2

respond to voice commands for control. Many such works
have been reported at the Conference series on Computers
and Accessibility (ASSETS), Computers Helping People with
Special Needs (ICCHP), Human-Computer Interaction (HCI),
and other relevant conferences.
   An early system that detects traffic lights was presented
in 2004 by Aranda and Mares [9]. It used a portable PC
in a backpack featured with a digital camera and a pair of
auriculars. The mobile system ‘Crosswatch’ [10] helps pedes-          Fig. 1. Program usage: pedestrian standing at the traffic light pole holding
trians at traffic intersections with zebra crossings orientate        the mobile device in an upright position
themselves in the correct direction. In that work a prototype
was developed to run with interactive frame rates on a Nokia
camera phone. There also exist options for maintaining the            with when detecting pedestrian lights are discussed. Also, the
mobility of sightless people in an indoor environment. In             German pedestrian lights used in our design and field tests are
[11] a mobile navigation system is presented, which uses              specified.
special color markers to guide the user in a prepared indoor
environment. Traffic light detection is not only helpful for          A. Program Usage
pedestrians but also an important task for driver assistance             To use the program we assume that the pedestrian knows
systems. In [12] the traffic light detection is used for estimating   the path to walk. For instance these paths are trained by
crossroads to give the driver additional guidance information.        orientation and mobility specialists to improve skills to walk
Another vision-based traffic light detection system is presented      independently with a white cane and to get aware of the
in [13], where a static camera position is however assumed to         surroundings and their orientation.
simplify the detection approach.                                         At traffic intersections the orientation and mobility special-
   This paper is an extension to our previous work [14]               ists usually teach their clients locations of the traffic lights
of single frame analysis to video analysis and time-based             with the acoustic or haptic signals. Nevertheless, without those
verification. To our best knowledge, this work is the first           attached signals the pedestrian has to know the location of the
prototype of traffic light analysis reported in the literature,       pole of the traffic light at his side of the street. To use the
which works on the basis of a mobile phone and thus has               traffic light detection the pedestrian has to be in range of the
the potential of being used by visually impaired people.              traffic light pole (see Fig. 1).
In addition to the overall system architecture, we need to               To get the traffic into field of view the mobile device is
carefully design all system components to cope with the very          held in an upright position in an approximate direction of the
limited resources of current mobile phones on the one hand            traffic light on the other side of the street. Since the user only
and to achieve sufficient performance in both accuracy and            knows an approximate direction the mobile device may be
time on the other hand. In particular, the video analysis and         panned slowly a few degrees left or right until the device tells
time-based verification introduced in this paper substantially        the user that a traffic light has been detected. Furthermore,
improves the performance of traffic light identification.             the user may take a few steps left or right to get another
   The remainder of this paper is organized as follows. In            perspective.
Section II, we concretize the external restrictions and specify          An example of the program usage is shown in
the challenges and problems of real world conditions. Fur-            a demonstration video on the authors’ website at
thermore, we define the design of traffic lights that should          http://cvpr.uni-muenster.de/research/pedestrianlights.
be detected. The system architecture is presented in Section
III. We discuss the used mobile device of the prototype and
give further information about the image/video databases,             B. Real World Conditions
which we generated for training and testing purpose and made
                                                                         The development of a mobile vision system to detect pedes-
publicly available. Thereafter, the four steps of the algorithm
                                                                      trian lights very accurately is challenging due to several real
are described: localization (Sec. IV) and classification (Sec. V)
                                                                      world conditions. The chosen mobile capture device limits the
on single frames of the video stream, the extension to video
                                                                      possibilities of computer vision algorithms in several aspects:
analysis (Sec. VI), and time-based verification (Sec. VII).
In Section VIII we give example results and a quantitative               1) The resolution of the capture device is relatively low.
performance assessment of our traffic light detection approach.          2) Mobile devices often provide only poor image quality,
Possible extensions are outlined and conclusions are given in                e. g. falsified colors and unsharpened images due to
Section IX.                                                                  automatic white balance and auto focus.
                                                                         3) Computation power and memory resource are re-
                                                                             stricted.
                II. P ROBLEM S PECIFICATION                           Not only the capture device, but also the objects to be captured
   In this section we discuss how to use the program on               impose restrictions by design and location:
the mobile device to get traffic lights into field of view.              4) Pedestrian lights have different appearances in differ-
Furthermore, the problems and the restrictions one is faced                  ent countries and even for different manufactures.
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                       3

           (a)                       (b)                        (c)                      (d)                         (e)                         (f)
Fig. 2. Challenges of detecting traffic lights in images: (a) minimal distance, (b) maximal distance, (c+d) two traffic lights, (e) occlusion, and (f) rotation
(from [14])

           (a)                         (b)                          (c)                                   (a)            (b)            (c)
Fig. 3.   Difficult illumination: (a) dusk, (b) frontlighting, and (c) night     Fig. 4. Examples of pedestrian light design in Germany: (a) two lights,
                                                                                 (b) three lights, (c) three lights and an additional ”please wait” sign

   5) The distance to the pedestrian lights could vary between
                                                                                 GPS signal of the mobile phone. The necessary adaptations
      approximately 4 and 24 meters. Therefore, the scale of
                                                                                 will be discussed in Section IX.
      a traffic light in an image is very small (see Fig. 2(a)
                                                                                    Our prototype system was trained to detect pedestrian
      and (b)).
                                                                                 lights that occur in most German cities (see Fig. 4). For the
   6) There may be many traffic lights in the image but only
                                                                                 remainder of this paper the following features of a pedestrian
      one is crucial (see Fig. 2(c) and (d)).
                                                                                 light are assumed to be valid preconditions:
Sight and light conditions may complicate the traffic light                         1) Shape: rectangular with aspect ratio of 1/2, 1/3, or 1/4.
detection:                                                                          2) Color arrangement: at the bottom there is one green
   7) Traffic lights can be temporarily occluded by vehicles                            light, at the top/middle there are one or two red lights.
      (see Fig. 2(e)).                                                                  At the top there is an optional blinking white light, but
   8) Traffic lights could be hardly visible in bad weather                             in our approach we ignore this light.
      situations like fog, heavy rain or snowfall.                                  3) Circuitry: either red or green light is switched on.
   9) The illumination condition varies between night and                           4) Background: the majority of the traffic light is dark.
      daylight. Thus, the captured colors of one traffic light                      5) Design: possible shapes of the green or red lights are
      depend on the capture time (see Fig. 3).                                          limited.
Finally, the user of the system could hold the mobile capture                       6) Installation: mounted at a vertical pole at a height of
device in an unfavorable position:                                                      approximately 2.15 meters and a distance between 4 and
  10) The image could have been captured with a non-                                    24 meters.
      neglected rotation (see Fig. 2(f)).
                                                                                                  III. S YSTEM A RCHITECTURE
   Problems related to camera failure and awkward ambient
light situations will not be discussed because it would go                          Our traffic light detection pipeline (see Fig. 5) consists of
beyond the scope of this work. Furthermore, awkward weather                      two concurrent steps and an additional step that combines
conditions are excluded.                                                         them. In the first concurrent step we identify the crucial pedes-
                                                                                 trian light in the field of view in the recent frame of the video
                                                                                 stream. It consists of localization followed by classification. In
C. Specification of Pedestrian Lights in Germany                                 the localization step we try to filter all the image regions out
   Due to the different appearances of pedestrian lights in                      that may contain traffic lights. The classification step decides
different countries or, even to some extent, cities, we restrict                 which regions contain pedestrian lights and which light is
our system to detect one chosen pedestrian light design. It                      crucial to the user.
should be possible to adapt to other designs and choose the                         The video analysis concurrently computes the crucial light
correct pedestrian light recognition system according to the                     location independent from the traffic light identification. For
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                     4

     Captured                             Identification
      Frames
                                               Localization                       Classification                              Time-Based
                      t0         ti                                                                                           Verification
          ...

                                                 Sec. IV                                Sec. V

                      t i-1                                                                                                                 “Green”
                                                             Video Analysis
                              t i-1,t i
                       ti
                                                                   Sec. VI                                                       Sec. VII

Fig. 5. Overview of the traffic light detection pipeline. On the left the input frames of the live video stream are presented. In the middle the single frame
analysis (top) and the extension to video analysis (bottom) are shown. At the right both results are compared and a feedback is generated.

this purpose the location of the crucial light in the previous                  B. Pedestrian Light Databases
frame is used to track its location in the most recent frame.                      We have built up two databases for the training and testing
   Time-based verification helps us improve the pedestrian’s                    purpose. One is holding images and the other is holding video
safety. Since we expect the crucial light locations computed                    sequences. Both contain pedestrian crossings with traffic lights
by different approaches to be similar this verification step                    and were captured from positions where pedestrians have to
compares the results of the concurrent steps. After a short                     wait for a green signal. Both databases are publicly avail-
period of successful comparisons a feedback for the user is                     able (at http://cvpr.uni-muenster.de/research/pedestrianlights)
generated.                                                                      for the research community.
                                                                                   A ground truth segmentation was made manually, storing
                                                                                all visible pedestrian lights. Furthermore, the crucial light is
A. Prototype System                                                             marked and the phases of the traffic lights (red or green) are
                                                                                given. In Table I the statistics of the databases are presented.
   As a proof of concept a prototype system was developed for
                                                                                The total number of images is shown, which is divided into
a Nokia N95 mobile phone. In the community of blind people
                                                                                the number of images with a crucial red and green light,
Nokia mobile phones are very common due to the large variety
                                                                                respectively, and the images without a crucial light.
of available software, e. g. screen readers, mobile reading and
                                                                                   The number of images without a crucial light is composed
shopping assistants.
                                                                                of the images without any traffic light and the images with at
   The N95 is equipped with a 330MHZ ARM processor and
                                                                                least one traffic light, but without a crucial one.
18Mb of available RAM. A built-in autofocus camera takes
                                                                                   Furthermore, in each database there are images with a dan-
photographs with up to 5MP. This device offers three capture
                                                                                gerous constellation, i. e. a crucial red light and an additional
modes:
                                                                                green light.
   1) Take photographs (up to 2582 × 1944) automatically                           The video database is made up of 14 image sequences.
      when the previous one is finished.                                        Each sequence represents a video stream with approximately
   2) Use the video stream with a resolution up to 640 × 480.                   8 frames per second and between 99 and 853 images.
   3) Take the viewfinder video stream with 320 × 240
      resolution. It is the stream that is shown on the display                          IV. L OCALIZATION OF P EDESTRIAN L IGHTS
      while recording videos or taking pictures.                                   The localization approach presented in this section can be
Due to the preparation time between two photographs, mode 1)                    considered as filter and refinement operations on single frames
cannot provide an interactive facility, even with a low capture                 of the video stream. As mentioned before, traffic lights have
resolution. The video stream 2) is encoded in YUV 420 planar                    specific features (i. e. shape, arrangement, circuitry, design,
format, which has the major drawback that only every fourth                     background, installation). All these features could be used in
pixel contains the correct color value (luminance). As we will                  a special filter algorithm to localize traffic light candidates.
see later we are in need of the correct color values during                        Although a parallel combination scheme of the used filters
the localization (Sec. IV-A). Our work is directly based on                     can achieve a high accurate recognition rate, i. e. high relia-
the detection of the red and green traffic light colors. Thus,                  bility, the computational power usage would be too much to
the RGB color model is more intuitive and we use the video                      ensure interactivity. Note that it is much faster to verify if
stream of the viewfinder 3), which provides this RGB data.                      a feature is valid for a specific candidate than to inspect all
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                  5

                            TABLE I
 G ROUND TRUTH STATISTICS OF THE IMAGE AND THE VIDEO DATABASE         itself). Another is located along the red color and the rest of
                                                                      the samples is introduced by noise. So we estimate a Gaussian
                                      image      video                mixture model in 3D with four contributions: black cluster,
                        #            database   database
                  images (total)       501        5635                gray cluster, red cluster, and noise cluster (see Fig. 7(b)). Since
                images with red                                       the most significant colors to detect red lights should be the
                   crucial light       309       3822                 red color, we only keep the Gaussian distribution of the red
              images with green
                   crucial light       184       1675
                                                                      cluster (see Fig. 7(c)).
                 images without                                           The green color samples (see Fig. 7(d)) are distributed
                  a crucial light       8         138                 in three significant portions. Similar to the red distribution
               images without a                                       we estimate a Gaussian mixture model in 3D with three
                crucial light but       5         20
              with another light
                                                                      contributions. One cluster is near the gray axis of the RGB-
                 images without                                       cube and another cluster contains values with low intensities
                any traffic light       3         118                 (see Fig. 7(e)). Only the remaining cluster contains the green
            images with dangerous                                     colors that occur in the lamps of the traffic lights. Thus, only
                  constellations        9         127
                                                                      this cluster of the Gaussian distribution is kept for the green
            images with more than
                one traffic light      165       4262                 light (see Fig. 7(f)).
                red lights (total)     424       6891                     (2) Design the filter rules: Here we only discuss the color
              green lights (total)     244       2888                 filter for the red traffic lights; similar filter rules apply for
                                                                      the green traffic lights. The Gaussian distribution of the red
                                                                      cluster is defined by its mean color µ = (0.48, 0.06, 0.07)
possible image regions according to the special feature. In this      and the three eigenvectors v1 , v2 , and v3 corresponding to the
section we thus present an approach to localizing possible traf-      eigenvalues λ1 = 0.0590, λ2 = 0.0032, λ3 = 0.0005.
fic lights in low resolution images in a sequential architecture          A color c = (r, g, b) is considered as red traffic light color
(see Fig. 6). This architecture provides interactivity, but also      if and only if the following three rules are fulfilled:
a high reliability. Furthermore, it is robust against the scale of                 Ired (c) := c · v1   ≥   thred,1                  (1)
traffic lights and also against rotation (to some degree).
                                                                                        (c − µ) · v2    ≤   thred,2 · Ired (c)       (2)
   As a first step of our localization procedure a red and a green
color filter are used (Sec. IV-A). After a connected component                        |(c − µ) · v3 |   ≤   thred,3                  (3)
analysis we compute the size and the circuitry to reduce false        It means that the red intensity Ired , which is the distribution
positives (Sec. IV-B). In Section IV-C we explain the next            along the dominant axis, should be lower bounded (Eq. (1)).
step: examination of the background color. The optional last          Furthermore, the distance to the red intensity axis along v2
step is a shape-based segmentation of the pedestrian light (see       should be limited toward the gray diagonal (Eq. (2)). The third
Sec. IV-D).                                                           rule is motivated by the observation that the distribution along
   At the end of this section we optimize parameters of the           v3 is very tight. More precisely, the distance of c along this
traffic light localization (see Sec. IV-E) and investigate the        direction is thresholded (Eq. (3)).
rotational robustness (see Sec. IV-F).                                   The resulting red traffic light region in the RGB-cube is
                                                                      wedge-shaped with missing apex. In Figure 7 examples are
A. Red and Green Color Filter                                         shown for the red 7(c) and the green 7(f) color clusters with
                                                                      thresholds th1 = 0.20, th2 = 0.25, and th3 = 0.07.
   The most significant feature of traffic lights is the bright
                                                                         (3) Optimize parameters: The image database was divided
color of the lamps. Due to the increased use of LED lights in
                                                                      in two disjoint sets, the training and the validation set. To
traffic lights the color is very specific. In this step we search
                                                                      optimize the parameters we apply the whole localization
for such colors in the region of interest, i. e. the limited region
                                                                      approach on the training data with different parameter settings
when the vertical line filter is applied or otherwise the whole
                                                                      and take the best (see Sec. IV-E).
image. Therefore, the color of each pixel is checked to fulfill
                                                                         The responds of the color filters are represented by a binary
some filter rules. We use the RGB color space, since this is the
                                                                      image where 1 corresponds to a positive filter result and 0 to
default color space on most mobile devices and a conversion
                                                                      a pixel, which is not part of a traffic light lamp according to
to another color space is time-consuming.
                                                                      its color. As a post-procession step, we apply a morphological
   Figure 7 shows a plot of red (a) and green (d) traffic light
                                                                      closing and compute the connected components.
colors, which are extracted from the ground truth. In the
following we explain how to establish the color filters for
the traffic lights based on the extracted colors in three steps:      B. Segmentation using Size and Circuitry
(1) analyze the color distribution of ground truth, (2) design           During the last step we have identified pixels that have the
fast and valuable parameterized filter rules, (3) optimize the        desired color to be part of a traffic light lamp. These pixels
parameters.                                                           are already grouped to connected components.
   (1) Analyze the data: One portion of the red color samples            We assume that the crucial traffic light is between 4 and 24
in Figure 7(a) is distributed along the gray axis of the RGB-         meters away (see Sec. II-B). In our setting with a small and
cube (one cluster near black and one cluster along the axis           fixed focal length of the mobile cameras this range corresponds
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                         6

            Image                              Connected                              Light Spot                    Traffic Light Locations
                                               Components                             Candidates                          and Regions

                           Sec. IV A                              Sec. IV B                              Sec. IV C
                    Red / Green Color Filter                Size / Circuitry Filter                Background Color Filter

Fig. 6. Sequential combination scheme for localization from left to right: (1) input color image, (2) color filter response in green and red, resp., (3) color
regions after pruning, (4) dark filter response in black, search region in blue, initial bounding boxes in light blue, (5) localized traffic lights.

                                        (a)                                       (b)                                        (c)

                                        (d)                                       (e)                                        (f)

Fig. 7. Red (a) and green (d) traffic light colors from ground truth. Clustering of the red (b) and green (e) samples visualized by the mean colors of the
respective cluster. Complete filter for red (c) and green (f) colors.

to a width of the traffic light between 2.5 and 15 pixels. Due                          C. Background Color Filter
to the known possible aspect ratios of 1/2, 1/3, or 1/4 (see                               The result of the last step are connected components of
Sec. II-C) we also know the possible corresponding heights.                             adequate sizes and colors. We know that the green lamp under
These parameters can be utilized to filter out regions that are                         a red light is switched off and vice versa. This fact enables
too small or too huge by thresholding the size of the connected                         us to implement a background filter, which inspects the image
components.                                                                             region under a red light candidate and above a green light.
                                                                                        In our system we defined the search region to have the same
   Due to reason of circuitry we know that exclusively the                              size as the connected component it belongs to (half height if
red or the green light is switched on. Connected components                             two red components were merged). If there are no dark pixels
featuring red and green pixels cannot be part of a valid traffic                        within this appropriate search region, it allows us to refuse
light. Furthermore, vertical neighbored connected components                            this candidate.
of different colors represent dangerous constellations. Thus,                              In our implementation this filter is simply defined as
all such candidates are refused.
                                                                                                   I(p) ≤ thred, dark   or resp.      I(p) ≤ thgreen, dark   (4)
   As a post-processing step we melt two red connected                                  where I(p) = (R(p) + G(p) + B(p)) /3 is the intensity of
components that are vertically neighbored, since a red light                            the pixel p. Furthermore, thred, dark and thgreen, dark are darkness
may consist of two lamps. The size of the melted components                             thresholds. The result of this step is a so-called initial bounding
are again checked against the size constraint.                                          box. It is a box around each traffic light candidate. The
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                                                    7

candidate is given by the connected components of the color                          100                                                             100

sample and the search region of the background color filter.                          80                                                              80

                                                                                      60                                                              60

                                                                       Recall in %

                                                                                                                                       Recall in %
D. Shape-Based Segmentation
                                                                                      40                                                              40

   We have already localized possible traffic light candidates,
                                                                                      20                                                              20
by their lamp color, size, arrangement and background color.
In this last step we aim to segment the traffic lights according                       0       20     40            60      80   100                   0   20   40            60   80    100
                                                                                                       Precision in %                                            Precision in %
to their rectangular shapes. Firstly, we assume that the rotation
                                                                                            (a)                                (b)
angle of the capture is fairly low (about ±10o ). A traffic light
                                                                        Fig. 8. Recall and precision for the localization of (a) red and (b) green
region should fulfill the following constraints:                        traffic lights (from [14])
   1) Traffic light and background are contained.
   2) Aspect ratio is between 1/4 and 1/2.
   3) Many pixels (e.g. 80%) are either light or background.               In the following we optimize the parameter groups for red
   4) Width of the region lies between 2 to 15 pixels.                  and subsequently for green traffic lights using the training
To ease the computation we consider axis-parallel rectangular           set of our ground truth database. Finally, we validate these
regions only. The task can be modeled by an optimization, like:         optimizations on the validation set.
Find the region of maximal size, which fulfills all constraints.           1) Optimize Parameters for Red Traffic Lights: The missing
   This optimization is however time-consuming, since many              of a red sign could cause serious problems. So our optimiza-
possible regions have to be considered for each traffic light.          tion criterion is to maximize the precision with a bounded miss
Therefore, in our implementation we use a fast but subopti-             rate. Fig. 8(a) shows the performance of the investigated red
mal region growing approach. The initial bounding box (see              parameter settings. We claim a recall1
Sec. IV-C) is first simultaneously expanded to the left and the                                                          R = TP/(TP + FN )                                              (5)
right. We stop, if the left or right border consists of too many
non-background pixel. After computing the vertical boundary,            of at least 75% and choose the setting with the best precision2
we apply an analogous technique to find the top and the bottom
                                                                                                                         P = TP/(TP + FP ).                                             (6)
of the traffic light.
   Even using a suboptimal but fast optimization strategy, this         The result of our optimization are the parameters thred,1 = 0.3,
last step decreases the performance so that an interactive              thred,2 = 0.15, thred,3 = 0.028, thred, dark = 0.19. With a recall
application is impossible on our hardware. Furthermore, the             of 76.0% a precision of 89.5% is achieved. This optimized
computation of the borders is somehow non-robust. Since the             performance is visualized as a black asterisk in the Fig. 8(a).
profit of this segmentation is negligible compared to the com-              2) Optimize Parameters for Green Traffic Lights: The op-
putational costs, we abandon the segmentation step. In future           timization of the green parameter set depends on a bounded
settings the segmentation might be profitable. For instance we          precision. The precision equals 100% if and only if we have
need a segmented region for a model-based verification [14].            detected no false green light. We allow at most 1.5% FP
Therefore, we keep the segmentation as optional step in our             (i.e. P ≥ 98.5%) and choose the parameter vector yielding the
localization pipeline.                                                  best recall. Fig. 8(b) shows the performance of the investigated
                                                                        green parameter settings. The best thresholds of the green
E. Parameter Optimization of Traffic Light Localization                 filter are: thgreen,1 = 0.2, thgreen,2 = 0.15, thgreen,3 = 0.05,
                                                                        thgreen,dark = 0.19. With these parameters we achieve a recall
   In this section the optimization of the parameters of our
                                                                        of about 85.0% (see black asterisk in Fig. 8(b)).
localization approach is discussed. Our traffic light detection
                                                                            3) Validate the Localization Results: As mentioned before,
algorithm depends on eight main parameters, four color pa-
                                                                        the validation set consists of 201 images, which are not used
rameters in each case (red and green light, resp.). These two
                                                                        during the parameter optimization. We validate the localization
parameter groups are optimized separately. In our experiments
                                                                        approach with the optimized parameters on all visible traffic
we subsample each parameter into 10 steps, getting 104 differ-
                                                                        lights in the images of the validation set. For all red lights of
ent parameter settings for each color. With our ground truth,
                                                                        the validation set we achieved a recall of R = 71.8% and a
we measure the quality of the setting by counting the number
                                                                        precision of P = 87%. For the green traffic lights a recall of
of correctly detected traffic lights (TP ), falsely detected traffic
                                                                        R = 83.3% and a precision of P = 92.6% were achieved.
lights (FP ), and missed traffic lights (FN ). A traffic light is
                                                                        The true positives, false positive and false negatives are listed
detected correctly if the initial bounding box is completely
                                                                        in Table II.
in the segmented bounding box from ground truth. For this
                                                                            Overall, false negative and false positive detections occur
comparison the segmented box from ground truth is extended
                                                                        for 90 of the 267 traffic lights in the validation set of 201
by 2 pixels in each direction to prevent small deviations.
                                                                        images, which is equal to an error of 33.7%. This error seems
   We have divided the image database into two disjoint
                                                                        to be very high. It is mostly caused by very small, undetected
sets. The first set (300 images) is used for training. With
the remaining 201 images we verify the performance of our                            1 also   called true positive rate
approach.                                                                            2 also   called positive predictive rate or occasionally, detection rate
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                                                          8

                              TABLE II
   T RUE POSITIVES , FALSE POSITIVES AND FALSE NEGATIVES OF THE                                                                       Traffic Light                      Crucial Traffic
  LOCALIZATION STEP FOR 177 RED AND 90 GREEN TRAFFIC LIGHTS OF                                                                         Locations                             Light
           THE VALIDATION SET CONSISTING OF 201 IMAGES
                                                                                                                                                         Sec. V A
                                                                                     red           green                                              Selection Filter
                                                            true positive            127            75
                                                           false positive            19              6
                                                           false negative            50             15                 Fig. 10. Sequential combination scheme for classification. Selection of the
                                                                                                                       crucial traffic light in the image
                                                     100
                 Count of True Positive Detections

                                                      80

                                                      60                                                                  In this section we describe how to select the traffic light
                                                      40                                                               that is crucial for the pedestrian (Sec. V-A). Furthermore,
                                                      20
                                                                                                                       the performance of the identification approach is presented in
                                                       0
                                                                                                                       Section V-B. At the end of this section some example results
                                                        -60 -50 -40 -30 -20 -10 0        10 20     30   40   50   60
                                                                        Rotation Angle in Degree                       are shown and discussed (Sec. V-C).
Fig. 9. Rotational robustness of the localization approach for red traffic lights
(similar for green lights)
                                                                                                                       A. Selection of the Crucial Light
                                                                                                                          By reason of perspective, the important traffic light should
traffic lights in the background, which were mostly not the                                                            be the biggest and highest of all traffic lights in the image.
crucial light. To obtain a deeper insight we investigated the                                                          These two simple criterion are used to select the crucial traffic
crucial light detection error ratio. The error of missing the                                                          light. More precisely, we report a traffic light candidate TLCi
crucial red traffic lights is about 3.8% (5 lights missed from                                                         as crucial if all of the following constraints are true:
132 crucial lights). In comparison 8 from 66 crucial green
                                                                                                                         •   TLCi is the broadest traffic light
lights have been missed (approx. 12.1%). Consequently, the
                                                                                                                         •   TLCi has got the smallest distance from the top of the
error of missing the crucial traffic light is considerably lower.
                                                                                                                             image
                                                                                                                         •   No other traffic light has a distance from the top of the
F. Rotational Robustness                                                                                                     image similar to TLCi
   Experiments on the rotational robustness showed that a                                                                 For the third point we report two traffic lights to have a
rotational angle of ±10o only slightly affects the performance                                                         similar distance from the top of the image if the difference is
of our approach (see Fig. 9). For these tests we have rotated the                                                      less than 10 pixels.
images in both directions with linear subsampling. We report                                                              The color of such a traffic light TLCi is obvious since the
the angular range in which the result remains stable.                                                                  region contains exactly one type of traffic light color, either
   Including all images (training and validation set) in this                                                          red or green. In the case that there exists no TLCi for which
test scenario, we can identify 328 (i. e. 77.4%) of the red and                                                        all constraints are fulfilled, we have found no pedestrian light.
206 (i. e. 84.4%) of the green traffic lights with no rotation.                                                        There could be different failures. The catastrophic error is that
If the images are rotated by maximal ±10o , we recognized                                                              a green light is reported during a red phase. Reporting no
254 red and 180 green traffic lights. This means that the                                                              traffic light or a false red report are errors that abridge the
localization remains stable for 77.4% red and 87.4% green                                                              convenience but do not affect the user’s safety.
lights in comparison to the case with no rotation. There are
several reasons why rotation affects the localization result:
   1) The search region of the background color filter (see                                                            B. Performance of Classification
       Sec. IV-C) contains more (bright) pixels that do not                                                               In Section IV-E we optimized the parameters of the lo-
       belong to the traffic light region. This situation appears                                                      calization based on the recall and precision. In this section
       most when the traffic lights are far away and the search                                                        the performance of identifying the crucial pedestrian light is
       region is small.                                                                                                presented on the training set.
   2) When two red components were merged, the width                                                                      The performance for detecting the crucial traffic light is
       grows by image rotation, so that the size filter (see                                                           presented in Fig. 11 using ROC-curves. Here, the true positive
       Sec. IV-B) may refuse candidates.                                                                               rate is plotted against the number of false positives. Further-
                                                                                                                       more, the standard deviation is visualized by the vertical lines.
         V. C LASSIFICATION OF P EDESTRIAN L IGHTS                                                                     Our optimized parameter setting (the black asterisk) leads to
   The localization procedure (Sec. IV) results in a set of traffic                                                    a stable recognition of the crucial traffic light. As desired the
light candidates TLC1 , . . . , TLCk . In this section we discuss                                                      number of false positives is very small in the case of green
how to select the correct candidate (see Fig. 10) of the current                                                       light detection. We report in 2 cases a wrong crucial green light
frame of the video stream. The features we could use are the                                                           (precision of 98.1%) and keep a recall (i.e. true positive rate)
position and size of the traffic light candidate in the image. If                                                      of 86.3%. The performance of the red traffic light detection is
the segmentation step of the localization pipeline is left out,                                                        similar: We classify in 4 cases false red traffic lights (precision
we use the initial bounding box as segmentation.                                                                       of 97.4%) and achieve a recall of 86.3%.
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                     9

                                                  100                                                          100

                                                  80                                                            80

                                                                                          True Positive in %
                             True Positive in %
                                                  60                                                            60

                                                  40                                                            40

                                                  20                                                            20

                                                   0    20   40           60   80   100                          0   10   20           30   40   50
                                                              False Positive                                               False Positive

                                                    (a)                                                  (b)
Fig. 11. ROC-curve for detecting the crucial (a) red and (b) green traffic light. The light gray markers represent the performance of each parameter set, the
black line the mean values and the vertical gray lines the standard deviations. Black asterisk is the optimized parameter set.

C. Results of Traffic Light Identification                                                       vehicles. After a few moments these vehicles will have
   Our validation set consists of 201 images, which are not                                      passed the crossing and the detection could repeat.
used during the parameter optimization. We fixed the param-                                  2) Falsified Colors
eters and applied the approach on this validation set. For red                                   In some situations the automatic illumination correction
traffic lights we yield a precision of 96.5% and a recall of                                     falsifies the traffic light colors. By moving the camera
83.3%. The precision for green traffic lights is 98.3% and the                                   and repeating the traffic light identification approach a
recall is 90.8%. We report 5 wrong crucial traffic lights and                                    result may be given. Even slight movements give the
falsely report no traffic light in 28 of the verification images.                                camera the chance to readjust the automated camera
This corresponds to an overall miss rate of 16.4%.                                               settings, like white balance and exposure.
   Fig. 12 depicts some results produced with our approach                                   3) Contradictory Scene
of traffic light identification. Thereby, we put a white frame                                   Two traffic lights close to each other may be contradic-
around all traffic light candidates and an additional blue frame                                 tory (see Fig. 2(d)). In such situations no feedback can
around the reported crucial one. In the first two results (a-b)                                  be given, since a feedback could be very dangerous. By
perfect recognitions are presented, even in dark illumination                                    changing the perspective the scene may be resolved so
conditions (a) or bright traffic light color (b).                                                that a decision is possible.
   However, there are still some limitations, which we present                               4) Repeating Results
in Figure 12(c-d). If traffic lamps are captured with low                                        In a video stream the same identification result may
saturation (c) the traffic light could be missed. Sometimes the                                  repeat. If this happens a few times successively it will
scene is contradictory (d).                                                                      increase the certainty that it is correct.
   Sometimes noisy objects are detected as traffic light candi-                              To make use of the video stream we want to track the crucial
dates (see Fig. 12(e-f)). Objects on trees (e) could be identified                        light between consecutive frames to improve the performance
as traffic light candidates. Such situations are much more                                of the system. The tracking could be used in two different
difficult, since the objects may be placed above the crucial                              ways: (1) Track the crucial traffic light in the following
traffic light. A template matching could decrease such false                              frames to save computation time of the re-localization and
positives. Currently, template matching is not integrated in                              re-classification. (2) Apply the localization and classification
our system. Another situation in which an additional template                             in every frame, in addition also track the crucial light be-
matching step could be helpful are transversely mounted street                            tween two consecutive frames. Afterwards, compare the two
traffic lights (see Fig. 12(f)).                                                          determined positions of the traffic light. Whereas the first
   Some problems (e.g. (d) and (f)) are introduced by a                                   approach improves the interactivity, the second improves the
poor perspective angle and can be corrected by changing the                               reliability. In our system we choose way (2), the time-based
viewpoint. This is shown in (g-h). In the next section we                                 verification, since false positive detections should be avoided
discuss an extension to the video stream, which among others                              in any circumstances.
reduces the effect of poor perspective.                                                      In our setting the distance between the mobile device and
                                                                                          the traffic light is at least 4 meters. Since we assume that the
                                                                                          user does not change his position very fast and the rotation
                        VI. V IDEO A NALYSIS
                                                                                          angle is at most ±10◦ , we can neglect the 3D perspective view
   The identification of the crucial traffic light in single images                       changes, scaling and rotation in the tracking approach. Thus,
was described in Section IV and V. The traffic lights in the                              we only have to deal with translation between two frames and
image were localized and thereafter, the crucial light was                                we can use motion estimation algorithms to get the position
selected. In this section the traffic light detection is extended                         of the crucial light in a new frame.
from single images to video streams due to the following                                     For the remainder of this section the objective is to estimate
reasons:                                                                                  a motion vector which defines the translation between two
   1) Temporary Occlusion                                                                 consecutive frames in the proximity of the crucial light. With
      Objects that occlude the crucial traffic light are big                              this vector the location of the crucial light in the new frame is
Recognition of Traffic Lights in Live Video Streams on Mobile Devices
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                                              10

               (a)                                         (b)                                          (c)                                         (d)

                (e)                                          (f)                                         (g)                                         (h)
Fig. 12. Results of the localization and classification. The found traffic lights are marked with a white border. An additional blue border marks the crucial
light. (a-b) perfect result, the crucial traffic light was located and classified correctly. (c) no traffic light reported due to failure of localization. (d) decision
could not be made due to classification. (e-f) noisy objects. Change of perspective with different result between (g) and (h). More results are presented in [14].

determined easily. Approaches to estimating the motion vector                        were set with the following trade-off. We want to have between
using phase correlation [15] or more complex methods like the                        5 and 10 frames per second to compute the whole approach.
determination of optical flow [16] cannot be used interactively                      Furthermore, the tracking should be subjectively stable. The
on mobile devices due to the need of high computational                              final parameter set was acquired in live field tests of our
burden. To estimate the image difference between two frames                          prototype system: The size of the small areas around the
we thus compute feature points in the first frame around                             points is 5 × 5. We search for the 5 best feature points and
the crucial light location and search corresponding points                           search in a small radius of 30 pixels in each direction for the
in the following frame. An applicable algorithm for motion                           matched position. The displacement vectors are the difference
estimation on hand-held devices was presented in [17], in                            between the old position and the new one. These displacement
which a multi-resolution scheme is used to search features                           vectors are combined to one single displacement vector that
in the image. However, due to the fact that in our system                            describes the image translation. The resulting vector is the
another complex algorithm (for traffic light localization and                        mean vector of all similar displacement vectors if and only if
classification) has to work on each frame, an even faster                            at least 3 displacement vectors nearly have the same values,
approach is needed. For our purpose we use the KLT tracker                           i. e. a maximum Euclidean difference of 4. Otherwise, if less
[18] due to the fact that it detects features that are good to                       than 3 of those vectors have been found the motion estimation
track. We reduced the computational time usage of the tracker                        has failed.
by only searching for good features in a small area around the                           With the presented approached and the given thresholds
crucial traffic light candidate (30 pixels in each direction). To                    a stable motion estimation approach was realized. After this
match the feature points we define the features as the small                         computation we get the location of the crucial traffic light in
fixed-size areas around the points. These features are searched                      the recent frame independent of the traffic light identification
in the second frame in a specified radius around the initial                         step (see Fig. 5).
position of the feature point. We correlate the features by using
the sum of absolute differences.                                                                      VII. T IME -BASED V ERIFICATION
  In our setting we use several thresholds. To estimate good                            In this section we want to verify the results of the concurrent
parameter values we tested different settings. These thresholds                      steps of our traffic light detection system (see Section IV, V
were not optimized by using the ground truth meta-data, but                          and VI). Thereby, the main focus is the reduction of false
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                11

                                                                                                TABLE III
positive detections. For this purpose we introduce the state            R ECALL AND PRECISION OF THE WHOLE SYSTEM FOR THE VIDEO
queue, which allows a verification over time.                                                        DATABASE
   We have to combine two results: (1) the identified crucial
                                                                                                        red     green
pedestrian light from the localization (Sec. IV) and classi-                               recall      52.4%    55.3%
fication (Sec. V) step; (2) the tracked traffic light location                           precision     100%     100%
from video analysis (Sec. VI). Based on our observations we
suppose that the locations match if their distance is less than
5 pixels. Four scenarios are possible:                                                       VIII. R ESULTS
   1) Traffic light identification and video analysis are suc-          In this section we want to present results of our traffic light
       cessful and the location of the crucial light from identi-    recognition system. In particular, we discuss the results of the
       fication step and the estimated one of the video analysis     traffic light identification on single images in comparison to
       match.                                                        the additional video analysis and time-based verification.
   2) Traffic light identification and video analysis are suc-          The state queue for the presented results was configured
       cessful, but the locations differ.                            with SQsize = 10 and SQmin = 5 due to the following reasons.
   3) Video analysis succeeds but traffic light identification       On the one hand, a feedback should be given within one
       step fails (i. e. localization or classification).            second (with optimal detection) due to the fact that we want
   4) Video analysis fails (i. e. motion could not be estimated).    to compute at least 5 frames per second. On the other hand,
   These scenarios are mapped to the state queue in the              a larger size SQsize of the state queue may store outdated
following way. Case 1) is the only positive. They are mapped         feedbacks. With SQsize = 10 and the minimum 5 frames per
to a red or green state, dependent on the current traffic light      second the given feedback is (maximal) one second old.
phase of the recent frame.                                              Example results of the whole traffic light detection approach
   Although the traffic light identification in the recent frame     with applied video analysis and time-based verification are
fails, case 3) is not critical due to the fact that the motion       shown in Figure 13, 14 and 15. For visualizing the results the
estimation is successful and the traffic light could possibly be     dark box in the images shows the crucial traffic light that was
verified in the following frames. These results are represented      detected in the localization and classification approach. The
by a black state.                                                    blue box represents the result of motion estimation. Below
   The remaining two cases are critical and mapped to a              each image the state queue is shown with its color states and
blue state. In terms of case 4) it is impossible to verify           the feedback appears in the bottom area.
the identification result, due to the failed motion estimation,         In the introduction of the paper we declared our two main
which represents the basis of our verification approach. Thus,       objectives: reliability and interactivity. In the following we
we are not sure enough if the detected pedestrian light is           discuss the results with regard to these two objectives. A
the crucial one. If case 2) occurs, the identification or the        short video that demonstrates the working of our system in
tracking detected a false crucial light. For example, the motion     real environment can be found at http://cvpr.uni-muenster.de/
estimation may point to the recent crucial traffic light in red,     research/pedestrianlights.
but the identification result may point to a green light in the
background.
   With these states we can verify the traffic light detection       A. Reliability
over time. For this purpose we build a queue that stores the            As mentioned before, the reliability is the most important
states of the last SQsize combination results, called state queue.   design criterion for the traffic light localization and classi-
It differs from a normal queue in the following way. Access          fication in single images. We have optimized the parameter
to all elements is allowed. When the queue is full and a new         values to prevent false positive green light detections in
state is pushed the oldest element is removed automatically.         any circumstances and therefore to achieve a high precision,
   A feedback to stay (color c =red) or walk (color c =green)        whereas dangerous feedbacks were reduced to a minimum.
is given if and only if the crucial light of color c is identified   Moreover, we achieved a high recall for red traffic lights so
in the recent video frame and the following conditions are           that red lights would not be missed.
fulfilled:                                                              With additional video analysis our system reached a better
   1) At least SQmin correct traffic light detections with the       precision for green and red traffic lights, but a lower recall (see
       same color c are required, counted from the last inserted     Table III). With video analysis and time-based verification we
       red or green state.                                           observed 0 false positive feedbacks in 5635 frames, neither
   2) These occurrences must not be interrupted by a blue            for red nor for green traffic lights, i. e. the system responses
       state.                                                        very safely. The drawback of this reliability improvement
   3) Between these occurrences color c must not switch from         is the number of false negative feedbacks, which limit the
       red to green or from green to red.                            interactivity (see Sec. VIII-B).
If at least one of the conditions is not fulfilled, no feedback         Table IV shows another representation of the results. The
is given and the pedestrian should wait for a feedback.              bold printed results present the reliability improvement of
   Consequently, black states do not directly exert influence on     the system with and without video analysis and time-based
the state queue. Only when there are more than SQsize −SQmin         verification. In 93.1% of all images neither the single image
black states no feedback is given as a result of these states.       analysis nor the video analysis gave a false positive feedback
SUBMISSION TO IEEE-TCSVT - SPECIAL ISSUE ON VIDEO ANALYSIS ON RESOURCE-LIMITED SYSTEMS                                                 12

                           TABLE IV
C OMPARISON OF FALSE POSITIVE GREEN LIGHT DETECTIONS BETWEEN         normally provides a feedback within 2 seconds, the traffic light
  VIDEO ANALYSIS AND SINGLE IMAGE ANALYSIS . T HE BOLD MARKED        identification normally gives between 4 and 8 feedbacks in a
    NUMBERS SHOW THE IMPROVEMENT OF VIDEO ANALYSIS .
                                                                     second.
6.9% OF THE DETECTED GREEN LIGHTS WERE FALSELY DETECTED WITH
  SINGLE IMAGE ANALYSIS AND REJECTED BY VIDEO ANALYSIS . T HE           From the 14 available sequences there are 2 which seem
  VIDEO ANALYSIS DID NOT PRODUCE EXTRA ERROR (0%) WHEN THE           to be outlier (see Table V). Except from sequence 1 and 4
    SINGLE IMAGE ANALYSIS CORRECTLY DETECTS GREEN LIGHTS .
                                                                     there are small means and standard deviations. Furthermore,
                              single images       single images      the maximum frame count between two feedbacks is less than
                             not false positive   false positive     38.
        video sequences                                                 Sequence 1 and 4 indicate that there are situations in which
        not false positive        93.1%               6.9%           our system does not provide an interactive feedback. After
        video sequences
          false positive            0%                 0%            397 frames (sequence 1), i. e. between 40 and 80 seconds, the
                                                                     user would not get a response of the system. This is due to
                                                                     fast changing false positive and false negative detections and
of a green crucial light. In 6.9% the single image analysis          furthermore, due to the reactions of the motion estimation;
falsely detected a green crucial traffic light, whereas the          see Figure 15 for an example of sequence 1. It is important to
video analysis rejected it. Since the precision is 100%, both        note that although the interactivity is decreased in this case, no
remaining values are 0%. It means that the video analysis            false positive feedbacks are given. This is a correct decision
did not produce extra error when the single image analysis           for safety reasons related to our main design criterion.
correctly detects green lights. Moreover, there is no image, in         Figure 14 (seq. 4) shows another case of benefit of video
which both detected a false candidate.                               analysis. In (b) and (c) false crucial green lights were identified
   This power of temporal analysis was not only observed in          and refused by the video analysis. The false candidate of (c)
working with the video database, but also in many additional         is even verified in (d) once, which is refused by time-based
tests under real conditions using the prototype system. In           verification. Without video analysis 3 false feedbacks would
numerous such field tests we did not observe a single situation      have been given (a-d). Instead, our system decides to remain
where a false positive response was produced.                        without feedbacks and to wait for more reliable detections.
   An example showing very good detection results is shown
in Figure 13. During the phase switch no feedback is given                       IX. D ISCUSSION AND C ONCLUSIONS
for SQmin frames.
   Figure 14 shows a situation in which the system would                A system was presented for detecting traffic lights for
have failed without video analysis and verification. In 3 of         visually impaired pedestrians on a mobile device. As a proof
the 6 frames the system would have given a feedback to walk,         of concept a prototype designed for German pedestrian lights
although the crucial light is red. Due to video analysis these       was developed for a Nokia N95 mobile phone and tested in real
false positive responses are prevented.                              environment. It runs with about 5 to 10 frames per second, so
                                                                     that in general a feedback is given in less than a few seconds.
                                                                     We tested this prototype in several situations, e. g. rainfall,
B. Interactivity                                                     snowfall, dusk, frontlighting, etc. On the one hand we did
   The second main objective of our system is interactivity.         not observe a false positive feedback, but on the other hand
We measure the interactivity by the number of feedbacks of           the amount of missed traffic lights increased very much. In
the system. With video analysis and time-based verification          our field tests the power consumption turned out to be not a
feedbacks are given as described in Section VII. Each result         matter of fact. The mobile device with our prototype system
of the bare traffic light identification step is interpreted as      was active for about 2 hours without running out of battery.
feedback in our interactivity results.                                  Several challenges have been tackled: low image quality
   The additional steps in our system with temporal analysis         and resolution, restricted computational power and memory
potentially reduce the interactivity, since we have to wait for at   resource, scalability, rotational robustness, temporary occlu-
least SQmin verified frames to give a feedback (see Fig. 13).        sion, and the selection of the crucial traffic light. In particular,
Furthermore, if the motion estimation fails, the verification        the temporal analysis turns out to be powerful in enhancing
will start from scratch.                                             the system performance. Overall, a good trade-off between
   The interactivity of the system is presented in Table V.          interactivity and reliability has been achieved.
There, the number of frames between two consecutive feed-               With enough caution, the presented prototype in its current
backs is measured. It shows that the overall frame count             state in fact would improve the safety of visually impaired
between two feedbacks is 1.8 in average with a standard              pedestrians. For instance, if the user gets the signal to walk,
deviation of 9.2 frames for the system with the additional video     he could signalize the drivers his intention to walk by holding
analysis and time-based verification. With our assumption to         his white cane on the street in a higher angle that can be seen
get between 5 and 10 frames per second and with a stable             by the drivers. With such sort of signalization the user would
traffic light recognition a feedback is normally given within        be safer using the prototype than having no information about
2 seconds. The mean interactivity of the bare traffic light          the phase of traffic lights.
identification step is similar with 1.1 frames, but the standard        Working with devices of limited resources like mobile
deviation is 18 times smaller. Whereas the whole system              phones is always an art of making compromises between
You can also read