Location Privacy in Pervasive Computing

Page created by George Ingram
 
CONTINUE READING
SECURITY & PRIVACY

                  SECURITY & PRIVACY

                Location Privacy in
                Pervasive Computing
                As location-aware applications begin to track our movements in the
                name of convenience, how can we protect our privacy? This article
                introduces the mix zone—a new construction inspired by anonymous
                communication techniques—together with metrics for assessing user
                anonymity.

                                M
                                                  any countries recognize privacy        therefore most people could see no privacy impli-
                                                  as a right and have attempted to       cations in revealing their location, except in special
                                                  codify it in law. The first known      circumstances. With pervasive computing, though,
                                                  piece of privacy legislation was       the scale of the problem changes completely. You
                                                  England’s 1361 Justices of the         probably do not care if someone finds out where
                            Peace Act, which legislated for the arrest of eaves-         you were yesterday at 4:30 p.m., but if this someone
                            droppers and stalkers. The Fourth Amendment to               could inspect the history of all your past movements,
                            the US Constitution proclaims citizens’ right to pri-        recorded every second with submeter accuracy, you
                            vacy, and in 1890 US Supreme Court Justice Louis             might start to see things differently. A change of scale
                            Brandeis stated that “the right to be left alone” is         of several orders of magnitude is often qualitative
                            one of the fundamental rights of a democracy.1 The           as well as quantitative—a recurring problem in per-
                                                1948 Universal Declaration of            vasive computing.4
                                                Human Rights2 declares that                 We shall focus on the privacy aspects of using
     Alastair R. Beresford and                  everyone has a right to privacy at       location information in pervasive computing appli-
     Frank Stajano                              home, with family, and in corre-         cations. When location systems track users auto-
     University of Cambridge                    spondence. Other pieces of more          matically on an ongoing basis, they generate an
                                                recent legislation follow this prin-     enormous amount of potentially sensitive informa-
                                                ciple. Although many people              tion. Privacy of location information is about con-
                            clearly consider their privacy a fundamental right,          trolling access to this information. We do not nec-
                            comparatively few can give a precise definition of           essarily want to stop all access—because some
                            the term. The Global Internet Liberty Campaign3              applications can use this information to provide use-
                            has produced an extensive report that discusses per-         ful services—but we want to be in control.
                            sonal privacy at length and identifies four broad cat-          Some goals are clearly mutually exclusive and can-
                            egories: information privacy, bodily privacy, privacy        not be simultaneously satisfied: for example, want-
                            of communications, and territorial privacy.                  ing to keep our position secret and yet wanting col-
                               This article concentrates on location privacy, a          leagues to be able to locate us. Despite this, there is still
                            particular type of information privacy that we define        a spectrum of useful combinations to be explored.
                            as the ability to prevent other parties from learning           Our approach to this tension is a privacy-protecting
                            one’s current or past location. Until recently, the very     framework based on frequently changing pseudo-
                            concept of location privacy was unknown: people              nyms so users avoid being identified by the locations
                            did not usually have access to reliable and timely           they visit. We further develop this framework by
                            information about the exact location of others, and          introducing the concept of mix zones and showing

46   PERVASIVE computing                               Published by the IEEE CS and IEEE Communications Society ■ 1536-1268/03/$17.00 © 2003 IEEE
Related Work: Location Awareness and Privacy

   T       he first wide-scale outdoor location system, GPS,1 lets users cal-
           culate their own position, but the flow of information is unidi-
   rectional; because there is no back-channel from the GPS receiver to
                                                                                customer billing information, micropayment systems, customer
                                                                                preferences, and location information. In this context, individual
                                                                                privacy becomes much more of a concern. Users of location-aware
   the satellites, the system cannot determine the user’s computed loca-        applications in this scenario could potentially have all their daily
   tion, and does not even know whether anyone is accessing the                 movements traced.
   service. At the level of raw sensing, GPS implicitly and automatically          In a work-in-progress Internet draft that appeared after we sub-
   gives its users location privacy.                                            mitted this article for publication, Jorge R. Cuellar and colleagues
      In contrast, the first indoor location system, the Active Badge,2         also explore location privacy in the context of mobile cellular sys-
   detects the location of each user and broadcasts the information             tems.8 As we do, they suggest using pseudonyms to protect loca-
   to everyone in the building. The system as originally deployed               tion privacy. Unlike us, however, they focus on policies and rules—
   assumes anyone in the building is trustworthy; it therefore                  they do not consider attacks that might break the unlinkability
   provides no mechanisms to limit the dissemination of individuals’            that pseudonyms offer.
   location information. Ian W. Jackson3 modified the Active Badge
   system to address this issue. In his version, a badge does not reveal
   its identity to the sensor detecting its position but only to a trusted      REFERENCES
   personal computer at the network edge. The system uses
   encrypted and anonymized communication, so observation of the                 1. I. Getting, “The Global Positioning System,” IEEE Spectrum, vol. 30, no.
                                                                                    12, Dec. 1993, pp. 36–47.
   traffic does not reveal which computer a given badge trusts. The
   badge’s owner can then use traditional access control methods to              2. R. Want et al., “The Active Badge Location System,” ACM Trans. Infor-
   allow or disallow other entities to query the badge’s location.                  mation Systems, vol. 10, no. 1, Jan. 1992, pp. 91–102.

      More recent location systems, such as Spirit4 and QoSDream,5               3. I.W. Jackson, “Anonymous Addresses and Confidentiality of Location,”
   have provided applications with a middleware event model                         Information Hiding: First Int’l Workshop, R. Anderson, ed., LNCS, vol.
   through which entities entering or exiting a predefined region of                1174, Springer-Verlag, Berlin, 1996, pp. 115–120.

   space generate events. Applications register their interest in a par-         4. N. Adly, P. Steggles, and A. Harter, “SPIRIT: A Resource Database for
   ticular set of locations and locatables and receive callbacks when               Mobile Users,” Proc. ACM CHI'97 Workshop on Ubiquitous Computing,
   the corresponding events occur. Current location-aware middle-                   ACM Press, New York, 1997.

   ware provides open access to all location events, but it would be             5. H. Naguib, G. Coulouris, and S. Mitchell, “Middleware Support for
   possible to augment this architecture to let users control the dis-              Context-Aware Multimedia Applications,” Proc. 3rd Int’l Conf. Distrib-
   semination of their own location information.                                    uted Applications and Interoperable Systems, Kluwer Academic Publish-
                                                                                    ers, Norwell, Mass., 2001, pp. 9–22.
      So far, few location systems have considered privacy as an initial
   design criterion. The Cricket location system6 is a notable excep-            6. N.B. Priyantha, A. Chakraborty, and H. Balakrishnan, “The Cricket Loca-
   tion: location data is delivered to a personal digital assistant under           tion-Support System,” Proc. 6th Int’l Conf. Mobile Computing and Net-
                                                                                    working (Mobicom 2000), ACM Press, New York, 2000, pp. 32–43.
   the sole control of the user.
      Commercial wide-area location-based services will initially                7. P. Smyth, Project Erica, 2002, www.btexact.com/erica.
   appear in mobile cellular systems such as GSM. BTexact’s Erica sys-
                                                                                 8. J.R. Cuellar, J.B. Morris, and D.K. Mulligan, Geopriv Requirements,
   tem7 delivers sensitive customer information to third-party appli-               Internet draft, Nov. 2002, www.ietf.org/internet-drafts/draft-ietf-
   cations. Erica provides an API for third-party software to access                geopriv-reqs-01.txt.

how to map the problem of location pri-              Problem, threat model, and                             now dangerously close to reality. The more
vacy onto that of anonymous communi-                 application framework                                  challenging problem we explore in this arti-
cation. This gives us access to a growing            In the pervasive computing scenario, loca-             cle is to develop techniques that let users
body of theoretical tools from the infor-            tion-based applications track people’s                 benefit from location-based applications
mation-hiding community. In this context,            movements so they can offer various use-               while at the same time retaining their loca-
we describe two metrics that we have                 ful services. Users who do not want such               tion privacy.
developed for measuring location privacy,            services can trivially maintain location pri-             To protect the privacy of our location
one based on anonymity sets and the other            vacy by refusing to be tracked—assuming                information while taking advantage of
based on entropy. Finally, we move from              they have the choice. This has always been             location-aware services, we wish to hide
theory to practice by applying our meth-             the case for our Active Badge (see the                 our true identity from the applications
ods to a corpus of more than three million           “Related Work” sidebar) and Active Bat                 receiving our location; at a very high level,
location sample points obtained from the             systems but might not be true for, say, a              this can be taken as a statement of our
Active Bat installation at AT&T Labs                 nationwide network of face-recognizing                 security policy.
Cambridge.5                                          CCTV cameras—an Orwellian dystopia                        Users of location-based services will not,

JANUARY–MARCH 2003                                                                                                             PERVASIVE computing             47
SECURITY & PRIVACY

     in general, want information required by            We assume a system with the typical           collusion. This is because certain regions
     one application to be revealed to another.       architecture of modern location-based ser-       of space, such as user desk locations in an
     For example, patients at an AIDS testing         vices, which are based on a shared event-        office, act as “homes” and, as such, are
     clinic might not want their movements (or        driven middleware system such as Spirit or       strongly associated with certain identities.
     even evidence of a visit) revealed to the        QoSDream (see the “Related Work” side-           Applications could identify users by fol-
     location-aware applications in their work-       bar). In our model, the middleware, like         lowing the “footsteps” of a pseudonym to
     place or bank. We can achieve this com-          the sensing infrastructure, is trusted and       or from such a “home” area. At an earlier
     partmentalization by letting location ser-       might help users hide their identity. Users      stage in this research, we tried this kind of
     vices register for location callbacks only at    register interest in particular applications     attack on real location data from the Active
     the same site where the service is provided.     with the middleware; applications receive        Bat and found we could correctly de-
     But what happens if the clinic and work-         event callbacks from the middleware when         anonymize all users by correlating two sim-
     place talk about us behind our back?             the user enters, moves within, or exits cer-     ple checks: First, where does any given
     Because we do not trust these applications       tain application-defined areas. The mid-         pseudonym spend most of its time? Sec-
     and have no control over them, we must           dleware evaluates the user’s location at reg-    ond, who spends more time than anyone
     assume they might collude against us to          ular periodic intervals, called update           else at any given desk? Therefore, long-
     discover the information we aim to hide.         periods, to determine whether any events         term pseudonyms cannot offer sufficient
     We therefore regard all applications as one      have occurred, and issues callbacks to           protection of location privacy.
     global hostile observer.                         applications when appropriate.                      Our countermeasure is to have users
        Some location-based services, such as            Users cannot communicate with appli-          change pseudonyms frequently, even while
     “when I am inside the office building, let       cations directly—otherwise they would            they are being tracked: users adopt a series
     my colleagues find out where I am,” can-         reveal their identity straight away. We there-   of new, unused pseudonyms for each appli-
     not work without the user’s identity. Oth-       fore require an anonymizing proxy for all        cation with which they interact. In this sce-
     ers, such as “when I walk past a coffee          communication between users and appli-           nario, the purpose of using pseudonyms is
     shop, alert me with the price of coffee,”        cations. The proxy lets applications receive     not to establish and preserve reputation (if
     can operate completely anonymously.              and reply to anonymous (or, more correctly,      we could work completely anonymously
     Between those extremes are applications          pseudonymous) messages from the users.           instead, we probably would) but to provide
     that cannot be accessed anonymously—             The middleware system is ideally placed to       a return address. So the problem, in this
     but do not require the user’s true identity      perform this function, passing user input        context, is not whether changing pseudo-
     either. An example of this third category        and output between the application and the       nyms causes a loss of reputation but more
     might be “when I walk past a computer            user. For example, to benefit from the appli-    basically whether the application will still
     screen, let me teleport my desktop to it.”       cation-level service of “when I walk past a      work when the user changes under its feet.
     Here, the application must know whose            coffee shop, alert me with the price of cof-     Our preliminary analysis convinced us that
     desktop to teleport, but it could conceiv-       fee,” the user requests the service, perhaps     many existing applications could be made
     ably do this using an internal pseudonym         from the coffee shops’ trade-association         to work within this framework by judicious
     rather than the user’s actual name.              Web site, but using her location system mid-     use of anonymizing proxies. However, we
        Clearly, we cannot use applications           dleware as the intermediary so as not to         decided to postpone a full investigation of
     requiring our true identity without vio-         reveal her identity. The coffee shop appli-      this issue (see the “Directions for Further
     lating our security policy. Therefore, in        cation registers with the user’s location sys-   Research” sidebar) and concentrate instead
     this article, we concentrate on the class of     tem and requests event callbacks for posi-       on whether this approach, assuming it did
     location-aware applications that accept          tive containment in the areas in front of        not break applications, could improve loca-
     pseudonyms, and our aim will be the              each coffee shop. When the user steps into       tion privacy. If it could not, adapting the
     anonymization of location information.           a coffee shop event callback area, the cof-      applications would serve no purpose.
        In our threat model, the individual does      fee shop application receives a callback            Users can therefore change pseudonyms
     not trust the location-aware application but     from the user’s location system on the next      while applications track them; but then, if
     does trust the raw location system (the sens-    location update period. The coffee shop ser-     the system’s spatial and temporal resolu-
     ing infrastructure that can position locata-     vice then sends the current price of coffee      tion were sufficiently high (as in the Active
     bles). This trust is certainly appropriate for   as an event message to the registered user       Bat), applications could easily link the old
     systems such as GPS and Cricket, in which        via the middleware without having to know        and new pseudonyms, defeating the pur-
     only the user can compute his or her own         the user’s real identity or address.             pose of the change. To address this point,
     location. Its applicability to other systems,       Using a long-term pseudonym for each          we introduce the mix zone.
     such as the Active Bat, depends on the trust     user does not provide much privacy—even
     relationships between the user and the           if the same user gives out different pseu-       Mix zones
     entity providing the location service.           donyms to different applications to avoid          Most theoretical models of anonymity

48   PERVASIVE computing                                                                                             http://computer.org/pervasive
Directions for Further Research

   D      uring our research on location privacy, several interesting open
          problems emerged.
                                                                                measurements from the perspective of what a hostile observer can
                                                                                see and deduce, instead of counting the users in the mix zone.
                                                                                The formalization, which we have attempted, is rather more com-
   Managing application use of pseudonyms                                       plicated than what is presented in this article, but it might turn out
     A user who does not wish to be tracked by an application will              to provide a more accurate location privacy metric.
   want to use different pseudonyms on each visit. Many applications
   can offer better services if they retain per-user state, such as per-        Dummy users
   sonal preferences; but if a set of preferences were accessed by sev-            We could increase location privacy by introducing dummy users,
   eral different pseudonyms, the application would easily guess that           similar to the way cover traffic is used in mix networks, but there
   these pseudonyms map to the same user. We therefore need to                  are side effects when we apply this technique to mix zones. Serv-
   ensure that the user state for each pseudonym looks, to the appli-           ing dummy users, like processing dummy messages, is a waste of
   cation, different from that of any other pseudonym.                          resources. Although the overhead might be acceptable in the
     There are two main difficulties. First, the state for a given user         realm of bits, the cost might be too high with real-world services
   (common to all that user’s pseudonyms) must be stored elsewhere              such as those found in a pervasive computing environment.
   and then supplied to the application in an anonymized fashion.               Dummy users might have to control physical objects—opening
   Second, the application must not be able to determine that two               and closing doors, for example—or purchase services with
   sets of preferences map to the same user, so we might have to add            electronic cash. Furthermore, realistic dummy user movements
   small, random variations. However, insignificant variations might            are much more difficult to construct than dummy messages.
   be recognizable to a hostile observer, whereas significant ones
   could adversely affect semantics and therefore functionality.                Granularity
                                                                                   The effectiveness of mixing depends not only on the user popu-
   Reacting to insufficient anonymity                                           lation and on the mix zone’s geometry, but also on the sensing
       You can use the methods described in this article to compute a quanti-   system’s update rate and spatial resolution. At very low spatial res-
   tative measure of anonymity. How should a user react to a warning            olutions, any “location” will be a region as opposed to a point and
   that measured anonymity is falling below a certain threshold? The sim-       so might be seen as something similar to a mix zone. What are the
   ple-minded solution would be for the middleware to stop divulging loca-      requirements for various location-based applications? How does
   tion information, but this causes applications to become unresponsive.       their variety affect the design of the privacy protection system?
       An alternative might be to reduce the size or number of the appli-       Several location-based applications would not work at all if they
   cation zones in which a user has registered. This is hard to automate:       got updates only on a scale of hours and kilometers, as opposed to
   either users will be bombarded with warnings, or they will believe they      seconds and meters.
   are still receiving service when they are not. In addition, removing an
   application zone does not instantaneously increase user anonymity;           Scalability
   it just increases the size or number of available mix zones. The user           Our results are based on experimental data from our indoor
   must still visit them and mix with others before anonymity increases.        location system, covering a relatively small area and user popula-
       Reconciling privacy with application functionality is still, in gen-     tion. It would be interesting to apply the same techniques to a
   eral, an open problem.                                                       wider area and larger population—for example, on the scale of a
                                                                                city—where we expect to find better anonymity. We are currently
   Improving the models                                                         trying to acquire access to location data from cellular telephony
     It would be interesting to define anonymity sets and entropy               operators to conduct further experiments.

and pseudonymity originate from the work             is analogous to a mix node in communi-              network that offers anonymous commu-
of David Chaum, whose pioneering con-                cation systems. Using the mix zone to               nication facilities. The network contains
tributions include two constructions for             model our spatiotemporal problem lets us            normal message-routing nodes alongside
anonymous communications (the mix net-               adopt useful techniques from the anony-             special mix nodes. Even hostile observers
work6 and the dining cryptographers algo-            mous communications field.                          who can monitor all the links in the net-
rithm7) and the notion of anonymity sets.7                                                               work cannot trace a message from its
   In this article, we introduce a new entity        Mix networks and mix nodes                          source to its destination without the collu-
for location systems, the mix zone, which              A mix network is a store-and-forward              sion of the mix nodes.

JANUARY–MARCH 2003                                                                                                         PERVASIVE computing           49
SECURITY & PRIVACY

                                                                                                      Figure 1. A sample mix zone arrangement
                                                                                                      with three application zones. The airline
                                                                                                      agency (A) is much closer to the bank
                                                                                                      (B) than the coffee shop (C). Users
                     Airline                      Mix zone
                                                                                                      leaving A and C at the same time might
                                                                                                      be distinguishable on arrival at B.
                                                                             Coffee
                                                                              shop

                     Bank
                                                                                                      amount to which the mix zone anonymizes
                                                                                                      users is therefore smaller than one might
                                                                                                      believe by looking at the anonymity set
                                                                                                      size. The two “Results” sections later in
                                                                                                      this article discuss this issue in more detail.

                                                                                                      A quantifiable anonymity
        In its simplest form, a mix node collects     distinct mix zones. In contrast, we define      measure: The anonymity set
     n equal-length packets as input and reorders     an application zone as an area where a user        For each mix zone that user u visits dur-
     them by some metric (for example, lexico-        has registered for a callback. The middle-      ing time period t, we define the anonymity
     graphically or randomly) before forward-         ware system can define the mix zones a pri-     set as the group of people visiting the mix
     ing them, thus providing unlinkability           ori or calculate them separately for each       zone during the same time period.
     between incoming and outgoing messages.          group of users as the spatial areas currently      The anonymity set’s size is a first mea-
     For brevity, we must omit some essential         not in any application zone.                    sure of the level of location privacy avail-
     details about layered encryption of packets;        Because applications do not receive any      able in the mix zone at that time. For exam-
     interested readers should consult Chaum’s        location information when users are in a        ple, a user might decide a total anonymity
     original work. The number of distinct            mix zone, the identities are “mixed.”           set size of at least 20 people provides suf-
     senders in the batch provides a measure of       Assuming users change to a new, unused          ficient assurance of the unlinkability of
     the unlinkability between the messages com-      pseudonym whenever they enter a mix             their pseudonyms from one application
     ing in and going out of the mix.                 zone, applications that see a user emerging     zone to another. Users might refuse to pro-
        Chaum later abstracted this last obser-       from the mix zone cannot distinguish that       vide location updates to an application
     vation into the concept of an anonymity          user from any other who was in the mix          until the mix zone offers a minimum level
     set. As paraphrased by Andreas Pfitzmann         zone at the same time and cannot link peo-      of anonymity.
     and Marit Köhntopp, whose recent survey          ple going into the mix zone with those             Knowing the average size of a mix zone’s
     generalizes and formalizes the terminology       coming out of it.                               anonymity set, as opposed to its instanta-
     on anonymity and related topics, the                If a mix zone has a diameter much larger     neous value, is also useful. When a user
     anonymity set is “the set of all possible sub-   than the distance the user can cover dur-       registers for a new location-aware service,
     jects who might cause an action.”8 In            ing one location update period, it might        the middleware can calculate from histor-
     anonymous communications, the action is          not mix users adequately. For example,          ical data the average anonymity set size of
     usually that of sending or receiving a mes-      Figure 1 provides a plan view of a single       neighboring mix zones and therefore esti-
     sage. The larger the anonymity set’s size,       mix zone with three application zones           mate the level of location privacy available.
     the greater the anonymity offered. Con-          around the edge: an airline agency (A), a       The middleware can present users with this
     versely, when the anonymity set reduces to       bank (B), and a coffee shop (C). Zone A is      information before they accept the services
     a singleton—the anonymity set cannot             much closer to B than C, so if two users        of a new location-aware application.
     become empty for an action that was actu-        leave A and C at the same time and a user          There is an additional subtlety here:
     ally performed—the subject is completely         reaches B at the next update period, an         introducing a new application could result
     exposed and loses all anonymity.                 observer will know the user emerging from       in more users moving to the new applica-
                                                      the mix zone at B is not the one who            tion zone. Therefore, a new application’s
     Mix zones                                        entered the mix zone at C. Furthermore, if      average anonymity set might be an under-
        We define a mix zone for a group of           nobody else was in the mix zone at the          estimate of the anonymity available
     users as a connected spatial region of max-      time, the user can only be the one from A.      because users who have not previously vis-
     imum size in which none of these users has       If the maximum size of the mix zone             ited the new application zone’s geograph-
     registered any application callback; for a       exceeds the distance a user covers in one       ical area might venture there to gain access
     given group of users there might be several      period, mixing will be incomplete. The          to the service.

50   PERVASIVE computing                                                                                             http://computer.org/pervasive
North

                                                                                   North zone
                    West zone
                                                                                                                 East zone &
                                                        Corridor                                Hallway
                                                                                                                  stairwell

                                                                                                 South zone

Figure 2. Floor plan layout of the laboratory. Combinations of the labeled areas form the mix zones z1, z2, and z3, described in the
main text.

Experimental data                               period between 9 a.m. and 5 p.m.—a total                  rather meaningless to average values that
   We applied the anonymity set technique       of more than 3.4 million samples. All the                 also include zero.
to location data collected from the Active      location sightings used in the experiment                     Figure 3 plots the size of the anonymity
Bat system5 installed at AT&T Labs Cam-         come from researchers wearing bats in nor-                set for the hallway mix zone, z1, for update
bridge, where one of us was sponsored and       mal use. Figure 2 provides a floor plan view              periods of one second to one hour. A loca-
the other employed. Our location privacy        of the building’s first floor. The second- and            tion update period of at least eight minutes
experiments predate that lab’s closure, so      third-floor layouts are similar.                          is required to provide an anonymity set size
the data we present here refers to the orig-                                                              of two. Expanding z1 to include the corridor
inal Active Bat installation at AT&T. The       Results                                                   yields z2, but the results (not plotted) for
Laboratory for Communications Engi-                Using the bat location data, we simu-                  this larger zone do not show any significant
neering at the University of Cambridge,         lated longer location update periods and                  increase in the anonymity level provided.
our current affiliation, has since redeployed   measured the size of the anonymity sets for                   Mix zone z3, encompassing the main
the system.                                     three hypothetical mix zones:                             corridors and hallways of all three floors
   The system locates the position of bats:                                                               as well as the stairwell, considerably
small mobile devices each containing an         • z1: first-floor hallway                                 improves the anonymity set’s cardinality.
ultrasonic transmitter and a radio trans-       • z2: first-floor hallway and main corridor               Figure 4 plots the number of people in z3
ceiver. The laboratory ceilings house a         • z3: hallway, main corridor, and stairwell               for various location update periods. The
matrix of ultrasonic receivers and a radio        on all three floors                                     mean anonymity set size reaches two for a
network; the timing difference between                                                                    location update period of 15 seconds. This
ultrasound reception at different receivers       We chose to exclude from our measure                    value is much better than the one obtained
lets the system locate bats with less than 3    of anonymity set sizes of zero occupancy,                 using mix zone z1, although it is still poor
cm error 95 percent of the time. While in       for the following reason. In general, a high              in absolute terms.
use, typical update rates per bat are           number of people in the mix zone means                        The anonymity set measurements tell us
between one and 10 updates per second.          greater anonymity and a lower number                      that our pseudonymity technique cannot
   Almost every researcher at AT&T wore         means less. However, this is only true for                give users adequate location privacy in this
a bat to provide position information to        values down to one, which is the least-                   particular experimental situation (because
location-aware applications and colleagues      anonymous case (with the user completely                  of user population, resolution of the loca-
located within the laboratory. For this arti-   exposed); it is not true for zero, when no                tion system, geometry of the mix zones,
cle, we examined location sightings of all      users are present. While it is sensible to aver-          and so on). However, two positive results
the researchers, recorded over a two-week       age anonymity set size values from one, it is             counterbalance this negative one: First, we

JANUARY–MARCH 2003                                                                                                        PERVASIVE computing            51
SECURITY & PRIVACY

                                                                                                                                             Figure 3. Anonymity set size for z1.
                                                    18
                                                                   Maximum
                                                    16
                                                                   Mean
                                                    14             Minimum
                                                                                                                                             point of view: If I am in the mix zone with
                     Cardinality of anonymity set

                                                    12                                                                                       20 other people, I might consider myself
                                                                                                                                             well protected. But if all the others are on
                                                    10
                                                                                                                                             the third floor while I am on the second,
                                                     8                                                                                       when I go in and out of the mix zone the
                                                                                                                                             observer will strongly suspect that those
                                                     6                                                                                       lonely pseudonyms seen on the second
                                                     4                                                                                       floor one at a time actually belong to the
                                                                                                                                             same individual. This discovery motivated
                                                     2                                                                                       the work we describe next.

                                                         1s   2s     4s   8s   15s   30s    1m    2m      4m   8m 15m 30m       1h           Accounting for user movement
                                                                                  Location update period                                        So far we have implicitly assumed the
                                                                                                                                             location of entry into a mix zone is inde-
                                                                                                                                             pendent of the location of exit. This in gen-
     have a quantitative measurement tech-                                                 approximately 50 seconds. For update              eral is not the case, and there is a strong
     nique that lets us perform this assessment.                                           periods shorter than this value, we cannot        correlation between ingress position and
     Second, the same protection technique that                                            consider users to be “mixed.” Intuitively, it     egress position, which is a function of the
     did not work in the AT&T setting might                                                seems much more likely that a user enter-         mix zone’s geography. For example, con-
     fare better in another context, such as                                               ing the mix zone from an office on the third      sider two people walking into a mix zone
     wide-area tracking of cellular phones.                                                floor will remain on the third floor rather       from opposite directions: in most cases
       Apart from its low anonymity set size,                                              than move to an office on the first. Analy-       people will continue walking in the same
     mix zone z3 presents other difficulties. The                                          sis of the data reveals this to be the case: in   direction.
     maximum walking speed in this area, cal-                                              more than half the movements in this mix             Anonymity sets do not model user entry
     culated from bat movement data, is 1.2                                                zone, users travel less than 10 meters. This      and exit motion. Andrei Serjantov and
     meters per second. To move from one                                                   means that the users in the mix zone are          George Danezis9 propose an information-
     extreme of z3 to another, users require                                               not all equal from the hostile observer’s         theoretic approach to consider the varying
                                                                                                                                             probabilities of users sending and receiv-
                                                                                                                                             ing messages through a network of mix
                                                                                                                                             nodes. We apply the same principle here to
                                                    40                                                                                       mix zones.
                                                                   Maximum                                                                      The crucial observation is that the
                                                    36
                                                                   Mean                                                                      anonymity set’s size is only a good measure
                                                    32
                                                                   Minimum                                                                   of anonymity when all the members of the
         Cardinality of anonymity set

                                                    28                                                                                       set are equally likely to be the one of inter-
                                                                                                                                             est to the observer; this, according to Shan-
                                                    24
                                                                                                                                             non’s definition,10 is the case of maximal
                                                    20                                                                                       entropy. For a set of size n, if all elements
                                                    16                                                                                       are equiprobable, we need log2 n bits of
                                                                                                                                             information to identify one of them. If they
                                                    12
                                                                                                                                             are not equiprobable, the entropy will be
                                                     8                                                                                       lower, and fewer bits will suffice. We can
                                                     4                                                                                       precisely compute how many in the fol-
                                                                                                                                             lowing way.
                                                     0
                                                         1s   2s    4s    8s   15s   30s   1m    2m    4m      8m 15m 30m      1h
                                                                                 Location update period
                                                                                                                                             Figure 4. Anonymity set size for z3.

52   PERVASIVE computing                                                                                                                                   http://computer.org/pervasive
Figure 5. The movement matrix M records
the frequency of movements through the

                                                                               Mix zone
hallway mix zone. Each element
represents the frequency of movements                                                     96      47            66            101          814
from the preceding zone p, at time t – 1,
through zone z, at time t, to the

                                                                               West
subsequent zone s, at time t + 1.
                                                                                          125     13            17             1           43

                                                           Preceding zone, p

                                                                               South
   Consider user movements through a                                                      29      55             7             22          30
mix zone z. For users traveling through z
at time t, we can record the preceding zone,
p, visited at time t − 1 , and the subsequent                                  North
zone, s, visited at time t + 1. The user does-                                            39      6             64             39          66
n’t necessarily change zones on every loca-
tion update period: the preceding and sub-
sequent zones might all refer to the same
                                                                               East

                                                                                          24      47            74            176          162
mix zone z. Using historical data, we can
calculate, for all users visiting zone z, the
relative frequency of each pair of preceding                                              East   North         South          West       Mix zone
and subsequent points (p, s) and record it
                                                                                                         Subsequent zone, s
in a movement matrix M. M records, for
all possible (p, s) pairs, the number of times
a person who was in z at t was in p at t −
1 and in s at t + 1. The entries of M are pro-        Results                                                  way mix zone z1 in the same location
portional to the joint probabilities, which              We applied this principle to the same set             update period. East and west are applica-
we can obtain by normalization:                       of Active Bat movement data described ear-               tion zones, so the hostile observer will
                                                      lier. We consider the hallway mix zone, z1,              know that two users went into z1, one from
                                 M (p, s )            and define four hypothetical application                 east and another from west, and that later
   P (prec = p, subs = s ) =                      .
                               ∑i, j M (i , j )       zones surrounding it: east, west, north, and
                                                      south (see Figure 2). Walls prevent user
                                                                                                               those users came out of z1, under new pseu-
                                                                                                               donyms, one of them into east and the
                                                      movement in any other direction. Figure 5                other into west. What the observer wants
  The conditional probability of coming               shows M, the frequency of all possible out-              is to link the pseudonyms—that is, to find
out through zone s, having gone in through            comes for users entering the hallway mix                 out whether they both went straight ahead
zone p, follows from the product rule:                zone z1, for a location update period of five            or each made a U-turn. Without any a pri-
                                                      seconds. From Figure 5 we observe that                   ori knowledge, the information content of
                                  M ( p, s )          users do not often return to the same mix                this alternative is one full bit. We will now
  P ( subs = s prec = p) =                    .
                               ∑j  M ( p, j )         zone they came from. In other words, it is
                                                      unlikely that the preceding zone, p, is the
                                                                                                               show with a numerical example how the
                                                                                                               knowledge encoded in movement matrix
                                                      same as the subsequent zone, s, in every case            M lets the observer “guess” the answer
 We can now apply Shannon’s classic                   except a prolonged stay in the mix zone                  with a chance of success that is better than
measure of entropy10 to our problem:                  itself. It is quite common for users to remain           random, formalizing the intuitive notion
                                                      in z1 for more than one location update                  that the U-turn is the less likely option.
   h =−  ∑ pi ⋅ log pi.                               period; the presence of a public computer in
                                                      the hallway might explain this.
                                                                                                                  We are observing two users, each mov-
                                                                                                               ing from a preceding zone p through the
          i
                                                         We can use the data from Figure 5 to                  mix zone z1 to a subsequent zone s. If p and
This gives us the information content, in             determine how modeling user movements                    s are limited to {E, W}, we can reduce M
bits, associated with a set of possible out-          affects the anonymity level that the mix                 to a 2 × 2 matrix M′. There are 16 possible
comes with probabilities pi. The higher it is,        zone offers. As an example, consider two                 cases, each representable by a four-letter
the more uncertain a hostile observer will            users walking in opposite directions—one                 string of Es and Ws: EEEE, EEEW, …,
be about the true answer, and therefore the           east to west and the other west to east—                 WWWW. The four characters in the string
higher our anonymity will be.                         passing through the initially empty hall-                represent respectively the preceding and

JANUARY–MARCH 2003                                                                                                                 PERVASIVE computing         53
SECURITY & PRIVACY

     subsequent zone of the first user and the          hostile observer can therefore unlink pseu-     tem demonstrated that, because the tem-
     preceding and subsequent zone of the sec-          donyms with much greater success than the       poral and spatial resolution of the location
     ond user.                                          mere anonymity set size would suggest.          data generated by the bats is high, location
        We identify two events of interest. The         The entropy therefore gives us a tighter and    privacy is low, even with a relatively large
     first, observed, corresponds to the situation      more accurate estimate of the available         mix zone. However, we anticipate that the
     we observed: two users coming into z1 from         anonymity.9                                     same techniques will show a much higher
     opposite directions and then again coming                                                          degree of unlinkability between pseudo-
     out of it in opposite directions. We have                                                          nyms over a larger and more populated

                                                        T
                                                                      echnologies for locating and      area, such as a city center, in the context of
       observed = EEWW ∨ EWWE ∨                                       tracking individuals are becom-   locating people through their cellular tele-
                  WEEW ∨ WWEE,                                        ing increasingly commonplace,     phones.
                                                                      and they will become more per-
     where ∨ means logical OR. The second               vasive and ubiquitous in the future. Loca-
     event, uturn, corresponds to both users            tion-aware applications will have the poten-
     doing a U-turn:                                    tial to follow your every move, from leaving
                                                        your house to visiting the doctor’s office,     ACKNOWLEDGMENTS
       uturn = EEWW ∨ WWEE.                             recording everything from the shelves you
                                                        view in a supermarket to the time you           We thank our colleagues Andrei Serjantov, Dave
                                                                                                        Scott, Mike Hazas, Tom Kelly, and Gray Girling,
     We see that, in fact, the first event includes     spend beside the coffee machine at work.        as well as Tim Kindberg and the other IEEE Com-
     the second. We can easily compute the prob-        We must address the issue of protecting         puter Society referees, for their thoughtful advice
     abilities of these two events by summing the       location information before the widespread      and comments. We also thank all of our former col-
                                                                                                        leagues at AT&T Laboratories Cambridge for pro-
     probabilities of the respective four-letter sub-   deployment of the sensing infrastructure.       viding their movement data, as well as EPSRC and
     cases. We obtain the probability of a four-           Applications can be built or modified to     AT&T Laboratories Cambridge for financial
     letter subcase by multiplying together the         use pseudonyms rather than true user iden-      support.
     probability of the paths taken by the two          tities, and this is one route toward greater
     users. For example, we obtain the proba-           location privacy. Because different appli-
     bility of EWWE by calculating P(EW) ×              cations can collude and share information
     P(WE). We obtain these terms, in turn, from        about user sightings, users should adopt
     the reduced movement matrix M′ by nor-             different pseudonyms for different appli-
     malization. Numerically, P(observed) =             cations. Furthermore, to thwart more            REFERENCES
     0.414. Similarly, P(uturn) = 0.0005.               sophisticated attacks, users should change       1. S. Warren and L. Brandeis, “The Right to
        As the hostile observer, what we want to        pseudonyms frequently, even while being             Privacy,” Harvard Law Rev., vol. 4, no. 5, Dec.
     know is who was who, which is formally             tracked.                                            1890, pp. 193–200.

     expressible as the conditional probability            Drawing on the methods developed for          2. United Nations, Universal Declaration of Human
     of whether each did a U-turn given what            anonymous communication, we use the                 Rights, General Assembly Resolution 217 A (III),
     we saw: P(uturn | observed). From the              conceptual tools of mix zones and                   1948; www.un.org/Overview.

     product rule, this is equal to P(uturn ∧           anonymity sets to analyze location privacy.      3. D. Banisar and S. Davies, Privacy and Human
     observed) / P(observed). This, because the         Although anonymity sets provide a first             Rights, www.gilc.org/privacy/survey.
     second event is included in the first, reduces     quantitative measure of location privacy,
                                                                                                         4. F. Stajano, Security for Ubiquitous Computing,
     to P(uturn) / P(observed) = 0.001.                 this measure is only an upper-bound esti-           John Wiley & Sons, New York, 2002.
        So the observer now knows that they             mate. A better, more accurate metric, devel-
     either did a U-turn, with probability 0.1          oped using an information-theoretic              5. A. Ward, A. Jones, and A. Hopper, “A New
                                                                                                            Location Technique for the Active Office,” IEEE
     percent, or they went straight, with proba-        approach, uses entropy, taking into                 Personal Communications, vol. 4, no. 5, Oct.
     bility 99.9 percent. The U-turn is extremely       account the a priori knowledge that an              1997, pp. 42–47.
     unlikely, so the information content of the        observer can derive from historical data.
                                                                                                         6. D. Chaum, “Untraceable Electronic Mail,
     outcome is not one bit, as it would be if the      The entropy measurement shows a more                Return Addresses and Digital Pseudonyms,”
     choices were equiprobable, but much less—          pessimistic picture—in which the user has           Comm. ACM, vol. 24, no. 2, 1981, pp. 84–88.
     because even before hearing the true answer        less privacy—because it models a more
                                                                                                         7. D. Chaum, “The Dining Cryptographers Prob-
     we are almost sure that it will be “they both      powerful adversary who might use histor-            lem: Unconditional Sender and Recipient
     went straight ahead.” Numerically, the             ical data to de-anonymize pseudonyms                Untraceability,” J. Cryptology, vol. 1, no. 1,
     entropy of this choice is 0.012 bits.              more accurately.                                    1988, pp. 66–75.

        This value is much lower than 1 because            Applying the techniques we presented in       8. A. Pfitzmann and M. Köhntopp, “Anonymity,
     the choices are not equiprobable, and a            this article to data from our Active Bat sys-       Unobservability and Pseudonymity—A Proposal

54   PERVASIVE computing                                                                                                 http://computer.org/pervasive
for Terminology,” Designing Privacy Enhancing
     Technologies: Proc. Int’l Workshop Design Issues
                                                                                                                                         the    AUTHORS
     in Anonymity and Observability, LNCS, vol.
     2009, Springer-Verlag, Berlin, 2000, pp. 1–9.                           Alastair Beresford is a PhD student at the University of Cambridge. His research
                                                                             interests include ubiquitous systems, computer security, and networking. After com-
  9. A. Serjantov and G. Danezis, “Towards an Infor-                         pleting a BA in computer science at the University of Cambridge, he was a
     mation Theoretic Metric for Anonymity,” Proc.                           researcher at BT Labs, returning to study for a PhD in the Laboratory for Communi-
     Workshop on Privacy Enhancing Technologies,                             cations Engineering in October 2000. Contact him at the Laboratory for Communi-
     2002; www.cl.cam.ac.uk/users/aas23/papers_                              cation Engineering, Univ. of Cambridge, William Gates building, 15, JJ Thomson
     aas/set.ps.                                                             Avenue, Cambridge, CB3 0FD, UK; arb33@cam.ac.uk.

 10. C.E. Shannon, “A Mathematical Theory of
     Communication,” Bell System Tech. J., vol. 27,
     July 1948, pp. 379–423, and Oct. 1948, pp.
     623–656.                                                                Frank Stajano is a faculty member at the University of Cambridge’s Laboratory for
                                                                             Communication Engineering, where he holds the ARM Lectureship in Ubiquitous
                                                                             Computing. His research interests include security, mobility, ubicomp, scripting lan-
                                                                             guages, and middleware. He consults for industry on computer security and he is
                                                                             the author of Security for Ubiquitous Computing (Wiley, 2002). Contact him at the
 For more information on this or any other com-                              Laboratory for Communication Engineering, Univ. of Cambridge, William Gates
 puting topic, please visit our Digital Library at                           building, 15, JJ Thomson Avenue, Cambridge, CB3 0FD, UK; fms27@cam.ac.uk;
 http://computer.org/publications/dlib.                                      www-lce.eng.cam.ac.uk/~fms27.

PU R POSE The IEEE Computer Society is                                                                        EXECUTIVE COMMITTEE
the world’s largest association of computing                                                                   President:
professionals, and is the leading provider of                                                                  STEPHEN L. DIAMOND*
technical information in the field.                                                                            Picosoft, Inc.
MEMBERSHIP Members receive the                                                                                 P.O.Box 5032
monthly magazine COM PUTER , discounts,                                                                        San Mateo, CA 94402
and opportunities to serve (all activities are                                                                 Phone: +1 650 570 6060
led by volunteer members). Membership is                                                                       Fax: +1 650 345 1254
open to all IEEE members, affiliate society                                                                    s.diamond@computer.org
members, and others interested in the com-
puter field.
                                                                                                               President-Elect: CARL K. CHANG*
COMPUTER SOCIETY WEB SITE                                                                                      Past President: WILLIS. K. KING*
The IEEE Computer Society’s Web site, at                                                                       VP, Educational Activities: DEBORAH K. SCHERRER
http://computer.org, offers information
and samples from the society’s publications                                                                    (1ST VP)*
and conferences, as well as a broad range of                                                                   VP, Conferences and Tutorials: CHRISTINA
information about technical committees,                                                                        SCHOBER*
standards, student activities, and more.                COMPUTER SOCIETY OFFICES                               VP, Chapters Activities: MURALI VARANASI†
                                                        Headquarters Office                                    VP, Publications: RANGACHAR KASTURI †
BOARD OF GOVERNORS
                                                        1730 Massachusetts Ave. NW                             VP, Standards Activities: JAMES W. MOORE†
Term Expiring 2003: Fiorenza C. Albert-
Howard, Manfred Broy, Alan Clements, Richard A.         Washington, DC 20036-1992                              VP, Technical Activities: YERVANT ZORIAN†
Kemmerer, Susan A. Mengel, James W. Moore,              Phone: +1 202 371 0101 • Fax: +1 202 728 9614          Secretary: OSCAR N. GARCIA*
Christina M. Schober                                                                                           Treasurer:WOLFGANG K. GILOI* (2ND VP)
Term Expiring 2004: Jean M. Bacon, Ricardo              E-mail: hq.ofc@computer.org
Baeza-Yates, Deborah M. Cooper, George V. Cybenko,                                                             2002–2003 IEEE Division VIII Director: JAMES D.
Haruhisha Ichikawa, Lowell G. Johnson, Thomas W.        Publications Office                                    ISAAK†
Williams                                                10662 Los Vaqueros Cir., PO Box 3014                   2003–2004 IEEE Division V Director: GUYLAINE M.
Term Expiring 2005: Oscar N. Garcia, Mark A                                                                    POLLOCK†
Grant, Michel Israel, Stephen B. Seidman, Kathleen      Los Alamitos, CA 90720-1314
M. Swigger, Makoto Takizawa, Michael R. Williams        Phone:+1 714 821 8380                                  Computer Editor in Chief:DORIS L. CARVER†
Next Board Meeting: 6 May 2003, Vancouver, WA                                                                  Executive Director: DAVID W. HENNAGE†
                                                        E-mail: help@computer.org
                                                                                                               * voting member of the Board of Governors
                                                        Membership and Publication Orders:                     † nonvoting member of the Board of Governors
IEEE OFFICERS                                           Phone: +1 800 272 6657 Fax: +1 714 821 4641
President: MICHAEL S. ADLER
President-Elect: ARTHUR W. WINSTON                      E-mail: help@computer.org
                                                        Asia/Pacific Office
                                                                                                               EXECUTIVE STAFF
Past President: RAYMOND D. FINDLAY                                                                             Executive Director: DAVID W. HENNAGE
Executive Director: DANIEL J. SENESE                    Watanabe Building
                                                                                                               Assoc. Executive Director:
Secretary: LEVENT ONURAL                                1-4-2 Minami-Aoyama,Minato-ku,                         ANNE MARIE KELLY
Treasurer: PEDRO A. RAY                                 Tokyo107-0062, Japan                                   Publisher: ANGELA BURGESS
VP, Educational Activities: JAMES M. TIEN               Phone: +81 3 3408 3118 • Fax: +81 3 3408 3553          Assistant Publisher: DICK PRICE
VP, Publications Activities:MICHAEL R. LIGHTNER
                                                        E-mail: tokyo.ofc@computer.org                         Director, Finance & Administration: VIOLET S.
VP, Regional Activities: W. CLEON ANDERSON
                                                                                                               DOAN
VP, Standards Association: GERALD H. PETERSON
                                                                                                               Director, Information Technology & Services:
VP, Technical Activities: RALPH W. WYNDRUM JR.
                                                                                                               ROBERT CARE
IEEE Division VIII Director JAMES D. ISAAK
                                                                                                               Manager, Research & Planning: JOHN C. KEATON
President, IEEE-USA: JAMES V. LEONARD

 JANUARY–MARCH 2003                                                                                                               PERVASIVE computing                55
You can also read