Pretty Good Phone Privacy - Barath Raghavan
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Pretty Good Phone Privacy Paul Schmitt Barath Raghavan Princeton University University of Southern California pschmitt@cs.princeton.edu barath.raghavan@usc.edu Abstract formal prescriptions for preserving user data privacy. The result is that the locations of all users are constantly tracked To receive service in today’s cellular architecture, phones as they simply carry a phone in their pocket, without even uniquely identify themselves to towers and thus to operators. using it. This is now a cause of major privacy violations, as operators Much has been made of privacy enhancements in recent sell and leak identity and location data of hundreds of millions cellular standards (e.g., 5G), but such changes do nothing to of mobile users. prevent cellular carriers from tracking user locations. Worse In this paper, we take an end-to-end perspective on the still, the 5G push toward small cells results in much finer- cellular architecture and find key points of decoupling that grained location information, and thus tracking, than previous enable us to protect user identity and location privacy with generations. no changes to physical infrastructure, no added latency, and no requirement of direct cooperation from existing operators. In our architecture, we alter commonly attacked permanent Privacy violations by carriers. In recent years it has been identifiers that are widely used in today’s mobile networks extensively reported that mobile carriers have been routinely to values that no longer individually identify users, while selling and leaking mobile location data and call metadata of maintaining connectivity and compatibility with existing in- hundreds of millions of users [18,19,40,71,75]. Unfortunately frastructure. for users, this behavior by the operators appears to have been We describe Pretty Good Phone Privacy (PGPP) and legal, and has left mobile users without a means of recourse demonstrate how our modified backend stack (NGC) works due to the confluence of a deregulated industry, high mobile with real phones to provide ordinary yet privacy-preserving use, and the proliferation of data brokers in the landscape. As connectivity. We explore inherent privacy and efficiency trade- a result, in many countries every mobile user can be physically offs in a simulation of a large metropolitan region. We show located by anyone with a few dollars to spend. This privacy how PGPP maintains today’s control overheads while signifi- loss is ongoing and is independent of leakage by apps that cantly improving user identity and location privacy. users choose to install on their phones (which is a related but orthogonal issue). While this major privacy issue has long been present in the 1 Introduction architecture, the practical reality of the problem and lack of Cellular phone and data networks are an essential part of the technical countermeasures against bulk surveillance is beyond global communications infrastructure. In the United States, what was known before. However there is a fundamental there are 124 cellular subscriptions for every 100 people and technical challenge at the root of this problem: even if steps the total number of cellular subscriptions worldwide now were taken to limit the sale or disclosure of user data, such stands at over 8.2 billion [5]. Unfortunately, today’s cellular as by passing legislation, the cellular architecture generally architecture embeds privacy assumptions of a bygone era. In and operators specifically would still seemingly need to know decades past, providers were highly regulated and centralized, where users are located in order to provide connectivity. Thus, few users had mobile devices, and data broker ecosystems as things stand, users must trust that cellular network operators were undeveloped. As a result, except for law enforcement will do the right thing with respect to privacy despite not access to phone records, user privacy was generally preserved. having done so to date. Protocols that underpin cellular communication embed an assumption of trusted hardware and infrastructure [2], and Architectural, deployable solution. We identify points of specifications for cellular backend infrastructure contain few decoupling in the cellular architecture to protect user pri-
vacy in a way that is immediately deployable. In this, we Our prototype implements privacy-enhancing changes that are aided by the industry-wide shift toward software-based impact the control traffic load of an operator. As such, we cellular cores. Whereas prior generations of cellular networks show PGPP’s impact on control traffic and on user anonymity. ran on highly-specific hardware, many modern cellular core We show that by altering the network coverage map we are functions are run in software, making it more amenable to able to gain control traffic headroom compared with today’s key changes. networks; we then consume that headroom in exchange for In our approach, users are protected against location track- improved anonymity. We analyze the privacy improvements ing, even by their own carrier. We decouple network con- against a variety of common cellular attacks, including those nectivity from authentication and billing, which allows the based on bulk surveillance as well as targeted attacks. We find carrier to run Next Generation Core (NGC) services that are that PGPP significantly increases anonymity where there is unaware of the identity or location of their users but while none today. We find that an example PGPP network is able to still authenticating them for network use. Our architectural increase the geographic area that an attacker could believe a change allows us to nullify the value of the user’s SUPI, an victim to be within by ~1,200% with little change in control often targeted identifier in the cellular ecosystem, as a unique load. identifier1 . We shift authentication and billing functionality Our contributions are as follows: to outside of the cellular core and separate traditional cellular credentials from credentials used to gain global connectivity. • We design a new architecture that decouples connectivity Since it will take time for infrastructure and legislation to from authentication and billing functionality, allowing change, our work is explicitly not clean slate. We anticipate us to alter the identifiers used to gain connectivity (§4.1) that our solution is most likely to be deployed by Mobile and enable PGPP-based operators to continue to authen- Virtual Network Operators (MVNOs), where the MVNO op- ticate and bill users (§4.1) without identifying them. erates the core (NGC) while the base stations (gNodeBs) • We adapt existing mechanisms to grow control traffic are operated by a Mobile Network Operator (MNO). This broadcast domains, thus enhancing user location privacy presents us with architectural independence as the MVNO while maintaining backwards compatibility (§4.2). can alter its core functionality, so long as the NGC conforms to LTE / 5G standards. While it is not strictly necessary for • We quantify the impacts of PGPP on both user privacy PGPP to be adopted by an MVNO, we assume that existing and network control traffic through simulation (§5) and industry players (e.g., MNOs) are unlikely to adopt new tech- demonstrate PGPP’s feasibility in a lab testbed. nologies or have an interest in preserving user privacy unless legal remedies are instituted. As a result, we consider how 2 Background privacy can be added on top of today’s mobile infrastructure Here we provide a brief overview of the cellular architec- by new industry entrants. ture and describe the inherent privacy challenges. For sim- plicity and brevity we focus on 5G, though the fundamental Contributions. In this work we refactor the cellular core challenges also exist in legacy standards. in order to decouple billing and authentication functionality from connectivity, allowing for enhanced privacy for com- 2.1 Cellular architecture overview monly leveraged permanent identifiers (e.g., SUPIs). Our The 5G architecture can be divided into two areas: the main goal is to thwart bulk passive surveillance that is com- Next Generation Radio Access Network (NG-RAN), which monplace in today’s networks. Additionally, we create a mech- is responsible for radio access; and the Next Generation Core anism for enhancing location privacy from local-scale targeted (NGC), which includes the entities responsible for authentica- attacks. tion and connectivity to the network core. Figure 1 shows a We describe our prototype implementation, Pretty Good simplified architecture for both conventional cellular as well Phone Privacy (PGPP). In doing so, we examine several key as with PGPP. PGPP moves authentication and billing to a challenges in achieving privacy in today’s cell architecture. new entity, the PGPP-GW, that is external to the NGC. We In particular, we consider: 1) which personal identifiers are detail PGPP’s specific changes in Section 4. We include a stored and transmitted within the cellular infrastructure; 2) glossary of cellular terms in Appendix A. which core network entities have visibility into them (and NG-RAN. The NG-RAN is the network that facilitates how this can be mitigated); 3) which entities have the ability connectivity between user devices (UEs)—commonly a cell to provide privacy and with what guarantees; and 4) how we phone with a SIM card installed—and the serving base station can provide privacy while maintaining compatibility with (gNodeB). The NG-RAN is responsible for providing UEs a today’s infrastructure and without requiring the cooperation means of connecting to the NGC via gNodeBs. of established providers. NGC. The NGC is the core of the 5G cellular network and 1 The SUPI is the 5G equivalent for the well-known IMSI from previous includes entities that provide authentication, billing, voice, cellular generations. SMS, and data connectivity. The NGC entities relevant to our
NG-RAN NGC Control Authentication Identifier Allocator Duration Connectivity AUSF SUPI Operator Permanent GUTI AMF Temporary Conventional gNodeB UE IP Address (static) Operator Permanent AMF SMF IP Address (dynamic) UPF Temporary RNTI gNodeB Temporary UPF PGPP-GW PGPP gNodeB Table 1: User identifiers in 5G. UE Figure 1: Simplified 5G architecture with and without PGPP. 2.2 Privacy in the cellular architecture PGPP decouples authentication and connectivity credentials Maintaining user privacy is challenging in cellular net- and shifts authentication to a new, external entity, the PGPP- works, both past and present as it is not a primary goal of GW. Details of the PGPP-GW are found in Section 4.1. the architecture. In order to authenticate users for access and billing purposes, networks use globally unique client identi- fiers. Likewise, the cellular infrastructure itself must always “know” the location of a user in order to minimize latency discussion are the Access and Mobility Management Func- when providing connectivity. We briefly discuss cellular iden- tion (AMF), the Authentication Server Function (AUSF), the tifiers as well as location information available from the per- Session Management Function (SMF), and the User Plane spective of the cell network in this section. We use acronyms Function (UPF). The AMF is the main point of contact for from the 5G architecture as it is the newest standard; however, a UE and is responsible for orchestrating mobility and con- similar entities exist in all generations (2G, 3G, 4G LTE). nectivity. UEs authenticate to the network by sending an identifier that is stored in the SIM to the AMF. The AUSF is then queried to verify that the UE is a valid subscriber. Once User and device identifiers. There are multiple identifiers the UE is authenticated, the AMF assigns the UE to an SMF that can be used to associate network usage with a given and UPF, which offer an IP address and connectivity to the subscriber. Identifiers can be assigned by various actors in the Internet. Note that 5G networks can include many copies of ecosystem, they can vary in degree of permanence, and they these entities and contain many more entities; however, for can be globally unique across all cellular operators or they the purposes of our discussion this simplified model suffices. can be locally unique within a given network. Table 1 shows these identifiers, their allocators, and their permanence in 5G. MVNOs. We design our solution to be implemented by a Mobile Virtual Network Operator (MVNO). MVNOs are The Subscription Permanent Identifier (SUPI) is the iden- virtual in that they offer cellular service without owning the tifier used to gain access to the network when a phone (UE) infrastructure itself. Rather, MVNOs pay to share capacity on performs initial attachment. The SUPI is globally unique, per- the infrastructure that an underlying carrier operates. MVNOs manent, and is stored on the SIM card. Carriers maintain a can choose whether they wish to operate their own core en- AUSF database containing the list of SUPIs that are provi- tities such as the AMF, AUSF, and UPF, which is the type sioned for use on the network and subscription details for of operation we propose. MVNOs that run their own core each. Because the SUPI is globally unique and permanent, it network are often called “full” MVNOs. is seen as a high-value target for those who wish to surveil cellular users. For example, in recent years there has been 5G Network slicing. The 5G architecture includes the con- a rise of cell-site simulators, also known as IMSI catchers2 . cept of network slicing, whereby infrastructure is shared by These devices offer what appears to be a legitimate base sta- multiple, virtual networks that operate in isolation. Slicing en- tion (gNodeB) signal. Since UE baseband radios are naïve ables each virtual network to define its requirements in terms and automatically connect to the strongest signal, they will at- of performance (e.g., latency demands, bandwidth, etc.). Net- tempt to attach to the IMSI catcher and offer their IMSI. IMSI work slicing, along with the industry shift toward “whitebox” catchers have been used extensively by law enforcement as gNodeBs that connect to central offices that act as datacenters well as nation-state adversaries to identify and eavesdrop on with virtualized NGC services, as in the Open Networking cellular users [54]. Foundation’s M-CORD project [26], lowers the barrier to en- try for our PGPP architectural changes. Other recent work has 2 The SUPI is a replacement for the International Mobile Subscriber shown that dramatic architectural changes are now possible Identity (IMSI) from previous cellular generations. The SUPI and IMSI have equivalent functionality. In 5G networks, the SUPI is encrypted before as the cellular core functionality is increasingly implemented transmission, creating a Subscription Concealed Identifier (SUCI). However, as software components rather than through specialized hard- when connecting to legacy generation networks, the SUPI or IMSI can be ware, enabling significant performance gains [58, 59]. visible.
Given the SUPI’s importance and sensitivity, temporary timing information from applications on their devices [69]. identifiers are often used instead. The Globally Unique Tem- Cellular operators often store location metadata for sub- porary Identifier (GUTI) can be thought of as a temporary scriber, giving them the ability to trace user movement and replacement for an SUPI. Once a phone attaches to the net- location history. This bulk surveillance mechanism has been work, the Access and Mobility Management Function (AMF) used to establish a user’s past location by law enforcement [9]. generates a GUTI value that is sent to the UE, which stores the value. The UE uses the GUTI rather than the SUPI when 3 Scope it attaches to the network in the future. The GUTI can be changed by the AMF periodically. Ephemeral identifiers such We believe that many designs are possible to increase pri- as GUTIs may be susceptible to de-anonymization attacks vacy in mobile networks, and no architecture, today or in the given weaknesses in their implementations. For instance, prior future, is likely to provide perfect privacy. Nevertheless, below work has found that GUTIs are often predictable with consis- we discuss various properties that PGPP strives to achieve. tent patterns, thus offering little privacy [33], but this can be Prior work examined the security vulnerabilities in modern remedied with a lightweight fix that we expect will be used go- cell networks [35,44,68] and revealed a number of flaws in the ing forward. We view such mitigations as complementary to architecture itself. In addition, data brokers and major opera- PGPP in order to construct a layered solution for user privacy. tors alike have taken advantage of the cellular architecture’s The 5G network is IP-based, meaning UEs must be given vulnerabilities to profit off of revealing sensitive user data. IP addresses in order to connect. IPs can be either statically We believe mobile networks should aim to, at a minimum, or dynamically assigned to UEs. Statically assigned IPs are provide one or both of the following privacy properties: stored in a backend core database. During the attach proce- • Identity privacy. A network can aim to protect users’ dure, the AMF retrieves the static IP address assigned to the identity. Networks—as well as third party attackers— UE from the backend. Conversely, dynamic addresses are identify users through SUPIs, which are intended to be assigned by the SMF when the UE attaches. Providers can uniquely identifying. associate a user with an IP address in the network by monitor- ing traffic at the UPF, which offers a convenient location to • Location privacy. A network can aim to protect informa- place a network tap. tion about the whereabouts of a phone. In order to connect with the gNodeB over the NG-RAN, UE’s must be assigned radio resources at layer 2, includ- Naturally, these privacy properties do not exist in isolation; ing a temporary unique identifier, the RNTI. Prior work has they intersect in critical ways. For example, attackers often shown that layer 2 information used on the NG-RAN can aim to learn not only who a user is but where a specific user be used to link RNTIs with temporary identifiers at higher is currently located, or where a user was when a specific call layers (e.g., GUTIs) provided the attacker knows the GUTI was made. Also, the definition of an attacker or adversary is a beforehand [65]. This attack is specific to the coverage area complex one, and depending on context may include individu- of a single cell, and can be mitigated by changing the GUTI als aiming to steal user data, mobile carriers and data brokers frequently, as discussed in [33]. looking to profit off of user data, governments seeking to per- form bulk surveillance, law enforcement seeking to monitor User location information. Cellular networks maintain a user with or without due process, and many others. Due knowledge of the physical location of each UE. Location to context dependence, we do not expect all privacy-focused information is necessary to support mobility and to quickly mobile networks to make the same choice of tradeoffs. find the UE when there is an incoming call, SMS, or data for a user. The mechanism used to locate a UE is known as 3.1 Cellular privacy threat model “paging” and it relies on logical groupings of similarly located Given the above discussion, we distinguish between bulk gNodeB’s known as “tracking areas” (TAs). Each gNodeB is and targeted data collection. We define bulk collection to assigned to a single TA. TAs can be thought of as broadcast be the collection of information from existing cellular archi- domains for paging traffic. If there is incoming data for an tecture traffic without the introduction of attack traffic; thus, idle UE, the paging procedure is used, where the network bulk collection is passive. Bulk attacks commonly target user broadcasts a paging message to all gNodeBs in the user’s last- identities (e.g., SUPIs). PGPP’s core aim is to protect against known TA. Prior work has shown that the paging mechanism bulk attacks. Targeted attacks are active and require injec- can be leveraged by attackers that know an identifier of the tion of traffic to attack specific targets. Targeted attacks are victim (e.g., phone number, WhatsApp ID) to generate paging often aimed at discovering a victim’s location. We also de- messages intended for the victim, which enables an unprivi- lineate attacks by the adversary’s capabilities, as they may leged attacker to identify a specific user’s location [44]. From have visibility into an entire network (global) versus, for an an external perspective, the vantage point of remote servers on unprivileged attacker, some smaller subset of a network’s in- the web can also be leveraged to localize mobile users given frastructure (local). Note that we assume trust that the PGPP
Attack type authentication and connectivity mechanisms in the cellular Bulk Targeted Carrier logs [18, 19, 40, 75] / architecture. Visibility Global Carrier Paging Isn’t 5G more secure than legacy generations? The 5G Government Surveillance [9] SDR [3, 52, 74] / standard includes enhancements focused on user privacy and Local Paging attack [36, 44] IMSI Catcher [27, 54] system performance over legacy cellular generations. How- ever, the enhancements do not offer location privacy benefits Table 2: Common cellular attacks. from the carriers. Encrypted SUPIs. 5G includes the addition of encrypted operator itself will not actively attempt to thwart PGPP’s pri- SUPIs, where public key cryptography, along with ephemeral vacy mechanisms (i.e., a global-targeted attack, described be- keys generated on the SIM, is used to encrypt the SUPI when low, or maliciously generated non-unique TALs (Section 4.2), sending it to the network in the form of a Subscription Con- or TALs containing unlikely or non-adjacent TAs). Table 2 cealed Identifier (SUCI). This protects user SUPIs from eaves- gives the taxonomy of attacks. droppers. SUCI usage negates the effectiveness of local at- Mobile carriers and governments are the most common tacks such as IMSI catchers. Thus, our techniques to increase global-bulk attackers. Such bulk surveillance is common- location privacy from local attackers (Section 4.2) will be- place in cellular networks, and has been at the center of recent come less necessary once a full transition to 5G is complete3 . lawsuits and privacy concerns. Attacks that employ IMSI However, SUCIs do not prevent the cellular provider itself catchers or passively listen to broadcasts using software- from knowing the user’s identity. An analogy for encrypted defined radios are considered local-bulk. Here, an IMSI SUPIs can be found in DNS over HTTPS (DoH): eavesdrop- catcher is only able to monitor phones that connect directly pers cannot see unencrypted traffic, yet the endpoints (the to it, so its visibility is limited to its radio range. Similarly, DNS resolver for DoH, the cellular core in 5G) still can. The SDR-based passive snooping is only able to monitor nearby goal of this work is to not only thwart local-bulk attacks, but base stations and will miss portions of the network. We de- also protect user privacy from mobile operators that would sign PGPP with a primary focus on thwarting bulk attacks by otherwise violate it (i.e., global-bulk attacks). nullifying the value of SUPIs (Section 4.1). Small cell location privacy. The 5G standard strives for Local-targeted attacks can be carried out by ordinary users reduced latencies as well as much higher data throughputs. by generating traffic that causes a network to page a victim This necessitates the use of cells that cover smaller areas in (e.g., phone call or data sent to the victim). As local-targeted higher frequency spectrum in order to overcome interference attackers do not have visibility into the entire network, they compared with previous cellular generations that used macro- must rely upon knowledge of the geographic area that is en- cells to provide coverage to large areas. A (likely unintended) compassed by a tracking area. Due to the prevalence of such byproduct of 5G’s use of smaller cells is a dramatic reduction attacks, as an enhancement, an operator can provide function- in location privacy for users. As the 5G network provider ality, in cooperation with the user, that reduces the efficacy of maintains state pertaining to the location in the network for local-targeted attacks through the use of TALs (Section 4.2). a given user for the purposes of paging, smaller cells result Global-targeted attacks represent a very powerful attacker in the operator, or attacker, knowing user locations at a much who can actively probe a victim while having global vis- higher precision compared with previous generations. ibility of the network. We envision defenses against such What about active | traffic analysis | signaling attacks? attacks would require fundamental changes to to communica- While active, targeted attacks aren’t our main focus, we im- tion models. PGPP does not mitigate global-targeted attacks prove privacy in the face of them by leveraging TALs to in- as we focus on immediately deployable solutions; we leave crease and randomize the broadcast domain for paging traffic, this to future work. making it more difficult for attackers to know where a victim is located (analyzed in Section 5.2). Further, the goal of many 3.2 Aims active attacks is to learn users’ SUPIs, and our nullification Next we discuss the aims of PGPP by considering several of SUPIs renders such attacks meaningless. common questions that arise. An attacker with a tap at the network edge could use traffic What sort of privacy does PGPP provide? As its name analysis attacks to reduce user privacy. We largely view this suggests, PGPP aims to provide “pretty good” privacy; we as out of scope as users can tunnel traffic and use other means don’t believe there is a solution that provides perfect privacy, to hide their data usage patterns. causes no service changes (i.e., does not increase latency), Cellular networks rely on signaling protocols such as Sig- and is incrementally deployable on today’s cellular networks. naling System 7 (SS7) and Diameter when managing mobility The main focus is to offer privacy against global-bulk surveil- 3 Unfortunately, we do not anticipate a complete transition to 5G in the lance of mobility and location, a practice by carriers that near future. For example, AT&T is scheduled to phase out their 3G network in is widespread and pernicious. We thwart this via eliminat- the U.S. in 2022 (https://www.att.com/support/article/wireless/ ing the SUPI as an individual identifier and decoupling the KM1324171/).
as well as voice and SMS setup and teardown. These protocols is assigned to the hardware by the manufacturer and identifies enable interoperability between carriers needed for roaming the manufacturer, model, and serial number of a given device. and connectivity across carriers. Unfortunately, these proto- Some operators keep an IMEI database to check whether a cols were designed with inherent trust in the network players, device has been reported as stolen, known as an equipment and have thus been used to reduce user privacy and disrupt identity register (EIR); IMEIs in the database are blacklisted. connectivity [24, 43, 51, 55, 67]. We design PGPP for 4G/5G Such databases are optional. data only, which renders legacy SS7 compatibility moot. Our For many devices, the IMEI can be changed through soft- PGPP design expects users to use outside messaging services ware, often without root access. We envision a PGPP MVNO rather than an in-NGC IMS system. would allow for subscribers to present their unchanged de- Can PGPP support roaming? Yes. While we envision vice IMEI, giving the PGPP operator the opportunity to check that many PGPP users would explicitly not wish to roam, as against a EIR to verify the phone has not been reported as roaming partners may not provide privacy guarantees, roam- stolen. At that point, the IMEI could be reprogrammed to a ing is possible using a Diameter edge agent that only allows single value, similar to our changes to the SUPI. Additionally, for home routed roaming, forcing traffic to route from the PGPP users can readily switch SIMs to different handsets visited network’s SMF back to the PGPP operator’s UPF, (IMEIs) at-will. Note that different jurisdictions have differ- rather than local breakout due to our authentication mecha- ent rules about whether, how, and by whom an IMEI can be nism (Section 4.1). Roaming, and international roaming in changed, so only in some cases IMEI changes require cooper- particular, adds billing complexities for the PGPP operator. ation with the MVNO. Typically, the visited network collects call data records for Is PGPP legal? Legality varies by jurisdiction. For ex- each roaming user on its network and calculates the wholesale ample, U.S. law (CALEA [1]), requires providers to offer charges payable by the home network. The visited network lawful interception of voice and SMS traffic. A PGPP-based then sends a Transferred Account Procedure (TAP) file to the carrier is data-only, with voice and messaging provided by home network via a data clearing house. The home network third parties. CALEA requires the provider to offer content of then pays the visited network. In PGPP, the individual identity communication data at the UPF, e.g., raw (likely-encrypted) of the user that roamed is not known, yet the PGPP operator network traffic. This is supported by PGPP. To the best of our remains able to pay the appropriate fees to visited networks. knowledge this would make PGPP legal. How does PGPP protect user privacy for voice or text service? Out of the box, PGPP doesn’t provide protection for such service. Instead, PGPP aims provide privacy from the 4 Design cellular architecture itself, and in doing so users are free to In this section we describe the mechanisms PGPP em- use a third party VoIP provider (in which case the phone will ploys to increase user identity and location privacy. Ultimately, operate identically to a normal phone for telephony service PGPP’s design choices appear obvious in retrospect. We be- from a user’s perspective) or use recent systems by Lazar et lieve its simplicity is an asset, as PGPP is compatible with al. [46, 47] that provide strong metadata privacy guarantees existing networks and immediately deployable. for communications, or similar systems such as [16,17,48,73]. In order to provide identity privacy against bulk attacks, We view PGPP as complementary to such systems. we nullify the value of the SUPI, as it is the most common How does PGPP protect users against leaky apps? PGPP target identifier for attackers. In our design, we choose to set doesn’t, as it is about providing protection in the cellular all PGPP user SUPIs to an identical value to break the link infrastructure. Even without leaky apps, users can always in- between SUPI and individual users. This change requires a tentionally or inadvertently reveal their identity and location. fundamental shift in the architecture, as SUPIs are currently Leaky apps make this worse as they collect and, sometimes, di- used for connectivity as well as authentication, billing, and vulge sensitive user information. We see PGPP as complemen- voice/SMS routing. This change also requires us to disable tary to work that has targeted privacy in mobile app ecosys- SUPI-based paging in lieu of paging using only temporary tems. Further, apps are not as fundamental as connectivity— identifiers (e.g., GUTIs). We design a new cellular entity for users can choose whether to install and run a leaky app, and billing and authentication that preserves identity privacy. For- can constrain app permissions. However, phones are, by their tunately, the industry push for software-based NGCs makes nature, always connected to carrier networks, and those very our architecture feasible. We describe the architecture in Sec- networks have been selling user data to third parties. tion 4.1. If users can’t be identified by carriers, how can carriers To provide location privacy from targeted attacks, PGPP still make money? We introduce PGPP tokens in Section 4.1 leverages an existing mechanism (TALs) in the cellular spec- as a mechanism for a PGPP operator to charge customers ification in order to grow the broadcast domain for control while protecting user anonymity. traffic (Section 4.2). By changing the broadcast domain for Can’t phone hardware be tracked as well? Phones have an every user, the potential location of a victim is broadened International Mobile Equipment Identity (IMEI). The IMEI from the attacker’s vantage point.
4.1 User identity privacy Scheme Customer? Anonymous? Unique? As discussed in Secton 2.2, SUPIs are globally unique, Standard auth • permanent identifiers. As such, they are routinely targeted Group/ring sig • • by attackers, both legal and illegal. In this section we re- Linkable ring sig • • architect the network in order to thwart bulk attacks introduced Cryptocurrency • • in Section 3.1 that are based on identifying individuals via PGPP tokens • • • SUPI. We decouple back-end connectivity from the authentica- Table 3: Three properties needed for user authentication in a tion procedure that normally occurs at the AUSF when a UE privacy-preserving cell network and schemes to achieve them. attaches to the network. Instead, the PGPP operator issues SIM cards with identical SUPIs to all of its subscribers. In As we show in Table 3, the challenge is that standard ap- this model, the SUPI is used only to prove that a user has a proaches for authentication only provide one of the three valid SIM card to use the infrastructure and, in turn, the PGPP required properties and widely-studied cryptographic mech- network can provide an IP address and connectivity and offer anisms only provide two of the three properties. For exam- the client a GUTI, providing the user with a unique identity ple, an ordinary authentication protocol (of which there are necessary for basic connectivity. Note that using identical many [7,38]) can provide property 1) but not 2) and 3). A cryp- SUPIs is only one technique for nullifying its value. We antic- tographic mechanism such as group signatures [8, 13] or ring ipate that a network could assign random SUPIs from a pool. signatures [20, 64] can protect the user’s identity upon authen- Such a mechanism would require a sufficiently large pool, tication, providing properties 1) and 2), but not 3) as providing and would be enabled as more UEs employ eSIM capabilities, the last property would violate the security of the signature allowing their SIMs to be programmable from the network. scheme. Similarly, traitor tracing schemes [14] (such as for We leave exploration into this as future work. broadcast encryption [25]) can provide all three properties 5G authentication is normally accomplished using SUPIs at but in practice cannot provide property 3) as the traitor trac- the AUSF; however, all PGPP users share a single SUPI. Thus, ing would require actual physical confiscation of the “traitor” to authenticate a user, we designed a post-attach oblivious phone by the MVNO, which is infeasible. A variation on ring authentication scheme to ensure that the PGPP operator is signatures known as linkable ring signatures [50] provides able to account for the user without knowing who they are. the ability for a user’s identity to be revealed if the user signs multiple messages with the same key. While this is useful in PGPP Gateway. In order to perform this authentication we establishing that the user is unique and hasn’t shared their create a new logical entity called a PGPP Gateway (PGPP- credentials, it also partially violates the user’s anonymity, as GW), shown in Figure 1, which sits between the UPF and that key cannot be used again. the public Internet. The UPF is configured to have a fixed tunnel to a PGPP-GW, which can be located outside of the PGPP operator’s network. Using this mechanism, the PGPP- Effective authentication. There are two approaches that GW only sees an IP address, which is typically NATed4 , and we view as viable, depending on the circumstances. An whether that IP address is a valid user. Notably, it does not anonymity-preserving cryptocurrency can provide properties have any information about the user’s SUPI or GUTI. The 2) and 3), but not 1) as a cryptocurrency would combine PGPP-GW design also allows for many different architectures. billing and authentication at the PGPP-GW. For MVNOs For instance, multiple PGPP-GWs could be placed in multiple that are not required to know their customers, an anonymity- datacenters or even use a privacy service such as Tor. preserving cryptocurrency may be the ideal solution for both user authentication and payment, though even the best coins Authentication properties. From the perspective of the provide imperfect anonymity guarantees [39]. PGPP-GW, there are multiple properties an authentication To provide all three properties, we develop a simple scheme scheme must guarantee: (1) the gateway can authenticate that called PGPP tokens that helps us sidestep the issues with al- a user is indeed a valid customer5 ; (2) the gateway and/or any ternative approaches. The choice of authentication scheme is other entities cannot determine the user’s identity, and thus deployment-context specific. With PGPP tokens, when paying cannot link the user’s credentials/authentication data with a a monthly bill a user retrieves authentication tokens that are user identity; and (3) the gateway can determine whether a blind-signed using a FDH-based variant of Chaum’s classic user is unique or if two users are sharing credentials. scheme [6, 12] by the billing system. Later, when authenticat- ing to the service, the user presents tokens and the service (the 4 Like GUTIs, we anticipate that IP addresses will be assigned using a PGPP-GW) verifies their signature before allowing the user solution that is robust to de-anonymization attacks. 5 Due to “Know Your Customer” rules in some jurisdictions, the provider to use the network. The token scheme ensures that the service may need to have a customer list, necessitating that the user authentication can check the validity of tokens without identifying the user scheme be compatible with periodic explicit customer billing. requesting access. The user then presents the next token in
advance so as to ensure seamless service. Note that PGPP has shared their credentials and is attempting to cheat. tokens disallow the post-pay model for cellular billing, as the The costs of this scheme to both the PGPP operator and network would be required to know the identity of users in the user are low. The operator stores the list of used tokens order to accurately charge them for usage. Therefore, PGPP in a standard consistent and replicated cloud database, so the is pre-pay only, though this can be adjusted to emulate post- service can operate multiple PGPP-GWs, though it is likely payment (e.g., users pre-pay for tokens on an ongoing basis that a small number of PGPP-GWs can serve a large number rather than only monthly, and tokens are valid for a longer of users: we benchmarked the 2048-bit RSA signature veri- time period, such as a year, rather than for only one billing fication used here at 31µs per call using Crypto++ [21] on a period). single core of a 2.6GHz Intel Xeon E5-2640 CPU, and thus Each token represents a unit of access, as is appropriate with a single CPU core the PGPP-GW can handle token veri- for the service provider. Some providers may choose to offer fication for tens of millions of users. The tokens themselves flat-rate unlimited-data service, in which case each token are small and the storage cost to the provider is about 1.5 MB represents a fixed period of time; this is the default approach / user per time period, which is a small amount for any user’s that we use to describe the scheme below. Other providers phone to store and for a provider even hundreds of millions may choose to offer metered service, in which case each token of tokens amounts to mere GBs of data in cloud storage. represents a fixed unit of data, such as 100 MB or 1 GB, rather than a period of time. Still others may choose to provide two- User device agent. To automate the process of authenti- tiered service priority by marking each token with a priority cating with the PGPP-GW, we create a simple agent that bit, in addition to either unlimited data or metered data service; runs as background job on the user device. This agent lever- such prioritization does come with slight privacy loss, as the ages the Android JobScheduler API; in the event of cellular MVNO and MNO alike would be able to differentiate which connectivity, the JobScheduler triggers PGPP-token-based priority level was in use. The privacy loss of two-tiered data authentication with the PGPP-GW. The agent establishes a priority can be partially mitigated by offering all users some TLS connection to the PGPP-GW and then sends the token amount of time or GB of high-priority service after which for the current time slice. Once the user presents a valid to- they must fall back to low-priority service; such a service ken, the PGPP-GW begins forwarding traffic for that user, plan structure is fairly standard in the industry today. In such and thus this behavior is akin to a captive portal though the a setting, each user would have both high-priority and low- authentication is automatic and unseen by the user. priority tokens and thus would not be clearly stratified into two identifiable groups of users. 4.2 Location privacy At the beginning of a billing period, the billing system de- As described in Section 2.2, cellular operators track user fines s time slices (e.g., corresponding to hours) or another location in the form of tracking areas for UEs in order to unit of access (e.g., a unit of data) and generates s RSA key- quickly find users when there is incoming content. PGPP pairs for performing blind signatures using Chaum’s scheme. leverages an existing mechanism in the cellular standard to It then appends the public keys for this time period to a well- reduce the effectiveness of local-targeted attacks described in known public repository that is externally maintained (e.g., on Section 3.1. GitHub), and these are fetched by users. The user generates s Paging has been exploited in the past to discover user lo- tokens where each token takes the form ikr where i is the time cation by adversaries. However, the use of tracking areas is slice index as a 256-bit unsigned value zero indexed from the useful for the cellular provider in that it confines the signal- beginning of the billing period, and r is a 256-bit random ing message load (i.e., paging messages) to a relatively small value chosen by the user. The user then blinds the hash of subset of the infrastructure. Tracking areas reduce mobility the tokens. The user pays the bill using a conventional means signaling from UEs as they move through the coverage zone of payment (e.g., credit card), and presents the blinded token of a single tracking area. Note that emergency calling rep- hashes to the billing system to be signed; the system signs resents a special case in cellular networks. When a device each token with the corresponding time slice key and returns dials 911, the phone and network attempt to estimate accurate these values to the user. The user unblinds the response values location information. In this work we do not alter this func- and verifies the signatures for each. tionality as we anticipate that users dialing 911 are willing to Upon later authentication to the service, the user presents reveal their location. its signed token for the current time slice to the PGPP-GW, In PGPP, we exploit the tracking area list (TAL) concept, which verifies the signature and if valid begins forwarding introduced in 3GPP Release 8 [2]. Using TALs, a UE no the user’s traffic onto the Internet. Since the token signature longer belongs to a single tracking area, but rather is given a was generated using Chaum’s scheme, the service cannot list of up to 16 tracking areas that it can freely move through determine which human user corresponds to which signed without triggering a tracking area update, essentially creating token. If the same token is used by two different users during larger tracking areas. Whereas prior work has focused on us- the same time period then the service can conclude that a user ing TALs to pre-compute optimal tracking area combinations
for users [60–62], in PGPP, we use TALs to provide improved location anonymity. Typically, TALs consist of groups of adja- cent tracking areas that are pre-computed, essentially growing the tracking area for a UE to the union of all tracking areas in the TAL. We do not use TALs in this way. Instead, we gen- erate TALs on-the-fly and generate them uniquely for each UE. When a UE attaches or issues a tracking area update message, the AMF learns the gNodeB and tracking area the UE is currently attached to. The AMF then generates a unique TAL by iteratively selecting at random some number (up to the TAL limit of 16) of additional, adjacent tracking areas. By generating unique TALs for each user, attackers are unable to know a priori which set of tracking areas (or gNodeBs) that Figure 2: Partial simulation map. Cells are shaded by AT&T victim is within. We explore tradeoffs in terms of TAL length, tracking area. control traffic overhead, and location anonymity in the next section. Cars 5 Analysis Pedestrians To study the implications of a PGPP deployment, we create 0 10 20 30 a simulation to model users, mobility, and cell infrastructure. gNodeBs Visited We study the impact of PGPP’s design on various cellular at- tacks that occur today. We then analyze the inherent tradeoffs Figure 3: gNodeBs visited by simulated mobile users. from the PGPP operator’s perspective, as improved privacy comes at the price of increased control traffic. Lastly, we examine PGPP in a lab testbed on real devices. randomly selecting start and end points, and use the Direc- 5.1 Simulation configuration tions API to obtain a polyline with coordinates along with gNodeB dataset. We select Los Angeles County, California estimated times to reach points along the line. We generate as the region for our simulation, which provides a mix of both 50,000 mobility traces: 25,000 cars and 25,000 pedestrians. highly urban areas as well as rural areas. For gNodeB location We then use ns-3 to process the mobility traces and generate information, we use OpenCellID [45], an open database that coordinates for each trace at 5-second intervals, in a method includes tower locations and carrier information. To simplify similar to [10]. We use this output, along with the gNodeB the simulation, we select base stations from the database that Voronoi diagram to assign each simulated UE to an gNodeB are listed as providing LTE from AT&T, the provider with for every 5-second interval in the mobility trace. Figure 3 the most LTE eNodeBs (22,437) in the region. We use LTE shows the distribution of the number of gNodeBs visited by eNodeBs as the number of gNodeBs deployed remains small. UEs in the simulation. As expected, car trips result in a signif- Given their geographic coordinates, we estimate coverage icantly higher number of gNodeBs for a UE compared with areas for every gNodeB using a Voronoi diagram. During the pedestrian trips. simulation, a UE is assigned to the gNodeB that corresponds Synthetic traffic. We simulate one hour. To create control to the region the UE is located within. While such discretiza- traffic, at every 5-second interval we randomly select 5% of tion is not likely in reality as UEs remain associated with the user population to receive a “call.” We select such a high an gNodeB based on received signal strength, this technique traffic load in order to perform a conservative analysis. A call provides us with a tractable mobility simulation. A partial results in a paging message that is sent to all gNodeBs in map of the simulation region is shown in Figure 2. GNodeB the UE’s tracking area. Each paged user enters a 3-minute regions are shaded based on the tracking area value in the “call” if it is not already in one, at which point further paging OpenCellID database. messages are suppressed for that user until the call is complete. Mobility traces. To simulate realistic mobility patterns We run the simulation with PGPP enabled as well as with the (i.e., users must follow available paths), we generate mobil- conventional infrastructure setup. ity traces using the Google Places [31] and Directions [30] Custom TAs. As we detail further in Section 5.3, large TALs APIs. First, we use the Places API to find locations in the increase control traffic loads, which lowers the network’s user simulation region that are available when searching for “post capacity. Therefore, we generate new tracking areas in the office6 .” Each place is associated with latitudinal and lon- underlying network in order to mitigate the control traffic bur- gitudinal coordinates. We then generate mobility traces by are societally-important and ubiquitous, allowing us to generate random 6 Our use of post offices as endpoints is arbitrary. We chose them as they trajectories over the entire simulation area.
1.0 1.0 a victim could possibly be connected to. This is because a N=22,437 N=22,437 cellular carrier can know the exact base station that a user is 0.8 0.8 connected to once the UE enters an active state. As a baseline, Degree of Anonymity Degree of Anonymity 2 (1) 0.6 0.6 the anonymity set for traditional cellular is loglog(22,437) = 0, as Conventional 2 0.4 0.4 TAs 25 each SUPI is a unique value. With PGPP, SUPIs are identical, Conventional TAs 50 TAL Length 4 TAs 100 so from the perspective of the carrier, the victim could be 0.2 TAL Length 8 0.2 TAs 200 TAL Length 12 TAs 500 connected to any gNodeB that has at least one PGPP client 0.0 TAL Length 16 0.0 TAs 1000 connected to it. Using our simulated environment we collect, 0 5000 10000 S 15000 20000 0 5000 10000 S 15000 20000 for each paging message, the number of gNodeBs that had users within their range and use the median value to calculate (a) TALs. (b) Custom TAs. the degree of anonymity. Figures 4a and 4b show the degree of anonymity using different configurations of TALs and custom Figure 4: Degree of anonymity using TALs and custom TAs. TAs, respectively. We see that high degrees of anonymity are attainable despite an attacker’s global visibility. For instance, with TALs of length 8, the degree of anonymity is 0.748. den. As tracking areas normally consist of groups of adjacent gNodeBs, we need a method by which we can cluster nearby gNodeBs into logical groupings. To do so, we use k-means Local-bulk attacks. PGPP’s use of identical SUPIs reduces clustering with the gNodeB geographic coordinates allowing the importance of SUPIs, and by extension the usefulness of for Euclidean distance to be calculated between gNodeBs. We local bulk attacks on user identity. An attacker that can view generate several underlying tracking area maps, with the num- traffic at the gNodeB(s) can gain insight into nearby SUPIs. ber of TAs (i.e., k-means centers) ranging from 25 to 1,000. In traditional cell networks, each user has a globally unique For comparison, the AT&T LTE network in the simulation is SUPI (S = 1), resulting in a degree of anonymity of zero as composed of 113 TAs. the victim could only be one user. The subset S in PGPP, on the other hand, is the size of the population of PGPP users in 5.2 Cellular privacy attack analysis a given location, as all SUPI values are identical and a local Given the taxonomy we presented in Section 3.1, we ana- bulk attacker cannot know the true identity of a single user. lyze the identity and location privacy benefits of PGPP in the To get an idea of S, we can calculate the number of PGPP simulated environment. users connected to each gNodeB in the simulation. Over the course of the simulation, we find a mean value of 223.09 users connected to each gNodeB that has users, which results in a Global-bulk attacks. By nullifying the value of SUPIs, 2 (223.09) degree of anonymity log log2 (50,000) = 0.50. While this value is separating authentication with connectivity, and increasing the broadcast domain for users, we increase user identity privacy somewhat low compared to the ideal value of 1, it is a drastic even with an adversary that is capable of bulk surveillance improvement over conventional cellular architecture, and is over an entire network (e.g., operators, governments). dependent on the overall user population in the network. As Anonymity analysis We measure the anonymity of a user more PGPP users exist, the degree of anonymity increases. when under bulk attacks using degree of anonymity [22]. The degree of anonymity value ranges from zero to one, with Local-targeted attacks. In PGPP, local-targeted attacks to ideal anonymity being one, meaning the user could be any discover a user’s location are diminished in two ways: first, member of the population with equal probability. In this case, SUPIs are no longer a useful identifier, so identifying an we consider the SUPI value to be the target identity. The size individual among all users is challenging; and second, we use of the anonymity set for a population of N users will result in TALs to increase the paging broadcast domain for a given UE. a maximum entropy of: From an attacker’s point of view, this broadens the scope of where the target UE may be located. HM = log2 (N) (1) In Figure 5a, we plot the CDF of geographic areas in which pages are broadcast as we increase TAL lengths using the The degree of anonymity is determined based on the size of base map consisting of 113 tracking areas. We calculate the the subset of user identities S that an attacker could possibly area by generating a bounding box around all gNodeBs that believe the victim to be: are included in the broadcast domain. As shown, large TALs H(X) log2 (S) result in drastically higher area anonymity compared with d= = (2) TALs disabled, particularly considering the number of UEs HM log2 (N) that could potentially be located in the larger geographic areas. Given global visibility into the network, we can reason For instance, the median area for the conventional simulation about the anonymity set using the number of gNodeBs that is 378.09 km2 whereas TAL lengths of 8 and 16 result in
1.0 1.0 1.0 Conventional Median 25 TAL Length: 4 95th percentile User Capacity (millions) 0.8 0.8 0.8 TAL Length: 8 Max 20 Conventional TAL Length: 12 0.6 0.6 (TAL disabled) 0.6 TAL Length: 16 CDF CDF 15 CDF Conventional # TAs: 25 Max pages/s 0.4 (TAL disabled) 0.4 # TAs: 50 0.4 TAL Length: 4 # TAs: 100 10 0.2 TAL Length: 8 0.2 # TAs: 200 0.2 TAL Length: 12 # TAs: 500 5 TAL Length: 16 # TAs: 1000 0.0 0.0 0.0 0 5000 10000 15000 20000 0 5000 10000 15000 20000 0 0 100 101 102 103 104 105 106 1 2 4 6 8 10 12 14 16 Page Area Anonymity (km^2) Page Area Anonymity (km^2) Control Traffic (pages) TAL Length (a) TALs. (b) Custom TAs. (a) Control traffic with TALs. (b) Capacity with TALs. Figure 5: Area anonymity using TALs and custom TAs. Figure 6: Control traffic and system capacities leveraging PGPP TALs in the simulated environment. median areas of 5,876.96 and 9,585.17 km2 , respectively. We analyze anonymity with TALs of length 16 while the 1.0 Conventional 25 Median # TAs: 25 95th percentile underlying map is varied using custom TAs. Figure 5b shows User Capacity (millions) 0.8 # TAs: 50 Max 20 our results. We observe that as the number of tracking areas in- # TAs: 100 # TAs: 200 0.6 crease, resulting in smaller tracking areas, the area anonymity # TAs: 500 CDF 15 # TAs: 1000 decreases. However, despite the decrease, the area anonymity 0.4 Max pages/s 10 remains considerably larger than anonymity with TALs dis- 0.2 abled as TALs include additional tracking areas. For instance, 5 the median area for the conventional case is 378.09 km2 0.0 0 0 100 101 102 103 104 105 106 0 200 400 600 800 1000 whereas the median area for a base map of 500 tracking areas Control Traffic (pages) # of TAs in Underlying Map with TAL 16 is 4891.08 km2 , a nearly 13-fold increase from the perspective of a local targeted attacker. (a) Custom TAs: Control traffic. (b) Custom TAs: Capacity. 5.3 Impact of PGPP on network capacity Figure 7: Control traffic and system capacities with custom From an operational perspective, the privacy benefits de- tracking areas in the simulated environment. livered by PGPP must coincide with feasibility in terms of control overhead in order for it to be deployable. Control traf- capacity planning, it is commonplace to budget paging traffic fic determines network capacity in terms of the number of headroom; accordingly, we estimate the maximum paging users that are serviceable in a given area. In this section, we capacity for an gNodeB to be 525 pages per second (70% of explore control traffic load when using TALs. the BTS3202E capacity). This value is depicted in the vertical 5.3.1 Control overhead with PGPP TALs red line in the figure (525 pages × 3600 seconds = 1,890,000 We first seek to quantify control message overhead while pages/hour). The simulation allows us to illustrate the user we leverage tracking area lists to provide location anonymity population that could be supported by the network, provided a against local-targeted attacks. Recall from Section 4.2 that we population with similar mobility and traffic profiles as defined randomly select additional tracking areas from the simulated in Section 5.1. Recall that we simulate 50,000 users, both coverage area to create TALs, which increases the broadcast pedestrians and cars. We consider the paging load for the net- domain for a page. Increased control traffic impacts both work and select the gNodeBs with the maximum paging load, gNodeBs and AMFs, however, from our experience with real the 95th percentile, and the median to estimate the number of cellular networks the control traffic capacity at gNodeBs is users each could theoretically support by taking into account the bottleneck as AMFs have much higher capacity. Thus, we the max page limitation of the BS3202E. Figure 6b shows the focus on gNodeB control load. user capacity as TAL lengths are increased. A TAL length of Figure 6a shows a cumulative distribution function (CDF) one shows the conventional network, as the TAL is composed for the number of pages broadcast by the simulated gNodeBs. of a single tracking area. As expected, larger TALs result in In the figure, “Conventional” corresponds to disabling TAL a reduction in the number of users the gNodeBs can handle functionality. As expected, larger TAL lengths result in in- compared with performance when TALs are disabled, due to creased control traffic for gNodeBs as they are more likely to increased paging load. be included in the paging broadcast domain for a given UE. To gain insight into the control limitations of real gNodeBs, 5.3.2 Control overhead with custom tracking areas we consider the capabilities of a Huawei BTS3202E eN- As we’ve demonstrated, large TALs result in gNodeBs odeB [34], which is limited to 750 pages per second. When with higher control traffic load, effectively reducing the user
You can also read