Contact Tracing Mobile Apps for COVID-19: Privacy Considerations and Related Trade-offs - arXiv

Page created by Sherry Ray
 
CONTINUE READING
Contact Tracing Mobile Apps for COVID-19:
                                                                  Privacy Considerations and Related Trade-offs

                                                     Hyunghoon Cho∗                      Daphne Ippolito∗                     Yun William Yu∗
                                             Broad Institute of MIT and Harvard        University of Pennsylvania              University of Toronto
                                             hhcho@broadinstitute.org               daphnei@seas.upenn.edu                ywyu@math.toronto.edu

                                                                   Abstract                           there were only a few cases, contact tracing could
                                                                                                      be done manually. With hundreds to thousands of
                                             Contact tracing is an essential tool for pub-            cases surfacing in some cities, contact tracing has
arXiv:2003.11511v2 [cs.CR] 30 Mar 2020

                                             lic health officials and local communities to
                                                                                                      become much more difficult [4].
                                             fight the spread of novel diseases, such as for
                                             the COVID-19 pandemic. The Singaporean                      Countries have been employing a variety of
                                             government just released a mobile phone app,             means to enable contact tracing. In Israel, legisla-
                                             TraceTogether, that is designed to assist health         tion was passed to allow the government to track
                                             officials in tracking down exposures after an in-        the mobile-phone data of people with suspected
                                             fected individual is identified. However, there          infection [5]. In South Korea, the government has
                                             are important privacy implications of the exis-          maintained a public database of known patients,
                                             tence of such tracking apps. Here, we analyze
                                                                                                      including information about their age, gender, oc-
                                             some of those implications and discuss ways
                                             of ameliorating the privacy concerns without             cupation, and travel routes [6]. In Taiwan, medical
                                             decreasing usefulness to public health. We               institutions were given access to patients travel his-
                                             hope in writing this document to ensure that             tories [7], and authorities track phone location data
                                             privacy is a central feature of conversations            for anyone under quarantine [8]. And on March
                                             surrounding mobile contact tracing apps and to           20, 2020, Singapore released an app that tracks
                                             encourage community efforts to develop alter-            via Bluetooth when two app users have been in
                                             native effective solutions with stronger privacy
                                                                                                      close proximity: when a person reports they have
                                             protection for the users. Importantly, though
                                             we discuss potential modifications, this docu-
                                                                                                      been diagnosed with COVID-19, the app allows the
                                             ment is not meant as a formal research paper,            Ministry of Health to determine anyone logged to
                                             but instead is a response to some of the privacy         be near them; a human contact tracer can then call
                                             characteristics of direct contact tracing apps           those contacts and determine appropriate follow-up
                                             like TraceTogether and an early-stage Request            actions.
                                             for Comments to the community.                              Solutions that have worked for some countries
                                             Date written: 2020-03-24                                 may not work well in other countries with differ-
                                             Minor correction: 2020-03-30                             ent societal norms. We believe that in the United
                                                                                                      States, in particular, the aforementioned measures
                                         1   Introduction                                             are unlikely to be widely adopted. On the legal side,
                                                                                                      publicly revealing patients’ protected health infor-
                                         The COVID-19 pandemic has spread like wildfire               mation (PHI) is a violation of the federal HIPAA
                                         across the globe [1]. Very few countries have man-           Privacy Rule [9], and the Fourth Amendment bars
                                         aged to keep it well-controlled, but one of the key          the government from requesting phone data with-
                                         tools that several such countries use is contact trac-       out cause [10]. Some of these norms may be sus-
                                         ing [2]. More specifically, whenever an individual           pended during times of crisis—HIPAA has recently
                                         is diagnosed with the coronavirus, every person              been relaxed via enforcement discretion during the
                                         who had possibly been near that infected individual          crisis to allow for telemedicine [11], and a pub-
                                         during the period in which they were contagious              lic health emergency could well be argued to be a
                                         is contacted and told to self-quarantine for two             valid cause [12]. However, many Americans are
                                         weeks [3]. In the early days of the virus, when              wary of sharing location and/or contact data with
                                              ∗
                                                  Authors listed alphabetically.                      tech companies or the government, and any privacy
concerns could slow adoption of the system [13].                    3     Desirable Notions of Privacy
   Singapore’s approach of an app, which gives in-
dividuals more control over the process, is perhaps                 Here, we discuss three notions of privacy that are
the most promising solution for the United States.                  relevant to our analysis of contact-tracing systems:
However, while Singapore’s TraceTogether app                        (1) privacy from snoopers, (2) privacy from con-
protects the privacy of users from each other, it has               tacts, and (3) privacy from the authorities. Note
serious privacy concerns with respect to the gov-                   that in this document, we do not rigorously define
ernment’s access to the data. In this document, we                  what it means for information to be private, as this
discuss these privacy issues in more detail and intro-              is a topic better left for future works; some popular
duce approaches for building a contact tracing ap-                  definitions include information theoretic privacy
plication with enhanced privacy guarantees, as well                 [16], k-anonymity [17], and differential privacy
as strategies for encouraging rapid and widespread                  [18]. Furthermore, we discuss only these three
adoption of this system. We do not make explicit                    notions of privacy to illustrate some of the short-
recommendations about how one should build a                        comings of direct contact-tracing systems. Other
privacy-preserving contact tracing app, as any de-                  recent work has presented a useful taxonomy of the
sign implementation should first be carefully vetted                risks and challenges of contact tracing apps [19].
by security, privacy, legal, ethics, and public health                 For any contact tracing app that achieves the aim
experts. However, we hope to show that there exist                  of telling individuals that they might have been
options for preserving several different notions of                 exposed to the virus, there is clearly some amount
user privacy while still fully serving public health                of information that has to be revealed. Even if
aims through contact tracing apps.                                  the only information provided is a binary yes/no
                                                                    to exposure, a simple linkage attack [20] can be
2   Singapore’s TraceTogether App                                   performed: if the individual was only near to one
                                                                    person in the last two weeks, then there will be
On March 20, 2020, the Singaporean Ministry of
                                                                    an obvious inference about the infection status of
Health released the TraceTogether app for Android
                                                                    that person. The goal is of course to reduce the
and iOS [14]. It operates by exchanging tokens
                                                                    amount of information that can be inferred by each
between nearby phones via a Bluetooth connec-
                                                                    of the three parties (snoopers, contacts, and the
tion. The tokens are also sent to a central server.
                                                                    authorities) while still achieving the public health
These tokens are time-varying random strings, as-
                                                                    goal of informing people of potential exposures to
sociated with an individual for some amount of
                                                                    help slow the spread of the disease.
time before they are refreshed. Should an indi-
                                                                       Of note, here we use a semi-honest model for
vidual be diagnosed with COVID-19, the health
                                                                    privacy [21], where we do not consider the pos-
officials will ask* them to release their data on the
                                                                    sibility of malicious actors polluting the database
app, which includes a list of all the tokens the app
                                                                    or sending malformed queries, but rather instead
has received from nearby phones. Because the gov-
                                                                    just analyze the privacy loss from the information
ernment keeps a database linking tokens to phone
                                                                    revealed to each party. A nefarious actor could,
numbers and identities, it can resolve this list of
                                                                    for example, falsely claim to be infected to spread
tokens to the users who may have been exposed.
                                                                    panic; this is not a privacy violation, though we do
    By using time-varying tokens, the app does keep
                                                                    consider this further in the Discussion. Alternately,
the users private from each other. A user has no
                                                                    when a server exposes a public API, queries can be
way of knowing who the tokens stored in their app
                                                                    crafted to reveal more information than intended
belong to, except by linking them to the time the
                                                                    by the system design, which is indeed a privacy
token was received. However, the app provides
                                                                    violation. We leave a more thorough analysis of
little to no privacy for infected individuals; after
                                                                    safeguards for the malicious model to future work.
an infected individual is compelled to release their
data, the Singaporean government can build a list                   3.1    Privacy from Snoopers
of all the other people they have been in contact
with. We will formalize these several notions of                    Consider the most naı̈ve system for contact trac-
privacy in Section 3.                                               ing, which no reasonable privacy-conscious society
    * While the health officials ask, it is a crime in Singa-
                                                                    would ever use, where the app simply broadcasts
pore not to assist the Ministry of Health in mapping one’s          the name and phone number of the phone’s owner,
movements, so ‘ask’ is a bit of a misnomer [15].                    and nearby phones log this information. Then,

                                                                2
upon diagnosis of COVID-19, the government pub-              the user is a binary exposure indicator, which is ar-
lishes a public list of those infected, which the app        guably the minimum possible information release
then checks against its list of known recent contacts.       for the system to be useful.
This is clearly problematic as a nefarious passive
actor (a ‘snooper’) could track the identities of peo-       3.3    Privacy from the Authorities
ple walking past them on the street.                         Protecting the privacy of the users from the au-
   A slightly more reasonable system would as-               thorities, i.e. whoever is administering the app,
sign a unique user-ID to each individual, which              whether that is a government agency or a large
is instead broadcast out. This does not have quite           tech company, is also a challenging task. Clearly,
as many immediate security implications, though              in the absence of a fully decentralized peer-to-peer
all it would take is a nefarious actor linking each          system, any information sharing among phones
ID to a user before one runs into the same prob-             with the app installed will have to be mediated by
lem, which is known as a ‘linkage attack.’ Given             some coordinating servers. Without any protective
how easy and common linkage attacks are, this ap-            measures (e.g. based on cryptography), the coordi-
proach also provides insufficient levels of privacy          nating servers are given an inordinate amount of
for users [22; 23].                                          knowledge.
   The Singaporean app TraceTogether does better,               TraceTogether does not privilege this type of pri-
in that it instead broadcasts random time-varying            vacy, instead making use of relatively high trust
tokens as temporary IDs. Because these tokens                in the government in its design. While it does not
are random and change over time, someone scan-               deliberately gather more information than neces-
ning the tokens while walking down the street will           sary to build a contact map—for example, it does
not be able to track specific users across different         not use GPS location information, as Bluetooth
time points, as their tokens are constantly refreshed.       is sufficient for finding contacts—it also does not
Note that the length of time before refreshing a to-         try to hide anything from the Singaporean govern-
ken is an important parameter of the system (too             ment. When a user is diagnosed with COVID-19
infrequent and users can still be tracked, too fre-          and gives their list of tokens to the Ministry of
quent and the amount of tokens that need to be               Health, the government can retrieve the mobile
stored by the server could be huge), but with a rea-         numbers of all individuals that user has been in
sonable refresh rate, the users are largely protected        contact with. Thus, neither the diagnosed user, nor
against attacks by snoopers in public spaces.                the exposed contacts, have any privacy from the
                                                             government.
3.2   Privacy from Contacts                                     Furthermore, because the government maintains
                                                             a database linking together time-varying tokens
Here, the term contact is defined as any individ-
                                                             with mobile numbers, they can also, in theory, track
ual with whom a user has exchanged tokens in the
                                                             people’s activities without GPS simply by placing
contact tracing app based on some notion of phys-
                                                             Bluetooth receivers in public places. There is no
ical proximity. Privacy from contacts is harder to
                                                             reason to disbelieve the TraceTogether team when
achieve, because the information that needs to be
                                                             they state that they do not attempt to track people’s
passed along is whether one of the individual’s con-
                                                             movements directly; however, the data they have
tacts has been diagnosed with COVID-19, so some
                                                             could be employed to do so. Citizens of countries
information has to be revealed.
                                                             such as the U.S. trust authorities much less than
   The TraceTogether app gives privacy from con-
                                                             Singaporeans [24], so the privacy trade-offs that
tacts by instead putting trust in government authori-
                                                             Singaporeans are willing to make may not be the
ties. When TraceTogether alerts a contact that they
                                                             same ones that Americans will accept.
have been exposed to COVID-19, the information
comes directly from the Singaporean Ministry of              4     Privacy-Enhancing Augmentations to
Health, and no additional information is shared (to                the TraceTogether System
our knowledge) that could identify the individual
that was diagnosed. Thus, TraceTogether does pro-            Here, we discuss potential approaches to build
tect users’ privacy from each other, except for what         upon the TraceTogether model to obtain a con-
can be inferred based on the user’s full list of con-        tact tracing system with differing privacy char-
tacts, as the only information that is revealed to           acteristics for the users. Though important and

                                                         3
Table 1: Comparison of contact tracing systems discussed in this document with respect to privacy of
the users in the semi-honest model and required computational infrastructure.

                  Privacy     Privacy from contacts                Privacy from authorities                Infrastructure
                  from       Exposed                                                                       requirements
                                         Diagnosed user    Exposed user            Diagnosed user
                  snoopers   user
                                                                                   No. Infection status,
                                                           No. Exposure
    Trace To-                                                                      all tokens, and all
                  Yes        Yes        Yes                status and all tokens                           Minimal
    gether [14]                                                                    contact tokens
                                                           revealed.
                                                                                   revealed.
    Polling-
                                                           Partial. Susceptible    Partial. Susceptible    Low. Single
    based*        Yes        Yes        Yes†
                                                           to linkage attacks.     to linkage attacks.     server.
    (§4.1)
                                                           Almost private.         Almost private.
    Polling-                                                                                               Medium.
                                                           Protects against        Protects against
    based with                              †                                                              Multiple
                  Yes        Yes        Yes                linkage attacks by      linkage attacks by
    mixing                                                                                                 servers for
                                                           mixing tokens from      mixing tokens from
    (§4.3)                                                                                                 mixing.
                                                           different users.        different users.
                                        Partial. Info                                                      Communica-
    Public
                                        leaked at time                             Partial. Susceptible    tion cost to
    database      Yes        Yes                           Yes
                                        of token                                   to linkage attacks.     phones is
    (§4.4)
                                        exchange.                                                          high.
                                                                                                           High.
    Private                             Partial. Info
                                                                                                           Multiple
    messaging                           leaked at time
                  Yes        Yes                           Yes                     Yes                     servers
    system                              of token
                                                                                                           performing
    (§5)                                exchange. ‡
                                                                                                           crypto.
*  Augmenting with random tokens does not improve privacy.
†
   However, if contacts are malicious, and they send malformed queries (e.g. a query that includes only a single token),
the diagnosed individual only has the same privacy level as in the public database solution. Namely, there’s only partial
privacy because information is leaked through knowing the time of token exchange.
 ‡
   This information leakage might be fixable using data aggregation based on multi-key homomorphic encryption, but we
do not do so here.

highly nontrivial, various technical and engineer-                i.e. bt ∈ Â and at ∈ B̂ because Alice and Bob
ing challenges behind the exchange of Bluetooth                   exchanged tokens at time t. Five days later, Bob
tokens [25] are outside the scope of this document.               is diagnosed with COVID-19, and sends his list
Our abstraction is that there exists some mecha-                  of contact tokens B̂, which includes at , to Grace.
nism for nearby phones to exchange short tokens if                Grace then matches each b̂i to a phone number,
the devices come within 6 feet of each other—the                  reaches out to those individuals, including Alice,
estimated radius within which viral transmission                  and advises them to quarantine themselves because
is a considerable risk [26]. We are primarily con-                they may have been exposed to the virus.
cerned with the construction of those tokens, and
how those tokens can be used to perform contact                   4.1   Partially Anonymizing via Polling
tracing in a privacy-preserving manner.
                                                                  Instead of having Grace reach out to Alice when
   First, we formally describe the TraceTogether                  Bob reports that he has been diagnosed, a more
system. Let Alice and Bob be users of the app, and                privacy-conscious alternative is for Alice to “poll”
let Grace be the government server (or other cen-                 Grace on a regular basis. In this setting, Grace
tral authority). Alice generates a series of random               maintains the full database, and Alice asks Grace
tokens A = {a0 , a1 , . . .}, one for each time inter-            if she has been exposed. This alternative does not
val, and Bob generates a similar series of tokens                 require Alice and Bob to send their phone numbers
B = {b0 , b1 , . . .}, all drawn randomly from some               to Grace. In this setting, there are two reporting
space {0, 1}N . They also both report their list of               choices for when Bob wishes to declare his diag-
tokens A and B, as well as their phone numbers                    nosis of COVID-19. Bob can send his own tokens
to Grace. At a time t, Alice and Bob encounter                    B to Grace, or he can send the contact tokens B̂
each other, exchanging at and bt . Alice and Bob                  to Grace. In the former case, Alice needs to send
keep lists of contact tokens  = {â0 , â1 , . . .} and         Grace her contact tokens  to see if any have been
B̂ = {b̂0 , b̂1 , . . .} respectively. These consist of           diagnosed with COVID-19. In the latter case, Alice
tokens from every person they were exposed to;                    needs to send Grace her own tokens A to ask if any

                                                              4
of them have been published. Either way, Grace               ple their tokens uniformly at random from {0, 1}N ,
is able to inform Alice that she has been exposed,           where N is chosen to be sufficiently large that ac-
without revealing Bob’s identity. This presupposes           cidental collisions between individuals’ tokens are
that Alice is Honest but Curious (semi-honest); if           unlikely. Suppose Bob sends to Grace his own to-
Alice is malicious and crafts a malformed query              kens B upon being diagnosed, and Alice queries
containing only the token she exchanged with Bob,            Grace with all her contact tokens Â. In theory,
she may be able to reveal Bob’s identity.                    Bob could augment his own tokens with a set of
   Note that in either version of this system, indi-         n random tokens {ri }ni=1 drawn uniformly from
viduals still have privacy from snoopers and from            {0, 1}N , and send those to Grace as well. Un-
contacts. However, they additionally gain some               fortunately, N was chosen to prevent accidental
amount of privacy from authority, as Grace does              collisions; this means that the probability that the
not have their mobile numbers. Of course, Grace              additional random tokens correspond to the tokens
does have some ability to perform linkage attacks.           broadcast by any individual is vanishing small. But
If Bob publishes to Grace his own tokens B upon              then, there is actually little to no privacy gained.
being diagnosed, and Alice queries Grace with all            Grace can just assume that the augmented set of
her contact tokens Â, then Grace can attempt to link        tokens correspond to Bob, and perform the same
those sets of tokens to individuals or geographic            linkage analysis that she would with only the cor-
areas; further, Grace can also monitor the source            rect set of tokens. This does nothing but pollute
of Alice and Bob’s queries (i.e. IP addresses of             Grace’s database with extra data, without affording
phones). For example, if Grace has Bluetooth sen-            any real privacy gains for Bob. Similarly, Alice
sors set up in public places, she can then trace             also cannot obfuscate her exposure through Bob
Alice and Bob’s geographic movements. That kind              from Grace, because any extra tokens she sends to
of location trace is often sufficient to deanonymize         Grace will not change the fact that she has Bob’s
personal identities [23]. Alternatively, the same is         token as one of her contacts.
true if Bob publishes his contact tokens to Grace               The root of the problem is that Grace has access
and Alice queries Grace with her own tokens. Thus,           to the universe of all tokens through user queries,
there is not perfect privacy from the authorities, but       and so can simply filter out all of the random tokens
still better than in the original TraceTogether sys-         generated. Thus, random noise is ineffective for
tem, at the cost of potentially lower privacy for Bob        hiding information from Grace.
in the malicious model.
                                                             4.3   Enhancing Anonymity by Mixing
4.2   Ineffectiveness of Adding Spurious                           Different Users’ Tokens
      Tokens for Further Anonymization
                                                             Although introducing spurious random tokens into
To further anonymize the polling-based system                the system achieves little in terms of privacy, as
to increase privacy from authorities, there are a            discussed in the previous subsection, a slight mod-
number of techniques that can be used to hide Al-            ification of this idea leads to meaningful privacy
ice and Bob’s identities. Let’s begin with a sim-            guarantees. The issue is that Grace has access to
ple approach—that doesn’t actually work—to give              the entire universe of tokens, as well as both of
some intuition before moving on to more effective            the sets of tokens corresponding to Alice and Bob,
approaches. Consider injecting random noise by               possibly augmented with random noise. Instead
augmenting the data with artificial tokens. When-            of hiding true tokens with random noise, suppose
ever Alice and Bob send information to Grace (ei-            the system includes a set of M honest-but-curious
ther in the form of a diagnosis report or a query),          non-colluding “mixing” servers not controlled by
they can augment their tokens with random ones.              Grace that aggregate data before forwarding it on
Note that some care has to be taken in deciding              to Grace.
which distribution to draw the random tokens from.               When Bob is diagnosed with COVID-19, he par-
Not only should the system keep the probability of           titions the tokens he wishes to send (depending on
spurious matches low, but the distributions should           the setup of the system, either his own tokens, or
also be designed to make inferences by Grace diffi-          those of his contacts) into M groups, and sends
cult.                                                        each group to one of the mixing servers. The mix-
   For example, assume that Alice and Bob sam-               ing servers then combine Bob’s data with that of

                                                         5
other users diagnosed with COVID-19 before for-            countered Bob’s token. If the token she exchanged
warding it onto Grace. Similarly, Alice does the           with Bob is present in the database, she gets a hint
same thing for querying, except she also needs to          as to the disease status of one of the individuals she
wait on a response from the mixing server for each         was in contact with during the token exchange.
of the tokens she sends. The linkage problem then
becomes much more difficult for Grace, because             5   Privacy from Authorities based on
the valid tokens for individuals have been split up.           Private Messaging Systems
Similarly, each mixing server only has access to
a subset of the tokens corresponding to each indi-         None of the easy-to-implement augmentation ideas
vidual, making the linkage analysis more difficult         given in Section 4 guarantee full privacy from the
for them. Of course, if the mixing servers collude,        authorities. At a cost of more computation, how-
then the privacy reduces to that of the standard           ever, we believe that a solution for secure contact
polling-based approach.                                    tracing can be built using modern cryptographic
   Note that this approach can also be simulated           protocols. In particular, private messaging systems
without the mixing servers by either Alice or Bob          [27; 28; 29] and private set intersection (cardinal-
if they have access to a large number of distinct          ity) [30; 31; 32; 33] protocols seem especially rel-
IP addresses. They can simply send their queries           evant. The sketch we provide below is based on
and tokens with some time delay from the different         private messaging systems, though we do not claim
IP addresses, preventing Grace from linking all of         this to be an optimal implementation.
them together. However, this approach may not be              We will give the intuition here before going into
feasible for most users.                                   technical details necessary for an effective imple-
                                                           mentation. First, we replace the random tokens
4.4   Public Database of Infected Users’                   (at , bt ) exchanged by Alice and Bob with random
      Tokens is Efficient but Less Private                 public keys (pkA        B
                                                                             t , pkt ) from asymmetric encryp-
                                                           tion schemes [34]. The matching secret keys are
Alternatively, Grace can simply publish the entire         stored locally on each of Alice’s and Bob’s phones.
database of tokens she receives from infected in-          Then, imagine that Grace has established a collec-
dividuals, including the ones from Bob. If Alice           tion of mailboxes, one for each public key that Al-
simply downloads the entire database, and locally          ice and Bob exchange. Additionally, we introduce
queries against it, then no information about Al-          Frank and Fred. Frank forwards messages to/from
ice’s identity is leaked to Grace.                         Fred. Fred forwards messages to/from Grace. They
   This approach may seem less computationally             do not tell each other the source of the messages.
feasible, especially on mobile devices. In circum-         At fixed time points after Bob’s contact with Al-
stances where the total number of people infected is       ice (up to some number of days), Bob addresses a
not very high, this approach works, as evidenced by        message to Alice encrypted using the public key
the South Korean model [6], though the approach            Alice gave Bob. Bob gives the message to Frank,
may fail as the epidemic reaches a peak. However,          who then forwards it on to Grace (through Fred),
the computational and transmission cost can be             who puts it in Alice’s mailbox. The content of the
partially ameliorated by batching together Grace’s         message is Bob’s current infection status, and the
database, so that Alice is not downloading the en-         reason he sends messages at fixed time points is
tire thing. For example, in the version where Bob          to prevent Frank from figuring out Bob’s infection
sends his own tokens B to Grace, Alice can down-           status from the fact that he is sending messages.
load batches corresponding to her contact tokens Â.       Alice checks all of the mailboxes corresponding to
If each batch has e.g. 50 tokens, then Grace does          her last several days worth of broadcasted public
not know which of those 50 tokens Alice came into          keys. In one of the mailboxes, she then receives
contact with.                                              and decrypts Bob’s message, and learns whether
   Unfortunately, it is worth noting that this ap-         she has been exposed to the virus. Grace cannot de-
proach decreases Bob’s privacy from Alice, be-             crypt the message Bob sends to Alice because it is
cause Alice knows when she encountered the token           protected by asymmetric encryption. Furthermore,
Bob sent; she can then limit the number of possible        to protect Alice’s privacy, she can also access her
individuals who could have sent the token based on         mailboxes through Frank and Fred, who deliver
who she was in contact with during the time she en-        the messages in Alice’s mailboxes to her without

                                                       6
At Contact                                             Periodically After Contact

                                                                          Proxy Servers                     Grace
                                                                         (Frank and Fred)
   Alice                       Alice
                                                                           Server 1
                           Alice retrieves and decrypts
                            messages in mailbox
           Bluetooth                                                       Server 2

                                                “I am (not) infected.”

                                                                           Server
    Bob                         Bob

      Alice and Bob       Bob sends encrypted infection            Proxy servers obfuscate      Grace maintains mailboxes, but
   exchange public keys     status to Alice’s mailbox              mailbox access patterns   cannot tell Bob sent a message to Alice

Figure 1: Overview of contact tracing based on private messaging systems. When Alice and Bob are near each
other they exchange public keys as tokens. They then periodically encrypt (using each other’s public key, followed
by the public keys of the proxy servers) a message indicating their infection status, and send it to the proxy server.
They also periodically query the proxy server for messages posted to the mailboxes corresponding to their public
keys to find out whether they have been exposed to the virus.

revealing which mailboxes she owns.                                      a more sophisticated use of mixing servers than de-
   Contact tracing can be viewed as a problem of                         scribed in Section 4.3 for the polling based solution.
secure communication between pairs of users who                          When Bob wishes to send his encrypted message
came into contact in the physical world. The com-                        to Alice, he first encrypts it multiple times with
munication patterns of who is sending messages                           public keys corresponding to each of the servers
to whom can reveal each individuals contact his-                         in the mix network. Because the messages are en-
tory to the service provider (Grace). This notion                        crypted in multiple layers, and each server peels
is known as metadata privacy leakage in computer                         only the outermost layer, the final destination (Al-
security [35], where the metadata associated with a                      ice’s mailbox) is revealed only to the last server,
message (e.g. sender/recipient and time) is con-                         and only Alice can read the content of the mes-
sidered sensitive, in addition to the actual mes-                        sage (i.e. infection status). To prevent Grace from
sage contents. In the contact tracing case, such                         learning the identity associated with each mailbox,
metadata could reveal who has been in contact                            Alice can also access her mailboxes through the
with whom, potentially revealing the users’ sen-                         mix network, which shuffles the traffic to decouple
sitive activities. We believe that recent technical                      the mailboxes from their owners. As long as one
advances [36; 27; 29] for designing scalable private                     of the servers is neither breached nor controlled by
messaging systems with metadata privacy present                          the adversary, the final message cannot be linked
a promising path for developing a similar platform                       to a specific sender even if the adversary has full
for secure contact tracing.                                              control of the rest of the network. Such a system
   Following recent works, our idea is to leverage                       for private communication could allow the users
a ‘mix network [37], which is a routing protocol                         (Bob) to share their infection status with their re-
that uses a chain of proxy servers (Frank/Fred)                          cent contacts (Alice) while hiding the metadata of
that individually shuffle the incoming messages                          their contact patterns from the service providers.
before passing them onto the next server, thereby                        The involvement of non-government entities, such
decoupling the sender of each message from its                           as an academic institution or a hospital, in the mix
destination—these types of mix networks are per-                         network may help increase users trust in the system
haps most well-known for being the basis of the                          and lower the bar for adoption.
Onion Router/Tor anonymity network [38]. This is                            There are several remaining issues that will

                                                                     7
need to be addressed for this system to be widely             strong privacy guarantees would likely encourage
adopted. First, if time-varying IDs are used, then            voluntary adoption. Any app needs to clearly ex-
the user receiving a token from a nearby person               plain privacy guarantees in ways understandable
could infer the identity of the sender based on their         by the average user, which was our motivation in
travel history; i.e. Alice might be able to infer who         describing here the different types of privacy (from
Bob is based on the time they exchanged the tokens,           snoopers, contacts, and the authorities) that the app
as described in Section 4.4 in the case where the             should be able to provide to users in order to earn
database is made public. This loss of privacy from            their trust.
contacts can be partially alleviated by choosing a               On that note, we believe it is imperative for any
less frequent token refresh, so that with high like-          app to be open source and audited by both secu-
lihood, Alice cannot completely identify Bob by               rity professionals and privacy advocates. This is
the time interval. Actual implementations much de-            not yet true for TraceTogether, but the app’s cre-
cide on the right tradeoffs between Alice and Bob’s           ators do claim that they will release the source code
privacy from eachother and authorities, as well as            soon [41]. Furthermore, open sourcing allows dif-
contact tracing effectiveness. Another possible way           ferent countries to customize such apps for their
to mitigate this problem would be to aggregate the            particular use cases and cultural preferences.
messages for Alice on the server before making                   Also, while in some countries it may be difficult
the results available to her. The messages are en-            to enforce a government mandate that all residents
crypted under different public keys, but it may be            install an app, it is possible to have this as a require-
possible to use multi-key homomorphic encryption              ment for entering certain public places. Such a prac-
schemes [39; 40] which allow computation over                 tice has precedence in so-called implied consent
ciphertexts encrypted with different public keys to           laws, such as agreeing to field sobriety tests when
sum up the count of ‘infected’ messages. We defer             getting a driver’s license [42]. One could imagine
the details of approach to future work.                       grocery stores, schools, and universities requiring
   One other issue is that the volume of messages             installing a contact tracing app as a precondition
delivered to each user may reveal how socially ac-            for entrance. This does not stop users from unin-
tive each user has been, which could be considered            stalling or turning off the app off-premises, but it
sensitive by some users. Approaches to flatten the            would at least be useful in getting people over the
distribution with dummy messages could allevi-                initial activation barrier of installation.
ate this concern. Flattening the distribution with               Finally, some amount of social pressure may also
dummy messages may however lead to scalability                assist in reaching widespread adoption. Contact
challenges for existing private messaging systems.            tracing apps, by design, know how many other
Though many techniques [36; 27; 29] have been                 people close by have the app installed. An app
proposed to address this challenge, further discus-           could display that number. Given this knowledge,
sion among the stakeholders is needed to determine            a user may be incentivized to attempt to persuade
the suitable trade-off between the level of latency           others nearby to install the app, in the interest of
that can be tolerated and the level of privacy guar-          public health.
antees desired by the users. Ultimately, though,
private messaging systems enable provable privacy             7   Discussion
from the authorities while still maintaining the use-
fulness of contact tracing.                                   In this document, we discuss ways to build an app
                                                              for contact tracing, based upon the premise that
6   Strategies for Encouraging                                phones can broadcast tokens to all nearby phones.
    Widespread Adoption                                       Notably, we do not address the engineering behind
                                                              applying Bluetooth to enable such a feature. Nor
Contact tracing apps depend on the network effect             do we address the possibility of location data col-
and critical mass to work. Having the app go ‘vi-             lection for assisting epidemiologists in forecasting
ral’ requires that people trust the app enough to             disease spread [43]. We also do not discuss ap-
install it and are enthusiastic enough to convince            propriate selection of token refresh interval and
their friends to do the same. After all, app adop-            frequency at which phones should poll for nearby
tion must have a higher ‘transmission rate’ than the          ones, which are important factors for balancing
virus itself in order for it to be effective. Providing       privacy and efficiency—stale IDs have been seen

                                                          8
to permit linkage attacks in other similar contexts           works. This is more computationally expensive,
[44]. Lastly, we also do not build a full model for           but would assure users that they do not have to give
privacy of contact tracing, which is a delicate and           up their privacy in order to take part in public con-
easy-to-get-wrong task that requires much more                tact tracing efforts. Indeed, the chief selling point
careful research. Instead, we focus only on the               would be that they would get additional informa-
privacy implications of a dedicated contact tracing           tion on their exposure without needing to trust any
app, in the hopes that providing sufficiently strong          individual third party with their private location or
privacy guarantees would assist an app in gaining             medical information. We believe that such a guar-
the critical mass needed to be effective.                     antee would go a long way towards mass adoption
   Note that here we only discuss direct contact trac-        of a contact tracing app in the United States.
ing using Bluetooth proximity networks, without                  Future work remains to actually build such an
using any location data. Some indirect proposals              app, of course, and additional engineering, security,
for contact tracing instead simply securely log the           and policy considerations are sure to arise. For ex-
user’s location history, which is then given to the au-       ample, scalability of the data structures used in the
thorities if a user is diagnosed with COVID-19 [45].          servers may become a major issue when the num-
This approach has the benefit of not requiring net-           ber of infected individuals rises. One additional
work effects, because single individuals can track            concern which we have not addressed is that of
their locations without needing their contacts to             nefarious actors seeking to spread panic by falsely
have the app. The approach of logging location                claiming to be infected. This could be prevented by
history is inherently less private than direct con-           allowing only hospital workers to trigger the broad-
tact tracing, but that may possibly be resolved with          cast of infection status, as in Singapore’s system,
appropriate safeguards and redactions [45]. Fur-              where the Ministry of Health directly contacts those
thermore, hybrid approaches involving both GPS                exposed, though that of course trades away some
data and Bluetooth proximity networks may prove               of the privacy of diagnosed patients. Alternately,
to be useful to public health officials in modelling          others have proposed cryptographic verification of
disease spread beyond just contact tracing [46].              contact events, which could perhaps be extended
   We first discussed how, with just minor mod-               to infection event broadcast without giving direct
ifications, a polling-based direct contact tracing            access of tokens to the authorities [47]. However,
solution allows for some anonymity from authori-              given that some cities are already rationing testing
ties, which is lacking in the Singaporean Ministry            kits and doctors’ visits to only the most serious
of Health app TraceTogether. We believe that this             cases [48; 49], restricting self-reporting might re-
may help an app succeed in countries such as the              sult in many instances of virus spread to be missed.
U.S., where many citizens are loath to give too               Alternately, the system can also be designed to sep-
much data to the government.                                  arate self-reports from confirmed reports by simply
                                                              keeping two databases.
   Even the polling-based solution still reveals quite
                                                                 Our goal in writing this document is to start a
a bit of information to the authorities, who could
                                                              conversation on (1) what kinds of privacy trade-offs
make use of linkage analysis to track individual
                                                              people are willing to endure for the sake of public
users. However, utilizing additional mixing servers
                                                              health, and (2) the fact that with sufficient computa-
is relatively practical and does provide additional
                                                              tional resources and use of cryptographic protocols,
protection. Alternately, a system can follow the
                                                              app-based contact tracing can be accomplished
South Korean model of openly publishing data
                                                              without completely sacrificing privacy. Because
about patients diagnosed with COVID-19, trading
                                                              bad early design choices can persist long after roll-
off some of their privacy to enhance the privacy
                                                              out, we hope that developers and policy-makers
of individuals who are trying to determine if they
                                                              will give privacy considerations careful thought
have been exposed.
                                                              when designing new contact tracing apps.
   However, if we are willing to invest in additional
computational resources, it is possible to achieve            Acknowledgment
increased privacy from snoopers, contacts, and the
authorities, and we propose the beginnings of one             We would like to thank David Rolnick, Adam Seal-
approach using private messaging systems, which               fon, Noah Daniels, and Michael Wirth for helpful
we hope will be further expanded upon in future               comments.

                                                          9
References                                                  [12] A. J. Jacobs, “Is state power to protect health com-
                                                                 patible with substantive due process rights,” Annals
[1] “Novel Coronavirus Map from HealthMap,”                      Health L., vol. 20, p. 113, 2011.
    March 2020. [Online]. Available:  https:
    //www.healthmap.org/covid-19/                            [13] R. Prez-Pea, “Virus Hits Europe Harder Than
                                                                 China. Is That the Price of an Open Society?
[2] K. T. Eames and M. J. Keeling, “Contact tracing and          ,” New York Times, March 2020. [Online]. Avail-
    disease control,” Proceedings of the Royal Society of        able: https://www.nytimes.com/2020/03/19/world/
    London. Series B: Biological Sciences, vol. 270, no.         europe/europe-china-coronavirus.html
    1533, pp. 2565–2571, 2003.
                                                             [14] “Help speed up contact tracing with TraceTo-
[3] D. Normile, “Coronavirus cases have dropped                  gether,” Singapore Government Blog, March 2020.
    sharply in South Korea. Whats the secret to its              [Online]. Available: https://www.gov.sg/article/
    success?” https://www.sciencemag.org/news/2020/              help-speed-up-contact-tracing-with-tracetogether
    03/coronavirus-cases-have-dropped-sharply-south-
    korea-whats-secret-its-success, 2020, accessed:          [15] T. TraceTogether, “Can I say no to uploading my
    2020-03-23.                                                  TraceTogether data when contacted by the Ministry
                                                                 of Health?”       https://tracetogether.zendesk.com/
[4] B. Chappell, “Coronavirus: Sacramento County                 hc/en-sg/articles/360044860414-Can-I-say-no-
    Gives Up On Automatic 14-Day Quarantines,”                   to-uploading-my-TraceTogether-data-when-
    https://www.npr.org/sections/health-shots/2020/              contacted-by-the-Ministry-of-Health-,          2020,
    03/10/813990993/coronavirus-sacramento-county-               accessed: 2020-03-23.
    gives-up-on-automatic-14-day-quarantines, 2020,
    accessed: 2020-03-23.                                    [16] C. E. Shannon, “Communication theory of secrecy
                                                                 systems,” Bell system technical journal, vol. 28,
[5] J. Tidy, “Coronavirus: Israel enables emergency              no. 4, pp. 656–715, 1949.
    spy powers,” BBC News, March 2020. [Online].
    Available: https://www.bbc.com/news/technology-          [17] L. Sweeney, “k-anonymity: A model for protect-
    51930681                                                     ing privacy,” International Journal of Uncertainty,
                                                                 Fuzziness and Knowledge-Based Systems, vol. 10,
[6] M. J. Kim and S. Denyer, “A travel log                       no. 05, pp. 557–570, 2002.
    of the times in South Korea:              Mapping
    the movements of coronavirus carriers ,”                 [18] C. Dwork, F. McSherry, K. Nissim, and A. Smith,
    The Washington Post, March 2020. [Online].                  “Calibrating noise to sensitivity in private data
    Available:        https://www.washingtonpost.com/            analysis,” in Theory of cryptography conference.
    world/asia pacific/coronavirus-south-korea-                  Springer, 2006, pp. 265–284.
    tracking-apps/2020/03/13/2bed568e-5fac-11ea-
    ac50-18701e14e06d story.html                             [19] R. Raskar, I. Schunemann, R. Barbar, K. Vil-
                                                                 cans, J. Gray, P. Vepakomma, S. Kapa, A. Nuzzo,
[7] C. J. Wang, C. Y. Ng, and R. H. Brook, “Response             R. Gupta, A. Berke et al., “Apps gone rogue: Main-
    to COVID-19 in Taiwan: Big Data Analytics, New               taining personal privacy in an epidemic,” arXiv
    Technology, and Proactive Testing,” JAMA, 2020.              preprint arXiv:2003.08567, 2020.
[8] Y. Lee, “Taiwan’s new ’electronic fence’                 [20] C. Dwork, A. Roth et al., “The algorithmic foun-
    for quarantines leads wave of virus mon-                     dations of differential privacy,” Foundations and
    itoring,”     March    2020.    [Online].     Avail-         Trends® in Theoretical Computer Science, vol. 9, no.
    able:     https://www.reuters.com/article/us-health-         3–4, pp. 211–407, 2014.
    coronavirus-taiwan-surveillanc-idUSKBN2170SK
                                                             [21] O. Goldreich, S. Micali, and A. Wigderson, “How
[9] “HIPAA Privacy Rule,” December 2000. [On-                    to solve any protocol problem,” in Proc. of STOC,
    line]. Available: https://www.hhs.gov/hipaa/for-             1987.
    professionals/privacy/index.html
                                                             [22] M. M. Merener, “Theoretical results on de-
[10] C. J. Roberts,        “Carpenter v. United                  anonymization via linkage attacks,” Transactions on
    States,”   Supreme Court of the United                       Data Privacy, vol. 5, no. 2, pp. 377–402, 2012.
    States, no. 16-402, 2018. [Online]. Avail-
    able:     https://www.supremecourt.gov/opinions/         [23] M. Srivatsa and M. Hicks, “Deanonymizing mobil-
    17pdf/16-402 h315.pdf                                        ity traces: Using social network as a side-channel,”
                                                                 in Proceedings of the 2012 ACM conference on Com-
[11] “Notification of Enforcement Discretion for                 puter and communications security, 2012, pp. 628–
    telehealth remote communications during the                  637.
    COVID-19 nationwide public health emer-
    gency,” March 2020. [Online]. Available: https:          [24] E. T. Barometer, “January 20, 2019,” 2019. [On-
    //www.hhs.gov/hipaa/for-professionals/special-               line]. Available: https://www.edelman.com/sites/g/
    topics/emergency-preparedness/notification-                  files/aatuss191/files/2019-02/2019 Edelman Trust
    enforcement-discretion-telehealth/index.html                 Barometer Global Report 2.pdf

                                                            10
[25] T. TraceTogether, “How does TraceTogether                  [37] D. L. Chaum, “Untraceable electronic mail, re-
    work?” https://tracetogether.zendesk.com/hc/en-sg/              turn addresses, and digital pseudonyms,” Communi-
    articles/360043543473-How-does-TraceTogether-                   cations of the ACM, vol. 24, no. 2, pp. 84–90, 1981.
    work-, 2020, accessed: 2020-03-23.
                                                                [38] M. G. Reed, P. F. Syverson, and D. M. Gold-
[26] “How COVID-19 spreads,” Centers for Disease                    schlag, “Anonymous connections and onion rout-
    Control and Prevention, March 2020. [Online].                   ing,” IEEE Journal on Selected areas in Communi-
    Available: https://www.cdc.gov/coronavirus/2019-                cations, vol. 16, no. 4, pp. 482–494, 1998.
    ncov/prepare/transmission.html
                                                                [39] A. López-Alt, E. Tromer, and V. Vaikuntanathan,
[27] J. Van Den Hooff, D. Lazar, M. Zaharia, and N. Zel-           “On-the-fly multiparty computation on the cloud via
    dovich, “Vuvuzela: Scalable private messaging re-               multikey fully homomorphic encryption,” in Pro-
    sistant to traffic analysis,” in Proceedings of the 25th        ceedings of the forty-fourth annual ACM symposium
    Symposium on Operating Systems Principles, 2015,                on Theory of computing, 2012, pp. 1219–1234.
    pp. 137–152.
                                                                [40] H. Chen, W. Dai, M. Kim, and Y. Song, “Efficient
[28] N. Tyagi, Y. Gilad, D. Leung, M. Zaharia, and                  multi-key homomorphic encryption with packed ci-
    N. Zeldovich, “Stadium: A distributed metadata-                 phertexts with application to oblivious neural net-
    private messaging system,” in Proceedings of the                work inference,” in Proceedings of the 2019 ACM
    26th Symposium on Operating Systems Principles,                 SIGSAC Conference on Computer and Communica-
    2017, pp. 423–440.                                              tions Security, 2019, pp. 395–412.
[29] H. Corrigan-Gibbs, D. Boneh, and D. Mazières,             [41] J. Zhang, “620,000 people installed TraceTogether
   “Riposte: An anonymous messaging system han-                     in 3 days, Spores open source contact tracing
    dling millions of users,” in 2015 IEEE Symposium                app,” Mothership, March 2020. [Online]. Avail-
    on Security and Privacy. IEEE, 2015, pp. 321–338.               able: https://mothership.sg/2020/03/tracetogether-
                                                                    installed-open-source/
[30] M. J. Freedman, K. Nissim, and B. Pinkas, “Effi-
    cient private matching and set intersection,” in Inter-
                                                                [42] A. C. Wagenaar, T. S. Zobeck, G. D. Williams,
    national conference on the theory and applications
                                                                    and R. Hingson, “Methods used in studies of drink-
    of cryptographic techniques. Springer, 2004, pp.
                                                                    drive control efforts: a meta-analysis of the litera-
    1–19.
                                                                    ture from 1960 to 1991,” Accident Analysis & Pre-
[31] L. Kissner and D. Song, “Privacy-preserving set                vention, vol. 27, no. 3, pp. 307–316, 1995.
    operations,” in Annual International Cryptology
    Conference. Springer, 2005, pp. 241–257.                    [43] S. Pei, S. Kandula, W. Yang, and J. Shaman,
                                                                   “Forecasting the spatial transmission of influenza
[32] E. De Cristofaro and G. Tsudik, “Practical private             in the United States,” Proceedings of the National
    set intersection protocols with linear complexity,” in          Academy of Sciences, vol. 115, no. 11, pp. 2752–
    International Conference on Financial Cryptogra-                2757, 2018.
    phy and Data Security. Springer, 2010, pp. 143–
    159.                                                        [44] S. E. Sarma, S. A. Weis, and D. W. Engels,
                                                                   “RFID systems and security and privacy implica-
[33] E. De Cristofaro, P. Gasti, and G. Tsudik, “Fast and           tions,” in International Workshop on Cryptographic
    private computation of cardinality of set intersection          Hardware and Embedded Systems. Springer, 2002,
    and union,” in International Conference on Cryptol-             pp. 454–469.
    ogy and Network Security. Springer, 2012, pp. 218–
    231.                                                        [45] “Private Kit: Safe Paths- Can we slow the spread
                                                                    without giving up individual privacy?” http://
[34] G. J. Simmons, “Symmetric and asymmetric en-                   safepaths.mit.edu/, 2020, accessed: 2020-03-23.
    cryption,” ACM Computing Surveys (CSUR), vol. 11,
    no. 4, pp. 305–330, 1979.                                   [46] “COVID Watch,” https://covid-watch.org/, 2020.

[35] B. Greschbach, G. Kreitz, and S. Buchegger, “The           [47] J. Petrie, “Cryptographically Secure Contact Trac-
    devil is in the metadatanew privacy challenges in de-           ing,” March 2020.
    centralised online social networks,” in 2012 IEEE In-
    ternational Conference on Pervasive Computing and           [48] J. Dolan and B. Mejia, “L.A. County gives
    Communications Workshops. IEEE, 2012, pp. 333–                  up on containing coronavirus, tells doctors
    339.                                                            to skip testing of some patients,” Los Ange-
                                                                    les Times, March 2020. [Online]. Available:
[36] A. Kwon, D. Lu, and S. Devadas, “{XRD}: Scal-                  https://www.latimes.com/california/story/2020-03-
    able Messaging System with Cryptographic Pri-                   20/coronavirus-county-doctors-containment-testing
    vacy,” in 17th {USENIX} Symposium on Networked
    Systems Design and Implementation ({NSDI} 20),              [49] C. Y. Johnson and L. H. Sun, “Health officials in
    2020, pp. 759–776.                                              New York, California restrict coronavirus testing to

                                                               11
health care workers and people who are hospital-
ized,” The Philadelphia Inquirer, March 2020. [On-
line]. Available: https://www.inquirer.com/health/
coronavirus/coronavirus-testing-20200321.html

                                                     12
You can also read