Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014

Page created by Florence Duncan
 
CONTINUE READING
Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
Leveraging anonymized telecom data to fight the
Ebola outbreak
White Paper – Proposed Approach
November 2014

                                                  1
Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
Executive Summary

Over the past weeks, Real Impact Analytics (RIA, a big data analytics provider in the telecom industry) and
Airtel Sierra Leone (Airtel SL, a mobile network operator) have been discussing the opportunity to use
telecom data to fight Ebola in Sierra Leone. Building on numerous academic publications and use cases
promoted by United Nations’ Global Pulse (the data for good arm of the UN) that demonstrate the value
at hand, RIA and Airtel SL would like to invite several partners to join forces and support this initiative.

The goal of this document is to structure the approach by which the data would be made available and
leveraged to allow health actors on the field (e.g. WHO, UNMEER, Red Cross) to improve the outcomes
and impact of their activities. The aim is to demonstrate impact with one mobile network operator in
one country before potentially expanding the approach to other operators and countries.

Why should telecom data be used to fight Ebola?

Our core observation is that collecting dynamic and updated data is one of the biggest challenges for
health actors, particularly in the case of an epidemic such as Ebola. There are very limited existing sources
(e.g. census, household surveys), they are often outdated and they are seldom dynamic. Moreover, when
a new cluster of Ebola cases appears outside of existing focus areas, health actors cannot trace the
potential spread patterns because they do not have any data that show population mobility or social
interactions. However, such data exists and is collected systematically by mobile network operators. It is
possible to aggregate raw data (e.g. signaling, call detail records (CDRs), recharge records) into mobility
maps that can drive field actions with health actors, while staying fully anonymized

Who are the key stakeholders that need to collaborate?

There are four essential components to this collaboration:

       Mobile Network Operators (MNO) that provide data,
       Health actors that use the analyses made on the data,
       Analytics experts who make the data valuable for end users
       Regulators that approve the use of the data for a specific scope
        and under a clearly defined approach.

So far, the situation is as follows:
     Airtel SL has accepted to lead the initiative from the MNO perspective,
     UNMEER has expressed interest in using the data and further enriching
         them with other collected data,
     Airtel SL has highlighted that two regulatory approvals were required: an approval from the
         National Ebola Response Center (NERC) and one from the national telecom regulator (NATCOM),
     Several other parties will be involved to run and facilitate the analytics applied to the data.
         Specifically, UN Global Pulse and RIA will lead this part of the effort under the global framework
         of UNMEER. Additional partners may be involved when deemed relevant by the current
         stakeholders.

                                                                                                           2
Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
What approach are we recommending?

Once all the parties and approvals are gathered for this exercise, we will initiate a 6-week project that will
cover the entire value chain from data-to-action:

         Step 1: Collect the data: Airtel SL will provide access to CDRs and recharge data with as much
          history as possible. UNMEER will provide its existing data records,
         Step 2: Anonymize and transform the data: RIA and Global Pulse will apply a set of algorithms
          to ensure subscriber privacy while extracting valuable insights from the data,
         Step 3: Make the output analyses accessible and actionable for health actors: RIA & Global Pulse
          will work closely with UNMEER to link the analyses to the health actors’ day-to-day needs. We
          will combine the use of dashboards with email notifications.

The approach documented below follows the strictest confidentiality, security and privacy standards.
It is impossible to target individuals based on the anonymized and aggregated data that will be
published. The only aim is to aggregate individual data to support a public good initiative, the fight
against Ebola.

What are the next steps?

In the short term, we need to:

         Receive written expressions of interest and involvement in the project from the following
          actors: UNMEER, Global Pulse and any other relevant stakeholder (e.g., GSMA, WHO),
         Request regulatory approval from NERC and NATCOM.

Once these requirements are met, we will agree on a project plan with a division of roles and
responsibilities. As a result, this document will be edited to reflect the final scope and approach across
participants.

In the longer term, we will expand the initiative to other countries and mobile network operators if it
proves valuable to UNMEER in Sierra Leone.

                                                                                                            3
Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
Context and Objectives

In March 2014, the Regional office of the World Health Organization (WHO) reported an outbreak of the
Ebola disease in Guinea. The United Nations Mission for Ebola Emergency Response (UNMEER), the
Economic Community of West African States (ECOWAS) and many other stakeholders have been setting
up various task forces and coordination plans to assess the situation and identify possible avenues to
contain and reduce the Ebola outbreak. Immense efforts have been made to fight the epidemic amid
geographies lacking the most basic infrastructure in health, information management and disaster
response.

Real Impact Analytics (RIA, a telecom big data vendor) and Airtel Sierra Leone (Airtel SL, a mobile network
operator) have been discussing how they could support aid and health actors. Especially, around providing
data and insights that would fundamentally improve the efficiency of field teams.

Building on numerous academic publications and use cases promoted by UN’s Global Pulse (the data for
good arm of the United Nations) that demonstrate the value at hand, RIA and Airtel SL wish to use telecom
data to support the fight to contain and eradicate Ebola cases.

To succeed in this effort, several partners and regulatory approvals need to be gathered. The goal of this
document is to structure the approach and framework by which the data would be used and made
available / actionable for health actors on the field (e.g. WHO, UNMEER, Red Cross). It is intended for all
the stakeholders and divided into 4 sections:

    -   Section 1: Why should telecom data be used to fight Ebola?
    -   Section 2: Who are the key stakeholders that need to collaborate?
    -   Section 3: What approach are we recommending?
    -   Section 4: What are the next steps?

Ultimately, the aim is to demonstrate impact with one mobile network operator in one country before
potentially expanding the approach to other operators and countries.

                                                                                                         4
Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
Section 1: Why should telecom data be used to fight Ebola?

Our core assumption is that collecting useful and actionable data is one of the biggest challenges for health
actors, particularly in the case of an epidemic such as Ebola. Indeed, there are very limited existing sources
(e.g. census, household surveys), they are often outdated and they are seldom dynamic.

When a new cluster of Ebola cases appears outside of existing focus areas, health actors do not have the
visibility on the potential spread because they do not have any data that show population mobility.
However, such data do exist and are collected systematically by mobile network operators. Indeed,
through signaling, call detail records (CDRs) or recharge records, it is possible to aggregate raw data into
mobility maps that can directly drive field action by health actors, while staying fully anonymized.

In the case of the 2010 Haiti earthquake, Flowminder.org has shown how mobility maps allow better
planning of the humanitarian response to a natural disaster.

With regard to Ebola, understanding the trends in mobility in affected countries is key, as individuals’ and
groups’ movements within and across countries significantly contribute to the spread of the disease. Even
if more recent census data on human movements in the affected countries were available, such approach
would not be accurate enough as it relies on historical static data sets and data outside urban areas are
often of poor quality. A much more precise and dynamic approach could be developed based on the Call
Detailed Records (CDRs) collected and stored by telephone operators on a daily basis. A vast body of
previous research showed that variables based on CDR data are reliable proxies for population mobility
modeling (e.g., Lenormand et al, 2014; Tizzoni et al, 2013; Wesolowski et al, 2014). This would lead to
forward-looking approaches and tools rather than assessing the past or current situation.

Many people have repeatedly called for making the CDR data from Ebola affected countries publicly
accessible to cutting-edge scientific teams, e.g., Halloran et al, 2014; Wesolowski et al, 2014. However,
privacy concerns echoed by regulators have prevented such data to be accessible to the scientific
community. It is hence important to realize that strict rules ensuring privacy are compatible with the
valuable insights brought by the analyses of the CDRs. Anonymized and aggregated data that do not
threaten anyone’s privacy are sufficient for providing powerful insights for prediction of further spreading
of the disease. Similarly, the Ebola Response Roadmap from the WHO calls for specialized data analysis
and also puts strong emphasis on the importance of contact tracing. In situations where precise contact
tracing is impossible, mobility models can be a good approximation for observing the spread of the
epidemic. The Ebola Mobile Response blueprint of the GSMA puts forward the potential value in mobile
phone usage within the Ebola affected countries. GSMA even antecedes possible usage of CDRs by
publishing the Guidelines for privacy protection when using the mobile phone data in response to the
Ebola outbreak.

CDRs from the affected countries are essential to more accurately estimate the current mobility patterns
taking place in these countries. Such an approach can be fully anonymized and aggregated at the level of
the site/city (hence protecting the data privacy of an individual), without losing or weakening any insights
in terms of public health and policy. It hence does not offer the possibility to trace back any specific
movement of an individual within the covered geographical area. Rather, a network of places and links
between them weighted by the frequency of movement between the two places should be built. This
network would facilitate maps of mobility hubs and popular links within countries to be built, allowing

                                                                                                            5
Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
estimation of the most probable routes for further spreading of the disease and thus, the most probable
places where new infections will occur.

Gaining insights would also require overlaying the CDR-based mobility model with an epidemics spreading
model (see the Epidemiological model section below) to identify and prioritize locations according to their
probability of being affected with Ebola. This can help in directing attention to areas to which
supplementary resources should be allocated and where careful preventive measures should be taken in
order to limit further spreading of the virus.

Our objectives are to leverage telecom data to support health actors (lead by UNMEER):

       Set-up and secure a system (hardware and software) for collecting telecom and public health
        data,
       Develop mobility matrices and epidemiological models to simulate the potential
        spread paths of the disease,
       Build a platform/tools allowing identification of a list of priority locations based on mobility
        patterns and epidemiological models.

In a first phase, we will focus on using Airtel SL data and applying the techniques within Sierra Leone. If
the analyses prove useful, we will then launch a coordinated effort with UNMEER to expand the
application to the remaining focus countries, Liberia & Guinea, across all local mobile network operators.

                                                                                                           6
Section 2: Who are the key stakeholders that need to collaborate?
The Ebola outbreak is spreading over more than 5 countries impacting both public and private
organizations alike. Fighting Ebola therefore requires coordinating many partners, including e.g. local
regulators, public health authorities, political decision-makers and commercial private corporations.

To fight Ebola in Sierra Leone, we will need to coordinate a subset of such players to ensure and secure
impact. In concrete terms, this means that we need to align at least the following stakeholders on a plan
of action:

       Coordination and public authorities
           o UNMEER
           o Global Pulse
           o Local regulator – NATCOM
           o Government – NERC
       Operational players
           o Airtel Sierra Leone
           o Real Impact Analytics

More information on each actor and the proposed split of roles and responsibilities across the ecosystem
is provided in this section. Additional players might be added to the picture whenever deemed relevant
by the current stakeholders.

                                                                                                       7
UNMEER
The UN Mission for Ebola Emergency Response (UNMEER) is being set up in response to this
unprecedented outbreak. The Mission will be temporary and will respond to immediate needs related
to the fight against Ebola.

The objective of UNMEER is to harness the capabilities and competencies of all the relevant United
Nations actors under a unified operational structure to reinforce unity of purpose, effective ground-
level leadership and operational direction, in order to ensure a rapid, effective, efficient and coherent
response to the crisis. The singular strategic objective and purpose of the Mission will be to work with
others to stop the Ebola outbreak. To achieve this, the strategic priorities of the Mission will be to stop
the spread of the disease, treat the infected, ensure essential services, preserve stability and prevent
the spread to countries currently unaffected. It works closely with governments and national
structures in the affected countries, regional and international actors, such as the African Union (AU)
and the Economic Community of West African States (ECOWAS), and with Member States, the private
sector and civil society. It also coordinates with the WHO, which is responsible for overall health
strategy and advice within the Mission, while other UN agencies will act in their area of expertise under
the overall leadership and direction of a single Head of Mission. The Mission will leverage the existing
presence and expertise of UN country teams and international partners including NGOs on the ground
to minimize gaps and ensure leadership. UNMEER has also set up a platform to share documents
among all responders to the Ebola crisis. The site is a one-stop-shop for information on the response
of each actor, following their area of action. It brings contact lists, infographics, maps and other tools
to a centralized location, and is designed to allow a variety of actors to participate in the management
of documents.

At this stage, we need to make sure that our actions, tools and approach are part of and coordinated
with the decisions of UNMEER. This could translate into various roles of UNMEER regarding our
project, which are:

       Supporting Airtel SL & RIA to secure an approval from NERC,
       Building awareness of the availability of the output analyses,
       Guiding RIA and Global Pulse on the needs of the end users (e.g. WHO, UNICEF, MSF, Red
        Cross) and how we can make the analyses more relevant for field teams.

Global Pulse
One of the missions of Global Pulse is to accelerate discovery, development and scaled adoption of big
data innovation for sustainable development and humanitarian action. The set-up of Global Pulse was
established based on a recognition that digital data offers the opportunity to gain a better understanding
of changes in human well-being, and to get real-time feedback on how well policy responses are working.

                                                                                                              8
To this end, Global Pulse is working to promote awareness of the opportunities Big Data presents for relief
and development, forge public-private data sharing partnerships, generate high-impact analytical tools
and approaches through its network of Pulse Labs, and drive broad adoption of useful innovations across
the UN System. Global Pulse has been playing an instrumental role in the response to the outbreak of
Ebola in West Africa.

In the current set-up, the roles of Global Pulse could be along the following lines:

        Coordinate players to exchange best practices and insights,
        Running their analytical models on the collected telecom data,
        Involve additional parties that would bring targeted expertise,
        Promote a potential success case and play the role of setting up and promoting a shared Big
         Data analytics platform across different countries based on the telecom data.

Real Impact Analytics
Real Impact Analytics is a company active in Big Data focusing on telecom in emerging markets. It has
been investing significantly in developing end-to-end tools for telecom operators in terms of improving
and optimizing their sales and marketing approach in Africa, LATAM and Asia. It has offices in 5 countries
(Belgium, South Africa, United Arab Emirates, Luxembourg and Brazil) serving 5 of the top 10 global
telecom operators, counting more than 80 people and doubling its size every year. It has been completing
its commercial approach and products with a donor-based activity focusing on financial inclusion through
mobile finance and Micro Finance Institutions. Concretely, it has been heavily cooperating with the Bill &
Melinda Gates Foundation, the MasterCard Foundation, the World Bank and USAID on various
development projects. Finally, it has been focusing on innovation and investing in R&D through the set-
up of a laboratory gathering software developers and pure academic research teams. This has led to
multiple research projects such as a “food alert project” with Global Pulse leveraging telecom data (CDRs)
to prioritize geographical areas in terms of risk of food shortage.

RIA is involved in this project because it has existing legal and technical relations with the major telecom
operators active in the countries showing cases of Ebola. Given its expertise in managing telecom data
and its growing catalogue of “data for good” initiatives, RIA will apply its methodologies to ensure
relevance to the field actors.

The role of Real Impact Analytics would be to secure the technical aspects of the approach, covering:

        Infrastructure and technical processes (compliant with ethical and privacy constraints)
         ensuring the collection of the required data in one place,
        Platform with tools to allow the identification of mobility patterns and prioritize actions
         based on social interactions and epidemiologic models,
        Processes to secure the usage and the impact of the proposed approach.

                                                                                                          9
Local Regulator – NATCOM
Telecom regulators in the relevant countries should be involved as the telecom operators would allow
third parties (e.g. Real Impact Analytics) to access the data and share insights and drive public / field
actions. For instance, the National Telecommunications Commission in Sierra Leone (NATCOM) has
already taken actions to accelerate the fight against Ebola. For instance, it facilitates SMS text messaging
of vital communications on Ebola. However, NATCOM is also the guardian of the consumer rights. This
has been translated into the right to privacy, which says that “Consumers must be protected from the
inappropriate use of information gathered by service providers in the course of providing
telecommunications service. It is required of service providers to protect the privacy of the financial,
personal and other confidential information on consumers, and the Commission shall impose sanctions
and penalties to ensure that service providers respect this right.”

In our case, the role of NATCOM is to allow RIA and Airtel SL to leverage the consumer telecom data to
develop relevant tools and insights. Real Impact Analytics is willing to provide all technical guarantees
to ensure that the rights of the consumers will never be compromised or infringed. Moreover, the goal
of RIA and Airtel SL is not to make or derive any commercial profit from their actions but to contribute
to the public good and welfare of Sierra Leone by leveraging their key assets and capabilities.

Airtel SL
Mobile Network Operators such as Airtel, MTN or Orange have significant market shares in the infected
countries. Moreover, they have been collecting data on customers to secure their commercial purposes.
For instance, billing requires to collect data on the calls made by any customer, such as the duration of
the call, the number called, the number calling. These data are stored in a secured database within the
premises of the local operators.

In our case, the role of Airtel SL would be to share their data (CDRs, recharge records) with Real Impact
Analytics in a way that prevents any infringement or risk to the right for privacy.

                                                                                                         10
Section 3: What approach are we recommending?

Which data is required?
Call Detail Records (CDRs)
We would need Airtel SL to provide CDRs for the entire subscriber base with as much history as possible
(ideally more than 9 months history). Here are the fields that will be required ad minima:

  Field                              DESCRIPTION
  A                                  MSISDN – format should be consistent, always Airtel user
  B                                  MSISDN – format should be consistent (e.g. 32493191914)
  CELL_ID                            String - "Cell code" of A-number
  DURATION                           Number - (in seconds for VOICE and 1 per event for SMS)
  TAC_A                              String - IMEI first 8 digits (set to Null if unavailable)
  DATE_TIME                          String - yyyyMMdd hh:mm:ss (Ex : 20140918 15:21:18)
  TRANSIT_TYPE                       String - "ONNET", "OFFNET"
  TYPE                               String - " SMS" or "VOICE"
                                     Number - amount charged to A (revenue), if A is receiver then
  VALUE_A
                                     VALUE_A = 0 except if A pays for receiving the call
                                     String - "INCOMING", "OUTGOING" (indicates if A is the receiver
  WAY                                (incoming) or the caller (outgoing)
A daily dump of the data will be required to make the analyses dynamic.

Recharge records
Subscriber recharges will enrich the mobility maps with socio-economic indicators.

Field                             DESCRIPTION
SUBS_MSISDN                       Number – consistent format with CDRs
AMOUNT                            Number - Amount charged to A for their top-up
DATE                              String - yyyyMMdd (Ex : 20140917)
CELL_ID                           String - CELL_ID of the recharge event
DATE_TIME                         Date - yyyyMMdd hh:mm:ss (Ex : 20140917 16:33:18)
TYPE                              String - EVD or Voucher
AGENT_MSISDN                      Number - MSISDN of the Agent, if any
VOUCHER_ID                        String - (set to Null if unavailable)

                                                                                                       11
Site GPS locations
We will also need cell and site locations to link aggregated subscriber events to geographies.

  Field                      Description/ comments
  CELL_ID                    String - cell ID (e.g. 659020200220262)
  CELL_TYPE                  String - 2G or 3G
  SITE_ID                    String - site ID (e.g. 65902)
  CITY_ID                    String - Unique identifier for each city
  CITY_POPULATION            String - city population
  CITY_URBAN_RURAL           String - "RURAL" or "URBAN" at a city level
  PROVINCE_ID                String - Unique identifier for each province
  REGION_ID                  String - Unique identifier for each region
  SITE_LATITUDE              String - cell latitude (e.g. 3.851443332)
  SITE_LONGITUDE             String - cell longitude (e.g. 32.27342849)

Public health data
The cases of Ebola at the most granular level of geography. Such data should be readily available at the
UNMEER or from the WHO. Real Impact Analytics will then consolidate data on the geographies used in
the context of the telecom data. For instance, if telecom data and insights are required at the town level,
we should be able to easily consolidate the public health data at the same level of aggregation.

Which analyses will we run?
Real Impact Analytics and Global Pulse will perform a series of analyses featuring different goals and
concerns:

       Anonymize the data to secure privacy, while allowing for public health and policy insights,
       Run mobility algorithms on the data to isolate and identify mobility matrix and patterns,
       Operate epidemiological models to assess probabilities of Ebola per geography and prioritize
        locations for action,
       Develop interactive tools (including processes) to allow for quick insights and decision-making.

All analyses will comply with the guidelines set forth by the GSMA to ensure maximum confidentiality,
security and privacy standards.

Anonymization
To avoid putting privacy at risk, all the data will be anonymized to remove any personal information
regarding the subscribers (especially, their mobile number). A new unique identifier will be created for
each subscriber based on the anonymization algorithms below.

MSISDNs (i.e. subscriber mobile phone number) will be anonymized using the SHA-224 hashing algorithm.
SHA-224 is one of the seven Secure Hash Algorithms defined by the National Institute of Standards and
Technology (NIST, see http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf for the latest official
report on secured hash algorithms). This algorithm is a one-direction encryption method not allowing the
retrieval of the initial information, as there is no key or table allowing for such reverse engineering. The

                                                                                                          12
original phone number cannot be recovered from the hashing function. The choice of using SHA-224 over
other SHA-2 algorithms has been made based on two factors. First, the message to encrypt (the MSISDN)
is short, which does not require a lengthy encryption as SHA-512. Second, SHA-224 is simply a truncated
version of SHA-256, so the output key is a subset of the full version. This allows us to store the output in
a smaller size than the full key.

 SHA-224 algorithm provides a security of 112 bits, which is currently considered safe by international
standards. A public implementation of this hashing tool is available in the Python library hashlib, which is
part of the Python core library. An example of the output of SHA-224 is given hereunder:

 MSISDN                        Hash (hexadecimal)
 0032499123456                 c7ee813753c5e72657db9c6aa82f0d46b785e5746df85d27dd4a7079
 0032499123457                 f8c9e797fab4e1b65b5155ac1ab26ae55dff2c4da4dd69a19e19af64
As shown, even though the MSISDNs are very similar, their corresponding output of the hash function
differs substantially.

Mobility
In the first stage of building a mobility model from CDRs, Real Impact Analytics will focus on commuting.
According to previous studies on human mobility, people tend to move recurrently between a few
locations (Gonzales et al, 2008), and home-work commuting accounts for more than 87% of all mobility
(Tizzoni, 2013). Thus, observing patterns present in daily commuting should provide a solid basis for more
detailed further analyses.

From the anonymized CDRs spanning at least over a few months, home and work locations will be
extracted for each user. As a home location, the site most used during non-working hours (weekends,
evenings and night hours of working days) over the analyzed time frame will be established. Similarly, a
site used most frequently during working hours by a given user will be marked as a work location. The
home and work location can be identical for persons working from home or close to their home. Only data
from users showing activity on at least 75% of the days over the observation period will be used in order
to have sufficient amount of data for establishing a reliable home and work location.

Across the dataset, links between all pairs of home and work locations will be established with weight of
the links corresponding to the proportion of users commuting between every two locations. This will
create a network of nodes corresponding to individual cities and their weighted connections.

The output graph will be fully aggregated to the level of a town and will not provide any information
regarding movement patterns of actual people, only probabilities of fluxes across the country.

In order to allow for correct implementation of the epidemics spreading model, the population of the
interconnected towns will be re-scaled from the number of recorded phone users to the actual
demographic figures.

                                                                                                         13
We provide 3 illustrations of analyses on mobility patterns. These can be reformatted into a formal
matrix.

       Illustration 1 – Invasion tree based on mobility patterns from CDR data complemented by a SIR
        model (Tizzoni, 2013)

       Illustration 2 – Mobility models for West African countries (based on CDR data from a few
        selected countries and demographic information of the remaining ones, Wesolowski, 2014)

                                                                                                      14
   Illustration 3 – Sierra Leone: weighted links between regions (based on roads and demographic
        information) and the number of cases in the individual regions (shading).

http://challengepost.com/software/how-mobility-informs-epidemic-dynamics-districts-sierra-leone

Epidemiological model
Epidemiological models are mathematical models predicting the number of new infections in a population
under the hypothesis that the population is fully mixed. Recently, several studies have added mobility
models on top of epidemiological ones, to add a geographical component to the prediction, and to gain
accuracy in the models themselves.

The guiding principle of these new epidemiological models is to simulate each geographical entity (village,
city sub-area) as a homogenous space where all inhabitants can potentially enter into contact with each
other. Inside these entities, the spread of the disease is modeled using a classical compartmental model.
In the case of Ebola, the model is a SEIR model (Susceptible-Exposed-Infectious-Removed), where the
Removed case covers both casualties and recovered cases that have gained immunity against the disease.

                                                                                                        15
SEIR model has several parameters

       Some parameters will be simplified and set to 0, such as mu, the default mortality rate
       The value of selected parameters of the model can be inferred from existing literature, such as
        sigma, the rate at which exposed subjects are becoming infectious, or gamma, the rate at which
        infectious subjects recover or die from the disease. These parameters will be estimated from
        recent work of the CDC (http://www.cdc.gov/mmwr/preview/mmwrhtml/su6303a1.htm) on the
        same topic.

Which tools will we develop?
Real Impact Analytics proposes to develop a series of tools based on the insights from the analyses. The
tools should be highly visual and intuitive using maps and prioritization algorithms.

Mobility matrix and mapping of the patterns
We propose to develop dashboards of maps showing the mobility patterns (see previous illustrations) and
key visuals per geography.

The size or color of the shapes on the map will depend on the level of urgency/risk of infection and possibly
the size of the location. This could hence provide an immediate insight on:

       What are the locations/cities at risk?
       Where are the largest cities at risk?

Ranking of action priorities per geography
Real Impact Analytics has been focusing on linking the insights to actions. In the current case, this
translates into securing a list of prioritized geographies and locations.

The output of our tools will provide an estimate of the future developments of the epidemic, based on
the observed recent cases. Adding mobility data onto the existing models, we will be able to predict where
the next cases will be likely to happen to help the organizations on the ground to better plan their
response to Ebola. The model can also be used to test some response strategies, such as area
confinement.

                                                                                                          16
Section 4: What are the next steps?

Key principles
This project will only succeed if the highest standards are applied in terms of confidentiality, security and
privacy. RIA will therefore apply standards that match or are stricter than the guidelines set by the GSMA
to eliminate any concern around confidentiality and privacy. In terms of security, RIA will apply best
practices in terms of data access, storage, transformation and sharing.

All the standards will be shared in advance and comply with any internal standard of Airtel SL and with
local legislation.

Financial responsibility
All the parties involved in this project will bear the financial costs of their investments and activities
related to this project. If and when donors are involved, the financial contribution will only be used to
recover any documented expense.

At no point shall RIA or Airtel SL seek financial profit from this project.

Transparency, audit and regulation
RIA is willing to open its processes for audit. To illustrate RIA’s desire to set the highest standards, we will
follow the guidelines set by the Director of Privacy at the GSMA (Pat Walshe):

       The mobile phone numbers of subscribers making or receiving calls or SMS will be anonymized
        within the premises of the mobile operators,
       Anonymized CDR data will not be transferred outside of the operator’s premises,
       All analyses will take place on mobile operator’s systems,
       No analysis will be undertaken that singles out identifiable individuals,

Only the output of the analyses will be made available to relevant and approved aid agencies, government
or research agencies that can use these inputs in their modeling and planning efforts.

Short term priorities
In the short term, we need to:

    -   Receive written confirmations of the interest and participation to the project from the following
        actors: UNMEER, Global Pulse and any other relevant stakeholder (e.g., GSMA, WHO),
    -   Request regulatory approval from NERC and NATCOM.

Concretely, Airtel SL can support us in getting NATCOM approval. Global Pulse and UNMEER can support
us in getting NERC approval.

Once these requirements are met, we will agree on a project plan with a split of roles and responsibilities.
As a result, the current document will be modified based on the input of each actor.

                                                                                                             17
Security and privacy – technical environment set-up
Real Impact Analytics will set up the following technical environment to ensure security and privacy of the
individual data:

       Local machine – the data will be stored on a local machine within the telecom operator premises
        and remotely accessed by Real Impact Analytics. At all times, the raw data will remain on the local
        machine. Real Impact Analytics will extract the CDRs from the operator’s data warehouse or
        operational databases, run the anonymization algorithm and compute the required aggregates.
        This will be adapted when we will take into account the data from multiple operators. Data will
        remain within their premises but some aggregated insights will be exported to build the
        dashboards,
       Hardware – the required server will be provided by Real Impact Analytics. This will avoid any
        shortage in storage capacity from the telecom operator, while securing the required
        parameterization for our analyses. The server will be shipped to the local operator and installed
        by its staff members,
       Secured connection – all accesses will be operated through a secured VPN.

Reaching scale
As the bigger picture is to fight Ebola, we cannot limit the effort to one country. With the very generous
leadership of Airtel SL, we will be in a position to demonstrate the value at hand and have actual use cases
to justify an extension of the project to more Mobile Network Operators in the remaining affected
countries.

To reach scale, we propose to have a phased approach to ensure stability of the end-product. We
essentially foresee 3 phases around the following end-products:

       Phase 1 - Prototype based on the data of Airtel SL (CURRENT FOCUS)
            o Product based on the data of Airtel SL. This would allow to fine tune the requirements,
                stabilize the output and the algorithms and start without needing all operators to
                participate,
            o Document/publication synthesizing the approach, best practices and lessons learned.
       Phase 2 – Tools leveraging all the data available in Sierra Leone
            o Platform integrated with UNMEER and other key stakeholders coordinating the actions,
                e.g. MSF, WHO, CDC ,
            o Extend the sources to non-telecom data (e.g. data collection by field actors),
            o Product scaled up at the country level. This would require multiple operators to
                participate within a country.
       Phase 3 – Tools rolled out on multiple countries and offered as a shared platform, possibly
        including international mobility flows

                                                                                                         18
References:
Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility
patterns. Nature, 453(7196), 779-782.

Halloran, M. E., Vespignani, A., Bharti, N., Feldstein, L. R., Alexander, K. A., Ferrari, M., ... & Longini Jr, I.
M. (2014). Ebola: mobility data. Science (New York, NY), 346(6208), 433.

Lenormand, M., Picornell, M., Cantu-Ros, O. G., Tugores, A., Louail, T., Herranz, R., ... & Ramasco, J. J.
(2014). Cross-checking different sources of mobility information. arXiv preprint arXiv:1404.0333.

Tizzoni, M., Bajardi, P., Decuyper, A., King, G. K. K., Schneider, C. M., Blondel, V., ... & Colizza, V.
(2013). On the use of human mobility proxy for the modeling of epidemics. arXiv preprint
arXiv:1309.7272.

Wesolowski, A., Buckee, C. O., Bengtsson, L., Wetter, E., Lu, X., & Tatem, A. J. (2014). Commentary:
Containing the Ebola outbreak–the potential and challenge of mobile network data. PLOS Currents
Outbreaks.

                                                                                                               19
You can also read