Leveraging anonymized telecom data to fight the Ebola outbreak - White Paper - Proposed Approach November 2014
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Leveraging anonymized telecom data to fight the Ebola outbreak White Paper – Proposed Approach November 2014 1
Executive Summary Over the past weeks, Real Impact Analytics (RIA, a big data analytics provider in the telecom industry) and Airtel Sierra Leone (Airtel SL, a mobile network operator) have been discussing the opportunity to use telecom data to fight Ebola in Sierra Leone. Building on numerous academic publications and use cases promoted by United Nations’ Global Pulse (the data for good arm of the UN) that demonstrate the value at hand, RIA and Airtel SL would like to invite several partners to join forces and support this initiative. The goal of this document is to structure the approach by which the data would be made available and leveraged to allow health actors on the field (e.g. WHO, UNMEER, Red Cross) to improve the outcomes and impact of their activities. The aim is to demonstrate impact with one mobile network operator in one country before potentially expanding the approach to other operators and countries. Why should telecom data be used to fight Ebola? Our core observation is that collecting dynamic and updated data is one of the biggest challenges for health actors, particularly in the case of an epidemic such as Ebola. There are very limited existing sources (e.g. census, household surveys), they are often outdated and they are seldom dynamic. Moreover, when a new cluster of Ebola cases appears outside of existing focus areas, health actors cannot trace the potential spread patterns because they do not have any data that show population mobility or social interactions. However, such data exists and is collected systematically by mobile network operators. It is possible to aggregate raw data (e.g. signaling, call detail records (CDRs), recharge records) into mobility maps that can drive field actions with health actors, while staying fully anonymized Who are the key stakeholders that need to collaborate? There are four essential components to this collaboration: Mobile Network Operators (MNO) that provide data, Health actors that use the analyses made on the data, Analytics experts who make the data valuable for end users Regulators that approve the use of the data for a specific scope and under a clearly defined approach. So far, the situation is as follows: Airtel SL has accepted to lead the initiative from the MNO perspective, UNMEER has expressed interest in using the data and further enriching them with other collected data, Airtel SL has highlighted that two regulatory approvals were required: an approval from the National Ebola Response Center (NERC) and one from the national telecom regulator (NATCOM), Several other parties will be involved to run and facilitate the analytics applied to the data. Specifically, UN Global Pulse and RIA will lead this part of the effort under the global framework of UNMEER. Additional partners may be involved when deemed relevant by the current stakeholders. 2
What approach are we recommending? Once all the parties and approvals are gathered for this exercise, we will initiate a 6-week project that will cover the entire value chain from data-to-action: Step 1: Collect the data: Airtel SL will provide access to CDRs and recharge data with as much history as possible. UNMEER will provide its existing data records, Step 2: Anonymize and transform the data: RIA and Global Pulse will apply a set of algorithms to ensure subscriber privacy while extracting valuable insights from the data, Step 3: Make the output analyses accessible and actionable for health actors: RIA & Global Pulse will work closely with UNMEER to link the analyses to the health actors’ day-to-day needs. We will combine the use of dashboards with email notifications. The approach documented below follows the strictest confidentiality, security and privacy standards. It is impossible to target individuals based on the anonymized and aggregated data that will be published. The only aim is to aggregate individual data to support a public good initiative, the fight against Ebola. What are the next steps? In the short term, we need to: Receive written expressions of interest and involvement in the project from the following actors: UNMEER, Global Pulse and any other relevant stakeholder (e.g., GSMA, WHO), Request regulatory approval from NERC and NATCOM. Once these requirements are met, we will agree on a project plan with a division of roles and responsibilities. As a result, this document will be edited to reflect the final scope and approach across participants. In the longer term, we will expand the initiative to other countries and mobile network operators if it proves valuable to UNMEER in Sierra Leone. 3
Context and Objectives In March 2014, the Regional office of the World Health Organization (WHO) reported an outbreak of the Ebola disease in Guinea. The United Nations Mission for Ebola Emergency Response (UNMEER), the Economic Community of West African States (ECOWAS) and many other stakeholders have been setting up various task forces and coordination plans to assess the situation and identify possible avenues to contain and reduce the Ebola outbreak. Immense efforts have been made to fight the epidemic amid geographies lacking the most basic infrastructure in health, information management and disaster response. Real Impact Analytics (RIA, a telecom big data vendor) and Airtel Sierra Leone (Airtel SL, a mobile network operator) have been discussing how they could support aid and health actors. Especially, around providing data and insights that would fundamentally improve the efficiency of field teams. Building on numerous academic publications and use cases promoted by UN’s Global Pulse (the data for good arm of the United Nations) that demonstrate the value at hand, RIA and Airtel SL wish to use telecom data to support the fight to contain and eradicate Ebola cases. To succeed in this effort, several partners and regulatory approvals need to be gathered. The goal of this document is to structure the approach and framework by which the data would be used and made available / actionable for health actors on the field (e.g. WHO, UNMEER, Red Cross). It is intended for all the stakeholders and divided into 4 sections: - Section 1: Why should telecom data be used to fight Ebola? - Section 2: Who are the key stakeholders that need to collaborate? - Section 3: What approach are we recommending? - Section 4: What are the next steps? Ultimately, the aim is to demonstrate impact with one mobile network operator in one country before potentially expanding the approach to other operators and countries. 4
Section 1: Why should telecom data be used to fight Ebola? Our core assumption is that collecting useful and actionable data is one of the biggest challenges for health actors, particularly in the case of an epidemic such as Ebola. Indeed, there are very limited existing sources (e.g. census, household surveys), they are often outdated and they are seldom dynamic. When a new cluster of Ebola cases appears outside of existing focus areas, health actors do not have the visibility on the potential spread because they do not have any data that show population mobility. However, such data do exist and are collected systematically by mobile network operators. Indeed, through signaling, call detail records (CDRs) or recharge records, it is possible to aggregate raw data into mobility maps that can directly drive field action by health actors, while staying fully anonymized. In the case of the 2010 Haiti earthquake, Flowminder.org has shown how mobility maps allow better planning of the humanitarian response to a natural disaster. With regard to Ebola, understanding the trends in mobility in affected countries is key, as individuals’ and groups’ movements within and across countries significantly contribute to the spread of the disease. Even if more recent census data on human movements in the affected countries were available, such approach would not be accurate enough as it relies on historical static data sets and data outside urban areas are often of poor quality. A much more precise and dynamic approach could be developed based on the Call Detailed Records (CDRs) collected and stored by telephone operators on a daily basis. A vast body of previous research showed that variables based on CDR data are reliable proxies for population mobility modeling (e.g., Lenormand et al, 2014; Tizzoni et al, 2013; Wesolowski et al, 2014). This would lead to forward-looking approaches and tools rather than assessing the past or current situation. Many people have repeatedly called for making the CDR data from Ebola affected countries publicly accessible to cutting-edge scientific teams, e.g., Halloran et al, 2014; Wesolowski et al, 2014. However, privacy concerns echoed by regulators have prevented such data to be accessible to the scientific community. It is hence important to realize that strict rules ensuring privacy are compatible with the valuable insights brought by the analyses of the CDRs. Anonymized and aggregated data that do not threaten anyone’s privacy are sufficient for providing powerful insights for prediction of further spreading of the disease. Similarly, the Ebola Response Roadmap from the WHO calls for specialized data analysis and also puts strong emphasis on the importance of contact tracing. In situations where precise contact tracing is impossible, mobility models can be a good approximation for observing the spread of the epidemic. The Ebola Mobile Response blueprint of the GSMA puts forward the potential value in mobile phone usage within the Ebola affected countries. GSMA even antecedes possible usage of CDRs by publishing the Guidelines for privacy protection when using the mobile phone data in response to the Ebola outbreak. CDRs from the affected countries are essential to more accurately estimate the current mobility patterns taking place in these countries. Such an approach can be fully anonymized and aggregated at the level of the site/city (hence protecting the data privacy of an individual), without losing or weakening any insights in terms of public health and policy. It hence does not offer the possibility to trace back any specific movement of an individual within the covered geographical area. Rather, a network of places and links between them weighted by the frequency of movement between the two places should be built. This network would facilitate maps of mobility hubs and popular links within countries to be built, allowing 5
estimation of the most probable routes for further spreading of the disease and thus, the most probable places where new infections will occur. Gaining insights would also require overlaying the CDR-based mobility model with an epidemics spreading model (see the Epidemiological model section below) to identify and prioritize locations according to their probability of being affected with Ebola. This can help in directing attention to areas to which supplementary resources should be allocated and where careful preventive measures should be taken in order to limit further spreading of the virus. Our objectives are to leverage telecom data to support health actors (lead by UNMEER): Set-up and secure a system (hardware and software) for collecting telecom and public health data, Develop mobility matrices and epidemiological models to simulate the potential spread paths of the disease, Build a platform/tools allowing identification of a list of priority locations based on mobility patterns and epidemiological models. In a first phase, we will focus on using Airtel SL data and applying the techniques within Sierra Leone. If the analyses prove useful, we will then launch a coordinated effort with UNMEER to expand the application to the remaining focus countries, Liberia & Guinea, across all local mobile network operators. 6
Section 2: Who are the key stakeholders that need to collaborate? The Ebola outbreak is spreading over more than 5 countries impacting both public and private organizations alike. Fighting Ebola therefore requires coordinating many partners, including e.g. local regulators, public health authorities, political decision-makers and commercial private corporations. To fight Ebola in Sierra Leone, we will need to coordinate a subset of such players to ensure and secure impact. In concrete terms, this means that we need to align at least the following stakeholders on a plan of action: Coordination and public authorities o UNMEER o Global Pulse o Local regulator – NATCOM o Government – NERC Operational players o Airtel Sierra Leone o Real Impact Analytics More information on each actor and the proposed split of roles and responsibilities across the ecosystem is provided in this section. Additional players might be added to the picture whenever deemed relevant by the current stakeholders. 7
UNMEER The UN Mission for Ebola Emergency Response (UNMEER) is being set up in response to this unprecedented outbreak. The Mission will be temporary and will respond to immediate needs related to the fight against Ebola. The objective of UNMEER is to harness the capabilities and competencies of all the relevant United Nations actors under a unified operational structure to reinforce unity of purpose, effective ground- level leadership and operational direction, in order to ensure a rapid, effective, efficient and coherent response to the crisis. The singular strategic objective and purpose of the Mission will be to work with others to stop the Ebola outbreak. To achieve this, the strategic priorities of the Mission will be to stop the spread of the disease, treat the infected, ensure essential services, preserve stability and prevent the spread to countries currently unaffected. It works closely with governments and national structures in the affected countries, regional and international actors, such as the African Union (AU) and the Economic Community of West African States (ECOWAS), and with Member States, the private sector and civil society. It also coordinates with the WHO, which is responsible for overall health strategy and advice within the Mission, while other UN agencies will act in their area of expertise under the overall leadership and direction of a single Head of Mission. The Mission will leverage the existing presence and expertise of UN country teams and international partners including NGOs on the ground to minimize gaps and ensure leadership. UNMEER has also set up a platform to share documents among all responders to the Ebola crisis. The site is a one-stop-shop for information on the response of each actor, following their area of action. It brings contact lists, infographics, maps and other tools to a centralized location, and is designed to allow a variety of actors to participate in the management of documents. At this stage, we need to make sure that our actions, tools and approach are part of and coordinated with the decisions of UNMEER. This could translate into various roles of UNMEER regarding our project, which are: Supporting Airtel SL & RIA to secure an approval from NERC, Building awareness of the availability of the output analyses, Guiding RIA and Global Pulse on the needs of the end users (e.g. WHO, UNICEF, MSF, Red Cross) and how we can make the analyses more relevant for field teams. Global Pulse One of the missions of Global Pulse is to accelerate discovery, development and scaled adoption of big data innovation for sustainable development and humanitarian action. The set-up of Global Pulse was established based on a recognition that digital data offers the opportunity to gain a better understanding of changes in human well-being, and to get real-time feedback on how well policy responses are working. 8
To this end, Global Pulse is working to promote awareness of the opportunities Big Data presents for relief and development, forge public-private data sharing partnerships, generate high-impact analytical tools and approaches through its network of Pulse Labs, and drive broad adoption of useful innovations across the UN System. Global Pulse has been playing an instrumental role in the response to the outbreak of Ebola in West Africa. In the current set-up, the roles of Global Pulse could be along the following lines: Coordinate players to exchange best practices and insights, Running their analytical models on the collected telecom data, Involve additional parties that would bring targeted expertise, Promote a potential success case and play the role of setting up and promoting a shared Big Data analytics platform across different countries based on the telecom data. Real Impact Analytics Real Impact Analytics is a company active in Big Data focusing on telecom in emerging markets. It has been investing significantly in developing end-to-end tools for telecom operators in terms of improving and optimizing their sales and marketing approach in Africa, LATAM and Asia. It has offices in 5 countries (Belgium, South Africa, United Arab Emirates, Luxembourg and Brazil) serving 5 of the top 10 global telecom operators, counting more than 80 people and doubling its size every year. It has been completing its commercial approach and products with a donor-based activity focusing on financial inclusion through mobile finance and Micro Finance Institutions. Concretely, it has been heavily cooperating with the Bill & Melinda Gates Foundation, the MasterCard Foundation, the World Bank and USAID on various development projects. Finally, it has been focusing on innovation and investing in R&D through the set- up of a laboratory gathering software developers and pure academic research teams. This has led to multiple research projects such as a “food alert project” with Global Pulse leveraging telecom data (CDRs) to prioritize geographical areas in terms of risk of food shortage. RIA is involved in this project because it has existing legal and technical relations with the major telecom operators active in the countries showing cases of Ebola. Given its expertise in managing telecom data and its growing catalogue of “data for good” initiatives, RIA will apply its methodologies to ensure relevance to the field actors. The role of Real Impact Analytics would be to secure the technical aspects of the approach, covering: Infrastructure and technical processes (compliant with ethical and privacy constraints) ensuring the collection of the required data in one place, Platform with tools to allow the identification of mobility patterns and prioritize actions based on social interactions and epidemiologic models, Processes to secure the usage and the impact of the proposed approach. 9
Local Regulator – NATCOM Telecom regulators in the relevant countries should be involved as the telecom operators would allow third parties (e.g. Real Impact Analytics) to access the data and share insights and drive public / field actions. For instance, the National Telecommunications Commission in Sierra Leone (NATCOM) has already taken actions to accelerate the fight against Ebola. For instance, it facilitates SMS text messaging of vital communications on Ebola. However, NATCOM is also the guardian of the consumer rights. This has been translated into the right to privacy, which says that “Consumers must be protected from the inappropriate use of information gathered by service providers in the course of providing telecommunications service. It is required of service providers to protect the privacy of the financial, personal and other confidential information on consumers, and the Commission shall impose sanctions and penalties to ensure that service providers respect this right.” In our case, the role of NATCOM is to allow RIA and Airtel SL to leverage the consumer telecom data to develop relevant tools and insights. Real Impact Analytics is willing to provide all technical guarantees to ensure that the rights of the consumers will never be compromised or infringed. Moreover, the goal of RIA and Airtel SL is not to make or derive any commercial profit from their actions but to contribute to the public good and welfare of Sierra Leone by leveraging their key assets and capabilities. Airtel SL Mobile Network Operators such as Airtel, MTN or Orange have significant market shares in the infected countries. Moreover, they have been collecting data on customers to secure their commercial purposes. For instance, billing requires to collect data on the calls made by any customer, such as the duration of the call, the number called, the number calling. These data are stored in a secured database within the premises of the local operators. In our case, the role of Airtel SL would be to share their data (CDRs, recharge records) with Real Impact Analytics in a way that prevents any infringement or risk to the right for privacy. 10
Section 3: What approach are we recommending? Which data is required? Call Detail Records (CDRs) We would need Airtel SL to provide CDRs for the entire subscriber base with as much history as possible (ideally more than 9 months history). Here are the fields that will be required ad minima: Field DESCRIPTION A MSISDN – format should be consistent, always Airtel user B MSISDN – format should be consistent (e.g. 32493191914) CELL_ID String - "Cell code" of A-number DURATION Number - (in seconds for VOICE and 1 per event for SMS) TAC_A String - IMEI first 8 digits (set to Null if unavailable) DATE_TIME String - yyyyMMdd hh:mm:ss (Ex : 20140918 15:21:18) TRANSIT_TYPE String - "ONNET", "OFFNET" TYPE String - " SMS" or "VOICE" Number - amount charged to A (revenue), if A is receiver then VALUE_A VALUE_A = 0 except if A pays for receiving the call String - "INCOMING", "OUTGOING" (indicates if A is the receiver WAY (incoming) or the caller (outgoing) A daily dump of the data will be required to make the analyses dynamic. Recharge records Subscriber recharges will enrich the mobility maps with socio-economic indicators. Field DESCRIPTION SUBS_MSISDN Number – consistent format with CDRs AMOUNT Number - Amount charged to A for their top-up DATE String - yyyyMMdd (Ex : 20140917) CELL_ID String - CELL_ID of the recharge event DATE_TIME Date - yyyyMMdd hh:mm:ss (Ex : 20140917 16:33:18) TYPE String - EVD or Voucher AGENT_MSISDN Number - MSISDN of the Agent, if any VOUCHER_ID String - (set to Null if unavailable) 11
Site GPS locations We will also need cell and site locations to link aggregated subscriber events to geographies. Field Description/ comments CELL_ID String - cell ID (e.g. 659020200220262) CELL_TYPE String - 2G or 3G SITE_ID String - site ID (e.g. 65902) CITY_ID String - Unique identifier for each city CITY_POPULATION String - city population CITY_URBAN_RURAL String - "RURAL" or "URBAN" at a city level PROVINCE_ID String - Unique identifier for each province REGION_ID String - Unique identifier for each region SITE_LATITUDE String - cell latitude (e.g. 3.851443332) SITE_LONGITUDE String - cell longitude (e.g. 32.27342849) Public health data The cases of Ebola at the most granular level of geography. Such data should be readily available at the UNMEER or from the WHO. Real Impact Analytics will then consolidate data on the geographies used in the context of the telecom data. For instance, if telecom data and insights are required at the town level, we should be able to easily consolidate the public health data at the same level of aggregation. Which analyses will we run? Real Impact Analytics and Global Pulse will perform a series of analyses featuring different goals and concerns: Anonymize the data to secure privacy, while allowing for public health and policy insights, Run mobility algorithms on the data to isolate and identify mobility matrix and patterns, Operate epidemiological models to assess probabilities of Ebola per geography and prioritize locations for action, Develop interactive tools (including processes) to allow for quick insights and decision-making. All analyses will comply with the guidelines set forth by the GSMA to ensure maximum confidentiality, security and privacy standards. Anonymization To avoid putting privacy at risk, all the data will be anonymized to remove any personal information regarding the subscribers (especially, their mobile number). A new unique identifier will be created for each subscriber based on the anonymization algorithms below. MSISDNs (i.e. subscriber mobile phone number) will be anonymized using the SHA-224 hashing algorithm. SHA-224 is one of the seven Secure Hash Algorithms defined by the National Institute of Standards and Technology (NIST, see http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf for the latest official report on secured hash algorithms). This algorithm is a one-direction encryption method not allowing the retrieval of the initial information, as there is no key or table allowing for such reverse engineering. The 12
original phone number cannot be recovered from the hashing function. The choice of using SHA-224 over other SHA-2 algorithms has been made based on two factors. First, the message to encrypt (the MSISDN) is short, which does not require a lengthy encryption as SHA-512. Second, SHA-224 is simply a truncated version of SHA-256, so the output key is a subset of the full version. This allows us to store the output in a smaller size than the full key. SHA-224 algorithm provides a security of 112 bits, which is currently considered safe by international standards. A public implementation of this hashing tool is available in the Python library hashlib, which is part of the Python core library. An example of the output of SHA-224 is given hereunder: MSISDN Hash (hexadecimal) 0032499123456 c7ee813753c5e72657db9c6aa82f0d46b785e5746df85d27dd4a7079 0032499123457 f8c9e797fab4e1b65b5155ac1ab26ae55dff2c4da4dd69a19e19af64 As shown, even though the MSISDNs are very similar, their corresponding output of the hash function differs substantially. Mobility In the first stage of building a mobility model from CDRs, Real Impact Analytics will focus on commuting. According to previous studies on human mobility, people tend to move recurrently between a few locations (Gonzales et al, 2008), and home-work commuting accounts for more than 87% of all mobility (Tizzoni, 2013). Thus, observing patterns present in daily commuting should provide a solid basis for more detailed further analyses. From the anonymized CDRs spanning at least over a few months, home and work locations will be extracted for each user. As a home location, the site most used during non-working hours (weekends, evenings and night hours of working days) over the analyzed time frame will be established. Similarly, a site used most frequently during working hours by a given user will be marked as a work location. The home and work location can be identical for persons working from home or close to their home. Only data from users showing activity on at least 75% of the days over the observation period will be used in order to have sufficient amount of data for establishing a reliable home and work location. Across the dataset, links between all pairs of home and work locations will be established with weight of the links corresponding to the proportion of users commuting between every two locations. This will create a network of nodes corresponding to individual cities and their weighted connections. The output graph will be fully aggregated to the level of a town and will not provide any information regarding movement patterns of actual people, only probabilities of fluxes across the country. In order to allow for correct implementation of the epidemics spreading model, the population of the interconnected towns will be re-scaled from the number of recorded phone users to the actual demographic figures. 13
We provide 3 illustrations of analyses on mobility patterns. These can be reformatted into a formal matrix. Illustration 1 – Invasion tree based on mobility patterns from CDR data complemented by a SIR model (Tizzoni, 2013) Illustration 2 – Mobility models for West African countries (based on CDR data from a few selected countries and demographic information of the remaining ones, Wesolowski, 2014) 14
Illustration 3 – Sierra Leone: weighted links between regions (based on roads and demographic information) and the number of cases in the individual regions (shading). http://challengepost.com/software/how-mobility-informs-epidemic-dynamics-districts-sierra-leone Epidemiological model Epidemiological models are mathematical models predicting the number of new infections in a population under the hypothesis that the population is fully mixed. Recently, several studies have added mobility models on top of epidemiological ones, to add a geographical component to the prediction, and to gain accuracy in the models themselves. The guiding principle of these new epidemiological models is to simulate each geographical entity (village, city sub-area) as a homogenous space where all inhabitants can potentially enter into contact with each other. Inside these entities, the spread of the disease is modeled using a classical compartmental model. In the case of Ebola, the model is a SEIR model (Susceptible-Exposed-Infectious-Removed), where the Removed case covers both casualties and recovered cases that have gained immunity against the disease. 15
SEIR model has several parameters Some parameters will be simplified and set to 0, such as mu, the default mortality rate The value of selected parameters of the model can be inferred from existing literature, such as sigma, the rate at which exposed subjects are becoming infectious, or gamma, the rate at which infectious subjects recover or die from the disease. These parameters will be estimated from recent work of the CDC (http://www.cdc.gov/mmwr/preview/mmwrhtml/su6303a1.htm) on the same topic. Which tools will we develop? Real Impact Analytics proposes to develop a series of tools based on the insights from the analyses. The tools should be highly visual and intuitive using maps and prioritization algorithms. Mobility matrix and mapping of the patterns We propose to develop dashboards of maps showing the mobility patterns (see previous illustrations) and key visuals per geography. The size or color of the shapes on the map will depend on the level of urgency/risk of infection and possibly the size of the location. This could hence provide an immediate insight on: What are the locations/cities at risk? Where are the largest cities at risk? Ranking of action priorities per geography Real Impact Analytics has been focusing on linking the insights to actions. In the current case, this translates into securing a list of prioritized geographies and locations. The output of our tools will provide an estimate of the future developments of the epidemic, based on the observed recent cases. Adding mobility data onto the existing models, we will be able to predict where the next cases will be likely to happen to help the organizations on the ground to better plan their response to Ebola. The model can also be used to test some response strategies, such as area confinement. 16
Section 4: What are the next steps? Key principles This project will only succeed if the highest standards are applied in terms of confidentiality, security and privacy. RIA will therefore apply standards that match or are stricter than the guidelines set by the GSMA to eliminate any concern around confidentiality and privacy. In terms of security, RIA will apply best practices in terms of data access, storage, transformation and sharing. All the standards will be shared in advance and comply with any internal standard of Airtel SL and with local legislation. Financial responsibility All the parties involved in this project will bear the financial costs of their investments and activities related to this project. If and when donors are involved, the financial contribution will only be used to recover any documented expense. At no point shall RIA or Airtel SL seek financial profit from this project. Transparency, audit and regulation RIA is willing to open its processes for audit. To illustrate RIA’s desire to set the highest standards, we will follow the guidelines set by the Director of Privacy at the GSMA (Pat Walshe): The mobile phone numbers of subscribers making or receiving calls or SMS will be anonymized within the premises of the mobile operators, Anonymized CDR data will not be transferred outside of the operator’s premises, All analyses will take place on mobile operator’s systems, No analysis will be undertaken that singles out identifiable individuals, Only the output of the analyses will be made available to relevant and approved aid agencies, government or research agencies that can use these inputs in their modeling and planning efforts. Short term priorities In the short term, we need to: - Receive written confirmations of the interest and participation to the project from the following actors: UNMEER, Global Pulse and any other relevant stakeholder (e.g., GSMA, WHO), - Request regulatory approval from NERC and NATCOM. Concretely, Airtel SL can support us in getting NATCOM approval. Global Pulse and UNMEER can support us in getting NERC approval. Once these requirements are met, we will agree on a project plan with a split of roles and responsibilities. As a result, the current document will be modified based on the input of each actor. 17
Security and privacy – technical environment set-up Real Impact Analytics will set up the following technical environment to ensure security and privacy of the individual data: Local machine – the data will be stored on a local machine within the telecom operator premises and remotely accessed by Real Impact Analytics. At all times, the raw data will remain on the local machine. Real Impact Analytics will extract the CDRs from the operator’s data warehouse or operational databases, run the anonymization algorithm and compute the required aggregates. This will be adapted when we will take into account the data from multiple operators. Data will remain within their premises but some aggregated insights will be exported to build the dashboards, Hardware – the required server will be provided by Real Impact Analytics. This will avoid any shortage in storage capacity from the telecom operator, while securing the required parameterization for our analyses. The server will be shipped to the local operator and installed by its staff members, Secured connection – all accesses will be operated through a secured VPN. Reaching scale As the bigger picture is to fight Ebola, we cannot limit the effort to one country. With the very generous leadership of Airtel SL, we will be in a position to demonstrate the value at hand and have actual use cases to justify an extension of the project to more Mobile Network Operators in the remaining affected countries. To reach scale, we propose to have a phased approach to ensure stability of the end-product. We essentially foresee 3 phases around the following end-products: Phase 1 - Prototype based on the data of Airtel SL (CURRENT FOCUS) o Product based on the data of Airtel SL. This would allow to fine tune the requirements, stabilize the output and the algorithms and start without needing all operators to participate, o Document/publication synthesizing the approach, best practices and lessons learned. Phase 2 – Tools leveraging all the data available in Sierra Leone o Platform integrated with UNMEER and other key stakeholders coordinating the actions, e.g. MSF, WHO, CDC , o Extend the sources to non-telecom data (e.g. data collection by field actors), o Product scaled up at the country level. This would require multiple operators to participate within a country. Phase 3 – Tools rolled out on multiple countries and offered as a shared platform, possibly including international mobility flows 18
References: Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779-782. Halloran, M. E., Vespignani, A., Bharti, N., Feldstein, L. R., Alexander, K. A., Ferrari, M., ... & Longini Jr, I. M. (2014). Ebola: mobility data. Science (New York, NY), 346(6208), 433. Lenormand, M., Picornell, M., Cantu-Ros, O. G., Tugores, A., Louail, T., Herranz, R., ... & Ramasco, J. J. (2014). Cross-checking different sources of mobility information. arXiv preprint arXiv:1404.0333. Tizzoni, M., Bajardi, P., Decuyper, A., King, G. K. K., Schneider, C. M., Blondel, V., ... & Colizza, V. (2013). On the use of human mobility proxy for the modeling of epidemics. arXiv preprint arXiv:1309.7272. Wesolowski, A., Buckee, C. O., Bengtsson, L., Wetter, E., Lu, X., & Tatem, A. J. (2014). Commentary: Containing the Ebola outbreak–the potential and challenge of mobile network data. PLOS Currents Outbreaks. 19
You can also read