L8: Introduction to privacy-preserving computations - Privacy-preserving Technologies / LTAT.04.007
←
                            
                                                            
                                →
                                
                            Page content transcription
                            
                        If your browser does not render page correctly, please read the page content below
                        L8: Introduction to privacy-preserving computations Privacy-preserving Technologies / LTAT.04.007 Dan Bogdanov dan.bogdanov@cyber.ee
Using privacy technologies to solve it
    Source data:                                                        Estonian
                                                  Education           Information
      10 million tax records,                      records         System's Authority
      600 000 education records.           Ministry of Education
                                               and Research
    Each record upload using secret sharing
    (think: “encryption”)
                                                                     Ministry of
    Records linked and processed using           Employment
                                                  tax records         Finance
    secure multi-party computation (think:                           IT Center
                                               Estonian Tax and
    “data not decrypted for processing”)        Customs Board
    Data never existed outside the source in
    an unencrypted state.
                                                                     Cybernetica
    Solution based on Sharemind MPC.
5Tax and             Aggregate by year
      Customs
       Board         Monthly            Average
                     income           yearly income                                Recover
    Extract data                                                                 results from
                    Aggregate      Expand by years and                             shares
     Employment     by month       aggregate by person
    tax payments
                    Employment         Employment                     Analysis     Analysis
                   tax payments     record of a person                 results      results
    Secret share
     and upload
                                                                                       ?
                                     Merge by     Complete record     Analysis
                                    person's ID     of a person        table
    Higher study   Higher study
                                                         Compute additional
       events         events
                                                            attributes and
    Extract data       Aggregate     University career   align tax payments
                       by person       of a person
                                                                                  Statistical
     Ministry of
                        Data stored with secret sharing and                        analyst
     Education
    and Science    processed with secure multi-party computation
7Sharemind-powered Analytics
    Data scientists used analytics tools based          Estonian
                                                      Information
    on secure multi-party computation.             System's Authority
    The MPC system also prevented queries
    outside the study plan.
    Reports were given to industry, universities
                                                                         Data   Universities
    and the government.                               Ministry of
                                                       Finance          Analyst Companies
                                                                               Policymakers
    Result: no clear relation between working         IT Center
    during studies and not graduating.
                                                      Cybernetica
8A privacy-preserving statistics tool inspired by R 9
10
Non-IT graduation
                                                              rate is around 40%
                                                               IT graduation rate
                                                               is around 20%
11   Joonis 1. Nominaalajaga lõpetajate osakaal immatrikuleerimisaastate lõikes, IKT- ja mitte-IKT õppekavad,
     bakalaureuseõpeNon-IT and IT students have similar employment
              ratios, but IT students lost more in the financial crisis
     Joonis 4. Nominaalaja jooksul töötanud tudengite osakaal kõigist tudengitest aastati, IKT- ja mitte-IKT
12   õppekavad, bakalaureuseõpeDATA-DRIVEN SERVICES ON CONFIDENTIAL DATA
Regulatory status of the project
     In an official response, after a study of the system, the Estonian DPA
     suggested that
        neither the hosts of the servers running the statistics
        nor the analysts making the queries
        could feasibly re-identify individuals in the source database (this was pre-GDPR).
     The Internal Supervision Department of the Tax and Customs Board agreed
     to provide unmodified tax records after a code and process review.
     Follow-up legal review in the FP7 PRACTICE by a research from the
     University of Göttingen suggested that the same precedent could hold under
     GDPR as well.
13A general model for privacy-preserving computing
Concept of secure computing
                                               encrypted
                                               database
                                   standard
       When a standard computer    tools
       encrypts data, it must be
       decrypted before analysis
                                   secure
      Secure computing systems     computing
      can analyze data without
      removing the encryption.
15Extended definition of secure multi-party computation
     Input parties                  Computing parties                 Result parties
                                     x11
        IP1          x1
                                      ...
                                     xk1
                                             CP1        y1
                                                                     y1
                                                                           RP1
                                     x1i
         ...                          ...     ...       yj                  ...
                                     xki
                                     x1l
        IPk          xk               ...
                                             CPl        yl
                                                                     ym    RPm
                                     xkl
                Step 1:                       Step 2:            Step 3:
                upload and                    secure         publishing
16              storage of inputs           computing         of resultsTechnique: property-preserving cryptography
                       Analogy: symmetric crypto that preserves a
                       relation on inputs (e.g., order, equality).
                       Pros:
                         Low performance overhead.
                         Fits well into existing systems.
                       Cons:
                         Only allows a few operations (e.g., only
                         equality comparison or ordering).
                         Multi-user systems are a challenge, but can
                         be done with proxy re-encryption.
17Technique: homomorphic encryption
                     Analogy: asymmetric crypto that allows
                     addition and multiplication of ciphertexts.
                     Pros:
                        Fits well into existing systems.
                     Cons:
                        High performance overhead.
                        Multi-user systems are a challenge, but can
                        be done with proxy re-encryption.
18Technique: garbled circuits
                        Analogy: cryptographic versions of electrical
                        circuits.
                        Pros:
                           Flexible programming model.
                        Cons:
                           Medium performance overhead.
                           Fixed number of parties (can be solved by
                           combining with other techniques).
19Example: millionaire’s problem 20
Technique: secret sharing
                       Analogy: give a number of people a random
                       piece of each secret value and let them
                       collaborate to compute results.
                       Pros:
                            Low-to-medium performance overhead.
                            Flexible programming model.
                       Cons:
                            Distributed deployments do not fit into all
                            existing systems.
21Technique: trusted execution environments
                       Analogy: think of a computer process that
                       hides the data from its owner
       Ik              Pros:
                         Minimal performance overhead.
                         Relatively easy to convert applications to work
       SC
            C            on trusted execution environments
                       Cons:
                         Side-channel attack mitigations are
       Rn                complicated to implement.
22Lecture exercise: modelling parties for a COVID-19 social distancing tracking application
Lecture task
     Think of an application that would support social distancing and limit infection
     rates. Write down very clearly, what is the expected benefit of the system.
        Write down the list of input parties and the data they would provide.
        Write down the list of computing parties and describe the kind of processing they
        would perform.
        Write down the list of result parties and describe the outputs they would receive.
     Bonus tasks, time permitting:
        Think of minimizing personal data processing using process redesign.
        See if any of the secure computing paradigms described above could support
        your application.
     Prepare in 12 minutes and then we’ll have 1-2 students present their ideas.
24Programmable privacy-preserving computations
PDK as an abstraction of a secure computing paradigm
     A protection domain kind (PDK) is a set of data representations, algorithms
     and protocols for storing and computing on protected data.
     Examples:
       SMC based on secret sharing,
       SMC based on garbled circuits,
       (fully) homomorphic encryption,
       trusted hardware (e.g., Intel SGX).
26Protection domain as an instance of a PDK
     A protection domain (PD) is a set of data that is protected with the same
     resources and for which there is a well-defined set of algorithms and
     protocols for computing on that data while keeping the protection.
     Examples:
        data held by a fixed group of servers performing secure multi-party computation,
        data encrypted under a fixed key of a homomorphic encryption scheme.
27Application model for privacy-preserving computing
                     Secure            Privacy-
                                                        Application
       Application   primitive         preserving
                                                        logic
                     operations        algorithms
      • private outputs from private • publish selected results to
        inputs,                        make system useful,
      • have privacy proofs,         • do not leak private inputs or
      • remain private under           show leakage as acceptable,
        sequential or parallel      • compositions of secure
        composition,                  primitive operations,
      • optimized to have a         • optimize for running
        low resource footprint.       time.
28Converting an algorithm to a privacy-preserving one
     We pick frequent itemset mining as a problem of choice.
     Frequent itemset mining is a data mining problem that helps with shopping
     basket analysis and the simplest kinds of recommender systems.
        What kind of things do people buy from stores together most often?
        If the service provider knows this, they can recommend one to a customer who is
        planning to buy the other.
     The simpler algorithms include Apriori (breadth-first search) and Eclat (depth-
     first search).
     We will know look at the basic primitive of frequent itemset mining and then
     build a privacy-preserving approach.
29Privacy-preserving data representations
       Private data representations are the key toward
       desaigning privacy-preserving algorithms.
                                                               nasi             chicken
                                                    rendang           lontong
                                                              lemak              satay
                                     chicken
      t1   rendang      nasi lemak
                                     satay     t1     1        1        0         1
      t2   nasi lemak   lontong                t2     0        1        1         0
           chicken
      t3   satay                               t3     0        0        0         1
      t4   rendang      nasi lemak             t4     1        1        0         0
                        chicken
      t5   nasi lemak
                        satay                  t5     0        1        0         1
                        chicken
      t6   nasi lemak
                        satay                  t6     0        1        0         1
      t7   lontong                             t7     0        0        1         0
30Calculating the support of an item
       The data representation allows for very efficient
       calculation of item supports.
                       nasi             chicken         nasi
            rendang           lontong
                      lemak              satay         lemak
       t1     1        1        0         1             1
       t2     0        1        1         0             1
       t3     0        0        0         1             0
       t4     1        1        0         0             1
       t5     0        1        0         1             1
       t6     0        1        0         1             1
       t7     0        0        1         0             0
31                                                ∑=    5Calculating support for a set of items
       Checking the joint support of a pair of items
       simply requires a multiplication
                       nasi             chicken    nasi       chicken       nasi lemak &
            rendang           lontong                                       chicken satay
                      lemak              satay    lemak        satay
       t1     1        1        0         1        1      x     1       =       1
       t2     0        1        1         0        1      x     0       =       0
       t3     0        0        0         1        0      x     1       =       0
       t4     1        1        0         0        1      x     0       =       0
       t5     0        1        0         1        1      x     1       =       1
       t6     0        1        0         1        1      x     1       =       1
              0        0        1         0        0      x     0       =       0
       t7
32                                                                  ∑=          3Evaluating itemsets with a depth-first strategy
          Depth-first search would be intuitive for pruning.
               { rendang }           { nasi lemak }      { lontong }        { chicken satay }
      {rendang,
                } {
       nasi lemak
                         rendang,
                         lontong    } {rendang,
                                                   } {
                                       chicken satay
                                                         nasi lemak,
                                                         lontong    } {    nasi lemak,
                                                                                      } {
                                                                           chicken satay
                                                                                                lontong,
                                                                                                           }
                                                                                                chicken satay
              {               } {                  }{                  }{                   }
                    rendang,           rendang,            rendang,             nasi lemak,
                    nasi lemak,        nasi lemak,         lontong,             lontong
                    lontong            chicken satay       chicken satay        chicken satay
                                           {                    }
                                                rendang,
                                                nasi lemak,
                                                lontong
                                                chicken satay
33Evaluating itemsets with a breadth-first strategy
          However, breadth-first search can be done in parallel.
               { rendang }           { nasi lemak }      { lontong }        { chicken satay }
      {rendang,
                } {
       nasi lemak
                         rendang,
                         lontong    } {rendang,
                                                   } {
                                       chicken satay
                                                         nasi lemak,
                                                         lontong    } {    nasi lemak,
                                                                                      } {
                                                                           chicken satay
                                                                                                lontong,
                                                                                                           }
                                                                                                chicken satay
              {               } {                  }{                  }{                   }
                    rendang,           rendang,            rendang,             nasi lemak,
                    nasi lemak,        nasi lemak,         lontong,             lontong
                    lontong            chicken satay       chicken satay        chicken satay
                                           {                    }
                                                rendang,
                                                nasi lemak,
                                                lontong
                                                chicken satay
34Balancing optimizations with privacy preservation
     Challenge: exploring all possible itemsets leads is slow due to combinatorial
     explosion.
     Pruning the search tree requires us to declassify itemset supports during
     computation (leak?).
     Solution: consider that the algorithm will publish all frequent itemsets, as that
     is its intended goal.
     We will compare support to the threshold privately, only declassifying the
     result bit.
     We will prune the search tree based on that bit.
     Not a leak - if the itemset is frequent, we would have learned it from the
     outputs anyway.
35You can also read

























































