SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018

Page created by Sean Williamson
 
CONTINUE READING
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
SAS Government
Analytics Leadership
Forum

 Anil Arora, Chief Statistician of Canada
 April 2018
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Statistics Canada

• Translating data into evidence for 100
  years
• Using statistical science and sophisticated
  methods to produce reliable information
  about Canadians
• A lot goes on behind the scenes to
  produce the census…
                                                2
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Census: Behind the Scenes

                            3
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
The data revolution is changing
Canada’s society and the
expectations of Canadians
•   New data sources and the
    sophistication of our users
    and their capacity underpin
    the need to modernize our
    methods and outputs
•   Leading-edge methods and
    data integration are a key
    pillar of our modernization
    agenda

                                  4
                                      4
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Statistics Canada is undertaking a significant
transformation and leading efforts to be more
responsive to the data needs of policy leaders by:

                    Moving beyond a survey-first approach with
                    new methods and integrating data from a
                    variety of existing sources

                Making data easier to access and use by adopting new
                tools to analyze and visualize data

                    Enabling Canadians to use data to make
                    evidence-based decisions
                                                                       5
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Statistical analysis is at the center of every
step in the cycle of translating data to
evidence
                                                Design and collection
                                             Optimize designs and processes
                                           (samples, collection, coding, record
                                                         linkage)

                                                                                                Processing and inference
         Consumption
                                                                                                Statistical error detection and
  Supporting quality decisions by            G-SAM, G-CODE, G-LINK                              correction, weighting, weight
  citizens, their governments and                                                           adjustments, use of statistical models
   businesses based on evidence                Statistical analysis is critical to
                                                   producing high quality
                                                information in the most cost                         BANFF. CANCEIS
                                                       efficient manner

                           Dissemination                                             Analysis
                       Measurement of accuracy,                        Time series analysis, statistical data
                      statistical disclosure control                    validation and confrontation, data
                                 (privacy)                                        interpretation

                             G-CONFID                                                 G-SERIES
                                                                                                                                     6
 All processing systems (G-SAM, etc.) are coded in SAS
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Leading-edge methods to
integrate new data types:

       Model-based crop yield estimates

                                          7
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Responding to rapidly evolving
policy needs:

January 11, 2018 print edition

                                 8

                                     8
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Integrating data to enable the
Horizontal Review of Innovation
and Clean Tech
                                                    Basic descriptive
     Administrative data files                          statistics
       from departments,
      agencies and crown
                                    Statistics    Before-after analysis
          corporations
                                    Canada’s
                                  linkable file
       Existing survey and       environment         Cohort analysis
     administrative data files
      at Statistics Canada
                                                  Linked file for ongoing
                                                        research

            ✓ Gathering data efficiently and strategically
            ✓ Leveraging existing data holdings across
              government
            ✓ Creating a new research dataset to allow                      9
              further analysis
SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
Evolving with the times

                                                                    Moving to:
     SAS first       From:                                            Visual
 introduced at     Character-       Primitive
                                                 Enterprise         Analytics,
    Statistics    based green     Windows user
                                                   Guide            Enterprise
 Canada in the   screens on the    interfaces
   late 1980’s     mainframe                                        Miner and
                                                                       Viya

                                                              Canadian Housing Statistics
                                                              Program
                                                              •   Trans Union data (43 mil. records)
                                                                  linked to tax information (165 mil.)
                                                              •   233 million possible pairs created
                                                              •   Runs in about 40 hours on the SAS
                                                                  Grid
                                                              •   Would not be possible on a
                                                                  dedicated Windows Server

                                                                                                     10
StatCan SAS Grid

                                             -   Started as a research project
                                                 made up of 4 workstations
                                             -   Evolved to be the largest SAS Grid
                                                 implementation in Canada:
                                                   - 16 Grid nodes each having 16
                                                       cores
                                                   - 256 compute cores and 60
                                                       Terabytes (TBs) of Shared File
                                                       System

                                                       Continued improvement: using the
                                                       StatCan SAS Grid and the new SAS
Allows many processes to run concurrently:             application G-Tab Census, one can see a
                                                       reduction in time of 95% when
                                                       compared to creating the same table
large record linkages
                                                       using the 2016 Tabulation system
complex estimation processes                                                                 11

multi-dimensional tabulations
Pure Data Analytic (Netezza)

                 • Capacity to store, process and
                   analyze Big Data
                 • Planned use-cases:
                    • CPI alternate data source
                    • Canadian Housing Statistics
                       Program linkage
                    • Admin Data Lake

                                                    12
Old and new: combining
traditional and AI methods
in the 2016 Census
 Immigration-related variables:

 Traditional: data was added
 through record linkage instead of
 collection
 - Result: 24,000 hour reduction
     of respondent burden

                                             OUTCOMES
 AI: to fill in missing values,
                                             More accurate data for
 Machine Learning identified best    Now      IRCC policymakers
 combination of respondent
                                              Proof of concept for
 characteristics to make             Later        Census 2021
 corrections
 - Result: complete data; up to                                       13
     10% more accurate
SAS and Statistics Canada

                            14
THANK YOU!
For more information,
please visit
www.statcan.gc.ca

#StatCan100
You can also read