THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY

Page created by Leon Hammond
 
CONTINUE READING
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
The National COVID Cohort Collaborative (N3C):
               Let’s Get Involved !
                Warren A. Kibbe, PhD, FACMI
                        June 15, 2021
             Purdue Big Data in Cancer Workshop

                         @data2health             covid.cd2h.org
           @wakibbe      @ncats_nih_gov           ncats.nih.gov/n3c
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
A program of NIH’s National Center
                                                                  Speaker Objectives
for Advancing Translational Sciences

                                                                             ●   Real World Data
                                                                             ●   Open Science
                                                                             ●   Overview of N3C
                                       Warren Kibbe                          ●   N3C Data Enclave statistics
                                       Duke Biostatistics & Bioinformatics   ●   How common data models and variables
                                       CTSA Informatics
                                       Duke Cancer Institute                     are harmonized
                                       Member N3C                            ●   The scope of answerable questions
                                                                             ●   Data access and security
                                                                             ●   How common data models and variables
                                                                                 are harmonized
                                                                             ●   Oncology research in N3C
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
Special thanks to:
● Chris Chute, N3C, Johns Hopkins

● Melissa Haendel, N3C, Colorado University

● Umit Topaloglu, N3C, Wake Forest

● Frank Rockhold, Duke

● Noha Sharafeldin, N3C, UAB
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
Take homes
 • N3C represents a unique resource to examine effects of COVID-19 on cancer
   outcomes
 • Largest COVID-19 and cancer cohort within the US
 • Consistent with previous literature, older age, male gender, increasing comorbidities,
   and hematological malignancies were associated with higher mortality in patients with
   cancer and COVID-19
 • The N3C dataset confirmed that cancer patients with COVID-19 who received recent
   immuno-, or targeted therapies were not at higher risks of overall mortality

                                                                                            4
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
What is Real World Data?
    Collected in the
    context of patient
    care. Real World
    Data was called out
    as part of the 21st
    Century Cures Act

21st Century Cures Act: https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act
Graphic from HealthCatalyst: https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
Current sources of data
   molecular           genome   pathology   imaging   labs   notes   sensors

         Our ability to generate biomedical
         data continues to grow in terms of
                variety and volume

icons by the Noun Project
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
AI is changing our ability to go both
deep and broad

  Trustworthy AI     Reusable
  Provenance         Reproducible
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
Having a health equity lens
       ●   Digital Health, precision medicine, and real world data
           all have the power to transform healthcare. However,
           we must pay attention to structural racism and implicit
           bias if we want to achieve equity.
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
21st Century Cures Act

    Last year I discussed the NCI Cancer
    Moonshot and Precision Medicine
    activities funded under the 21st Century
    Cures Act
    FDA was directed by congress to focus
    on the use of RWD and RWE in drug
    design, development and outcomes
    assessment

https://www.fda.gov/regulatory-information/selected-
amendments-fdc-act/21st-century-cures-act
THE NATIONAL COVID COHORT COLLABORATIVE (N3C): LET'S GET INVOLVED ! - WARREN A. KIBBE, PHD, FACMI - PURDUE UNIVERSITY
Is it just about Real World Data?
What about Open Science? Data transparency? Data Access?
The importance of Open Science
Calls for greater transparency and ‘open data access’ in clinical research
continue actively.
● “Open science is the movement to make scientific research, data and
   dissemination accessible to all levels of an inquiring society”*
● Open Science Project**: “If we want open science to flourish, we should
   raise our expectations to: Work. Finish. Publish. Release.”
● FAIR Principles: Findability, Accessibility, Interoperability, and Reusability***
● TRUST Principles: Transparency, Responsibility, User focus, Sustainability
   and Technology
* https://www.fosteropenscience.eu/resources
** http://openscience.org/
*** https://www.nature.com/articles/sdata201618
**** https://www.nature.com/articles/s41597-020-0486-7
Open Science and Patient Data Access
Some of the challenges are:
● Patient privacy
● Academic credit
● Commercial sensitivity and intellectual property
● Data standards
● Resources (money and people)

There should be room for researchers and patients alike to gain from this effort.

Informatics experts and data scientists are essential elements of this discussion.
One problem with Clinical Trials Data Sharing
   ● “The tendency for researchers to ‘‘sit’’ on their data for an unduly long period
     of time is neither desirable from a scientific point of view nor acceptable from
     an ethical perspective. ‘

   ● ‘After all, the data belong to the patients who agreed to participate in the
     research, not to the investigators who coordinated it, as the new European
     General Data Protection Regulation emphasizes.”*

*Rockhold, F, et al. Open science: The open clinical trials data journey, Clinical Trials, Vol 16 (5) 1-8, 2019
Access to patient-level data is important for research
There are certainly challenges, but question is not whether data should be
shared, but rather how and when access should be granted.
Responsible open access enables secondary analyses that:

●   Enhance reproducibility of clinical research
●   Honor the contributions of trial participants,
●   Improve the design of future trials
●   Generate new research findings

This journey of making patient data available is part of an evolution in
transparency and not a sudden awakening.
What about N3C?

It is an open science, controlled access environment
Clinical and Translational Science
Awards (CTSA) Program
A program of NIH’s National Center
for Advancing Translational Sciences
                                        The pandemic highlights urgent needs

                 ●       Algorithms (diagnosis, triage, predictive, etc.)
                 ●       Drug discovery & pharmacogenetics
                 ●       Multimodal analytics (EHR, imaging, genomics)
                 ●       Interventions that reduce disease severity
                 ●       Best practices for resource allocation
                 ●       Coordinated research efforts to maximize efficiency and
                         reproducibility

                                           These all require the creation
                                       of a comprehensive clinical data set
A program of NIH’s National Center
                                       What Kinds of Questions Can N3C Address?
for Advancing Translational Sciences

                        The scope and scale of the information in the platform
                               will support probing questions such as:
    ●        What social determinants of health are risk factors for mortality?
    ●        Do some therapies work better than others? By region? By demographics?
    ●        Can we compare local rare clinical observations with national occurrences?
    ●        Can we predict who might have severe outcomes if they have COVID-19?
    ● What factors will predict the effectiveness of vaccines?
    ● Can we predict acute kidney injury in COVID-19 patients?
    ● Who might need a ventilator because of lung failure?
A program of NIH’s National Center
                                        Cohort characterization objectives
for Advancing Translational Sciences

                               To clinically characterize the N3C cohort
             +
                                ● Largest U.S. COVID-19 cohort to date (+ representative controls)
                                ● Racially, ethnically, and geographically diverse

                               To develop and share validated, versioned OMOP representations of
                               common variables (labs, vital signs, medications, treatments)

                               To generate hypotheses to be tested within N3C and elsewhere

         ?                      ● Clinical phenotypes and trajectories
                                ● Treatment patterns and response
                                ● … and many others
A program of NIH’s National Center
                                       Benefits for Participation
for Advancing Translational Sciences

               ● Access to large scale COVID-19 data from across the nation

               ● Pilot data for grant proposals

               ● Opportunities for KL2 and TL1 and other scholars

               ● Team science opportunities for new questions and access to
                 Teams, statistics, machine learning (ML), informatics
                 expertise

               ● Learn ML analytics, NLP methods & access to tools, software,
                 additional datasets
Who is inAnalytics
                                       Step 4. Federated  the N3C? with HPC
A program of NIH’s National Center
for Advancing Translational Sciences
                                           The N3C Computable Phenotype
      ● At a high level, our phenotype looks for patients:
        ○ With a positive COVID-19 test (PCR or antibody) OR
        ○ With an ICD-10-CM code of U07.1 OR
        ○ Two or more COVID-like diagnosis codes (ARDS, pneumonia, etc.) during the
            same encounter, but only on or prior to 5/1/2020
      ● Each one of these patients is then demographically matched to two patients with
        negative or equivocal COVID-19 tests.
                  Age           47                                 Age        49          Age        46

                  Gender        M                                  Gender     M           Gender     M

                  Race          Black                              Race       Black       Race       Black
                                              Matching algorithm
                  Ethnicit      Unknow                             Ethnicit   Hispanic/   Ethnicit   Not
                  y             n                                  y          Latino      y          Hispanic

                  COVID         Positive                           COVID      Negative    COVID      Negative

      ● Each site securely sends this set of patients, along with their longitudinal EHR
        data from 1/1/2018 to the present, to the N3C on a regular basis.
A program of NIH’s National Center
                                       N3C Timeline
for Advancing Translational Sciences
N3C Dashboard
                                            covid.cd2h.org/dashboard
A program of NIH’s National Center
for Advancing Translational Sciences

                                       55 sites with data released (purple) and 37 sites with
                                       data pending (open circle). OCHIN is a national network
                                       of 131 sites (diamond).

                                                     covid.cd2h.org/teams

                                                      31 Domain teams!
         As of June 14, 2021
Data Transfer Agreement Signatories

                                                                               6/14/2021
                                                                       88 DTA Signatories

     Northwestern University at Chicago ᛫ Tufts Medical Center ᛫ Advocate Health Care Network ᛫ University of Alabama at Birmingham ᛫ Oregon Health & Science University ᛫
   University of Washington ᛫ Stanford University ᛫ The University of Michigan at Ann Arbor ᛫ Children's Hospital Colorado ᛫ Duke University ᛫ Medical College of Wisconsin ᛫ The
  Ohio State University ᛫ University of Nebraska Medical Center ᛫ University of Arkansas for Medical Sciences ᛫ George Washington University ᛫ Johns Hopkins University ᛫ West
Virginia University ᛫ Medical University of South Carolina ᛫ University of North Carolina at Chapel Hill ᛫ University of Virginia ᛫ The University of Texas Medical Branch at Galveston
 ᛫ University of Minnesota ᛫ University of Cincinnati ᛫ Columbia University Irving Medical Center ᛫ Cincinnati Children's Hospital Medical Center ᛫ Rush University Medical Center ᛫
     Nemours ᛫ University of Wisconsin-Madison ᛫ The State University of New York at Buffalo ᛫ Washington University in St. Louis ᛫ University of Rochester ᛫ The University of
     Chicago ᛫ University of Miami ᛫ The Scripps Research Institute ᛫ University of Texas Health Science Center at San Antonio ᛫ University of Kentucky ᛫ University of Illinois at
    Chicago ᛫ Virginia Commonwealth University ᛫ Weill Medical College of Cornell University ᛫ Carilion Clinic ᛫ University Medical Center New Orleans ᛫ The University of Iowa ᛫
Emory University ᛫ Maine Medical Center ᛫ The University of Texas Health Science Center at Houston ᛫ Boston University Medical Campus ᛫ The University of Utah ᛫ University of
  Southern California ᛫ George Washington Children's Research Institute ᛫ University of Colorado Denver I Anschutz Medical Campus ᛫ Mayo Clinic Rochester ᛫ The Rockefeller
        University ᛫ Montefiore Medical Center ᛫ University of Mississippi Medical Center ᛫ University of Oklahoma Health Sciences Center, Board of Regents ᛫ University of
  Massachusetts Medical School Worcester ᛫ Aurora Health Care ᛫ Penn State ᛫ University of New Mexico Health Sciences Center ᛫ NorthShore University HealthSystem ᛫ Wake
Forest University Health Sciences ᛫ Vanderbilt University Medical Center ᛫ Regenstrief Institute ᛫ Brown University ᛫ Stony Brook University ᛫ University of California, Davis ᛫ Yale
      New Haven Hospital ᛫ Rutgers, The State University of New Jersey ᛫ MedStar Health Research Institute ᛫ Loyola University Chicago ᛫ Loyola University Medical Center ᛫
                                                              University of Delaware ᛫ Children's Hospital of Philadelphia

    https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories
A program of NIH’s National Center
                                       N3C Enclave Data Stats
for Advancing Translational Sciences

                                              Pediatric cases
A program of NIH’s National Center
                                       N3C Enclave Data Stats
for Advancing Translational Sciences

                                                      Pediatric cases
A program of NIH’s National Center
                                       N3C Enclave Data Stats
for Advancing Translational Sciences
The National COVID Cohort Collaborative: Clinical
                                      Characterization and Early Severity Prediction
                                            Predicting Clinical Severity using machine
                                            learning (64 input variables)

                                            The most powerful predictors are patient age and widely available
                                            vital sign and laboratory values.
https://pubmed.ncbi.nlm.nih.gov/33469592/
Step 4.How
                                               Federated
                                                  does dataAnalytics  with HPC
                                                            get into N3C?
A program of NIH’s National Center
for Advancing Translational Sciences

      ● We have gone through the high-level purpose – EHR data about COVID-19
        patients
      ● Identified the contributing sites
      ● Know what the inclusion criteria for N3C is – documented COVID-19 testing
      ● Seen the dashboard overview of N3C and the overall cohort characteristics

      ● What are the data ingestion, harmonization, query, and publication processes?

      ● Data governance and security?

      ● And finally, what about cancer and COVID-19?
A program of NIH’s National Center
                                       Leveraging Common Data Models
for Advancing Translational Sciences

  ● These four data models are commonly used by
    academic medical centers throughout the US.
  ● CDMs are used to store EHR data in a
    consistent way.
  ● Sites participating in N3C may send data in one
    of these four formats—the idea is to make it
    as convenient as possible for sites to submit.
  ● Common data models also allow us to write a
    consistent computable phenotype that can be
    run with few local changes at sites with one or
    more of these data models.
A program of NIH’s National Center
for Advancing Translational Sciences

 Harmonization of N3C Data
A program of NIH’s National Center
                                       Data Availability vs Utility
for Advancing Translational Sciences

                                               ● Collections of data are not always useful
                                               ● Even if they are available

  ● Consistently classified data is
    alway more useful
FAIR: Findable, Accessible,
A program of NIH’s National Center
for Advancing Translational Sciences        Interoperable, Reusable
     What does Interoperable mean with respect to data? Harmonized!
     Syntactic Interoperability (harmonization)
       ● One can make sense of the structure
       ● Metaphor: sentence has good grammar
       ● Domain of the data standards and data model communities
     Semantic interoperability (harmonization)
       ● One can make sense of the meaning
       ● Metaphor: the words are understandable
       ● Domain of the vocabulary, ontology, classification communities
A program of NIH’s National Center
                                   N3C Data Ingestion & Harmonization Pipeline
for Advancing Translational Sciences

                                                                     Span manual
                                                                     curation of mapping
                                                                     resources to
                                                                     industrial scale
       (future)
                                                                     production
                                                                     transformation
A program of NIH’s National Center
                                       Harmonized, not Homogenous
for Advancing Translational Sciences

CDMs are built for purpose. Different CDMs emphasize and prioritize different things.
Collaborative
 Analytics -
 N3C Secure
Data Enclave

                Secure, reproducible, transparent, versioned, provenanced, attributed,
                          and shareable analytics on patient-level EHR data
A program of NIH’s National Center
                                       Federated versus Centralized DQ
for Advancing Translational Sciences

Many clinical data research networks are federated; N3C is centralized. Centralized datasets
have some advantages where data quality assessment is concerned.

                        Federated Network                     Centralized Data

                                                                          Questions asked
                                                                          directly against
                                                                          all sites’ data
                                                                          combined
A program of NIH’s National Center
                                            Federated versus Centralized DQ
for Advancing Translational Sciences

 With federated data, sites are benchmarked against                With centralized data, sites can be benchmarked
 themselves.                                                       against each other.

                                                                    Site   Patient   Visit Type   Adm. Date   Disc. Date
 We have 43                                                         1      123       IP           7/4/2020    7/8/2020
 qualifying                                          We have 806
 inpatient                             We have 27    qualifying     1      456       IP           5/6/2020    5/20/2020
 visits.                               qualifying    inpatient
                                       inpatient     visits.        2      987       IP           8/2/2019    8/7/2019

                                       visits.                      2      654       IP           9/3/2019    9/14/2019

                                                                    3      234       IP           1/26/2021   1/26/2021

                                                                    3      234       IP           1/26/2021   1/29/2021

                                                                    3      234       IP           1/26/2021   1/30/2021
               Site 1                       Site 2        Site 3
                                                                    3      234       IP           1/26/2021   1/27/2021
          Clearly, sites differ in how they define “a visit.”
A program of NIH’s National Center
                                       N3C’s DQ Process
for Advancing Translational Sciences

  How Would N3C Deal with This Finding?               Site   Patient   Visit Type   Adm. Date   Disc. Date
   ● Discover and discuss at weekly DQ meetings.
   ● Determine: Is this an issue…                     1      123       IP           7/4/2020    7/8/2020
       ○ For the site to fix?
                                                      1      456       IP           5/6/2020    5/20/2020
       ○ For us to handle on our end?
   ● Reach out to the site to get more information.   2      987       IP           8/2/2019    8/7/2019
       ○ What if they can’t fix it?
                                                      2      654       IP           9/3/2019    9/14/2019

                                                      3      234       IP           1/26/2021   1/26/2021

                                                      3      234       IP           1/26/2021   1/29/2021

                                                      3      234       IP           1/26/2021   1/30/2021

                                                      3      234       IP           1/26/2021   1/27/2021
A program of NIH’s National Center
                                          N3C’s DQ Process
for Advancing Translational Sciences

  How Would N3C Deal with This Finding?               Site   Patient   Visit Type   Adm. Date   Disc. Date
   ● Discover and discuss at weekly DQ meetings.
   ● Determine: Is this an issue…                     1      123       IP           7/4/2020    7/8/2020
       ○ For the site to fix?
                                                      1      456       IP           5/6/2020    5/20/2020
       ○ For us to handle on our end?
   ● Reach out to the site to get more information.   2      987       IP           8/2/2019    8/7/2019
       ○ What if they can’t fix it?
                                                      2      654       IP           9/3/2019    9/14/2019
     We can write an algorithm to make this
     site’s visits look more like the other sites:    3      234       IP           1/26/2021   1/26/2021

                                                      3      234       IP           1/26/2021   1/29/2021
     if:
      ● the visit type is inpatient                   3      234       IP           1/26/2021   1/30/2021
      ● and there are > 1 per patient
          per day                                     3      234       IP           1/26/2021   1/27/2021
     then:
      ● merge into a single “macro”
          visit
A program of NIH’s National Center
                                                                N3C’s DQ Process
for Advancing Translational Sciences

                                       Original Table                                                  Ready for Analysis

     Site          Patient             Visit Type   Adm. Date   Disc. Date            Site   Patient    Visit Type   Adm. Date   Disc. Date

     1             123                 IP           7/4/2020    7/8/2020              1      123        IP           7/4/2020    7/8/2020

     1             456                 IP           5/6/2020    5/20/2020             1      456        IP           5/6/2020    5/20/2020

     2             987                 IP           8/2/2019    8/7/2019              2      987        IP           8/2/2019    8/7/2019
                                                                             DQ fix
     2             654                 IP           9/3/2019    9/14/2019             2      654        IP           9/3/2019    9/14/2019

     3             234                 IP           1/26/2021   1/26/2021             3      234        IP           1/26/2021   1/30/2021

     3             234                 IP           1/26/2021   1/29/2021
                                                                                      Takeaways
     3             234                 IP           1/26/2021   1/30/2021              ● Centralized DQ processes allow us to fully
     3             234                 IP           1/26/2021   1/27/2021                  realize the potential of N3C’s large sample size.
                                                                                       ● All transformations are fully logged and always
                                                                                           completely reversible if needed.
A program of NIH’s National Center
                                   N3C Data Ingestion & Harmonization Pipeline
for Advancing Translational Sciences

       (future)
A program of NIH’s National Center
                                        Harmonizing numeric data
for Advancing Translational Sciences

       ● Problem: Different sites provide their
              data in different units

       ● Solution: Harmonize each to a standard
         unit
                Kilograms = Pounds / 2.20462
                Kilograms = Ounces / 35.274
                Kilograms = Grams / 1000
A program of NIH’s National Center
                                         Harmonizing numeric data
for Advancing Translational Sciences

       ● Problem: Some units are missing

       ● Solution 1: Contact the source

       ● Solution 2: N3C inference engine
                Kilograms = x / 2.20462 ?
                Kilograms = x / 35.274 ?
                Kilograms = x / 1000 ?
A program of NIH’s National Center
                                       Harmonization progress
for Advancing Translational Sciences

                                                          Humans measured in grams do not
    ● Harmonized measurements                             look the same as humans measured
       ○ By original unit                                 in kilograms!
       ○ Across many sites

                                          Homogeneity
                                          after
                                          harmonization
A program of NIH’s National Center
                                         Unit harmonization progress
for Advancing Translational Sciences

       ● ~2x increase in usable data from our
         harmonization procedures

                                        Canonical unit
                                        Uses a known conversion
                                        Unit not plausible
                                        Missing unit inferred
                                        Unit still missing

                              We can rescue
                              a lot of data!
A program of NIH’s National Center
                                   N3C Data Ingestion & Harmonization Pipeline
for Advancing Translational Sciences

       (future)
Long-COVID phenotypes are myriad
      patient-reported and researcher-measured phenotypes are starkly different
40   141     7
                     Map literature and patient-
                     reported terms to HPO

                                                   Pharyngalgia = Sore throat
                                                   Plain-language medical vocabulary for precision
                                                   diagnosis. Nat Genet. 2018 50:474-476.
A program of NIH’s National Center
                                       N3C Harmonization Takeaways
for Advancing Translational Sciences

      What N3C has revealed most in terms of needs:

         ● Interoperability - we need syntactic and semantic!
             ○ FHIR ⇒ OMOP (syntactic)
             ○ Common vocabulary/codeset mapping provenance
                and management (semantic)
         ● Approach data harmonization from an end-to-end data
           life cycle perspective
         ● Leverage USCDI, but build for
           interoperable semantic modeling
           and extensions
A program of NIH’s National Center
for Advancing Translational Sciences

 Governing N3C Data
A program of NIH’s National Center
                                       N3C: Unique Data Use and Privacy
for Advancing Translational Sciences

                                            Goal of the Data Use Agreement is Privacy Protection
                                            to Promote broad access:

                                             ●   COVID-Related research only
                                             ●   NIH housed secure repository
                                             ●   No re-identification of individuals or data source
                                             ●   No download or capture of raw data
                                             ●   Open platform to all researchers
                                             ●   Investigator activities are recorded and can be
                                                 audited for security and reproducibility
N3C: Governance and Access
Data Levels to Access
Data Use and Privacy

Goal of the Data Use Agreement is Privacy Protection to Promote broad access:
 ● COVID-Related research only
 ● No re-identification of individuals or data source
 ● No download or capture of raw data
 ● Open platform to all researchers
 ● Security: Activities in the N3C Data Enclave are recorded and can be audited
 ● Disclosure of research results to the N3C Data Enclave for the public good
 ● Analytics provenance
 ● Contributor Attribution tracking
N3C Provenance, Transparency,
A program of NIH’s National Center
for Advancing Translational Sciences
                                        Attribution & Rapid Sharing
                     N3C Attribution and Publication Principles
    ●     Transparent and collaborative environment where all contributions are acknowledged
    ●     Provenance and reproducibility
    ●     Promptly sharing research results with N3C users
    ●     Publish in high-impact journals
    ●     Attribution for all N3C artifacts

          Researchers, projects, and
          artifacts are all linked
          together in the enclave
          using the Contributor
          Attribution Model (CAM).
A program of NIH’s National Center
                                       N3C Data Access: Process
for Advancing Translational Sciences

                                       Data Use
                                       Agreement

                                                                             Data Use Request

                                               HSP / Security Training

                                                                         https://ncats.nih.gov/n3c/about/applying-for-access
A program of NIH’s National Center
for Advancing Translational Sciences

                          Realizing Team Science
N3C team Science within & across institutions

CTSAs                                                     N3C Domain Team Expertise:
                                                            ● Enclave technology
Key functions can                                           ● Data model (OMOP)
nucleate projects:                                          ● Terminologies
                                                            ● Data quality
● Education & training
                                                            ● Codesets, variables,
● Biostatistics                                               phenotype
● Study design                                              ● Using/parsing N3C data
● Evaluation                                                ● Workflows, methods,
● Informatics                                                 algorithms
● Clinical expertise
● Innovation &               Roles
  commercialization          Ingredients (Methods, datasets, instruments)
● Community &                Scientific questions
  partnerships                                            https://covid.cd2h.org/domain-teams
OUTCOMES OF COVID-19 IN
CANCER PATIENTS: REPORT
FROM THE NATIONAL COVID
COHORT COLLABORATIVE
(N3C)
Noha Sharafeldin, Benjamin Bates, Qianqian Song, Vithal Madhira, Yao
Yan, Sharlene Dong, Eileen Lee, Nathaniel Kuhrt, Yu Raymond Shao,
Feifan Liu, Timothy Bergquist, Justin Guinney, Jing Su, Umit Topaloglu
on behalf of the N3C Consortium

Given on June 4, 2021
https://covid.cd2h.org/   cd2h.slack.com                @data2health
60

N3C Oncology Domain Team (ODT)
                                                             Leadership

                                  Umit Topaloglu, PhD Noha Sharafeldin, MD, PhD Benjamin Bates, MD
                                      Wake Forest     The University of Alabama at Rutgers University
                                       University            Birmingham

                                                              https://covid.cd2h.org/oncology
                                                               Slack channel: #n3c-tt-oncology

   Noha Sharafeldin, MBBCh, PhD
61

N3C ODT Expertise
 Informatics          Biostatistics           Clinical           Epidemiology                        N3C data and Logic

  Umit Topaloglu         Jing Su              Noha Sharafeldin       Benjamin Bates         Justin Guinney      Vithal Madhira      Tim Bergquist

   Feifan Liu          Qianqian Song   Yu Raymond Shao       Nate Kuhrt     Sharlene Dong             Yao Yan                    Eileen Lee

        Noha Sharafeldin, MBBCh, PhD
A program of NIH’s National Center
                                        N3C Oncology
for Advancing Translational Sciences

                http://ascopubs.org/doi/full/10.1200/JCO.21.01074
63

N3C Cancer Cohort
      Primary Diagnosis

   Noha Sharafeldin, MBBCh, PhD
64

N3C Cancer Cohort

 Primary Outcome
 • All- cause mortality

 Secondary Outcomes
 (Clinical severity indicators
 requiring hospitalization)
 • Mechanical Ventilation

     Noha Sharafeldin, MBBCh, PhD
9
                                                                                                                                       65

Demographic, clinical, and tumor characteristics
   COVID-19 Positive

                                Age                                                       Sex
                                                2%
                                                       13%

                                                                    18-29
                                                                    30-49                                                  Female
                                                                                                49%               51%
                                 54%                                50-64                                                  Male
                                                             31%    65+

 Race                                                                       Geographical Location
                    4%                                                                                11%
         22%              13%                                                         22%
                                                                                                                        US-Northeast
                                                    Hispanic
                                                                                                                        US-Midwest
                                                    Non-Hispanic Black              5%                                  US-South
                                                    Non-Hispanic White                                      34%
                                                                                                                        US-West
                                                    Other or Unknown
                                                                                                                        Unknown
                    61%                                                                  28%

        Insert Name
        Noha Sharafeldin, MBBCh, PhD
        (Insert > Header & Footer > Apply to All)
10
                                                                                                              66

Demographic, clinical, and tumor characteristics
      COVID-19 Positive

                                                                                    ADJUSTED CCI
      Smoking status                                            18000
                                                                16000   41%
                                                                14000

                                                                12000                                   28%
                    14%
                                                                10000
                                                 Non-smoker
                                                                8000
                                                                              16%
                                                 Current or     6000
                                                Former smoker   4000                      9%
                                                                                                   6%
                          86%                                   2000
                                                                   0
                                                                         0     1           2       3     ≥4

    Insert Name
    Noha Sharafeldin, MBBCh, PhD
    (Insert > Header & Footer > Apply to All)
11
                                                                                                                                                           67

Demographic, clinical, and tumor characteristics
         COVID-19 Positive
                                                                     Type of primary malignancy

                                                                                     MULTI-SITE                                    11%
               3% 3%
         11%
                                                                      GASTROINTESTINAL CANCERS                             9%

                                                 Solid
                                                                        HEMATOLOGICAL CANCERS                                        12%
   12%                                           Liquid
                                                 Multi-Site
                                                 Unknown                      PROSTATE CANCER                                        12%
                                                 Undefined Primary
                                                                                BREAST CANCER                                               14%
                                    71%

                                                                                  SKIN CANCERS                                               15%

                                                                                                  0   1000   2000   3000    4000     5000    6000   7000

     Insert Name
     Noha Sharafeldin, MBBCh, PhD
     (Insert > Header & Footer > Apply to All)
68

COVID-19 Treatment

        COVID-19 Treatment (Yes)      COVID positive (n=38,614)
           Systemic antibiotics             4032(15.75%)
           Systemic steroids                3514(13.73%)
           Azithromycin                      1197(4.68%)
           Remdesivir                        1047(4.09%)
           Dexamethasone                     1029(4.02%)
           Hydroxychloroquine (HCQ)           364(1.42%)

   Noha Sharafeldin, MBBCh, PhD
69

Death and invasive ventilation in hospitalized patients

          Outcome                   COVID positive   COVID negative
                                      (n=19,515)      (n=184,988)
             Death                  2,894 (14.8%)    23,207 (12.5%)
             Invasive Ventilation    1,606 (8.2%)     9,576 (5.2%)

    Noha Sharafeldin, MBBCh, PhD
70

Survival Probability –
by COVID status

HR = 1.20 (95%CI: 1.15 – 1.24, p
71

Survival Probability by
cancer type among
COVID positive patients

    Noha Sharafeldin, MBBCh, PhD
72

Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients

    Noha Sharafeldin, MBBCh, PhD
73

Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients

    Noha Sharafeldin, MBBCh, PhD
74

Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients

    Noha Sharafeldin, MBBCh, PhD
75

Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients

    Noha Sharafeldin, MBBCh, PhD
76

Limitations

• RWD Challenges (e.g. data missingness)
• Limited capture of recent cancer therapy
• Potential misclassification of cancer patients
• Challenges in primary cancer diagnosis mapping and limited
  historical data
• Method for construction of COVID-19 negative control

     Noha Sharafeldin, MBBCh, PhD
77

Conclusions

• N3C represents a unique resource to examine effects of COVID-19 on cancer outcomes
• Largest COVID-19 and cancer cohort within the US
• Consistent with previous literature, older age, male gender, increasing comorbidities,
  and hematological malignancies were associated with higher mortality in patients with
  cancer and COVID-19
• The N3C dataset confirmed that cancer patients with COVID-19 who received recent
  immuno-, or targeted therapies were not at higher risks of overall mortality

      Noha Sharafeldin, MBBCh, PhD
78

Acknowledgements

   The Patients
                                                   NCATS U24 TR002306
   US Data Partners
                                                   NIGMS 5U54GM104942-04
   N3C Consortial Authors
                                                   NCI P30CA012197 [UT, QS]
   Christopher Chute
                                                   LLS 3386-19 [NS]
   Melissa Haendel
                                                   Indiana University Precision Health
   Amit Mitra
                                                   Initiative [JS]
   Ramakanth Kavuluru

                                  N3C Core Teams

   Noha Sharafeldin, MBBCh, PhD
79

Acknowledgements
We gratefully acknowledge contributions from the following N3C core teams:
• Principal Investigators: Melissa A. Haendel*, Christopher G. Chute*, Kenneth R. Gersing, Anita Walden
• Workstream, subgroup and administrative leaders: Melissa A. Haendel*, Tellen D. Bennett, Christopher G. Chute, David A. Eichmann, Justin Guinney, Warren A.
Kibbe, Hongfang Liu, Philip R.O. Payne, Emily R. Pfaff, Peter N. Robinson, Joel H. Saltz, Heidi Spratt, Justin Starren, Christine Suver, Adam B. Wilcox, Andrew E.
Williams, Chunlei Wu
• Key liaisons at data partner sites
• Regulatory staff at data partner sites
• Individuals at the sites who are responsible for creating the datasets and submitting data to N3C
• Data Ingest and Harmonization Team: Christopher G. Chute*, Emily R. Pfaff*, Davera Gabriel, Stephanie S. Hong, Kristin Kostka, Harold P. Lehmann, Richard A.
Moffitt, Michele Morris, Matvey B. Palchuk, Xiaohan Tanner Zhang, Richard L. Zhu
• Phenotype Team (Individuals who create the scripts that the sites use to submit their data, based on the COVID and Long COVID definitions): Emily R. Pfaff*,
Benjamin Amor, Mark M. Bissell, Marshall Clark, Andrew T. Girvin, Stephanie S. Hong, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B.
Palchuk, Kellie M. Walters
• Project Management and Operations Team: Anita Walden*, Yooree Chae, Connor Cook, Alexandra Dest, Racquel R. Dietz, Thomas Dillon, Patricia A. Francis, Rafael
Fuentes, Alexis Graves, Julie A. McMurry, Andrew J. Neumann, Shawn T. O'Neil, Andréa M. Volz, Elizabeth Zampino
• Partners from NIH and other federal agencies: Christopher P. Austin*, Kenneth R. Gersing*, Samuel Bozzette, Mariam Deacy, Nicole Garbarini, Michael G. Kurilla,
Sam G. Michael, Joni L. Rutter, Meredith Temple-O'Connor
• Analytics Team (Individuals who build the Enclave infrastructure, help create codesets, variables, and help Domain Teams and project teams with their datasets):
Benjamin Amor*, Mark M. Bissell, Katie Rebecca Bradwell, Andrew T. Girvin, Amin Manna, Nabeel Qureshi
• Publication Committee Management Team: Mary Morrison Saltz*, Christine Suver*, Christopher G. Chute, Melissa A. Haendel, Julie A. McMurry, Andréa M. Volz,
Anita Walden
• Publication Committee Review Team: Carolyn Bramante, Jeremy Richard Harper, Wenndy Hernandez, Farrukh M Koraishy, Federico Mariona, Saidulu Mattapally,
Amit Saha, Satyanarayana Vedula

           Noha Sharafeldin, MBBCh, PhD
N3C Registration/Training
A program of NIH’s National Center           https://covid.cd2h.org/tutorials
for Advancing Translational Sciences

                                                             Registration for Documents,
                                                             Meetings & the N3C Data Enclave

                                                             Requires Authentication

                                                             Enclave Checklist

                                                            Training Office Hours:
                                                            Tuesdays & Thursdays at 10-11 am PT/1-2 pm ET
                                                            Registration Required at this link

                                                            Orientation Video Coming Soon

                                                            Additional Training Tutorials available in the Enclave
Step 4. Federated Analytics with HPC
                                                   Takeaways
A program of NIH’s National Center
for Advancing Translational Sciences

               ● N3C comprises the largest, most representative patient-level COVID-19
                 cohort in the US and continues to grow

               ● We CAN do transparent, reproducible, innovative science (including ML)
                 on sensitive observational data at scale, together!

               ● N3C is an innovative partnership between clinical sites, CDM
                 communities, NIH ICs, CD2H, and commercial partners

               ● Automation of data extraction and minimum requirements reduces
                 burden and increases site participation

               ● Robust attribution of all contributors; also provides great venue for
                 trainees

               ● N3C data is complicated, but there are many people and resources to
                 help users do good science
A program of NIH’s National Center
                                        How to Get Involved with N3C
for Advancing Translational Sciences

                                                                          Register with N3C: https://labs.cd2h.org/registration/

                                                                          Joining Workstreams:
                                                                                 N3C Data Ingestion & Harmonization Workstream
                                                                                 Slack Channel Harmonization
                                                                                 Google Group Harmonization

                                                                                 N3C Phenotype & Data Acquisition Workstream
                                                                                 Slack Channel Phenotype
                                                                                 Google Group Phenotype

                                                                                 N3C Collaborative Analytics Workstream
                                                                                 Slack Channel Analytics
                                                                                 Google Group Analytics

                                                                                 N3C Data Partnership & Governance Workstream
         NCATS N3C Webpage                   N3C Website                         Slack Channel Governance
                                                                                 Google Group Governance

                                                                                 N3C Synthetic Clinical Data Workstream
     Additional Information:                                                     Slack Channel Synthetic
     Onboarding N3C, Slack, Google | Finding and Joining a Google Group          Google Group Synthetic

                                                                                 N3C Implementation Workstream- Coming soon
https://academic.oup.com/jamia/advance-
                                                                                                                                                        article/doi/10.1093/jamia/ocaa196/5893482

Melissa A. Haendel,1,4,7,8,10,13,14,52,78,101 Christopher G. Chute,1,4,8,10,13,14,52,78,100,101 Tellen D. Bennett,9,10,13,14,52,100,101 David A. Eichmann,4,9,10,13,78,101 Justin
Guinney,4,9,10,14,78,101 Warren A. Kibbe,9,10,52,78,101 Philip R.O. Payne,4,9,10,78,101 Emily R. Pfaff,9,10,13,15,52,78 Peter N. Robinson,4,9,10,15,52,78,100 Joel H.
Saltz,10,13,14,15,52,78,101 Heidi Spratt,9,10,100 Christine Suver,10,78,101 John Wilbanks,10,78,101 Adam B. Wilcox,10,101 Andrew E. Williams,10,13,78 Chunlei Wu,9,13,14,78
Clair Blacketer,15,52 Robert L. Bradford,9,52 James J. Cimino,10,14,101 Marshall Clark,9,15,52 Evan W. Colmenares,9,15,52 Patricia A. Francis,78 Davera
Gabriel,9,10,13,14,15,52 Alexis Graves,7,9,78 Raju Hemadri,9,15,52 Stephanie S. Hong,9,15,52 George Hripscak,10,52 Dazhi Jiao,9,15,52 Jeffrey G. Klann,14,52,101 Kristin
Kostka,9,15,52 Adam M. Lee,9,15,52 Harold P. Lehmann,9,15,52 Lora Lingrey,9,15,52 Robert T. Miller,9,15,52 Michele Morris,9,15,52 Shawn N. Murphy,9,15,52 Karthik
Natarajan,9,15,52 Matvey B. Palchuk,9,15,52 Usman Sheikh,9,78 Harold Solbrig,9,15,52 Shyam Visweswaran,10,15,52,101 Anita Walden,7,10,13,14,52,101 Kellie M.
Walters,10,14,101 Griffin M. Weber,10,101 Xiaohan Tanner Zhang,9,15,52 Richard L. Zhu,9,15,52 Benjamin Amor,78 Andrew T. Girvin,15,78 Amin Manna,78 Nabeel
Qureshi,15,78 Michael G. Kurilla,10,78 Sam G. Michael,10,78 Lili M. Portilla,101 Joni L. Rutter,1,101 Christopher P. Austin,101 Ken R. Gersing,78,101
Shaymaa Al-Shukri,4,15 Adil Alaoui,101 Ahmad Baghal,15 Pamela D. Banning,15,100 Edward M. Barbour,8,15 Michael J. Becich,15,52,101 Afshin Beheshti,14 Gordon R. Bernard,8,15 Sharmodeep Bhattacharyya,100 Mark
M. Bissell,9,15 L. Ebony Boulware,14,100 Samuel Bozzette,100,101 Donald E. Brown,101 John B. Buse,14 Brian J. Bush,8,101 Tiffany J. Callahan,14,52 Thomas R. Campion,8,15 Elena Casiraghi,9,15 Ammar A.
Chaudhry,13,14 Guanhua Chen,9 Anjun Chen,13 Gari D. Clifford,8,15 Megan P. Coffee,14,100 Tom Conlin,14 Connor Cook,7,78 Keith A. Crandall,9,14,101 Mariam Deacy,78 Racquel R. Dietz,78 Nicholas J. Dobbins,8,9
Peter L. Elkin,15,52,100 Peter J. Embi,52,101 Julio C. Facelli,8,15 Karamarie Fecho,13 Xue Feng,9 Randi E. Foraker,8,13,15 Tamas S. Gal,8,15 Linqiang Ge,14 George Golovko,15,101 Ramkiran Gouripeddi,14,15 Casey S.
Greene,13,14 Sangeeta Gupta,52,101 Ashish Gupta,13,101 Janos G. Hajagos,9,15 David A. Hanauer,15,52 Jeremy Richard Harper,9,14,52 Nomi L. Harris,14 Paul A. Harris,101 Mehadi R. Hassan,9 Yongqun He,15,52,100
Elaine L. Hill,9,14 Maureen E. Hoatlin,14 Kristi L. Holmes,4,101 LaRon Hughes,14 Randeep S. Jawa,14 Guoqian Jiang,14 Xia Jing,7,14 Marcin P. Joachimiak,8,15 Steven G. Johnson,9,14,101 Rishikesan
Kamaleswaran,9,15,78 Thomas George Kannampallil,15,101 Andrew S. Kanter,15,52 Ramakanth Kavuluru,9,13,14 Kamil Khanipov,8,14 Hadi Kharrazi,9,14 Dongkyu Kim,15,52 Boyd M. Knosp,8,15 Arunkumar Krishnan,9
Tahsin Kurc,9,15 Albert M. Lai,101 Christophe G. Lambert,52,101 Michael Larionov,14 Stephen B. Lee,1,14 Michael D. Lesh,9 Olivier Lichtarge,14 John Liu,9 Sijia Liu,8,9,101 Hongfang Liu,9,15 Johanna J. Loomba,1,15,78,101
Sandeep K. Mallipattu,9,14,15 Chaitanya K. Mamillapalli,14 Christopher E. Mason,15 Jomol P. Mathew,8,15,52 James C. McClay,101 Julie A. McMurry,1,4,7,9,13,14,78 Paras P. Mehta,14 Ofer Mendelevitch,9 Stephane
Meystre,8,14,15 Richard A. Moffitt,9,13,15 Jason H. Moore,8,9 Hiroki Morizono,13,14,15,52 Christopher J. Mungall,15,52 Monica C. Munoz-Torres,7,10,78 Andrew J. Neumann,78 Xia Ning,14 Jennifer E. Nyland,13,14 Lisa
O'Keefe,78 Anna O'Malley,78 Shawn T. O'Neil,78 Jihad S. Obeid,10,14,15 Elizabeth L. Ogburn,13 Jimmy Phuong,9,15,52,100,101 Jose D Posada,8,15 Prateek Prasanna,14,52 Fred Prior,9,14,15 Justin Prosser,9,78 Amanda
Lienau Purnell,101 Ali Rahnavard,9,52 Harish Ramadas,9,52,78 Justin T. Reese,9,10 Jennifer L. Robinson,14,100 Daniel L. Rubin,101 Cody D. Rutherford,9,101 Eugene M. Sadhu,8,15 Amit Saha,9 Mary Morrison
Saltz,15,52,101 Thomas Schaffter,78 Titus KL Schleyer,14 Soko Setoguchi,8,14,15 Nigam H. Shah,8,14 Noha Sharafeldin,14 Evan Sholle,15,52 Jonathan C. Silverstein,15,52,101 Anthony Solomonides,101 Julian Solway,14,101
Jing Su,101 Vignesh Subbian,9,52,101 Hyo Jung Tak,15 Bradley W. Taylor,9,14 Anne E. Thessen,14,101 Jason A. Thomas,15 Umit Topaloglu,15,52 Deepak R. Unni,8,9,15,52 Joshua T. Vogelstein,14 Andréa M. Volz,7 David
A. Williams,14,15 Kelli M. Wilson,9,78 Clark B. Xu,8,9,15 Hua Xu,9,10,14 Yao Yan,9,15,52 Elizabeth Zak,8,15 Lanjing Zhang,101 Chengda Zhang,14 Jingyi Zheng,14
 1CREDIT_00000001 (Conceptualization) 4CREDIT_00000004 (Funding acquisition) 7CRO_0000007 (Marketing and Communications) 8CREDIT_00000008 (Resources) 9CREDIT_00000009 (Software role) 10CREDIT_00000010
(Supervision role) 13CREDIT_00000013 (Original draft) 14CREDIT_00000014 (Review and editing) 15CRO_0000015 (Data role) 52CRO_0000052 (Standards role) 78CRO_0000078 (Infrastructure role) 100Clinical Use Cases 101Governance
Questions or Comments?
A program of NIH’s National Center
for Advancing Translational Sciences

           Thank you!
                     Thank you!
You can also read