SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences

Page created by John Vasquez
 
CONTINUE READING
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT Analysis in Q4

2018 SMRT Informatics Developers Conference, Leiden – James Drake
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2018 by Pacific Biosciences of California, Inc. All rights reserved.
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT Link
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT LINK – KEY APPLICATIONS

Iso-Seq Analysis
          https://github.com/PacificBiosciences/IsoSeq3
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT LINK – KEY APPLICATION UPDATES

Structural Variation
                                                deletion   insertion
- 85% sensitivity with merely 10-fold
  coverage
- Multiple sample support (joint calling)
- Insertions/Deletions (down to 50 bp)
- Base-level edge resolution
- Tandem repeat aware                       inversion       translocation
- New SV types
   - Translocations
   - Inversions
- Indels (20-49)
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT LINK USABILITY – DATASETS
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT LINK USABILITY – DATASETS
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT LINK USABILITY – BARCODED DATA
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT LINK – SECURITY

- Why? PACB maturing beyond research towards clinical
- Communication with SMRT Link servers now encrypted via HTTPS
- Impact: API hardening, access now requires authentication, even with
  instrument
- Shipped command line interface will support out-of-the box
SMRT Analysis in Q4 2018 SMRT Informatics Developers Conference, Leiden - James Drake - Pacific Biosciences
SMRT Tools
SMRT TOOLS – DOCUMENTATION
  https://www.pacb.com/support/software-downloads/
SMRT TOOLS – CONSENSUS

Concept
                     Approaches to fixing noisy reads by piling them up

Algorithm (model)                                                                      Plurality
   Quiver (Machine Learning - CRF)                       Arrow (HMM)                 Partial Order
            Version 1, built during PacBio RS II   Version 2, built during Sequel
                                                                                     Many More …

Implementation

          Single-Molecule                                         Multi-Molecule
                                                                       Genomic Consensus
             CCS v1 – Quiver-based                                       aka variantCaller
             CCS v2 – Arrow-based                                    (employs Quiver or Arrow)
SMRT TOOLS – SINGLE-MOLECULE CONSENSUS (CCS)

- Linear increase in QV
  with each additional
  pass
- Works on a range of
  insert sizes
- Longer movies enable
  longer inserts
SMRT TOOLS – CONSENSUS

Residual Errors – Haploid Homopolymer Regions
- Homopolymer stretches of 4 bp or more can be challenging
- Indels are degenerate in these stretches
      AAAA → AAA (deletion)           AAAA → AAAAA (insertion)
        AAAA → - AAA (?)                 AAAA → AAAAA (?)
        AAAA → A - AA (?)                AAAA → AAAAA (?)
        AAAA → AA - A (?)                AAAA → AAAAA (?)
        AAAA → AAA - (?)                 AAAA → AAAAA (?)
- Higher coverage required can (sometimes) resolve these homopolymers in
  Arrow
- Correct answer is in the data
- Solution space lies outside of the current analysis approach
- New analysis - prototype - showing excellent results!
SMRT TOOLS – CONSENSUS

Residual Errors – Diploid Regions
- Spurious consensus deletions at heterozygous locations
- Arrow algorithm designed initially for haploid calling
- Substitution is a foreign concept
- Experimental diploid mode now available
PBBAMIFY – CONSENSUS WITHOUT BLASR

- Sometimes, you can’t use BLASR
    - Reference too large (>4Gb)
    - Impatient
    - Can’t build/install

  Third-           Standard
                    Aligned
   party
                     Bam
  aligner
                                              PacBio
                                                          Multi-
                                              Aligned
                                   pbbamify              molecule
                                               Bam
                                                        consensus

                   PacBio
                    BAM
BINARY RELEASE – FALCON/UNZIP

         Search for “Falcon Assembler Documentation”
Community Releases
A BRIEF HISTORY

- Challenge: Making software for N choose M
  systems … portably
- Outdated OS versions, even older tool-
  chains
- Modular Build System
                  MOBS
PITCHFORK

- Driven by internal developer needs
- What is it, exactly?
  - Abstract meta-builder
  - 500 lines of make file syntax … actually closer to 5000
  - Poor mans package manager
- Became the de facto way of getting latest tools
- Sizeable technical debt, most issues stemming from … more custom
  environments
- Make syntax not the friendliest or portable
A BETTER ALTERNATIVE
COMPARING THE EXPERIENCE
       Pitchfork           Bioconda
ACKNOWLEDGEMENTS

    TAG Team         SMRT Link Team     Marketing
    Armin Töpfer     Aaron Klammer      Tzvetana Kerelska
    Michael Brown    Nathaniel Echols   Mary Budagyan
    Yuan Li          Michael Kocher     Aaron Wenger
    Chris Dunn       Paul Fernhaut      Liz Tseng
                                        Sarah Kinghan
    Derek Barnett    Michael Cantor
                                        Greg Concepcion
    Brett Bowman     Ben Lerch
    David Seifert
    Ivan Sovic
    Zeljko Dzakula
    Rob Grothe*
www.pacb.com
   For Research Use Only. Not for use in diagnostics procedures. © Copyright 2018 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo,
PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx.
                                                   FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies.
                                                                All other trademarks are the sole property of their respective owners.
You can also read