Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator

Page created by Matthew Fischer
 
CONTINUE READING
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Managing data in your
      own lab

              Ian Berry
        Software Developer
  Oxford Protein Production Facility
BIOXHIT Working Group 1 Coordinator
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Overview
•   What is information management?
•   Types of laboratory processes
•   Benefits of a LIMS
•   What are the potential pitfalls
•   The ideal world… and reality
•   Tools:
    –   PiMS
    –   xtalPiMS
    –   e­HTPX
    –   Data Processing and Structure Solution
• Data mining
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
What is information management?
• A process for storing information where it
  can be retrieved later
  – Processes include human memory and paper
    systems as well as sophisticated relational
    database systems
  – Purpose of retrieval can vary from supporting
    the next experiment to depositing data
  – Automated systems may require electronic
    information management
  – In a laboratory setting it can be considered a
    branch of bioinformatics
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Types of Information Management
• Paper­based records
  – Well suited to independent research
  – Long­term archive
• Electronic Laboratory Notebooks (ELN)
  – Central repository of information
  – Electronic version of paper systems
• Laboratory Information Management Systems
  (LIMS)
  – Relational database
  – Model for laboratory processes
  – Snapshot of current state of laboratory
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Types of laboratory process
• Full projects (studying one protein?)
  – Long workflow with many decision points
  – No two projects are identical
• One­off experiments (specific assays?)
  – Planned for each experiment
  – Number of inputs and outputs undetermined
• Routine experiments (purification?)
  – Performed the same way for several targets
• High­throughput experiments (cloning?)
  – Using specialised equipment including robots
  – Tracking samples becomes paramount
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Benefits of LIMS
• Distributed projects
  – Information can be accessed anywhere
• Collaborative projects
  – Different people record into same store
• Miniaturized projects
  – Labelling of samples becomes impossible
• Automated projects
  – Handling layouts in plates etc.
• High­throughput processes
  – System managed by computer with automated
    sample tracking
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
In an ideal world…
• When depositing your new structure you
  would be able to include:
  – Exactly how you processed bioinformatics
  – Exactly how you created the protein
     • What chemicals, sources, batch numbers, etc.
  – Exactly how this protein crystallized
     • All conditions with hits, which were used for
       collection (and why)
  – What Ligands / Soaks were used
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
In an ideal world…
• Exactly how the data was collected
  – Which synchrotron / home source
  – Which beamline (beamline parameters)
• Exactly how the data were processed?
  – Which programs / arguments
• Plus…
  – A summary of the failed experiments to get the final
    result!
i.e. the methods section of your
  structure paper!
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Back to reality…
• Good Information management is
  important:
  – It means we can do better science
  – It will mean that you have more time to do the
    science as more things are recorded
    automatically*

                                  * eventually!
Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Potential pitfalls of LIMS
• Data loss
  – Hardware failure – manageable
  – Data corruption – potentially catastrophic
• Data integrity
  – Data need to be described properly
  – LIMS can default to being ELN
• Extra burden of recording data
  – Takes time for no immediate benefit
  – Need easy and intuitive input – risk of sloppiness
• Compliance
  – Unrecorded data are lost
  – Incomplete data may break data “chain”
Potential pitfalls of LIMS
• Different Lab practices:
  – Through the development of these tools it has
    become obvious….
     • Everyone works differently!
     • Every lab has different processes!
     • Every lab has a different focus!
  – Do we create the LIMS according to the
    processes of one lab and force another to fit
    with that or make it so generic that it does not
    model any system perfectly?
What tools are we using?
• PiMS for Protein Production
• xtalPiMS for Crystallization
• e­HTPX for Managing Synchrotron trips
• ISPyB for managing data collection at the
  Synchrotron
• CCP4, Xtrack (and others) for managing
  data processing
How do they   fit
                 Machine
                         together?
                             Integration

                                                   Data
                                               Management
                            Crystallization   (Xtrack) & Data   Deposition
                             (xtalPiMS)         Processing       (PDB)
                                                  (CCP4)
                 PiMS
               (Protein
              Production)      e­HTPX          Synchrotro
 Machine
Integration                                        n
                                                (ISPyB)

                                               Machine
                                              Integration
What is PiMS?
• A software development project aiming to
  develop an easy­to­use Laboratory Information
  Management System (LIMS) suitable for
  tracking the complex and rapidly evolving
  laboratory practices associated with protein
  production in the context of structural biology.
• PiMS is being developed to commercial software
  standards of reliability and usability and will be
  freely available to academic laboratories.
Funding and Usage
• UK Funded development with input and
  support from European labs.
• It is being used or evaluated in several
  labs around the UK and Europe, e.g.
  Oxford, St Andrews, NKI Amsterdam.
• Interest has been shown as far afield as
  China as well as by several major
  pharmaceutical companies.
Basic concepts of PiMS
PiMS uses a few simple key concepts which can be linked
  together to model complex workflows
• Targets
   – Description of sequences, store annotations
• Constructs
   – Starting points for real experiments, link to targets
• Samples
   – Tracked samples made & used by experiments
   – Samples have types, owners, locations etc.
• Experiments
   – Take one (or more samples), produce new sample(s) as outputs
Experiments and
         protocols
• A protocol is a reusable user­defined template
  describing what you record for your experiments.
• Parameters
  – Numerical values, free text values, T/F. E.g.
    incubation temperature or the number of PCR cycles;
    details of incubation conditions; was reagent added?
• Input Samples
  – Samples or reagents used when performing an
    experiment that you wish to track
• Output Samples
  – Samples or reagents produced when performing an
    experiment that you wish to track
More about protocols
Typing of PiMS items
Typing helps PiMS offer sensible choices: only a
  plasmid can be used for transfection
  experiments…
• Samples
  – Typed to show what they are

• Input/Output samples for protocols
  – State what type of sample can be used and what is
    produced

• Experiments and protocols
  – An experiment type is defined by its protocol. A
    protocol type links similar protocols together
Experiments & samples →
       Workflows
  Sample A
                      Expt 3
   Expt 1

                     Sample D
  Sample B

   Expt 2             Expt 4

  Sample C     Sample E1   Sample E2
The PiMS holder
      (plate experiments)
A holder groups samples. This allows PiMS to
  perform plate experiments in groups
• Samples
  – For plate experiments output samples of previous
    experiment are mapped to input samples of next.
    (Provided sample type matches!)

• User interface for plate experiments
  – Gives graphical and spreadsheet views. Allows
    editing, reformatting and spreadsheet upload
a c h
     a tt
  n
Ca iles to &
   f       l e s s
     m  p ent
  sa erim
  exp
What is xtalPiMS?
• An extension to PiMS to cover
  crystallization, crystal handling and data
  collection
• Will integrate with automatic and manual
  imaging systems
• Will integrate with liquid handling robots
• Provides a single interface for viewing
  images from multiple imagers
Funding and usage
• Funded for two years by BIOXHIT until
  June 2008.
• Three developers:
  – Ian Berry (OPPF, UK)
  – Gael Seroul (EMBL Grenoble, France)
  – Diederick de Vries (NKI, The Netherlands)
• Current version in full time use at the
  OPPF (20,000 plates, 50,000,000 images)
Basic concepts of
            xtalPiMS
•   Liquid Handling robot integration
•   Imager integration
•   Image processing / analysis
•   Web­based interface
    –   Create experiments
    –   Monitor experiments
    –   Screen Management
    –   Optimisation
    –   Trip Management (merging with e­HTPX)
    –   Data Collection results
OPPF Crystallization Facility Robots
Concepts
• Plate Experiment
  – Each plate can contain 1 or more plate experiments
    (either separated by sub­position or location)
• Plate Inspections
  – Whenever a plate is inspected by an imager or human
    on a microscope a new plate inspection is created
• Annotations
  – An annotation is a “score” for an image, e.g. crystal
What is e­HTPX?
• Simple answer:
  – A great many things!
• Longer answer…
  – A client at the home lab for managing:
     •   Crystal Handling.
     •   Synchrotron trips.
     •   Shipments between sites.
     •   Data collection information.
  – A server at a remote site
     • For receiving information from the home sites.
     • Providing the “service” (e.g. data collection).
     • Providing access to the results / data about what happened.
What is e­HTPX?
• For a scientist it provides a management
  tool for:
  – Lab hardware (pins, pucks, etc)
  – Crystal handling
  – Mounting crystals
  – Shipments to synchrotrons
  – Return shipments
  – Beamline meta­data retrieval
  – Upload of data via Excel Spreadsheet
Funding and usage
• Funding came from UK BBSRC
• e­HTPX clients have been used by:
  – York Structural Biology Lab, University of York
  – OPPF, Oxford
  – University of Oulu, Finland
  – Adam Mickiewicz University, Poznan, Poland
  – University of Crete, Crete
• Servers (ISPyB) at ESRF and Diamond
Basic Concepts
•   Crystal Drops
•   Mounted Drops
•   Pins
•   Pucks / Canes
•   Dewars
•   Plate Storage
•   Shipping Agents
•   Locations
•   Diffraction Metadata
“Speaks e­HTPX”
• Every Synchrotron has a different data collection
  database.
• Every home source has a different data
  collection database.
• But… as long as they “Speak e­HTPX”, you will
  be able to submit your crystal data and get your
  data collection meta­data home again to store in
  a local database.
• We are working with Synchrotrons to integrate e­
  HTPX messaging into their systems.
The future of e­HTPX
• The current version is available for use
  – Not straightforward to install at this stage
• e­HTPX will be integrated into xtalPIMS to
  provide the seamless integration of
  crystallization and data collection
  information for the home lab.
Data Processing and
 Structure Solution
Data Processing with XIA2

• xia2 is an automated data reduction
  system designed to work from raw
  diffraction data and a little metadata, and
  produce usefully reduced data in a form
  suitable for immediately starting phasing
  and structure solution, e.g. through Mr
  BUMP or your favourite experimental
  phasing suite.
Structure Solution
• Many data processing pipelines available
• Several systems for storing data collection
  information…
• Existing solutions:
  –   Harvesting tools exist within CCP4.
  –   pdb_extract suite
  –   HKL 3000
  –   The XTRACK database
• A system based on these will be included in
  xtalPIMS
Loading from ISPyB to XTrack
XTrack

http://xray.bmc.uu.se/xtrack/
Data Mining
Data Mining
• The bonus of having everything recorded
  is the ability to feedback information and
  improve techniques and get better
  science!
• Example: OPPF Glycosylated Protein
  Screen – conditions taken from standard
  crystallization sparse­matrix screens and
  reformatted to provide a good first pass at
  getting crystals based on prior knowledge
Acknowledgements
•   BIOXHIT
•   OPPF
    – Robert Esnouf
    – Jon Diprose
    – Dave Stuart
•   NKI
    – Tassos Perrakis
    – Diederick de Vries (xtalPiMS developer)
•   EMBL Grenoble
    – Josan Marquez
    – Gael Seroul (xtalPiMS developer)
•   All the PiMS Developers
•   All the e­HTPX Developers
•   All the ESRF ISPyB Developers
More information
• PiMS
  – http://www.pims­lims.org
• xtalPiMS
  – http://www.oppf.ox.ac.uk/xtalpims
• e­HTPX
  – http://www.oppf.ox.ac.uk/ehtpx
• XIA2
  – http://www.ccp4.ac.uk/xia/
• XTrack
  – http://xray.bmc.uu.se/xtrack/
You can also read