BioinfoGRID Project Milanesi Luciano National Research Council Institute of Biomedical Technologies, Milan, Italy ...

Page created by Charlotte Murray
 
CONTINUE READING
BioinfoGRID Project
                   Milanesi Luciano
                   National Research Council
                   I tit t off Biomedical
                   Institute    Bi  di l TTechnologies,
                                               h l i
                   Milan, Italy
                   luciano.milanesi@itb.cnr.it

Milanesi Luciano           EGEE User Forum, Clermont-Ferrand , France 11-14 February, 2008
Networks of resources
• The potential of new biological and biomedical
  technological platforms in connection with HPC and
  GRID technologygy will be p
                            particularly
                                       y useful to deal with the
  increasing amount, complexity, and heterogeneity of
  biological and biomedical data.
• Bioinformatics applications for eHealth have become an
  ideal research area where computer scientists can apply
  and further develop new intelligent computation methods,
  in both experimental and theoretical cases
                                         cases.

 Milanesi Luciano         BioinfoGRID Symposium, Milan 10-13 December 2007   2
BioinfoGRID Project
         BioinfoGRID Project web site: www.bioinfogrid.eu

Milanesi Luciano               BioinfoGRID Symposium, Milan 10-13 December 2007   3
Consortium

Milanesi Luciano   BioinfoGRID Symposium, Milan 10-13 December 2007   4
BioinfoGRID Objectives
• Objective of the BioinfoGRID project

 Milanesi Luciano      BioinfoGRID Symposium, Milan 10-13 December 2007   5
Interaction with related projects
At present the BioinfoGRID project has established
co-operations with the following projects initiative:
•    EGEE
•    BELIEF
•    EMBRACE
•    EUCHINAGRID
•    EUMEDGRID
•    EELA
•    DILIGENT
•    ICEAGE
•    LITBIO
•    LIBI
•    HEALTHGRID
•    WISDOM

    Milanesi Luciano         BioinfoGRID Symposium, Milan 10-13 December 2007   6
BioinfoGRID Work Packages
Work-package No                         Work Package title
        WP1        Genomics Applications in GRID

        WP2        Proteomics Applications in GRID

        WP3        Transcriptomics Applications in GRID

        WP4        Database and Functional Genomics Applications

        WP5        Molecular Dynamics Applications

        WP6        Coordination of technical aspects and relation with Grid
                                     j    , user training,
                   infrastructure Projects,             g, application
                                                            pp         support
                                                                         pp
                   and resources integration.

        WP7        Dissemination and Outreach.

        WP8        Project Management Office

Milanesi Luciano             BioinfoGRID Symposium, Milan 10-13 December 2007    7
WP1 – Genomics Applications

GCG                                                                    In house
                                                                       In-house
(~130                                                                developments
programs)                                                            - own programms
                                                                     - automated tasks
                                HUSAR
                            Program Package
EMBOSS
(~150                                                                 Third-party
                                                                      Third party
programs)                                                             programs

                   DATABASES
                   - >300                            SRS
                   - Prompt updates          ((Sequence
                                                 q      Retrieval
                   (daily, weekly)                 System)

Milanesi Luciano                      BioinfoGRID Symposium, Milan 10-13 December 2007   8
WP1 – Genomics Applications
• Integrating
  I t    ti W3H,
              W3H SoapLab
                  S   L b and
                            d th
                              the GRID
                                       target setup   preliminary setup     HTML pages
                                                                              W2H
                          WebService
                                                             Solaris (OS)
           ScLinux (OS)

                                                                    W3H
                 SoapL                                           analysis
                                                                       y
                  ab                                               tasks
                                                             @dkfz-heidelberg.de
            Grid             any
            Client          more
            toolkit        software                                       ScLinux (OS)
                              ??
                                                         ssh
                                                                           %
                          G
                          Grid API
                                                                           submit_formatd
                                                                           b …            any
                                                                            Grid
                      Interface                                            %Client       more
                                                                                        software
                                                                           submit_blastal
                                                                            toolkit        ??
                                                                           l …
                                                                          @dkfz or anywhere else
           Grid CE
                                                             Grid CE

             % formatdb                                         % formatdb
             …                                                  …
             % blastall                                         % blastall
             …                                                  …

 Milanesi Luciano                             BioinfoGRID Symposium, Milan 10-13 December 2007     9
WP2 – Proteomics Applications
•      Perform functional protein analysis in GRID by using
       the functional protein domain annotations on large protein
       families using GRID and related databases
                                       databases.
•      All 518 human protein kinases and 5129 proteins from
       non-redundant chainset of Protein DataBank were
       analyzed with InterProScan applications

    Milanesi Luciano         BioinfoGRID Symposium, Milan 10-13 December 2007   10
WP2 – Proteomics Applications
•      Protein surface calculation in GRID. : the grid was used to
       compute the volumetric description of the proteins obtaining a
       precise representation of the corresponding surface. Then
       protein interactions could be quickly screened by the mean of
       surface analysis.
       – The ProSite domains were analyzedy     all-against-all
                                                     g
       – ATP-E against its inhibitor
       – Collagen against integrin

    Milanesi Luciano           BioinfoGRID Symposium, Milan 10-13 December 2007   11
WP3 – Transcriptomics applications
• Ph
  Phylogenetics
       l      ti : Reconstructing
                    R       t ti theth evolutionary
                                             l ti       hi
                                                        history
                                                           t    off
  a group of taxa is major research thrust in computational
  biology
        gy and a standard p
                          part of exploratory
                                    p        y sequence
                                                   q
  analysis.
• An evolutionary history not only gives relationships among
  taxa but also an important tool for inferring structural
  taxa,                                           structural,
  physiological, and biochemical properties of sequences
  from other similar sequences, and reconstruction of tissue
  evolution.

 Milanesi Luciano            BioinfoGRID Symposium, Milan 10-13 December 2007   12
WP4 – Databases & Genomics Applications
• Work Package 4: Databases and Functional Genomics
  Applications
   – Testing the main biological databases in the Grid
     environment
      ƒ optimization
          p            on storageg space,
                                      p  , bandwidth,, download
        time
   – Testing performances and scalability of database-based
     applications
      ƒ performances/scalability testing according to various
                                        g
        use cases and submission algorithms
   – 1 challenge: Gene Analogous Finder
      ƒ 55+ years of computation on a single CPU, not
        f
        feasible
             ibl iin a llocall environment.
                                  i

 Milanesi Luciano         BioinfoGRID Symposium, Milan 10-13 December 2007
WP4 – Databases handling

• GridDBManager
   – Automatic Updater
      ƒ Timer based monitoring and update of Grid ported
        databases
   – Adaptive replica manager
      ƒ Constantly adapts the number of replicas in relation
                  g of each database in the last 10 days
        to the usage                                   y
   – Version Regression
      ƒ Keeps patches on the Grid for allowing regression of
        eachhddatabase
                  b    to an earlier
                                li version
                                        i

  Milanesi Luciano        BioinfoGRID Symposium, Milan 10-13 December 2007
WP4 – Methods - GridDBManager

Milanesi Luciano         BioinfoGRID Symposium, Milan 10-13 December 2007   15
WP4 – Methods - DBApp Perf. Testing

• Testing performances and scalability of Database-Oriented
  Bioinformatics Applications (DBApp) in the EGEE GRID
    – Testing Performance and Scalability
          ƒ Grid: too manyy variables (queue
                                      (q     time, database
            download time, queue failures, execution failures)
          ƒ Submission mode: too many variables (number of jobs,
            rate-limiting settings, resubmission algorithm)
          ƒ Application too many variables: (performance of
            specific application, location of database)
          ƒ Probing of Grid performances
          ƒ Numeric simulation for all algorithms
 Milanesi Luciano            BioinfoGRID Symposium, Milan 10-13 December 2007   16
WP4 – Methods - DBApp Perf. Testing
• Probing Grid performances (Example)
    – Grid queue times and reliability
                 ƒ Sent 150 jobs in 3 groups of 50 at different times
                                                          Grid queue times
                                                             (normal load)

                     30

                     25

                     20
         % of jobs

                     15

                     10

                     5

                     0
                                     1 2min
                                     1-2min            4 10min
                                                       4-10min         30min 1h
                                                                       30min-1h           4h 8h
                                                                                          4h-8h         Time out
                                                                                                        Time-out
                          8h

                                                                 Queue times

 Milanesi Luciano                                                 BioinfoGRID Symposium, Milan 10-13 December 2007   17
WP5 – Molecular docking

The neuraminidase viruses is considered a valid target for antiviral drugs

 Milanesi Luciano               BioinfoGRID Symposium, Milan 10-13 December 2007   18
WP5 – Molecular docking

Starting compound               Starting target
     database                  structure model                            Docking: predict how
                                                                          small molecules bind
                                                                             to a receptor of
                     DOCKING                                               known 3D structure

                      Predicted
                   binding models                 There are successful examples
                                                     – rapid,
                                                          id
                                                     – cost effective…
                    Post-analysis
                                                  But there are limitations
                                                     – CPU and storage needed
                    Compounds
                     for assay
                                              More specific
                                                    p       talk by
                                                                  y Ana Lucia Da Costa
                                              Wednesday 13th 11:15 – Room: Bordeaux

Milanesi Luciano                          BioinfoGRID Symposium, Milan 10-13 December 2007   19
WP7 – Dissemination
• The following series of events were specifically associated to
  or organized by the BioinfoGRID project:
    – BioinfoGRID Symposium 2007: December 10th-13        13th 2007,
                                                               2007 Milan
    – BioinfoGRID Session at EGEE '07: October 4th 2007, Budapest
    – Biomed Grid School, Varenna, Italy, May 14th-19th 2007
    – BioinfoGRID Workshop at Healthgrid 2007 Conference - Geneva,
      Switzerland, 24th April 2007
    – NETTAB 2006 Workshop: Distributed Applications, Web Services,
      Tools and GRID Infrastructures for Bioinformatics - Santa
      Margherita di Pula, Sardinia, Italy - July 10-13th, 2006
    – BioinfoGRID Initial Training Course,
                                   Course Bari     Italy, March 8th-10th 2006
                                             Bari, Italy

• In addition, the BioinfoGRID project has been represented at
  58 national and international conferences and workshops.

 Milanesi Luciano              BioinfoGRID Symposium, Milan 10-13 December 2007   20
WP7 – Dissemination
• 24 Journal Articles written within the frame of the
  BioinfoGRID project:
    –   9 - BMC Bioinformatics
    –   4 - IEEE Transactions on Nanobioscience
    –   3 - Studies in Health Technology and Informatics
    –   1 - Journal of Parallel and Distributed Computing
    –   1 - Journal of Chemical Information and Modeling
    –   1 - Parallel Computing
    –   1 - Int. J. of Bioinformatics Research and Applications
    –   1 - IEEE Transactions on Systems Science and Applications
    –   1 - Nucleic Acids Research
    –   1 - BMC Genetics
    –   1 - Bioinformatics

 Milanesi Luciano             BioinfoGRID Symposium, Milan 10-13 December 2007   21
WP7 – Dissemination
• 19 Conferences proceedings achieved within BioinfoGRID
    –   6 – NETTAB '06
    –   2 – EGEE User Forum 06/07
    –   2 – BITS '06
    –   2 – HPDC '07
    –   1 – EGEE 06/07
    –   1 – CAPI 2006
    –   1 – Bioinformatics of African Pathogens and Disease Vectors
                                                            Vectors.
        Nairobi 2007
    –   1 – MAS-BIOMED '06 Workshop
    –   1 – CCGrid '07 Symposium
    –   1 – EvoBIO '08
    –   1 – CHEP '07

 Milanesi Luciano             BioinfoGRID Symposium, Milan 10-13 December 2007   22
People Acknowledgments
•    Cristina Aiftimiei     •   David Fergusson                •   Alessandro Orro
•    Roberta Alfieri        •   Geraldine Fettahi              •   Giovanni Paolella
•    Claudio Arlandini      •   Sandro Fiore                   •   Silvano Paoli
•    Roberto Barbera        •   Riccardo Gervasoni             •   Antonio Pierro
•    Endre Barta            •   Karl-Heinz Glatting            •   Giorgio Pietro Maggi
•    F
     Francesco   Beltrame
                 B lt       •   J h H
                                John  Hatton
                                         tt                    •   M
                                                                   Marco  Pi l
                                                                          Pirola
•    Attila Bende           •   Ally Hume                      •   Raffaele Ponzini
•    Chiara Bishop          •   Nicolas Jacq                   •   Ivan Porro
•    Chirstophe Blanchet    •   Atul Jain                      •   Paolo Ramieri
•    Ignacio Blanquer       •   Miklos Kozlovszky              •   Paolo Romano
•    Vincent Bloch          •   Giuseppe La Rocca              •   Ermanna Rovida
•    Gianpaolo Bottoni      •   Yannick Legré                  •   Erika Salvi
•    Vincent Breton         •   Pietro Liò                     •   Jean Salzemann
•    Andrea Calabria        •   Carles Loomis                  •   Diego Sardaci
•    Andrea Caprera         •   Mario Marchisio                •   Salvatore Scifo
•    Tiziana Castrignanò
                     g      •   Hajnal
                                   j   Marton                  •   Martin Senger
                                                                               g
•    Federidica Chiappori   •   Rafael Mayo Garcia             •   Giuliano Taffoni
•    Dario Corrada          •   Mirco Mazzucato                •   Livia Torterolo
•    Paolo Cozzi            •   Giovanni Meloni                •   Gabriele Trombetti
•    Stefano Cozzini        •   Ivan Merelli                   •   Angelica Tulipano
•    Enza D’Alba            •   Emanuale Merelli               •   Vania Ugè
•    Pasqualina D’Ursi
                    ’       •   L i
                                Luciano  Mil
                                         Milanesii             •   Eli b th van der
                                                                   Elizabeth       d Wath
                                                                                     W th
•    Ana Da Costa           •   Elisa Molinari                 •   Richard van der Wath
•    Paride Dagna           •   Ettore Mosca                   •   Kasam Vinod
•    Guilia De Sario        •   Georgina Moulton               •   Federica Viti
•    Davide Di Pasquale     •   Loukas Moutsianas              •   Guy Warner
•    Giacinto Donvito       •   Tibor Nagy                     •   Ted Wen
•    Vihang Dudhalkar       •   Alessandro Negro               •   Pierfrancesco Zuccato
•    Peter Ernst            •   Laszlo Oroszi

    Milanesi Luciano                       BioinfoGRID Symposium, Milan 10-13 December 2007   23
Projects Acknowledgements

                   ISSeG   EU
                            GRID

                                      Diligent
                             A DIgital Library Infrastructure
                             on Grid ENabled Technology

Milanesi Luciano                              BioinfoGRID Symposium, Milan 10-13 December 2007   24
You can also read