Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico

Page created by Allan Weaver
 
CONTINUE READING
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Computing Challenges in the Years to Come
     Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa)
     + many contributors

KIT – The Research University in the Helmholtz Association             www.kit.edu
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
HL-LHC Computing Challenges

                  Source: S. Campana, Status of WLCG, 38th WLCG RRB, 27.10.2020, https://indico.cern.ch/event/957310/timetable/

2    20.11.2020    Achim Streit – Computing Challenges of the Coming Years                                      Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Addressing HL-LHC Computing Challenges
    with a Multi-Pronged Approach

     Challenge: higher data rates                                                         Experiments have to
     require more storage and CPU                                                           Improve SW design & algorithms
                                                                                            Adapt to new methods (AI) & HW (GPUs)
                                                                                            Improve computing models (less copies &
                                                                                            special data formats)
                                                                                          Innovation and R&D at GridKa
                                                                                            Federated storage (exascale data lakes)
                                                                                            Optimizations for inc. data accesses
                                                                                            Integration of special resources (GPUs)
                                                                                            Access to opp. resources (HPC,cloud)

                                                                                          Still – additional hardware is needed
     Source: ATLAS Experiment – Public Results,
     https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ComputingandSoftwarePublicResults

3    20.11.2020          Achim Streit – Computing Challenges of the Coming Years                             Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Performance Development
                                                                            2.4 EFlop/s
                                                                            442 PFlop/s

                                                                            1.3 PFlop/s

                                                                            Fugaku @ Riken (7.6 M cores, 5.1 PB mem, 30 MW)
                                                                            Summit
                                                                            Taihu Light
                                                                            Tianhe-2
                                                                            K Computer
                                                                            Earth Simulator

                                                                              Source: https://top500.org/statistics/perfdevel/

4    20.11.2020   Achim Streit – Computing Challenges of the Coming Years                                       Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Chip Statistics
     System Share

     Source: Erich Strohmaier, slide 30, https://www.top500.org/media/filer_public/54/77/5477d858-1f1e-410b-994b-b7122cfd1d57/top500_2020_06_v2_web.pdf

5    20.11.2020          Achim Streit – Computing Challenges of the Coming Years                                     Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Chip Statistics
     Performance Share

     Fugaku

     Source: Erich Strohmaier, slide 31, https://www.top500.org/media/filer_public/54/77/5477d858-1f1e-410b-994b-b7122cfd1d57/top500_2020_06_v2_web.pdf

6    20.11.2020          Achim Streit – Computing Challenges of the Coming Years                                     Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Personal View – a Paradigm Change is needed
     Adapt to present and future hardware architectures (SIMD instructions,
     multi-/many-core, accelerators/GPUs, distributed memory)

     Apply computer science principles and algorithms
     Apply continuous integration/development/testing (CI/CD/CT)

     Consider software as a research infrastructure in itself
     Do Research Software Engineering (RSE)

                                                                                                            engineering
                                                                                                             software
     at the intersection of algorithms/numerics,                                               RSE
     software engineering, and community codes

     More physics computing professorships
7    20.11.2020   Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Personal View – How to achieve this?
     Many parts of software have to be rewritten to profit from modern HW
         Use abstraction layers like Alpaka, OpenMP, etc. to ensure portability
         (GPUs/CPUs/…) and easy access to future architectures
         Probably not 100% performance, but good investment in sustainable code
         “The optimization/transformation process of software for GPUs also results in
         much more efficient code on CPUs” (V. Lindenstruth, 9th CERN SCF, slide 11)

                                                                                              more
     Computing centralization vs. distributed resources?                                   centralized
         Likely model: Centralized base resources (storage heavy &                          and at the
         network) and services + interfaces for                                             same time
         a) permanent resources accessible through central site and                           more
         b) integration of opportunistic/dynamic (computing) resources                     distributed

8    20.11.2020   Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Personal View – Future Computing Resources
     Compute
         Heterogeneous architectures
             Be able to run jobs almost everywhere
             Some software profits from adaption to
             specific resources for peak performance

     Storage
         Permanent storage no longer at all sites
         Large fast storage at few sites with very good network connectivity
             Increasing tape usage with very well orchestrated data access
         Cache storage with low operations requirements
             Especially important at sites w/o direct internet access from WNs
        Source: WLCG Data Lake, Simone Campana, https://indico.cern.ch/event/738796/contributions/3174573/attachments/1755785/2846671/DataLake-ATCF.pdf

9    20.11.2020        Achim Streit – Computing Challenges of the Coming Years                                   Steinbuch Centre for Computing, GridKa
Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
Innovation and R&D at KIT – Examples
                                                                             Scalable online storage technology:
      GridKa as an island in the data lake                                   throughput, IOPs, capacity

      requires massive scalability of storage
      and network infrastructure
          Software defined online storage to
          address less predictable, more remote,
          more diverse data access
          Powerful networks (internal, external)
          Reliable offline storage
                                                                              2017        2018             2019/20           2021/22
      Excellent performance and reliability                                               upgrade          upgrade           upgrade

          https://s.kit.edu/gridka-monitoring                                 23 PB       35 PB            43 PB             60 PB
                                                                              70 GB/s     100 GB/s         120 GB/s
          https://s.kit.edu/gridka-numbers
10    20.11.2020   Achim Streit – Computing Challenges of the Coming Years                        Steinbuch Centre for Computing, GridKa
Recent GridKa Storage Extension

                                        + 36h

                                                             40 5U Seagate x5u85
                                                             • 84 16TB HDDs
                                                             • dual controller
                                                             12 protocol servers
                                                             200 GB IB Switches
                                                             400/100/40G Eth

11    11/20/2020   Achim Streit – Computing Challenges of the Coming Years         Steinbuch Centre for Computing, GridKa
Innovation and R&D at KIT – Examples
      Deep integration of GPU nodes in GridKa farm accessible via GridKa CEs
          3 nodes á dual AMD EPYC 7662 64 core, 1TB RAM, 8 Nvidia V100S 32GB

      COBalD/TARDIS enabling a “regional resource pool”
          HPC-systems ForHLR II & HoreKA (~17 Pflop/s) @ KIT
          and in Bonn, Freiburg, Munich, …
          Tier-2/3 WLCG systems in Aachen, Bonn, Karlsruhe, …
                                Grid CE

                                Grid CE

                                Grid CE

12    20.11.2020   Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
The PUNCH4NFDI Consortium
Spokesperson: Thomas Schörner (thomas.schoerner@desy.de)
DESY, Notkestr. 85, D-22607 Hamburg
Contact:
Mail:      punch4nfdi@desy.de
Web:       www.punch4nfdi.de
Twitter:   #punch4nfdi
*Particles, Universe, NuClei & Hadrons for the NFDI

PUNCH4NFDI* in one Slide
A consortium for the NFDi

                      PUNCH4NFDI
                      Represents (astro)particle, astro,
                      hadron & nuclear physics in the NFDI.
                      Specific strengths: big data and open         Broad community representation: > 40 partner institutes
                      data; ready to take leading role in NFDI.
Our offer:                                                                      Task areas:
A layered model of data management with                                         Data management
scalability that allows for easy FAIRification                                  Data transformations
Numerous services to develop community-                                         Data portal
specific approaches in this direction                                           Data irreversiblity
The PUNCH science data platform evolving                                        Synergies&services, Teaching&outreach
   ASTRO
around  advanced research products

    @NFDI
Timeline and general situation:
                                                                                                        Services:
                                                                                                        Evolving around research
9 consortia (out of 30 max in NFDI) funded in                                                           products and their dynamic
first NFDI round – none from physics. Now competing e.g. with                                           life cycle
FAIRMat, DAPHNE4NFDI                                                                                    Connecting to entire NFDI
Submission of proposal 30 Sep 2020; evaluation by review panel 10                                       – a cornerstone of
Dec 2020; grants by July 2021; funding start 1 Oct 2021                                                 research data
                                                                                                        management in D. Page 14
Autor der Folien:
  15      20.11.2020      Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
Gregor Kasieczka, U-Hamburg
16   20.11.2020   Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
Autor
  17 der 20.11.2020
         Folien:          Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
Alexander Schmidt, RWTH
18   Achim Streit – Computing Challenges of the Coming Years   Steinbuch Centre for Computing, GridKa
Neuer Verbund im Bereich Computing
für Run-3 bei ATLAS und CMS
                                                  Ziel: Zuverlässiger Betrieb der zu erweiternden
                                                        Computing-Infrastruktur in Run-3 für die
                                                        ATLAS- und CMS-Experimente in Deutschland
                                                         essentiell für den Gesamterfolg der Forschung
                                                          durch Physiker*innen an deutschen Instituten

                                                  Run-3 des LHC (2022 bis 2024):
                                                  pp-Datensatz wächst um Faktor ~ 2,5
                                                   Bereitstellung und Betrieb zusätzlicher Ressourcen
                                                     Anforderungen 2024 ~ 1,5 * Hardware in 2021
                                                     (plus signifikanter Ersatz alter Hardware)
                                                   Weiterentwicklung und Optimierung des Betriebs
                                                      (Datenmanagement, Job-Scheduling, Monitoring, ...)
                                                      für sich ändernde Computingmodelle
Autor der Folien: Markus Schumacher, U-Freiburg
Neuer Verbund im Bereich Computing
für Run-3 bei ATLAS und CMS
Run-3 (2022-2024): im Wesentlichen erfolgreiches Modell aus Run-1 und Run-2
HL-LHC (ab 2028): Übergang muss aber bereits in 2024 eingeleitet werden
 In 2023 beschaffte Hardware, letzte die nicht in Run-4 genutzt wird
 Neue Computing-Modelle im effizienten Dauerbetrieb spätestens ab 2028
     Nutzung neuer Ressourcen z.B. HPC-Cluster, GPUs, opportunistische Ressourcen, ...
     Neues Konzept für Datenspeicherung: wenige Standorte vor allem HGF-Zentren,
       im Rahmen des Konzepts der Data Lakes u. schnelle Caches an Analysezentren (Unis)

 Vorlauf von 4 Jahren notwendig für Test des Betriebs, Aufbau der Hardware, Finanzierung,...
 Stärkere Rolle der HGF-Zentren  zusätzliche Finanzmittel jenseits normaler Förderung

Neuer experimentübergreifende Verbund:
 noch stärkere Vernetzung (jenseits GridKa-OB, GridKa-TAB, Terascale-Computing-Board)
 gemeinsame Diskussion und Strategie für homogenes Betriebsmodell für HL-LHC
Conclusion
      Challenges ahead cannot be solved with hardware investments alone
      (but neither can they be solved without hardware investments)

                                                                                  HPC, cloud, data lake… is not enough…

                                                                                  Aggressive R&D = better exp. software
                                                                                       Created with computer science expertise
                                                                                       Using modern programming paradigms
                                                                                       Being efficient and scalable
                                                                                        chance for university groups?
      Source: ATLAS Experiment – Public Results,
      https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ComputingandSoftwarePublicResults

21    20.11.2020           Achim Streit – Computing Challenges of the Coming Years                         Steinbuch Centre for Computing, GridKa
You can also read