HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB

Page created by Jeanne Nguyen
 
CONTINUE READING
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
HEPiX Fall 2018 Workshop Report

IT-Seminar
Die Teilnehmer berichten die Highlights.

DESY-IT:
Thomas Finnern
Dirk Jahnke-Zumbusch
Peter van der Reest
Martin Gasthuber
(Helge Meinhard)
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
HEPiX Mission
https://www.hepix.org

• From our Web site https
  “The HEPiX forum brings together worldwide Information Technology staff, including system administrators,
  system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and
  institutes, to foster a learning and sharing experience between sites facing scientific computing and data
  challenges.”
• Emphasis is on site services as opposed to experiment software and middleware
• Originating in HEP (particle physics), but open to other sciences
  • Recent participation from life sciences, photon/material sciences, ...
  • See “Report on the Workshop on Central Computing Support for Photon Sciences”
• Twice per year, one week each, in Europe, North America and Asia
  • Typically 100...140 participants
  • Autumn 2018 in Barcelona was the 58th workshop since 1991

   Report HEPiX Fall 2018                                                                                  Page 2
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Barcelona Workshop
https://indico.cern.ch/e/hepix-autumn2018

• Held 08 – 12 October in downtown Barcelona, hosted by Port d’Informació Científica (PIC)
• PIC: Tier-1 for WLCG (ATLAS, CMS, LHCb), support for particle physics, astrophysics, cosmology, earth
  sciences
• 137 registered attendees – record for HEPiX!
  • Many first-timers, many old friends
  • 105 from Europe (including 12 from PIC),14 from North America, 9 from Asia, 9 from companies
• 54 different affiliations
  • 37 from Europe, 7 from North America, 5 from Asia, 5 companies
• 69 contributions + some extra session

   Report HEPiX Fall 2018                                                                                 Page 3
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Attendees
Casa Convalescència de l’Hospital de Sant Pau de Barcelona

   Report HEPiX Fall 2018                                    Page 4
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Tracks and Trends (1)
Miscellaneous, Basic IT , Grids, Clouds and Virtualisation

• Miscellaneous: 4 Contributions, 1h10’
  • CSBS is OUR journal. Let’s use it and publish our work! Many interesting things this week – consider
    writing an article
• Basic IT Services: 4 Contributions, 1h40’
  • Using messaging services to improve scalability and reliability of systems management
  • New approach in the CERN Authentication and Authorisation management
• Grids, Clouds, Virtualisation: 3 Contributions, 1h10’
  • Data lake proposal: To optimise costs, will need to switch to QoS (performance, reliability etc.) for storage –
    major change (technical, cultural, social)

   Report HEPiX Fall 2018                                                                                      Page 5
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Tracks and Trends (1)
Owen Synge: Why I like Python

   Report HEPiX Fall 2018       Page 6
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Tracks and Trends (1)
AAI: The new CERN CERN Authentication and Authorisation Infrastructure

   Report HEPiX Fall 2018                                                Page 7
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Tracks and Trends (1)
AAI: The new CERN CERN Authentication and Authorisation Infrastructure

   Report HEPiX Fall 2018                                                Page 8
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Tracks and Trends (1)
AAI: The new CERN CERN Authentication and Authorisation Infrastructure

Further Reading:
• The Road to the new CERN Authentication
• CERN Authentication and Authorization Infrastructure Design

   Report HEPiX Fall 2018                                                Page 9
HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
Tracks and Trends (2)
Computing and Batch Services

• Computing and Batch Services: 9 Contributions, 3h35’              •   Photon science support: several HEP sites involved with
  •      Benchmarks : 2 Main Activities                                 big projects for the coming decade
                                                                        • A 1st workshop at BNL discussing the specific
         • Fast benchmark for estimating the job slot CPU
                                                                          issues, in particular those caused by the loose links
           power : LHCb DB12 adopted by LHCb and ALICE
                                                                          between users and the photon facility
           but not appropriate for the procurements
                                                                        • Idea of organising a workshop focused on computing
         • Next generation benchmark for estimating the
                                                                          for photon science co-located with HEPiX, like we did
           installed capacity and for procurements : work stil in
                                                                          for LHCOPN/LHCONE
           progress: SPECcpu 2017, set of HEP apps?
  •      AMD EPYC Architecture:                                     •   Talk on Remote Analysis Efforts at ALBA

         • Promising evaluation by BNL, will restore some
                                                                    •   Commissioning CERN Tier-0 reconstruction workloads
           competition on the server CPU market
                                                                        on Piz Daint at CSCS: Configuring and optimizing Piz
         • Performance and price competitive with Intel; need to        Daint at CSCS for running ATLAS and CMS Tier-0
           see adoption by server vendors                               workloads
         • Next generation next year will bring many more cores     •   PDSF - Current Status and Migration to Cori Spare
  •      Improving OpenMP scaling using openssl:: Using set of          compute resources of EOS storage nodes at CERN
         "openssl speed" commands to optimize OpenMP                    have been enabled to run user jobs in containers
         performance
      Report HEPiX Fall 2018                                                                                                Page 10
Tracks and Trends (3)
End-User Services and Operating Systems

• End-User Services and Operating Systems: 7             • Reducing Dependencies on commercial Software
  Contributions, 2h55’                                     (providers): investigating open-source products to
  • CERN service management: Focus on Service              provide services and avoid vendor lock-in
    Catalogue, bringing services on board, the User      • Container Orchestration: Enable hosting large
    experience, and Tool configuration                     range of Web applications within CERN;
  • CERN Linux services: Update on CC7, SLC6,              opportunity to consolidate all web hosting on
    RHEL support distributions and services; software      common infrastructure
    collections, virtualization, openstack SIGs;         • Jupyter-based analysis portal at BNL: Supports
    anaconda plugin, lockup; community work on             containerized applications started via a seamless
    alternative architectures; Koji and                    integration into local batch with authentication
  • Gitlab; Future Support for Lightweight Containers,     token delegation (see next slide)
    CC8, Freeipa, s3 for static content                  • Rust: Worth a serious look by Python developers,
  • Indico: Upgrade to 2.1! Roadmap for next               good compromise between C++ and Python
    releases: new room booking, internationalisation,
    paper reviewing, CalDAV support, ...

   Report HEPiX Fall 2018                                                                                  Page 11
Tracks and Trends (3)
End-User Services and Operating Systems

   Report HEPiX Fall 2018                 Page 12
User Consulting & tickets (3)
What CERN learned

• Light-weight wasy to use forms for input help
  •      Qualify input duriing ticket creation
  •      Offer KB articles (and ease KB article publication in parallel)
• User feedback is valuable
  •      Easy to access feedback forms generate more feedback
• Transparency: all services

      Report HEPiX Fall 2018                                               Page 13
Tracks and Trends (4)
IT Facilities

• IT Facilities: 4 Contributions, 1h40’
   • Technology watch working group kicked off with 58 people subscribed. Number of subgroups covering
     individual domains, e.g. processors, memory, etc. Further volunteers to do the actual work are needed
   • Cost model WG report: better understanding of the workloads via defined metrices; define a common
     framework for estimating resources, to then look at scenarios to make improvements
   • Latest CERN procurements, changes in the team, impact of recent issues and technology changes
   • Superfacility at NERSC: introduce common workflow and API for access from multiple sciences and user to
     the facility (rather than dedicated access per science/user)

    Report HEPiX Fall 2018                                                                                   Page 14
What could possibly go wrong? (4)
Miscellaneous – NDGF – 1

• Construction works interrupted networkk access for University of Linkoeping
  •      There should have been two separte tracks
  •      Those were clearly shown in the providers papers, but …

      Report HEPiX Fall 2018                                                    Page 15
What could possibly go wrong? (4)
Miscellaneous – NDGF – 2

• Battery driven UPS
  •      Electric current had been monitored, but …
  •      … One day colleague smelled acid and rack was hot (76 degC)
  •      sustained electric current had risen from 1A to 5A
  •      Unnoticed, as 5A peaks are common
  •      Should have been monitored
         • Voltage
         • Resistance
         • Temperature

      Report HEPiX Fall 2018                                           Page 16
What could possibly go wrong? (4)
Miscellaneous – NDGF – 3

• Control cabinet got hot
  •      Light arc
  •      Loose bolt for neutral kept fluctuating P-N-Voltage between 210..250V

                                                                             230V (P-N)

      Report HEPiX Fall 2018                                                              Page 17
Tracks and Trends (5)
Networking and Security

• Networking and Security: 11 Contributions, 4h30’

   Report HEPiX Fall 2018                            Page 18
Computer Security
Update
Liviu Vâlsan
For The CERN Computer Security Team
HEPiX Autumn 2018, Barcelona
                                      Page 19
20
Page 20
Google NOT disclosing user data breach
                      In March 2018 Google finds a bug that
                       allowed third-party app developers to access
                       user data for which they didn’t have
                       permission
                      Google officials in leaked memo:
                           Disclosure will likely result “in us coming
                            into the spotlight alongside or even
                            instead of Facebook despite having
                            stayed under the radar throughout the
                            Cambridge Analytica scandal”
                           The disclosure would also invite
                            “immediate regulatory interest”
                      No way to know who was affected, logs kept
                       for two weeks only
                  Sources: The Guardian and The Wall Street
                  Journal                                                 21
                                                                      Page 21
Tracks and Trends (5)
Networking and Security

• Networking and Security: 11 Contributions, 4h30’

   Report HEPiX Fall 2018                            Page 22
Tracks and Trends (5)
Networking and Security

• Networking and Security: 11 Contributions, 4h30’

   Report HEPiX Fall 2018                            Page 23
Tracks and Trends (6)
Site Reports

• Site Reports: 16 Contributions, 4h00’
  •      HPC sites / clusters more and more used by HEP, and discussed in HEPiX
  •      Computing farms extended to clouds or HPC resources (unified pool).
         • HTCondor + ArcCE a common choice
  •      More improvements / upgrades on network, enabling IPv6 everywhere now
  •      Sites need to cope with new and important resources needs of their users
  •      SurfSara, the Dutch supercomputer center for the first time at HEPiX. Very active on GRID activities for non WLCG
         comunities. Interesting among the several activities: WebDav security, and RcAuth, proxies without certificates
  •      KiSTI notable Data Center Relocation, KISTI Grid CA system based on Hardware Security Module
  •      LAL+GRIF: Expansion to new data center slower then expected, mainly bc of administrative problems
  •      NERSC: move from PDSF to SLURM, move to CORI, CVMFS on the Cray

      Report HEPiX Fall 2018                                                                                                 Page 24
Tracks and Trends (7)
Storage and File Systems

• Storage and File Systems: 11 Contributions, 4h25’,             • Tools for data management being renewed and
  plus two BoF Sessions                                            adapted (also touched upon in BoF)
  • Tape Performance and accompanying                              • Interesting R&D on tape drive characterisation
    infrastructure: centres have been busy w/ ATLAS                  (as essential information needed is not
    Data Carousel; CERN field-testing CTA                            provided by vendors) to improve tape
  • Seven specialist presentations in Bird-of-a-                     operations efficiency
    Feather session leaving little time for discussion,          • Optimizing EOS
    but (hopefully) continued theme at HEPiX                       based storage:
  • Impact of vendor release changes on storage                    splitting up storage
    systems and experiments timelines: example of                  metadata servers in
    Oracle Databases at CERN                                       multiple instances to
                                                                   remedy performance
  • Large deployment of CEPH storage cluster                       and operational issues
    (Osiris) around the Great Lakes: multi-site data
    placement challenging, caching very rewarding                • Introducing CEPHfs + Manila-based file storage
                                                                   as replacement of classic NFS file services
  •      Three industry talks focussing on tape media and tape
         drive tech developments, and on optimising flash        • Overview of large backup infrastructures: 'new'
         storage                                                   themes both in infrastructure and in user
      Report HEPiX Fall 2018                                       requirements                                  Page 25
Tape technology (7)
Storage

  •      Fujifilm (sponsored talk)
  •      1EB/month of tapes sold in Europe (~30% of market, US~40%)
  •      Metal particles are only ~20%, Barium-Ferrit (BaFe) already ~80%
         • IBM 3592 drives 30% faster than LTO8
         • -“- 3x faster than newer HDDs writing speed
         • LTO8 = 10TB, 3592JE = 18TB per tape in 2019
         • 4sqm = 10 PB
         • Bit error rate BER 1000x better in comparison to LTO6 and SSDs
  •      Strontium-Ferrit in the future
         • Up to 400TB per tape capacity possible

      Report HEPiX Fall 2018                                                Page 26
Board Meeting

• Current and Next Meetings                            • Working Groups
  • Confirmed 2019:                                       • Most WGs reported during the week
    • 25 – 29 March: UCSD / SDSC, San Diego, CA,       • Batch monitoring now started well
      USA                                            • Infrastructure
    • 14 – 18 October: Nikhef, Amsterdam, The          • Little hiccup with the Web site’s certificate, solved
      Netherlands
                                                       • Logos to be made available on Web site
  • Quite firm ideas about 2020 and 2021
                                                     • Discussion about Protecting HEPiX Name / Logo
  • Some expressions of interest even beyond 2021
                                                     • Pepe Flix invited to become board member
  • Expressions of interest and proposals still very
    welcome

   Report HEPiX Fall 2018                                                                                 Page 27
Ad-hoc BoF on AAI at Sites

• Wednesday 13:30 h
• Well attended
• Multiple sites planning to review (and partially redo) their stack
• Interest in digging into the issue further
• Paolo Tedesco and Dave Kelsey volunteered to take things in hand
  • Series of remote meetings has started; volunteers still welcome 
  • When enough interest is present and long terms goals agreed upon, could turn into Working Group

   Report HEPiX Fall 2018                                                                             Page 28
Special Thanks to Local Organisers

 Report HEPiX Fall 2018              Page 29
Miscellaneous
Not to forget

• IoT
  •      any concepts available?
  •      security: access
  •      security: patching
  •      Which kind of devices
  •      Protocols
  •      …

• Software & computing for Big Science
  •      Online journal
  •      One paper edition per year
  •      Springer is involved for publishing
  •      Input/articles are welcome.

      Report HEPiX Fall 2018                   Page 30
Conclusions
More PICs here

●   Thomas 2ct:
    ●      Update Indico
    ●      Observer AAI Group
    ●      IoT coming
    ●      Be Secure

    Peter 2Ct:
    ●      Viele CERN-Beiträge fokussieren auf Infrastruktur-Maßnahmen, die wir bereits gemacht haben (oder auch dabei sind)
           ●    Netze, WLAN, Conf- & Management-Tools, Storage-Mgmt, Performance-Evals
    ●      Compute-technisch, Storage-Integration (EOS, CERNBox, Jupyter & SWAN), Virtualisierung, Self-Service sind sie an
           einigen Stellen weiter
    ●      Mehr Enterprise-Scale-Projekte: Benutzung von Messaging-Infrastruktur für techn. Workflows, Monitoring, Auto-
           Reaktion (CERNMegaBus)
           ●    ist alles nicht neu; die ersten Ansaetze unter Quattor & Lemon haben wir vor >10Jahre gehört, trotzdem aktuell!

        Report HEPiX Fall 2018                                                                                                    Page 31
See you in San Diego / CA (USA)!

                              25 – 29 March 2019

 Report HEPiX Fall 2018                            Page 32
You can also read