CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO

Page created by Darlene Leon
 
CONTINUE READING
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
CernVM Program of Work 2021

Jakob Blomer for the CernVM Team
SFT Meeting
22 February 2021
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Infrastructure: WLCG
 Agent

 Software, containers, auxiliary data for HEP, Stratum 0/1
 LIGO, EUCLID, LSST, EESSI, and many others
 WLCG squid

 2
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Infrastructure: WLCG
 Agent

 Software, containers, auxiliary data for HEP, Stratum 0/1
 LIGO, EUCLID, LSST, EESSI, and many others
 WLCG squid

 Available in the default configuration:
 ∼ 1.4 B files
 ∼ 125 repositories

 2
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Infrastructure: WLCG
 Agent

 Software, containers, auxiliary data for HEP, Stratum 0/1
 LIGO, EUCLID, LSST, EESSI, and many others
 WLCG squid

 CERN Stratum 0s fully on Ceph S3

 Available in the default configuration:
 ∼ 1.4 B files
 ∼ 125 repositories

 2
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Code Works

 • Among the top 1.5 % of active open
 source projects
 • Steady 50–100 commits per month
 • ∼30 000 LOC changed in 2020

 3
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Review of 2020
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Review of 2020

 Highlights
 • Scaling up CernVM-FS container hub: 800+ images on /cvmfs/unpacked.cern.ch
 • Container runtime support: containerd/k8s, podman [GSoC contribution]
 • Major improvements to container conversion service (DUCC)

 • New and improved publishing workflows
 • Template transactions: ultra-fast, meta-data only publishing
 • Ephemeral writable shell: technical foundation to publish from anywhere
 • Fine-grained publisher monitoring
 • Performance improvements: parallelized garbage collection and storage gateway services
 → now fast enough to publish all LHCb nightlies (1.5 k packages, >10M files) until start of working day
 • Commissioning of the gateway services for LHCb nightlies
 • Experimental support for Microsoft Azure blob storage [Microsoft contribution]
 • CernVM 5 prototype, EL8 based
 • Infrastructure modernization: web presence, CI pipeline, VM & storage replacement
 • Dissemination: pre-GDB EGI webinar EGI Clinic IPDPS’20 (with U Notre Dame)
 CernVM virtual workshop 1-2 February 2021 with 99 registered participants 4
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Review of 2020

 Highlights Unfinished Tasks
 • Scaling up CernVM-FS container hub: • Container conversion status REST API &
 800+ images on /cvmfs/unpacked.cern.ch dashboard
 • New and improved publishing workflows • Shared, external cache manager for
 • Performance improvements multi-container host

 • CernVM 5 prototype, EL8 based • Client pre-caching (due to reduced summer
 student programme)
 • Infrastructure modernization
 • In progress:
 • ...
 • Transition of publishing code to new
 libcvmfs_server
 • Connecting ephemeral writable shell to
 gateway services

 5
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Platform Support Commissioned in 2020

 • A Platforms:
 • EL 7–8
 new
 • Ubuntu 16.04, 18.04, 20.04

 • B Platforms
 new
 • macOS 10.15, 11 Big Sur (M1 + Intel)
 • SLES 11 – 12
 • Fedora, latest two versions
 • Debian 8–10
 • EL7 AArch64
 • IA32 architecture
 new
 • Linux on Windows via WSL-2
 new
 • Client packaged as a container (for container-only Linux distros such as Atomic Host)

 6
CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
Highlights: New Web Site

 7
Highlights: New Web Site

 7
Highlights: New Web Site

 7
Highlights: New Web Site

 nvm.cern.ch
 https://cer
 l to Jekyll
 from Drupa
 • Moved
 sive design
 lo ok, resp on
 • Modern

 7
Highlights: New Monitoring Site

 8
Highlights: New Monitoring Site

 8
Highlights: New Monitoring Site

 8
Highlights: New Monitoring Site

 monitor
 Repository
 OpenShift
 on CERN
 • Hosted fs
 and lib cvm
 JavaScript
 • Based on ad d
 posit or y:
 add your re
 • Easy to
 metadata and
 repository t
 pull reques
 submit
 JSON AP I
 •
 8
Highlights: CernVM Forum

 9
Highlights: CernVM Forum

 .cern.ch
 nvm-forum
 https://cer S
 r CernVM-F
 e fo m fo
 ru
 • Discours ap pl iance
 rn VM
 and the Ce
 le,
 : searchab
 M an y ni ce features
 •
 n be m ar ked as
 questions ca
 ..
 resolved, .
 e
 tually replac
 ed to even
 • Supp os
 mailing lists
 9
Highlights: JSROOT Powered Fine-Grained Publish Monitoring

 CVMFS_UPLOAD_STATS_DB=true
 Demo

 • Statistics are generated
 with ROOT
 • Uploaded as static files
 to Stratum 0 storage
 • Interactive plots
 (JavaScript / JSROOT)

 10
Highlights: Template Transactions

 /cvmfs/sw.cvmfs.io

 amd64-gcc9 cvmfs_server transaction \
 -T /amd64-gcc9/4.2=/amd64-gcc9/4.2-patches \
 4.2
 sw.cvmfs.io
 ChangeLog
 ..
 .
 • As part of opening the transaction,
 4.2-patches “4.2” is cloned to “4.2-patches”
 ChangeLog • Meta-data only copy, thus extremly fast:
 ..
 . observed 50 kHz file publish rate
 • Only changes on top need to be published
 template clone
 Used in fast container image ingestion

 11
Highlights: Ephemeral Publish Shell

 • A new command, cvmfs_server enter, creates a sub-shell with a writable /cvmfs
 • Uses internally user namespaces and fuse-overlayfs
 • Works unprivileged on any modern Linux (e. g. EL8) that can mount the client
 • Could eventually be used to directly publish from any node to a gateway —
 however, the 2.8 release has only a ephemeral writable shell as a first step

 $ cvmfs_server enter hsf.cvmfs.io
 ...Opens a shell with write access to /cvmfs/hsf.cvmfs.io
 $ cvmfs_server diff --worktree
 ...Close shell, back to read-only mode

 Solves the main technology challenge to move away from dedicated publisher node,
 i. e. publish from anywhere!

 12
Highlights: Infrastructure Modernization

 • Migration of 26 OpenStack VMs (builders, web services, etc.);
 campaign triggered by hypervisor decommissioning – we’d prefer automatic migrations in the future
 • Migration of ∼1 TB project storage from NFS to Ceph-FS and Ceph S3
 • Migration of ∼15 build & test jobs to new Jenkins server
 • Commissioning of GitHub pull request builder: allows us to fully test changes before merging

 There has been an exceptional amount of infrastructure work in 2020.
 We count on the fact that the work is amortized over the coming years.

 13
CernVM / CernVM-FS Program of Work 2021

 14
Developer Power

 2020 2021

 Jakob Blomer Staff 50 % 50 %
 TBS Staff — 50 %
 Simone Mosciatti Fellow 100 % 25 %
 Jan Priessnitz Tech 60 % —
 Andrea Valenzuela Tech 33 % 66 %
 TBS Tech — 33 %
 FTE ∼2.4 ∼2.25

 Significant contributors: Mohit Tyagi (GSoC student), Enrico Bocchi (IT-ST), Dave Dykstra (FNAL)

 15
CernVM Calendar

 F)
 HE
 4.5

 IK
 (N
 VM

 ’22
 1
 Ce Cern
 2.8 M’2

 M
 rnV
 rnV
 .1
 .8,
 1

 .2

 .3

 .4
 .5
 .7

 .9
 .
 2.7

 2.7

 2.7

 2.7
 2.7

 Ce
 v2
 v2

 v2
 bugfix releases
 9

 0

 20

 0

 0
 0

 1

 1

 1

 2
 /1

 /2

 /2

 /2
 /2

 /2

 /2

 /2

 /2
 /
 12

 02

 04

 07

 09
 10

 02

 03

 Q4

 Q2
 Consolidation & Improvements

 Ongoing effort to consolidate CernVM-FS developments in a single repository,
 e. g. gateway services and containerd plugin scheduled for merging

 16
CernVM Appliance Plan of Work for 2021

 1. Ready to use platform for LHC experiment production and development
 2. Reference platform for long-term data preservation

 • 10 000+ booted VMs / day
 • 45 % of all ATLAS simulation jobs in 2020 ran at point 1 on CernVM!
 • CernVM bootloader + reference containers covering EL 4–7
 • Interactive support: cernvm-launch and cernvm-online.cern.ch

 2021 Plan of Work
 • Maintenance updates for CernVM 4 [est 1 FTW]
 • Migration of cernvm-online.cern.ch to new single sign-on system [est 1 FTW]
 • Stretch goal: CernVM 5 pre-production release [est 1 FTM]

 17
CernVM-FS Plan of Work for 2021

 1. Maintenance and support
 2. Consolidation tasks
 3. Seamless container image ingestion
 4. Kubernetes-native publisher (in collaboration with SPI)
 5. Client performance improvments for very large applications (e. g. Tensorflow)

 18
Maintenance and Support

 Significant mainte-
 nance and support load

 Key figures from 2020:
 • 450 mails on support
 mailing lists
 • 40 bug fixes merged

 19
Consolidation Tasks

 • Addressing open issues: bugfix sprint [est 1 FTM]
 • Addressing known shortcomings of the gateway services
 • Trigger garbage collection from remote publishers [est 1 FTW]
 • Use template transaction from remote publishers [est 1 FTW]
 • Transaction wait queue to prevent concurrent publishers from starvation [est 1 FTW]
 • Full repository tagging support [est 1 FTW]
 • Rebase gateway receiver on new libcvmfs_server [est 1 FTW]
 • Source code repository consolidation [est 1 FTW]
 • New platforms: SLES15 (for HPC), Debian 11 [est 1 FTW]
 • macOS binary signatures [est 1 FTW]

 20
Future-proofing: Next-Generation Server Code

 Legacy Code New Architecture

 CLI GW receiver REST API ···

 libcvmfs_server
 commit changeset, GC, tag management, . . .

 PUT/GET storage abstraction

 A set of tools targeted for a dedicated release manager A common base library providing repository
 machine, and the interactive workflow open transaction transformation primitives, on top of which higher-level
 + copy + commit publish abstractions can be built

 Initial CLI commands ported to libcvmfs_server: info, diff, transaction, enter.
 Foundation for future maintainability and other consolidation tasks (e. g. gateway services)

 Plan for 2021: port complete publish workflow to libcvmfs_server, including
 transaction abort & commit, tagging, garbage collection [est 2 FTM] 21
Seamless Container Image Ingestion

 Approach
 Users develop containers with the standard tools and services (gitlab, Dockerhub, etc.).
 For their large-scale deployment, we want to automatically ingest them in /cvmfs/unpacked.cern.ch

 Container Publishing Container Engine Integration
 • Based on working prototype, commission Engine Type CernVM-FS Support
 web-hook connection from standard registry to
 CernVM-FS [est 2 FTW] singularity flat native
 docker layers graph driver1
 • Based on working prototype, merge fast merging
 containerd layers remote snapshotter
 of image layers [est 2 FTW]
 podman layers extra image store
 • Dashboard and status API: display current
 1
 activity, list of hosted images, etc. [est 1 FTM] Expected to be replaced by containerd remote snapshotter

 • Develop standard benchmark for publish
 throughput to assess supported scale of user Review and improve documentation, examples,
 container ingestion [est 1 FTM, summer student] integration tests for different deployment
 options [est 2 FTW]

 22
Usability Milestones

 • Implement publishing to gateway services from ephemeral writable shell
 relies on libcvmfs_server consolidation tasks [est 1 FTM]
 • Based on the ephemeral publish container (see before), demonstrate a kubernetes-native publish
 workflow in collaboration with SPI [est 1 FTM]
 • Implement a client-preching mechanism to improve cold-cache start-up performance of very large
 applications (e. g. Tensorflow) design ready, planned as GSoC project [est 2 FTM]

 • Stretch goal: shared, external cache manager for multi-container host [est 1 FTM]
 • Stretch goal: restart activity on CernVM-FS Conveyor (see backup slides)

 23
Community Interaction
Community Interaction

 • Developers and operators meet in a monthly coordination call (no changes for 2021)
 • Weekly operations coffee with IT-SM (no changes for 2021)

 • New CernVM forum supposed to take over from mailing lists
 • Mattermost becoming an important information exchange between developers and power users

 • Two publications in preparation for vCHEP 2021
 • A CernVM-FS powered container hub (with IT-ST)
 • Performance engineering LHCb nightly builds publishing (with LHCb, IT-ST)
 • Frontiers in Big Data publication in preparation on containerised analysis workflows with
 kubernetes (with CMS)
 • Conferences and workshops on the radar: experiment computing weeks, GDB, HEPIX, ACAT

 • Stretch goal: repository content manager training course for software librarians [est 2 FTW]

 24
Summary
Outlook and Goals for 2021

 Main Priorities for 2021
 1. Consolidation and exploitation of the CernVM-FS new services and features
 2. Improve usability and scale of CernVM-FS based container deployments
 3. Demonstration of a kubernetes-native publishing workflow (with SPI)

 The team successfully addressed a number of technology challenges in the last 12-18 months, in
 particular CernVM-FS integration with the container ecosystem, unprivileged client deployments
 (crucial for HPC access) and containerized publishing. In 2021, the new developments will undergo a
 phase of consolidation and hardening.

 25
Backup Slides
Stretch Goal: CernVM-FS Conveyor

 A high-level abstraction of writing based on interdependent publication jobs.

 $ ssh cvmfs-sft.cern.ch {
 $ cvmfs_server transaction sft.cern.ch /lcg/ROOT "repository": "sft.cern.ch",
 $ tar -xf ROOT-6.18.tar.gz "path": "/lcg/ROOT",
 $ post-install.sh "payload": "https://root.cern.ch/ROOT-6.18.tar.gz",
 $ cvmfs_server publish "script": "https://spi.cern.ch/post-install.sh",
 "uuid": "e7b67a2...",
 "dependencies": ["f61d...", "a00e...", "..."]
 }

 oach
 Current appr
 • Send jobs to Conveyor API
 • Conveyor distributes work to multiple
 publisher nodes

 Goal: liberate CI pipeline from handling cvmfs_server intrinsics.
 Prototype available, est 1–2 months to develop into a first usable version in collaboration with SPI
You can also read