FriDAQ Infrastructure - CERN Indico

Page created by Frederick Garcia
 
CONTINUE READING
FriDAQ Infrastructure - CERN Indico
FriDAQ Infrastructure

                       Benjamin Moritz Veit

                         8. Februar 2021

Benjamin Moritz Veit       FriDAQ Infrastructure   8. Februar 2021   1 / 24
FriDAQ Infrastructure - CERN Indico
Selection of Server Hardware
          Usage of AMD EPYC architecture in other DAQ systems
             (e.g. LHCb) lead us to look into this architecture.

 AMD EPYC MCM architecture is considerably different from Intel single
      die configuration → Advantages in NUMA configuration.

           Higher IO capability (PCIe4.0) and more lanes per CPU.

                      Better price/performance figure than Intel.

                            AMD EPYC 7002 architecture:
      Chip             Cores/Threads   max Freq.       TPD      Cache   Cost       Cost/Core
    EPYC 7282              16/32        2.8Ghz         120W     64MB    650           40
   EPYC 7402P              24/48        2.8Ghz         180W    128MB    1300          54
    EPYC 7542              32/64        2.9Ghz         225W    128MB    2660          83
    EPYC 7262              8/16         3.2Ghz         155W    128MB    650           81
   EPYC 7232P              8/16         3.1Ghz         120W     32MB    460           58

               → EPYC7282 has the best price/performance tag!
    Benjamin Moritz Veit               FriDAQ Infrastructure                   8. Februar 2021   2 / 24
FriDAQ Infrastructure - CERN Indico
Supermicro 1014S-WTRT

                                                  Single AMD EPYC7002
                                                  Processor
                                                  8x DIMMs
                                                  (ECC DDR4-3200MHz)
                                                  2x PCI-E 4.0 x16 (FHFL) slots
                                                  1x PCI-E 4.0 x16 (LP) slot
                                                  4 Hot-swap 3.5 SATA3 drive
                                                  2xPCIe/SATA3 NVMe M.2
                                                  2x10GBase-T (Broadcom
                                                  BCM57416)
                                                  Integrated IPMI 2.0 + KVM
                                                  (dedicated port)

   Benjamin Moritz Veit   FriDAQ Infrastructure               8. Februar 2021   3 / 24
FriDAQ Infrastructure - CERN Indico
AMBER Server V1
                                                              Single AMD EPYC7282
                                                              Processor (16C/32T)
                                                              64GB (ECC DDR4-3200MHz)
                                                              512GB NVMe SSD
                                                              4x 3.5 HDD Bay
                                                              Expansion cards:
                                                                   Nvidia ConnectX-4Lx
                                                                   2x25Gbit (default)
                                                                   LSI MegaRAID SAS
                                                                   9380-8e (optional +750Euro)
                                                                   Spillbuffer card   (optional)

 Universal system for all computation nodes in the AMBER DAQ. Allows
             up to 3x PCIe expansion cards and 4x 3.5 HDD

                           Cost point ≈ 2200Euro (base system)

    Benjamin Moritz Veit              FriDAQ Infrastructure               8. Februar 2021      4 / 24
FriDAQ Infrastructure - CERN Indico
Local Storage
               JBOD Disk
                Chassis
                 (384TB each)
                                                ...

               4x MiniSAS3
                (12Gbit/s)

                ReadOut
                Engines                         ...

                 25Gbit/s
                SFP+ DAC

               QFX5120-48Y
                 25Gbit
                 Switch

                 25Gbit/s
                SFP+ DAC

                HLT Nodes
                                                ...

   Benjamin Moritz Veit         FriDAQ Infrastructure   8. Februar 2021   5 / 24
FriDAQ Infrastructure - CERN Indico
Supermicro SuperChassis CSE-846BE2C-R1K03JBOD

                                                           4U Storage JBOD Chassis
                                                           24 x 3.5 hot-swappable HDDs
                                                           bays
                                                           8 x Mini-SAS HD ports
                                                           1x IPMI port for Remote
                                                           System Power on/off and
                                                           system monitoring
                                                           Dual Expander Backplane
                                                           Boards support SAS3/2 HDDs
                                                           with 12Gb/s throughput
                                                           1000W (1+1) 96% efficient
                                                           Titanium level power supplies

                          24x 16TB = 384TB Raw capacity
              System Cost (24x320Euro + 1500Euro + 750Euro ≈ 10000 Euro)

   Benjamin Moritz Veit            FriDAQ Infrastructure               8. Februar 2021   6 / 24
FriDAQ Infrastructure - CERN Indico
DAQ General Overview
                                                                                                                                                                CERN
   FrontEnd                                                                                                        SpillBuffer PCIe   ReadOut Engine            Tape
                                                                          LV1+ Multiplexer
                                                                                                                           JBOD Local Storage                  Archive                  HLT Node
                       LV0
   FrontEnd         Multiplexer
                                                                                                                   SpillBuffer PCIe   ReadOut Engine
                                        Crosspoint                        LV1+ Multiplexer                                                                                              HLT Node
   FrontEnd                                                                                                                JBOD Local Storage

                                                                                                                                                                  20Gbit/s
                                          Switch
                                         (72x72)
                                                                                                                   SpillBuffer PCIe   ReadOut Engine
                                                                          LV1+ Multiplexer                                                                                              HLT Node
   FrontEnd                                                                                                                JBOD Local Storage
                       LV0
   FrontEnd                                                                                                        SpillBuffer PCIe   ReadOut Engine
                    Multiplexer
                                                                          LV1+ Multiplexer                                 JBOD Local Storage                                           HLT Node

                                                          24 Interlinks
   FrontEnd                                                                                    DAQ Switch
                                                                                                   (8x8)
                                                                                                                   SpillBuffer PCIe   ReadOut Engine
                                                                                                51.2Gbit/s                                                     25Gbit
                                                                          LV1+ Multiplexer                                 JBOD Local Storage                                           HLT Node
                                                                                                                                                            AMBER Network
    ...

                      ...

                                                                                                                   SpillBuffer PCIe   ReadOut Engine
                                                                          LV1+ Multiplexer                                                                                              HLT Node
                                                                                                                           JBOD Local Storage
                                        Crosspoint
                                          Switch                                                                   SpillBuffer PCIe   ReadOut Engine
                                         (72x72)                          LV1+ Multiplexer                                                                                              HLT Node
                                                                                                                           JBOD Local Storage
   FrontEnd
                       LV0
                    Multiplexer                                                                                    SpillBuffer PCIe   ReadOut Engine
   FrontEnd                                                               LV1+ Multiplexer                                                                                              HLT Node
                                                                                                                           JBOD Local Storage

    FE sents      Multiplexing on    Load ballancing                      Multiplexing on    Multiplexing data      Receives data and save it in local                             Fetches data from
 continous HIT    image level and        of links                         TimeSlices level   which belongs to                   storage                                             local storage for
 information in      buffering         between LV0                         and buffering     one TimeSlices to                                                                          filtering
     Images                           and LV1+ MUX                                             one ReadOut            Local storage for O(1-2Weeks)
                                                                                                  Engine

                                    Max 120 Input Links                                      Max Rate: 8x6.4Gb/s

              Possible scheme for maximum configuration
              96 input links on the cross-point switches
              8x8 DAQ switch
              → max 8 readout engines → 8x650 Mbyte/s = 5.2 Gbyte/s                                                                                      (sustained)
              Can be doubled by adding a second DAQ switch!
              Benjamin Moritz Veit                                                     FriDAQ Infrastructure                                                          8. Februar 2021            7 / 24
FriDAQ Infrastructure - CERN Indico
FriDAQ Rack Planning
                                                                                Server Rack                                   Server Rack                                     Server Rack

                                                24xMTP-Patch-Panel
                                                                                                                                                      Router
                                                                                                     X-Switch
                                                24xMTP-Patch-Panel                                                                                 NetworkMaster
                                                24xMTP-Patch-Panel                                                                                     GW01
                                                                                                     X-Switch
                                                                                                                                                       GW02
                                                24xMTP-Patch-Panel
                                                                                                                                                       HLT01
                                                                                                    aTCA 8xMUX2
                                                       4xMTP->LC
                                                       4xMTP->LC                                                                                       HLT02
                                                       Ethernet Switch                              TCS Distribution
                                                                                                    aTCA Switch                                        HLT03
                                                                                                     TimeSlice
                                                                                                      Builder
                                                                                                                                                       HLT04
                                                       aTCA 24xMUX1
                                                                                                    DB Master                                          HLT05

                                                                                                    FileServer                                         HLT06
                                                       TCS Distribution

                                                                                                    Web Server                                         HLT07
                                                       Ethernet Switch
                                                                                                        DAQ                                            HLT08
                                                       TCS Distribution                                Master
                                                                                                       Switch                                       Ethernet Switch
                                                                                                       RE01                                             RE05
                                                       aTCA 24xMUX1
                                                                                                     Storage1                                         Storage5

                                                                                                        RE02                                            RE06

                                                       aTCA 24xMUX1                                  Storage2                                         Storage6

                                                       TCS Distribution                                 RE03                                            RE07

                                                       Ethernet Switch
                                                                                                     Storage3                                         Storage7
                                                       TCS Distribution

                                                                                                        RE04                                            RE08
                                                       aTCA 24xMUX1
                                                                                                     Storage4                                         Storage8

                                                                          4x100W Switches = 0.4kW                        8*350W Server = 2.8kW
                                                                                                                                                                          12*300W HLT = 3.6kW
                                                                            4*6Slot*200W = 4.8kW                       4x250W Disk Array = 1.0kW                       4*250W Disk Array = 1.0kW
                                                                                                                          2*2Slot*200W = 0.8kW                           1x100W Switch = 0.1kW
                                                                                                                       2x100W(X-Switch) = 0.2kW
                                                                                                                                                                         1x300W Router = 0.3kW
                                                                                                                         1x100W Switch = 0.1kW                        3*150W Small Server = 0.5kW

                                                                                  5.2kW                                       4.9kW                                             5.5kW
https://schroff.nvent.com/en-gb/search#q=ATCA

                         → Power requirements: 2x 16 A per rack
      Benjamin Moritz Veit                 FriDAQ Infrastructure                                                                                   8. Februar 2021                                  8 / 24
FriDAQ Infrastructure - CERN Indico
Network

 CERN IT route the subnet 172.22.0.0/18 (172.22.0.1 - 172.22.63.254 [16,384]) via our
              new layer3 switch since December 2020.

VLANs are used to separate sub-nets. The cross sub-net communication is realized via a central
         layer3 switch. DHCP/DNS is provided by a new, separated network gateway.

     Benjamin Moritz Veit             FriDAQ Infrastructure                8. Februar 2021   9 / 24
FriDAQ Infrastructure - CERN Indico
Network Scheme
                          COMPASS CR Bld. 892                                                             COMPASS DOMAIN
                                                                                 COMPASS/GPN                 172.22.24.0/24
                                                                                  Gateway and                   VlanID: 10
                                                                                DHCP/DNS for privat      l3-interface: 172.22.24.23
                                                                                     VLANS

                                                                                                            COMPASS IPBus
                                                                                                              10.152.0.0/16

                                                                                                                                       COMPASS
                                                                                                                VlanID: 50
                                                                                                           l3-interface: 10.152.0.5
                                                                                     PCCOGW00
                                                                                    172.22.24.241
                              CERN Network
                                                                                                          COMPASS SlowCTRL
                                                                                                             192.168.104.0/24
                                                                                    VLAN Trunk (all)
                                                                                                                 VlanID: 70
                                                                                                       l3-interface: 192.168.104.5

                                                                                                           COMPASSPriv
                                                        20Gbit LAG
                                                      COMPASS DOMAIN
                                                                                                             192.168.101.0/24
                                                                                                                VlanID: 60
                                                                                                         l3-interface: 192.168.101.5
                               CERN HP                                               COMPASS
                                Router                                             Juniper Switch
                              172.22.24.1                                           172.22.24.23

                                                                                                                 IPMI
                                                                                                             192.168.100.0/24
                                                                                                                VlanID: 30
                                                                                                         l3-interface: 192.168.100.5

                                                                       VLAN Trunk (all)
                                                                                                                 AMBER
                                                                                                             172.22.28.0/22
                                                                                                               VlanID: 128
                                                                                                           l3-interface: 172.22.28.1

                                                                                                                                       AMBER
                                                                                                          AMBER SlowCTRL
                                                         Experimental Area                                   172.22.32.0/22
                                                                                                               VlanID: 132
                                                                                                           l3-interface: 172.22.32.1

                                                               ...
                                             Access Switch                 Access Switch
                                                                                                             AMBER IPBus
                                                                                                             172.22.36.0/20
                                                                                                               VlanID: 136
                                                                                                           l3-interface: 172.22.36.1

Routing between COMPASS and AMBER network might be required if we
                want to use COMPASS control room?!
   Benjamin Moritz Veit                                           FriDAQ Infrastructure                                                8. Februar 2021   10 / 24
New central layer3 switch

    Juniper QFX-5120-48Y
    48x 25 GbE (SFP28)/10 GbE (SFP+)/1 GbE (SFP) downlink ports
    8x 100 GbE (QSFP28)/40 GbE (QSFP+) uplink ports
    Up to 4 Tbps L2 and L3 performance (bidirectional)
    Latency as low as 550 nanoseconds
    2.9 GHz quad-core Intel CPU with 16 GB memory and 100 GB SSD
    storage

   Benjamin Moritz Veit      FriDAQ Infrastructure       8. Februar 2021   11 / 24
Status of Network Installation

    Central layer3 network switch installed and configured
    All switches in the area are updated and configure
    All switches are connected to juniper switch as star point
    New NETGW00 deployed with OPNsense as OS - act as DNS and DHCP
    Network configuration of servers in COMPASS network is adapted
    Benjamin Moritz Veit           FriDAQ Infrastructure         8. Februar 2021   12 / 24
Network Map

        All network equipment is integrated in Zabbix monitoring!
   Benjamin Moritz Veit       FriDAQ Infrastructure       8. Februar 2021   13 / 24
Speed towards CTA

The connection towards the new Central Tape Archive was tested during
                          the 2020 Dry Run.

                          Over 7 Gbit/s archived!

Still discussion over bigger up-link towards CERN data centers is ongoing
                              (min 20 Gbit/s)

   Benjamin Moritz Veit        FriDAQ Infrastructure      8. Februar 2021   14 / 24
Fiber Distribution Status Quo

    Old style SC connectors / low density installation / aging
    → Multiple conversions between connector types
    LC-SC/SC-MTP/MTP-LC... Multiple points of failure!
    Lack of fibers especially at BMS/CEDARS, Target, Gallery, Trigger
    → Use of Multiplexer in experimental area to concentrate fibers
    Sensitive to radiation! (Problems on Gallery and CEDARs in 2018)
    Change of classical read-out scheme to trigger-less read-out
    → Additional fibers needed to avoid multiple MUXs in area.
    → Higher speed of serial links not compatible with current fibers.
    Avoid radiation issue for future hadron runs!

   Benjamin Moritz Veit       FriDAQ Infrastructure        8. Februar 2021   15 / 24
MTP/MPO Technology

High density Fiber Connectors - used also in all new DAQ developments.

         MTP-24 with OM3/4 fiber as new standard for the DAQ.
              https://www.samm.com/en/page/109/mpo-mtp-frequently-asked-questions.html

   Benjamin Moritz Veit                 FriDAQ Infrastructure                    8. Februar 2021   16 / 24
Plans:

Removing all Multiplexers from the area (at least for the high radiation locations) and have
 direct connections to the DAQ barracks and place multiplexers there...

                                                 4

        0
                       1                                              8
                                                     3        5
                                         2   9

                                                     6            7

     Benjamin Moritz Veit             FriDAQ Infrastructure               8. Februar 2021   17 / 24
Test Area (CEDAR)

                                 TPC

                                                           1
                                         TPC

To connect our equipment at the two test location ≈20-25m of patch cables are needed

Do we want to have a direct run from BMS to our DAQ or go over (1) as a patch-point?

     Benjamin Moritz Veit          FriDAQ Infrastructure           8. Februar 2021   18 / 24
Status

                             Fe Fibers    Estimated          Newly
    Position        Name     old setup    new Fibers        Installed         Comment
         0          BMS             14                72                0
         1         CEDAR            12                72                0   Covert by BE
         2          Target         341               432          144       To be Installed
         3           SM2           401               432          144          Installed
         4         Gallery         305               360                0
         5         ECAL2           390               432                0
         6          VETO            12                72           24          Installed
         7         Trigger          24                72           48          Installed
         8       BeamDump           12                72                0
         9          RICH           180               216          144          Installed

We have to decide about BeamDump position if we want to use it as test
                         location for PRM!
   Benjamin Moritz Veit                  FriDAQ Infrastructure                    8. Februar 2021   19 / 24
New Fibers

   Benjamin Moritz Veit   FriDAQ Infrastructure   8. Februar 2021   20 / 24
New Fiber patch panels
            DAQ-Barracks:                                SM2-Position:

                                                 11xMTP-24
                                                 24xLC
                                                       RICH-Position:
   48xMTP-24 in 2x1U
   96xLC to 8x MTP-24 in 2x1U
                                                 11xMTP-24
                                                 24xLC
    Material for Target position arrived but still have to be installed.
   Benjamin Moritz Veit        FriDAQ Infrastructure               8. Februar 2021   21 / 24
General Status

   Benjamin Moritz Veit   FriDAQ Infrastructure   8. Februar 2021   22 / 24
General Status

   Benjamin Moritz Veit   FriDAQ Infrastructure   8. Februar 2021   23 / 24
Order/Delivery Status

                         Goal for tests in 2021:
    General Infrastructure + 1x complete read-out chain running ...

   4x Amber Server V1           (1x File-Server, 1x DB-Server, 1x ReadOutEngine, 1x HLT)

   ( 2x already delivered, 2x ordered )

   2x Low cost Server         (1x Gateway, 1x Web Server)

   ( 2x ordered )

   2x 24 Port 1 GbE Network Switch
   ( not yet ordered - waiting for final offer )

   8x 16 TB HDD
   ( 8x ordered - arrived today at CERN)

   1x Storage Array + Raid Controller
   ( Not yet ordered - waiting for budget (≈ 10kEuro) )

   Benjamin Moritz Veit                 FriDAQ Infrastructure                         8. Februar 2021   24 / 24
You can also read