www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council

Page created by Darryl Ortiz
 
CONTINUE READING
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
www.beegfs.io                Marco Merkel, VP WW
                April 2019    Sales and Consulting
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
Quick Facts: BeeGFS
   Is a hardware-independent possix parallel file
  system (Software-defined Parallel Storage)
   Designed for performance, scalability, robustness
  & ease of use
   Easy to install, maintain (user space)
   Hardware agnostic: X86, ARM, OpenPower, AMD                                                 /mnt/beegfs/dir1
   Linux distros (*.*): RHEL, SLES, Ubuntu…
   No Linux patches, on top of EXT, XFS, ZFS ..
   Supports RDMA / RoCE & TCP, Omni-Path

                                                                                                                                                                                …
   NFS, CIFS, Hadoop enabled                            1                   1 1                2 2                  3 2                                    M MM
                                                                                                                                        3 3
   Robust & flexible (any IO pattern)
                                                       Storage Server #1   Storage Server #2   Storage Server #3   Storage Server #4   Storage Server #5   Metadata Server #1

                                               Simply grow capacity & performance to the level that you need non-disruptive

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
First point of contact for BeeGFS worldwide
         Founded in 2014 - Fraunhofer spin-off
                    Cooperative development together with Fraunhofer
                    (Fraunhofer continues to maintain a core BeeGFS HPC team)

         TPQ delivers consulting, professional services & support for BeeGFS
                                                                                We didnt paid for
         Development is self funded

         Made in Germany

    L3 support go to market approach
         Partner deliver turnkey solution and 1st & 2nd level support

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
Delivering solutions for

             HPC         AI / Deep Learning   Life Sciences   Oil and Gas

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
BeeGFS – The Leading Parallel Cluster File System
                                                   Client Service

      WellPerformance
           balanced from                                                                 Easy to deploy and
                                                                    Metadata Service
                                                                                            Easewith
                                                                                       integrate  of Use
                                                                                                      existing
      small to large files
                                                                                           infrastructure

                             Direct Parallel
                              File Access
                                               Storage Service

    Increase file system                                                               High availability design
     performance
         Scalabilityand                                                                 enabling  continuous
                                                                                              Robust
  capacity, seamlessly and                                                                   operations
      nondisruptively

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
Enterprise Features
    BeeGFS Enterprise Features :
      High Availability                       link
      Quota Enforcement                       link
      Access Control Lists (ACLs)             link
      Storage Pools                           link
      Burst buffer function with Beeond       link

    Support Benefits:
              Professional Support
              Customer Portal (training, documentation, scripts)
              Special repositories with early updates and hotfixes
              Guaranteed next business day response
              Remote support via ssh

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
Concept

 beegfs.io
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
BeeGFS Architecture
                                                                                          Client Service

        Client Service
            Native Linux module to mount the file system
                                                                                                           Metadata Service
        Management Service
          Service registry and watch dog
        Metadata Service
          Maintain striping information for files                   Direct Parallel
          Not involved in data access between file open/close        File Access
                                                                                      Storage Service
        Storage Service
           Store the (distributed) file contents
        Graphical Administration and Monitoring Service
           GUI to perform administrative tasks and monitor system
           information
                  Can be used for “Windows-style installation“

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
High Availability

    Built-in Replication for High       Storage               Storage            Storage                Storage
   Availability                        Server #1             Server #2          Server #3              Server #4
    Flexible setting per file,
   directory, target
                                    Target #101                Target #201   Target #301                 Target #401
    Individual for metadata
   and/or storage                                  Buddy                                     Buddy
                                                  Group #1                                  Group #2
    Buddies can be in different
   racks or different fire zones.

ThinkParQ Confidential
www.beegfs.io Marco Merkel, VP WW Sales and Consulting - HPC Advisory Council
Storage Pool

                                     Storage
                                     Service                                      …
▪      Support for different types
       of storage
▪      Modification Event Logging
▪      Statistics in time series
       database
                                               Performance Pool   Capacity Pool

                                                   Current           Finished
                                                   Projects          Projects

ThinkParQ Confidential
BeeOND – BeeGFS On Demand

      Create a parallel file system instance on-the-fly
      Start/stop with one simple command
                                                          Compute   Compute   Compute                     Compute
      Use cases: cloud computing, test systems,           Node #1   Node #2   Node #3                     Node #n

      cluster compute nodes, …..                                                            …
      Can be integrated in cluster batch system
      Common use case:
                                                                                        User-controlled
      per-job parallel file system                                                       Data Staging
           Aggregate the performance and capacity of
           local SSDs/disks in compute nodes of a job
           Take load from global storage
           Speed up "nasty" I/O patterns

ThinkParQ Confidential
The easiest way to setup a parallel filesystem…
      # GENERAL USAGE…
      $ beeond start –n  -d  -c 

      -------------------------------------------------

      # EXAMPLE…
      $ beeond start –n $NODEFILE –d /local_disk/beeond –c /my_scratch

      Starting BeeOND Services…
      Mounting BeeOND at /my_scratch…
      Done.

ThinkParQ Confidential
Live per-Client and per-User Statistics

ThinkParQ Confidential
Cluster Manager Integration

ThinkParQ Confidential
BeeGFS and BeeOND

 beegfs.io
Scale from small

                         Converged Setup

ThinkParQ Confidential
Into Enterprise

                                                                                   Storage
                                                                                   Service

                                                                                 ...

                                                Storage
                                                Service

                                                          Direct Parallel File
                                                               Access
                                                                                             ...
                         Direct Parallel File
                              Access

ThinkParQ Confidential
to BeeOND

                         NvME

                                Storage
                                Service

                                          ...
ThinkParQ Confidential
BeeGFS Use Cases
HPC & Cognitive Infrastructures….

   beegfs.io
What is
                           Software-defined SAN for
                         ultra low Latency over Fabrics
            Patented technology to access remote NVMe
                  Bypassing the target-side CPU
                                                          /
            Scale-out
                  Multiple drives, multiple servers
            Logical volumes
                  Data protection and high availability
            Extremely easy & flexible management
                  Web GUI, command-line and REST API

ThinkParQ Confidential
Project “Stingr”:                                      +          in a Box
                         Perfectly balanced Big Twin
            Elegant and dense
                  4x Server in 2U, 24x NVMe, 8x NIC
                  Runs all NVMesh components and BeeGFS
                  servers (separate clients for this benchmark)
            Unleashed performance
                  Random 4K write IOPS boosted to >1Mio (x2.5)
                  File creates boosted to >600,000/s (x3)
                  Reduced CPU load

ThinkParQ Confidential
Typical AI System with                                   +                   in a Box

                         The DGX-2s here are very happy for not starving on I/O :-)

ThinkParQ Confidential
Alfred Wegener Institute for Polar and Marine Research

      Institute was founded in 1980 and is named
      after meteorologist, climatologist and geologist Alfred Wegener.
      Government funded
      Conducts research in the Arctic, in the Antarctic and in the high
      and mid latitude oceans
      Additional research topics are:
           North Sea research
           Marine biological monitoring
           Technical marine developments
      Actual mission: In September 2019 the icebreaker Polarstern will
      drift through the Arctic Ocean for 1 year with 600 team
      members from 17 countries & use the data gathered to take
      climate and ecosystem research to the next level.

ThinkParQ Confidential
Day to day HPC operations @AWI

 CS400
      11,548 Cores
      316 Nodes:
                2x Intel Xeon Broadwell 18-Core CPUs
                64GB RAM (DDR4 2400MHz)
                400GB SSD
      4 fat compute nodes, as above, but 512GB RAM                  Global BeeGFS storage on spinning disks
      1 very fat node, 2x Intel Broadwell 14-Core CPUs, 1.5TB RAM          1PB of scratch_fs providing 80GB/s
                                                                    316 compute nodes
      Intel Omnipath network
                                                                           Each equipped with 400MB SSD each
      1024TB fast parallel file system (BeeGFS)                     316x500MB/s per SSD equals 168 GB/s aggregate
      128TB home and software file system                           BeeOND burst

ThinkParQ Confidential
AIST (National Institute of Advanced Industrial Science &
Technology)

     Japanese Research Institute located in the Greater
     Tokyo Area
     Over 2,000 researchers
     Part of the Ministry of Economy, Trade and Industry
                                                             1,088 servers
                                                             Two Intel Xeon Gold processor CPUs (a total of
                                                            2,176 CPUs)
ABCI (AI Bridging Cloud Infrastructure)
                                                             Four NVIDIA Tesla V100 GPU computing cards (a
   Japanese supercomputer in production since July 2018     total of 4,352 GPUs)
   Theoretical performance is 130pflops – one of the         Intel SSD DC P4600 series based on an NVMe
   fastest in the world                                     standard, as local storage. 1.6TB per node (a total
   Will make its resources available through the cloud to   of about 1.6PB)
   various private and public entities in Japan              InfiniBand EDR
   #7 on the Top 500 list                                    Simple integration with Univa Grid Engine

ThinkParQ Confidential
CSIRO

    CSIRO) has adopted BeeGFS file system for their 2PB all NVMe storage in
    Australia, making it one of the largest AI NVMe storage systems in the
    world.
                                                                              Metadata
    Overview:                                                                 x4
      4 x Metadata Server
      32 x Storage Server
      2 PiB usable capacity DELL all NVMe
                                                                              Storage
      Look forward to ISC to see what the beast can do!
                                                                              x 32
      Further details: http://www.pacificteck.com/?p=437

                                                                              3.2 TB NVMe
                                                                              x 24
                                                                              per server

ThinkParQ Confidential
AI Integration(s)
                         > Script insert for SLURM Prolog on first node of a job:
                         >
                         > if [ "$SLURM_JOB_PARTITION" = "beeond" ] then
                         >
                         LOGDIR="/global/opt/slurm/default/var/log/slurm/AWI_logs"
                         >                 NODEFILE="$LOGDIR/slurm_nodelist.$SLURM_JOB_ID"
                         >                 echo >> $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
   Singularity           >                 date >> $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
                         >                 whoami >> $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
                         >                 hostname >> $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
                         >                 /global/opt/slurm/default/bin/scontrol show
                         hostnames $SLURM_NODELIST > $NODEFILE 2>&1
                         >                 /usr/bin/beeond start -n $NODEFILE -d
                         /mnt/lssd1/.beeond_data -c /mnt/beeond -P -F -L /tmp >>
                         $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
                         > fi
                         > Script insert for SLURM Epilog:
                         > if [ "$SLURM_JOB_PARTITION" = "beeond" ] then
                         >
                         LOGDIR="/global/opt/slurm/default/var/log/slurm/AWI_logs"
                         >                 NODEFILE="$LOGDIR/slurm_nodelist.$SLURM_JOB_ID"
                         >                 echo "stop --" >> $LOGDIR/slurm_beeond.$SLURM_JOB_ID
                         2>&1
                         >                 date >> $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
                         >                 /usr/bin/beeond stop -n $NODEFILE -L -d >>
                         $LOGDIR/slurm_beeond.$SLURM_JOB_ID 2>&1
                         > fi
                         > The paths of slurm and the name of the beeond directory are
                         installation-specific.

ThinkParQ Confidential
Budget Information

 beegfs.io
BeeGFS Pricing
 Per Node/Per Year       Per Node/Per Year       Up to 100 clients/   Up to 500 clients/   > 500 clients/   Per
                                                 Per Year             Per Year             Year
 MDS Server              OSS Server              BeeOND               BeeOND               BeeOND
 1.230 €                 1.780 €                 8.340 €              11.120 €             16.670 €

Training                      SSH Installation
1.200 €                       1.200 €

ThinkParQ Confidential
BeeGFS Benchmark Tools
                   https://www.beegfs.io/wiki/Benchmark

ThinkParQ Confidential
Evaluating the Metadata Performance
    https://www.beegfs.io/docs/whitepapers/Metadata_Performance_Evaluation_of_BeeGFS_by_ThinkParQ.pdf

ThinkParQ Confidential
BeeGFS – customer/partner portal
•   https://www.beegfs.io/login/wiki2/TableOfContents

ThinkParQ Confidential
Partners
 Platinum Partners       Gold Partners

ThinkParQ Confidential
Technology Partners

ThinkParQ Confidential
Web      www.beegfs.io

Mail     training@beegfs.io
         info@thinkparq.com
         support@beegfs.io

Newsletter
www.beegfs.io/news

                                          Follow BeeGFS:

             marco.merkel@thinkparq.com

  ThinkParQ Confidential
You can also read