Hardware Based Compression in Ceph OSD with BTRFS - SNIA

 
Hardware Based Compression in Ceph
         OSD with BTRFS
          Weigang Li (weigang.li@intel.com)
            Brian Will (brian.will@intel.com)
       Praveen Mosur (praveen.mosur@intel.com)
        Edward Pullin (edward.j.pullin@intel.com)

                         Intel Corporation

        2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Agenda
  Motivation
    Data compression: value vs. cost
  Hardware based compression in Ceph OSD
    Benefit of hardware acceleration
    Compression algorithms
    Compression in BTRFS & Ceph
    PoC implementation
  Benchmark configuration & result

                                                                                    2

            2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression: Why

                                 More Data

               10x Data Growth from
                   2013 to 20201
           •    Digital universe doubling every two years                                          Compression can save
           •    Data from the Internet of Things, will grow 5x
           •    % of data that is analyzed grows from 22% to 37%                                   storage capacity

 1   Source: April 2014, EMC* Digital Universe with Research & Analysis by IDC*

                                                                                                                          3

                                        2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression: Cost
          90.00                                                                      60%
                                                                                                         Compress 1GB Calgary
          80.00         51%                                                                               Corpus* file on one CPU
                                                                                     50%
                                                                                                          core (HT).
          70.00
                                                                                                         Compression ratio: less
          60.00                                                                      40%
                                        38%                                                               is better
    sec

          50.00                                                   33%                                      cRatio = compressed size /

                                                                                           cRatio %
                                                                                     30%                   original size
          40.00                                                                28%
                                                                                                         CPU intensive, better
          30.00                                                                      20%                  compression ratio
          20.00
                                                                                                          requires more CPU time.
                                                                                     10%
          10.00
                                                                                                          Source as of August 2016: Intel internal measurements with dual E5-
                                                                                                          2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit,
                                                                                                          DDR4-128GB
           0.00                                                                      0%                   Software and workloads used in performance tests may have been
                  lzo            gzip-1                gzip-6           bzip2                             optimized for performance only on Intel microprocessors. Any change to
                                                                                                          any of those factors may cause the results to vary. You should consult
      real (s)    6.37           22.75                 55.15            83.74                             other information and performance tests to assist you in fully evaluating
                                                                                                          your contemplated purchases, including the performance of that product
      user (s)    4.07           22.09                 54.51            83.18                             when combined with other products. Any difference in system hardware
                                                                                                          or software design or configuration may affect actual performance.
      sys (s)     0.79           0.64                      0.59         0.52                              Results have been estimated based on internal Intel analysis and are
                                                                                                          provided for informational purposes only. Any difference in system
      cRatio %    51%            38%                       33%          28%                               hardware or software design or configuration may affect actual
                                                                                                          performance. For more information go to
                                        Compression tool                                                  http://www.intel.com/performance

                                                                                                                                                                                      4
*The Calgary Corpus is a collection of text and binary data files, commonly used for comparing data compression algorithms.

                                  2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Acceleration - Intel® QuickAssist
Technology
               • Security (symmetric encryption
    Bulk         and authentication) for data in
Cryptography     flight and at rest

               • Secure Key Establishment                              Chipset            PCI Express*           SoC
 Public Key
                 (asymmetric encryption, digital                                          Plugin Card
Cryptography     signatures, key exchange)
                                                                                            Connects to
                                                                      Connects to                           Connects to CPU
                                                                                            CPU via off-
                                                                      CPU via on-                              via on-chip
                                                                                             board PCI
                                                                       board PCI                              interconnect
                                                                                           Express* lanes
                                                                     Express* lanes
                                                                                               (slot)
               • Lossless data compression for
Compression      data in flight and at rest

 Intel® QuickAssist Technology integrates hardware acceleration of compute
   intensive workloads (specifically, cryptography & compression) on Intel®
                           Architecture Platforms                                                                             5

                          2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benefit of Hardware Acceleration
             90.00                                                                                                                                                                                                                   60%

             80.00                          51%
                                                                                                                                                                                                                                     50%
             70.00
                                                                              40%                       38%
             60.00                                                                                                                         38%                                                                                       40%
                                                                                                                                                                             33%
                   sec

             50.00

                                                                                                                                                                                                                                           cRatio %
                                                                                                                                                                                                                                     30%
             40.00
                                                                    Less CPU load,                                                                                                                                 28%

                                                                    better compression
             30.00                                                  ratio                                                                                                                                                            20%

             20.00
                                                                                                                                                                                                                                     10%
             10.00

               0.00                                                                                                                                                                                                                  0%
                                      lzo                           accel-1 *                       accel-6 **                          gzip-1                           gzip-6                            bzip2
           real (s)                  6.37                              4.01                             8.01                            22.75                             55.15                            83.74
           user (s)                  4.07                              0.49                             0.45                            22.09                             54.51                            83.18
           sys (s)                   0.79                              1.31                             1.22                             0.64                             0.59                              0.52
           cRatio %                  51%                               40%                              38%                              38%                              33%                               28%
                                                                                                                Compression tool

  * Intel® QuickAssist Technology DH8955 level-1
                                                                                                                                  Compress 1GB Calgary Corpus File
  ** Intel® QuickAssist Technology DH8955 level-6

 Source as of August 2016: Intel internal measurements with dual E5-2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit, 1 x DH8955 adaptor, DDR4-128GB
 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any change to any of those factors may cause the results to vary. You should consult other information
 and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Any difference in system hardware or software design or                            6
 configuration may affect actual performance. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration
 may affect actual performance. For more information go to http://www.intel.com/performance

                                                               2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
BTRFS Introduction
  Copy on Write (CoW) filesystem for Linux.
  “Has the correct feature set and roadmap to serve
   Ceph in the long-term, and is recommended for
   testing, development, and any non-critical
   deployments… This compelling list of features
   makes btrfs the ideal choice for Ceph clusters”*
  Native compression support.
     Mount with “compress” or “compress-force”.
     ZLIB / LZO supported.
     Compress up to 128KB each time.
                                                                                                       7
* http://docs.ceph.com/docs/hammer/rados/configuration/filesystem-recommendations/

                               2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression Algorithm
   BTRFS currently supports LZO and ZLIB:
      The Lempel-Ziv-Oberhumer (LZO) compression is a
       portable and lossless compression library that focuses on
       compression speed rather than data compression ratio.
      ZLIB provides lossless data compression based on the
       DEFLATE compression algorithm.
           LZ77 + Huffman coding
           Good compression ratio, slow
   Intel® QuickAssist Technology supports:
      DEFLATE: LZ77 compression followed by Huffman coding
       with GZIP or ZLIB header.
                                                                                               8

                       2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Compression in BTRFS

                          Application
                                                                                    BTRFS compress the
                                                                                     page buffers before
user                           sys                                                   writing to the storage
                               call
kernel                                                                               media.
                    VFS                                Page
                                                       Cache
                                                                                    LKCF select hardware
                  BTRFS               LZO
                                                                                     engine for compression.
                                      ZLIB                                          Data compressed by
                                                                                     hardware can be de-
  Linux Kernel Crypto API                                        Flush
                                                                                     compressed by software
      async compress                                                                 library, and vise versa.
                                  Job DONE

Intel®QuickAssist Technology
           Driver

                                                                  Storage
         Intel® QuickAssist Technology                             Media                                        9

                            2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Compression in BTRFS (Cont.)
                         btrfs_compress_pages                                              BTRFS submit “async”
                                                                                            compression job with sg-list
                                                                                            containing up to 32 x 4K
                    zlib_compress_pages_async                                               pages.
                                                                                           BTRFS compression thread
                                                                                            is put to sleep when the
          Uncmpressed Data                Cmpressed Data                                    “async” compression API is
    4K       4K      …       4K            4K        4K            4K                       called.
                                                                                           BTRFS compression thread
 Input Buffer (up to128KB)             Output buffer (pre-allocated)
                                                                                            is woken up when
 sleep…                                                                    return           hardware complete the
                             async compress
                                                                                            compression job.
                                                                                           Hardware can be fully
                                                                 Callback
                         Linux Kernel Crypto API                                            utilized when multiple
                          DMA               DMA                                             BTRFS compression
                          input             output
                                                               interrupt                    threads run in-parallel.
                  Intel® QuickAssist Technology                                                                        10

                                  2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression in Ceph OSD
     Compute                              Ceph is a distributed object store and file
      Compute
      Node                                 system designed to provide excellent
       Node
                                           performance, reliability and scalability, but
           Network                         it doesn’t support native compression
                                           currently.
                                          Ceph OSD with BTRFS can support build-
  OSD            OSD                       in compression:
                                              Transparent, real-time compression in

         BTRFS
                                               the filesystem level.
                                              Reduce the amount of data written to
                                               local disk, and reduce disk I/O.
  Disk           Disk                         Hardware accelerator can be plugged
                                               in to free up OSDs’ CPU.
     Intel® DH8955
                                              No benefit to the network bandwidth.
                                                                                             11

                     2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark - Hardware Setup
                 FIO                                                                            Ceph Cluster
                 Linux                                                                                Linux

  CPU 0                  CPU 1                                             CPU 0              CPU 1                           DDR4
                                                                                                                               128GB
  Xeon(R) CPU            Xeon(R) CPU                                       Xeon(R) CPU        Xeon(R) CPU         HBA
    E5-2699 v3             E5-2699 v3                                        E5-2699 v3         E5-2699 v3      LSI00300
   (Haswell) @            (Haswell) @                                       (Haswell) @        (Haswell) @
     2.30GHz                2.30GHz                                           2.30GHz            2.30GHz
                                                                                                                   PCIe
                                                    40Gb NIC
                                                                                                     PCIe
                                                                    PCIe           PCIe

                 DDR4                                                                                                         JBOD
                  64GB

                                                                               NVMe            NVMe

                                                                                                                           SSD x 12

                                                                     Intel®                     Intel®
                                                                    DH8955                     DH8955
                                         Client                   plug-in card               plug-in card                         Server
                                                                                                                                           12

                                        2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark - Ceph Configuration
                                  MON                     Deploy Ceph OSD on top
                                                           of BTRFS as backend
 OSD-1     OSD-3              OSD-23
 OSD-2     OSD-4              OSD-24                       filesystem.
                                                          Deploy 2 OSDs on 1 SSD
           BTRFS
                                                            24x OSDs in total.
                      …                                   2x NVMe for journal.
                                                          Data written to Ceph OSD
  SSD-1    SSD-2              SSD-12                       is compressed by Intel®
                                                           QuickAssist Technology
 NVMe-1                                                    (Intel® DH8955 plug-in
 Journal
            Intel®               Intel®                    card).
 NVMe-2    DH8955               DH8955
 Journal                                                                                   13

                   2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Test Methodology

   FIO thread0
                                                             Start 64 FIO threads in
     FIO
       FIOthread
            thread                                            client, each write / read
    CephFS           RBD
                                  Client                      2GB file to / from Ceph
                                                              cluster through network.
        LIBRADOS
                                                             Drop caches before tests.
                                                              For write tests, all files are
                                                              synchronized to OSDs’ disk
          RADOS                                               before tests complete.
          OSD                                                The average CPU load,
           OSD                   CEPH
            OSD                                               disk utilization in Ceph
                                                              OSDs and FIO throughput
  MON
                                                              are measured.
                                                                                               14

                      2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark Configuration Details
 Client
 CPU              2 x Intel® Xeon CPU E5-2699 v3 (Haswell) @ 2.30GHz (36-core 72-threads)

 Memory           64GB
 Network          40GbE, jumbo frame: MTU=8000
 Test Tool        FIO 2.1.2, engine=libaio, bs=64KB, 64 threads

 Ceph Cluster

 CPU              2 x Intel (R) Xeon CPU E5-2699 v3 (Haswell) @ 2.30GHz (36-core 72-threads)

 Memory           128GB

 Network          40GbE, jumbo frame: MTU=8000
 HBA              HBA LSI00300
 OS               Fedora 22 (Kernel 4.1.3)
 OSD              24 x OSD, 2 on one SSD (S3700), no-replica
                  2 x NVMe (P3700) for journal
                  2400 pgs
 Accelerator      Intel® QuickAssist Technology, 2 x Intel® QuickAssist Adapters 8955
                  Dynamic compression Level-1
 BTRFS ZLIB S/W   ZLIB Level-3
                                                                                               15

                      2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Sequential Write
                            120%                                                                                                                             3500

                                           2910                       2960
                            100%                                                                          3003                                               3000

                                                                                                                                                             2500
                              80%
                                                          60% disk saving, with                                                                              2000      BW
CPU Util (%)                  60%                         minimal CPU overhead                                                                                        (MB/s)
 cRatio (%)                                                                                                                                                  1500
                              40%                                                                                                   1157
                                                                                                                                                             1000

                              20%                                                                                                                            500

                              0%                                                                                                                             0
                                             off                       accel *                      lzo                          zlib-3
               Cpu Util (%)                 13.62%                     15.25%                     28.30%                        90.95%
               cRatio (%)                   100%                        40%                        50%                           36%
               Bandwidth(MB/s)               2910                       2960                       3003                          1157

                                                                                                                                   Source as of August 2016: Intel internal
             120                                                                                                                   measurements with dual E5-2699 v3 (18C, 2.3GHz,
                                                                                                                                   145W), HT & Turbo Enabled, Fedora 22 64 bit,
             100                                                                                                                   kernel 4.1.3, 2 x DH8955 adaptor, DDR4-128GB
                                                                                                                                   Software and workloads used in performance tests
cpu util %

              80                                                                                                       off
                                                                                                                                   may have been optimized for performance only on
                                                                                                                                   Intel microprocessors. Any change to any of those
              60                                                                                                                   factors may cause the results to vary. You should
                                                                                                                       accel *     consult other information and performance tests to
              40                                                                                                                   assist you in fully evaluating your contemplated
                                                                                                                                   purchases, including the performance of that product
              20                                                                                                       lzo         when combined with other products. Any difference
                                                                                                                                   in system hardware or software design or
               0                                                                                                       zlib-3      configuration may affect actual performance. Results
                                                                                                                                   have been estimated based on internal Intel analysis
                   1   5      9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101105                           and are provided for informational purposes only.
                                                                                                                                   Any difference in system hardware or software
                                                           Time (seconds)                                                          design or configuration may affect actual
* Intel® QuickAssist Technology DH8955 level-1                                                                                     performance. For more information go to            16
                                                                                                                                   http://www.intel.com/performance
** Dataset is random data generated by FIO

                                                     2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Sequential Read
                              30%                                                                                                                                  3500

                                                                                                     3042
                              25%                                        2915                                                                                      3000
                                                                                                                                             2913
                              20%
                                                                   Minimal CPU overhead                                                                            2500
                                               2557
CPU Util (%) 15%                                                   for decompression                                                                                             BW
                                                                                                                                                                   2000
                                                                                                                                                                                (MB/s)
                              10%

                                                                                                                                                                   1500
                              5%

                              0%                                                                                                                                   1000
                                                off                        accel *                      lzo                             zlib-3
               Cpu Util (%)                    7.33%                        8.76%                     11.81%                            26.20%
               Bandwidth(MB/s)                 2557                         2915                       3042                             2913

                                                                                                                                             Source as of August 2016: Intel internal
              40                                                                                                                             measurements with dual E5-2699 v3 (18C, 2.3GHz,
                                                                                                                                             145W), HT & Turbo Enabled, Fedora 22 64 bit,
                                                                                                                                             kernel 4.1.3, 2 x DH8955 adaptor, DDR4-128GB
              30                                                                                                                             Software and workloads used in performance tests
 cpu util %

                                                                                                                                             may have been optimized for performance only on
                                                                                                                                  off        Intel microprocessors. Any change to any of those
              20                                                                                                                             factors may cause the results to vary. You should
                                                                                                                                  accel *    consult other information and performance tests to
                                                                                                                                             assist you in fully evaluating your contemplated
              10                                                                                                                  lzo
                                                                                                                                             purchases, including the performance of that product
                                                                                                                                             when combined with other products. Any difference
                                                                                                                                             in system hardware or software design or
               0                                                                                                                  zlib-3     configuration may affect actual performance. Results
                                                                                                                                             have been estimated based on internal Intel analysis
                   1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940                                          and are provided for informational purposes only.
                                                                                                                                             Any difference in system hardware or software
                                                                 Time (seconds)                                                              design or configuration may affect actual
                                                                                                                                             performance. For more information go to
                                                                                                                                             http://www.intel.com/performance
                                                                                                                                                                                               17
              * Intel® QuickAssist Technology DH8955 level-1

                                                          2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Key Takeaways
  Data compression can save disk IO and disk
   utilization.
  Data compression is CPU intensive, getting
   better compression ratio requires more CPU
   cost.
  Hardware offloading method can greatly reduce
   CPU cost, optimize disk utilization & IO in the
   Storage infrastructure.
  Filesystem level compression in OSD is
   transparent to the Ceph software stack.
                                                                                      18

              2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
References
   DEFLATE Compressed Data Format Specification version 1.3
    http://tools.ietf.org/html/rfc1951
   BTRFS: https://btrfs.wiki.kernel.org
   Ceph: http://ceph.com/
   For more information on Intel® QuickAssist Technology & Intel® QuickAssist Software
    Solutions can be found here:
       Software Package and engine are available at 01.org: Intel QuickAssist Technology |
         01.org
       For more details on Intel® QuickAssist Technology visit:
         http://www.intel.com/quickassist
       Intel Network Builders: https://networkbuilders.intel.com/ecosystem
   Intel®QuickAssist Technology Storage Testimonials
       IBM v7000Z w/QuickAssist:http://www-
         03.ibm.com/systems/storage/disk/storwize_v7000/overview.html
        https://builders.intel.com/docs/networkbuilders/Accelerating-data-economics-IBM-flashSystem-and-Intel-quick-assist-
         technology.pdf
   Intel’s QuickAssist Adapter for Servers: http://ark.intel.com/products/79483/Intel-
    QuickAssist-Adapter-8950
                                                                                                                          19

                               2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Legal Disclaimer
   INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS
    GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
    IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT
    OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

    Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves
    these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this
    information.

    The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

    Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

    Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm%20 Performance tests
    and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or
    configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and
    on the performance of Intel products, visit Intel Performance Benchmark Limitations

    All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

    Intel, Intel logo, Intel Core, Intel Inside, Intel Inside logo, Intel Ethernet, Intel QuickAssist Technology, Intel Flow Director, Intel Solid State Drives, Intel Intelligent Storage Acceleration Library, Itanium,, Xeon, and Xeon Inside are
    trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

    64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel® 64 architecture. Performance will vary depending on your
    hardware and software configurations. Consult with your system vendor for more information.

    No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology is a security technology under development by Intel and requires for operation a computer system with Intel® Virtualization
    Technology, an Intel Trusted Execution Technology-enabled processor, chipset, BIOS, Authenticated Code Modules, and an Intel or other compatible measured virtual machine monitor. In addition, Intel Trusted Execution Technology
    requires the system to contain a TPMv1.2 as defined by the Trusted Computing Group and specific software for some uses. See http://www.intel.com/technology/security/ for more information.

    Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain platform software enabled for it. Functionality, performance or other
    benefits will vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.

    * Other names and brands may be claimed as the property of others.

    Other vendors are listed by Intel as a convenience to Intel's general customer base, but Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of these
    devices. This list and/or these devices may be subject to change without notice.
    Copyright © 2016, Intel Corporation. All rights reserved.

                                                                                                                                                                                                                                                       20

                                                                  2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
You can also read
Next slide ... Cancel