AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil

Page created by Margaret Arnold
 
CONTINUE READING
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
AIBench Training, Subsets
    and Its rankings
                 Fei Tang

     ICT, Chinese Academy of Sciences

      AIBench Tutorial at ISCA 2021
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Other AIBench Contributors
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Executive Summary

    A lack of understanding of learning dynamics raises serious AI
                      benchmarking challenges

AIBench Training methodology, workload characterizations, two subsets
for repeatable performance ranking and workload characterization, and
                              rankings

                https://www.benchcouncil.org/aibench

                            AIBench Tutorial on ISCA 2021
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Learning dynamics are not understood
n High   dimension non-convex optimization problem
  uA slightchange leads to a different optimization path
  uHeavily depend on the experience for parameter tuning

                Picture from http://www.dashangu.com/postimg_13493485.html
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Prohibitive Cost Challenge
n Running     an entire training session is mandatory!

n Takeseveral weeks to run a complete training session on a small-scale
 system
   uSimulators      with slowdowns 10 to 1,000 times exacerbate the challenge

nA  microbenchmark like HPL-AI cannot model the learning dynamics of
 deep learning
[1] HPL-AI Mixed-Precision Benchmark — HPL-AI 0.0.2 documentation. https://icl.bitbucket.io/hpl-ai/
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Conflicting-requirement Challenge

n Earlier-stage   evaluations of a new architecture or system
   uAffordable
   uPortability   (Micro benchmarks)
   uSimplicity

n Later-stage   evaluations or purchasing off-the-shelf systems
   uComprehensiveness/Representativeness
   uReality   and overall system performance (Component or scenario benchmarks)
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Short Shelf-life Challenge

n AI   model evolutions and changes outpace the AI benchmarks
   uIt   takes one year to walk through benchmark design, implementation,
       community adoption, and large-scale testing

n Synthetic benchmarks like ParaDNN [1] can traverse many networks,
 but it cannot model learning dynamics

 [1] Wang, Yu Emma, Gu-Yeon Wei, and David Brooks. n.d.
 “A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms,” 14.
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Scalability Challenge
n An AI    task’s problem scale is often fixed

n HPL-AI  [1] is scalable, but it cannot model the learning dynamics
 without considering the model quality.

                          Picture from HPC AI500 Ranking, Image Classification

[1] HPL-AI Mixed-Precision Benchmark — HPL-AI 0.0.2 documentation. https://icl.bitbucket.io/hpl-ai/
[2] HPC-AI500 Ranking. https://www.benchcouncil.org/ranking.html
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
n The

                           Etc.
                           Dropout
                           Data shuffle
                           Data augment
                           Model initialization
                         Factors of randomness:
                                                                                                                                                                    networks is stochastic

N
    eu
       ra
          l   Ar
                ch
              it e
                   ct
                                                          10.00%
                                                                   15.00%
                                                                            20.00%
                                                                                     25.00%
                                                                                              30.00%
                                                                                                       35.00%
                                                                                                                40.00%

                                          0.00%
                                                  5.00%

          Fa u re
               ce          S
                    E ear
          O mb ch
             bj
                 ec ed d
                    t D in
               I         et g
     Im ma ect i
         ag ge- o n
            e
                Cl t o-T
         Re as si ext
              c           fi
     T e om cat i
         xt          m              on
             Su en d
                  m           at
                                   i
                     m
                                                                                                                                                                                                                                   Repeatability Challenge

                        ar o n
    3D T                   i z at
         Fa ex t- i on
             ce           to
      Sp Re -T e
         ee            co xt
             ch            g
                   Re n it io
3D        V            c              n
   O         id o gn
     bj          eo             it i
        ec            P             o
           t R red n
                e c          ic  t
      Sp            o
          ati ns t io n
                                                                                                                                                                        benchmark mandates being repeatable, while training deep

              al          ru
                  T           c
         L e ran ti on
             ar          s f
      Im n in orm
          ag          g-t o er
             e
                 Co -Ra
                                    n
                                                                                                                         Run-to-run Variation of AIBench Training

                      m
                         pr k
                             es
                    A            s
                      dv io n
                          ert
                               is i
                                    ng
                               N
                                   LP
AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
Outline
n Challenges
n Related
        work
n AIBench Training
   uMethodology
   uWorkload  characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings
Related Work
Time-to-accuracy as the main metric

Modeling the critical paths of a real-world application scenario

A systematic AI benchmarking project

A synthetic AI benchmark

A micro benchmark that uses mixed-precision LU
decomposition to achieve upper bound FLOPS performance
Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology
   uWorkload  characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings
Methodology
n Perform a detailed survey of the critical domain—Internet Services,
 including search engines, social networks, and e-commerce

n Include   as most representative benchmarks as possible

n Proposerepeatable performance ranking subset and workload
 characterization subset, and keep the subsets to a minimum

n Considerthe full benchmarks, their subsets, and microbenchmarks as
 indispensable
Typical Internet service applications (with 17 industry partners)
   n Representative AI   tasks among search engines, social networks, and e-commerce
AIBench Training Workloads
Image Classification
n Classify   an image into multiple categories
   uDataset:ImageNet2012, one of the world’s largest image database, containing
    more than 14 million im- ages, and the data size is more than 100 GB
   uModel: Resnet50, a milestone model which exerts the ability of AI to classify
    images and exceeds the ability of humans

                                AIBench Tutorial on ISCA 2021
Image Generation
n Learning   the distribution of images to generate new images
   uDataset: LSUN, about 1 million labeled image data, divided into 10 scene
    categories and 20 object categories
   uModel: WGAN, one of the most famous GAN-based models, which uses
    adversarial generation networks to solve image generation problems.

                                AIBench Tutorial on ISCA 2021
Text Translation
n Text   conversion from one language to another
   uDataset: WMT English-German, which has 4.5 million sentence pairs
   uModel: Transformer, is the classical model for text translation and is the basis
    for the subsequent Bert model

                                  AIBench Tutorial on ISCA 2021
Image-to-Text
n Generate   description text for given images
   uCombination  of computer vision and natural language processing
   uDataset: MSCOCO2014, 82783 training samples, 40504 validation samples,
    40775 test samples (20GB+)
   uModel: Neural Image Caption, a combination of CNN and RNN

                               AIBench Tutorial on ISCA 2021
Image-to-Image
n Image-to-Image   — Convert an image from on representation to
 another
  uChange  of seasons, change of object species, etc.
  uDataset: Cityscapes, street view data for more than 50 cities (300MB)
  uModel: CycleGAN, a widely used GAN-based model, which has two
   generators and two discriminators

                               AIBench Tutorial on ISCA 2021
Speech Recognition
n Recognize   voice messages and translate them into text
  uDataset:LibriSpeech,   1000+ hours of voice data, the most representative
   audio dataset
  uModel: DeepSpeech2, a milestone model in speech recognition

                               AIBench Tutorial on ISCA 2021
Face Embedding
n Faceembedding is to verify a face by learning an embedding into the
 Euclidean space and this can be used as face recognition
   uDataset:VGGFace2,36GB training data,1.9GB test data
   uModel: FaceNet, a representative model and based on the GoogleNet style
    Inception model

                               AIBench Tutorial on ISCA 2021
Object Detection
n Objectdetection aims to find objects of certain target classes with
 precise localization in a given image
   uDataset: VOC2007, 9963 images, containing 24640 labeled objects
   uModel: Faster R-CNN, a classical model for object detection task and is the
    cornerstone of many other models such as Mask R-CNN

                                AIBench Tutorial on ISCA 2021
Recommendation
n Personalized   recommendations based on collaborative filtering
   uDataset:MovieLens, a real-world movie ratings dataset from IMDB (the
    world’s most popular and authoritative source for movie)
   uModel: Neural collaborative filtering, a fundamental algorithm for
    recommendation

                               AIBench Tutorial on ISCA 2021
Video Prediction
n Predict   the video frame after by learning the previous video frame
   uDataset: Robot pushing dataset,behavior data of 59000 robots,100GB+
   uModel: Motion-Focused Predictive, this model predicts how to transform the
    last image into the next image

                               AIBench Tutorial on ISCA 2021
Image Compression
n Reduce  redundant information in image data and store and transfer
 data in a more efficient format
  uDataset:ImageNet2012,100GB+,        this dataset is one of the world’s largest
   image database, containing more than 14 million im- ages, and the data size is
   more than 100 GB
  uModel: a RNN based model

                                AIBench Tutorial on ISCA 2021
3D Object Reconstruction
n capturethe shape and appearance of a real object, a core technology
 of a wide variety of fields like computer graphics and virtual reality
   uDataset: ShapeNet, containing about 51,300 different 3D models of 55
    commonly used object categories
   uModel: Convolutional Encoder-decoder Network, a model combining image
    encoder, volume decoder, and perspective transformer

                              AIBench Tutorial on ISCA 2021
Text Summarization
n Generate   summaries for given text
   uDataset:Gigaword,about 10 million text data, over 4 billion words
   uModel: Sequence-to-sequence Model, consisting an off-the-shelf attentional
    encoder-decoder RNN

                                AIBench Tutorial on ISCA 2021
Spatial Transformer
n Spatial   transformation of images such as spatial rotation and stretching
   uDataset: MNIST, containing 60,000 training images and 10,000 test images
   uModel: Spatial Transformer Network, a model includes a localisation network,
    a grid generator, a sampler

                                AIBench Tutorial on ISCA 2021
Neural Architecture Search
n  Automatically designs neural networks
n Dataset: PTB Dataset, containing 2,499 stories from a three-year Wall
  Street Journal collection of 98,732 stories for syntactic annotation
n Model: ENAS, a model finds efficient neural networks by
  reinforcement learning
                                        Search Network Architecture

                                                                  Performance
      Search Space          Search Strategy
                                                               Evaluation Strategy

                                       Evaluate Network Architecture
                               AIBench Tutorial on ISCA 2021
Advertising

n   Advertising is to display the most relevant ads to customers

n Dataset:   Kaggle Display Advertising Challenge Dataset

n Model:   Deep Learning Recommendation Model (DLRM)

                               AIBench Tutorial on ISCA 2021
Nature Language Processing (NLP)

n NLP is to train a language model, which we use for many tasks like
 translation and question answer

n Dataset:   Wikipedia

n Model:   BERT

                            AIBench Tutorial on ISCA 2021
Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology

   n Workload characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings
Representativeness and Comprehensiveness
nDiverse   behaviors for workload characterization
  uAlgorithm      behavior
    p Model   architectures, parameters, optimizers, and loss functions

  uSystem   behavior
    p Evaluation   time cost, variation, convergent rate, and number of hot
      functions

  uMicro-architecture     behavior
    p Computation    pattern, memory access pattern, and I/O pattern
Representativeness and Comprehensiveness
n Coverageof diverse network architectures (CNN, ResNet, LSTM,
 GRU, Attention, etc.)
  uText processing (7)
    p Text-to-Text, Text summarization, Learning to Rank, Recommendation, Neural Architecture Search,
      Advertising and NLP
  uImage processing (8)
    p Image Classification, Image Generation, Image-to-Text, Image-to-Image, Face Embedding, Object
      Detection, Image Compression, Spatial Transformer
  uAudio processing (1)
    p Speech Recognition

  uVideo processing         (1)
    p Video Prediction

  u3D     data processing (2)
      p   3D Face Recognition, 3D Object Reconstruction
AIBench Training vs. MLPerf Training

    The Comparisons of AIBench against MLPerf from the Perspectives of
    Model Complexity, Computational Cost, and Convergent Rate
Micro-architectural Characteristics
n Distinct   computation and memory access behaviors
   u AIBench   has a wider coverage than MLPerf
                                                                1: achieved occupancy
                                                                Warps utilization rate

                                                                2: ipc efficiency
                                                                IPC efficiency

                                                                3: gld efficiency
                                                                Global memory load
                                                                efficiency

                                                                4: gst efficiency
                                                                Global memory store
                                                                efficiency

                                                                5: dram utilization
                                                                DRAM utilization
                                  MLPerf (blue) vs. AIBench (red)
AIBench Training (v1.1) vs. MLPerf Training (v0.7)
   n Concurrent   work

   n AIBenchTraining has more
    wide coverage
      uTasks
      uDataset
      uDiverse   Characteristics
         p Algorithm
         p System
         p Microarchitecture
Runtime Breakdown of the AIBench Benchmarks
Hotspot Functions

n AIBench  Training covers more hotspot functions than MLPerf
 Training, which is more suitable for simulator research
Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology

   n Workload characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings
Repeatable performance ranking subset (RPR subset)

   n Reflecting
              diverse model complexity, computational cost, and
    convergent rate

   n Low   run-to-run variation

   n Widely   accepted evaluation metrics
Workload characterization subset (WC subset)

n Minimum   workloads with the most representative system or micro-
 architectural characteristics
Two Subsets
n RPR subset : Image Classification, Object Detection, and Learning-to-Rank
n WC subset: Spatial Transformer, Image-to-Text, and Speech-to-Text

                            The result of K-means clustering
                            using micro-architecture characteristics
Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology

   n Workload  characterization
   n Subset for repeatable performance ranking and workload characterization

   n Rankings
Performance Ranking
n Weuse the AIBench RPR subset to rank the performance of GPUs and
 TPUs
Insights
    n TPUs   have significant performance advantages in Image Classification,
      but lack generality and do not support many models (like Faster R-
      CNN and Learning to Rank) because they support limited TensorFlow
      operations [1]

[1] Available TensorFlow Ops | Cloud TPU. (n.d.). Google Cloud. From https://cloud.google.com/tpu/docs/tensorflow-ops
Insights
n PyTorch   is poorly optimized for TPUs, because it cannot load data
  directly from Google Cloud Storage onto the TPU as TensorFlow does
n Data loading is a bottleneck for the image classification task

   [1] [Question] Loading from Google Cloud Storage · Issue #1544 · pytorch/xla. (n.d.). GitHub.
Summary

n Five AIbenchmarking challenges: prohibitive cost, conflicting
 requirements, short shelf-life, scalability and repeatability

n AIBench  Training methodology, workload characterizations, two
 subsets for repeatable performance ranking and workload
 characterization, and rankings

                            AIBench Tutorial on ISCA 2021
Thank You!

AIBench Tutorial on ISCA 2021
You can also read