Accelerating Microsoft's AI Ambitions

Page created by Brent Sherman

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Accelerating Microsoft’s AI Ambitions

Text Analytics
                     Personalizer
                                                                                                          Translator Text               Bing Spell Check
     Decision
                  Content Moderator
                                                    Ink Recognizer
                                                                       Computer
                                                                        Vision                                      Language                     Language
                                                                                       Face                                                    Understanding
                                                                                                           Content
                                                                 Vision
Anomaly Detector
                                                                                                          Moderator            QnA Maker
                                                                                    Custom
                                                        Video
                                                                                     Vision
                                                       Indexer
                                                                     Form Recognizer

      Conversation
 transcription capability           Custom Speech                                              Bing Custom                  Bing Entity Search
                                                                                                                   Bing
                                                                                                  Search       Video Search
                                                                                  Bing News                                                Bing

                    Speech             Speech transcription                         Search
                                                                                              Bing Web
                                                                                                         Web search                   Local Business
                                                                                                                                          Search
 Text-to-Speech                                                                                Search
                                                                                                         Bing Image Search         Bing Autosuggest
                            Neural Text-to-Speech
                                                                                                                      Bing Visual Search

Classic                  Deep
           ML                     CNNs

Figure sources:
1. Han et al., Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification
2. Vaswani et al., “Attention is all you need”
                                                                                                                                                             4
3. https://tkipf.github.io/graph-convolutional-networks/

100000
                         10000                                        Megatron
                                                                                                                                                  Megatron
Millions of parameters

                                    ~325x ResNet50                     GPT-2
                                                                                                    10000   ~2200x ResNet50
                         1000                                                                                                                      GPT-2

                                                                                 Billions of ops
                                                       GNMT
                                                                  BERT-L                             1000
                          100                                                                                                                 BERT-L
                                        AlexNet
                                                                                                      100
                                                   ResNet-50
                           10
                                                                                                       10                      ResNet-50

                            1                                                                           1          AlexNet

                             2010    2012   2014   2016        2018   2020                               2010   2012    2014   2016        2018    2020

                                                                                                                                                             5

Registers
Contro
 l Unit CPUs           GPUs
  (CU)
           Arithmeti                      FPGAs                                 ASICs
            c Logic                                           NPUs
             Unit
            (ALU)

Cloud DNN training and batched inferencing on NVIDIA GPUs (CUDA, PyTorch, TensorFlow)

Cloud and heavy edge inferencing performed on Intel CPUs (ONNX) and MS-NPUs (FPGA)

Light edge inferencing on commodity and custom silicon (e.g., Hololens, etc.)

                                                                                        6

Inside Bing’s AI Inference Supercomputer:
Project Brainwave

2011: Project Catapult Launched
Field Programmable
     Gate Arrays     2013: Bing pilot runs decision trees 40X faster
                     2015: Bing ranking throughput increased 2X
                     2016: Azure Accelerated Networking delivers industry-leading cloud
                     performance
                     2017: Over 1M servers deployed with FPGAs at hyperscale
                     2017: Hardware Microservices harness FPGAs for distributed
                     computing
                     2017: FPGAs enable real-time AI, ultra-low latency inferencing without
                     batching; Bing launches first FPGA-accelerated Deep Neural Network
                     2018: Project Brainwave launched in Azure Machine Learning

T2

                               T1                        T1

                        TOR                                       TOR
                              50G            9x50G

                                             F       F    F        F             F         F
         F         F                F
                                             F   F   F    F   F    F             F     F   F

                                             F   F   F    F   F    F             F     F   F
         C         C                C

        50G                                               50G

                                                                         CP      50G
                                                                  FPGA   U       NIC           FPGA
         FPGA

   PCIe Gen3
      x16                                                         FPGA        FPGA             FPGA
                       50G
                       NIC

     Dual Socket                                                  FPGA        FPGA             FPGA
        CPUs

Bing Compute Server
                                                          Bing FPGA Appliances
                                                                                                      9

22μs latency
                                                                                                    T2
                                                                     8μs latency     T1                                 T1

                                                                3                          TOR
                                                                                                                                TOR

                                                                 3μs latency   TOR

         CPU         QPI          CPU
                                                                                                                  2
                                                                                                                             Hardware acceleration plane
                                                                                                            NLP (RNN)
                                                                                                             Models                   Web Ranking

                                                                                          Image Detection
                                                                                              (CNN)                     Text to Speech

                              FPGA
                                                                                            Bing Serving
                                                                                               Stack

        QSFP               QSFP      QSFP            1
                50Gb/s                      50Gb/s       ToR
                                                                    Traditional software (CPU) server plane

1 FPGAs are network connected. Used and                   2    Interconnected FPGAs form a separate plane of computation built
   managed independently from the CPU.                         on Hardware as a Service (HaaS).
                                                          3    Direct FPGA to FPGA communication using Lightweight 10
                                                                                                                   Transport
                                                               Layer (LTL) at ultra low latencies.

Brainwave v1      Brainwave v2       Brainwave v3    Brainwave v4

Low latency LSTM   Narrow Precision    Convolution    Generalized ISA,
   inference        Breakthrough      Optimizations    Transformers

     2016               2017               2018            2019

                                                                         11

msfp8
        int8

               float16             msfp8   int8   float16   float32

                         float32

    Multiplier Area & Energy

                                                                      13

Sub-millisecond FPGA compute
     latencies at batch 1

https://www.microsoft.com/en-us/research/uploads/prod/2018/03/mi0218_Chung-
2018Mar25.pdf

https://blogs.bing.com/search/2017-12/search-2017-12-december-ai-update

                                                                 16

Hardware for Future AI

Must solve real customer problems – solutions including non-AI pieces, not just AI components

Must be differentiated E2E including system overheads

Want durable and “horizontally-capable” architectures with long shelf lives (3-5 years)

Compatible and friendly to deploy in diverse environments (SKUs, datacenters, etc)

Must be easy to develop software/models for and integrate seamlessly with AI tools ecosystem

Improved cost of ownership at system-scale vs general-purpose commodity hardware

                                                                                                18

1. H.T. Kung, “Why Systolic Arrays?”, 1982                     19
2. https://datascience.stackexchange.com/questions/49522/what-is-gelu-activation

Closing thoughts and predictions

Q/A & Discussion

erchung@microsoft.com

You can also read

STUDY ABROAD AND EXCHANGE STUDENTS 2019 - 2020 Course Search Guide - BE REMARKABLE - Griffith University

(TOMS) Test Operations Management System - Wyoming User's Guide Building Coordinator Role

Practical support options for customers Coronavirus (Covid-19) - how we can help - Housing Partners

Keeping You and Your Business Safe Online

Analysis and Evaluation of Skype and Google Talk

Promoting Your Consulting Services On Craigslist - PATCA March 9, 2015 Meeting - Carl Angotti

RADIATOR AND CORE SUPPORT ASSEMBLIES 2019 - AutoRad Radiators

Welcome to Wantagh Middle School's Achieve@Home Literacy Support Parent Presentation - Wantagh School

Increasing VR Headset adoption via Hand Interaction - ManoMotion - Tech Report - A technology guideline aimed to remove interaction obstacles ...

Women in power: Female city leaders and regional economic development in China - Cheng Cheng Bo Huang

DATASHEET - 39 GHz Repeater with Holographic Beam Forming Technology Ultra-Low Latency - Pivotal Commware

Western Canadian Kata Championships | March 6, 2020

COVID-19 Maritime Industry Update 63 - 29 January 2021 This guideline is for the maritime industry and port supply chain - Maritime NZ

BUSINESS CASE INFORMATION - The PA Hub Conference Pullman Liverpool Sponsored by Pullman Hotels & Resorts

Watchdog_1 Watchdog configuration - AURIX TC2xx Microcontroller Training - Infineon Technologies

How WhatsApp Moved 1.5B Users Across Data Сenters - Igors Istocniks Code BEAM SF 2019 - Code Sync

SARS-COV2, INACTIVATION, AND COMPOSTING - PATRICIA MILLNER APRIL 1, 2020

Advocacy in a Pandemic - February 2021 - New Zealand Nurses Organisation

Our topic this term is Animals and this week we are looking at Dinosaurs!

Race Guide - Dolls Point Classic 2021 - Paddle NSW

Media Kit 2020 - Engaged Media

PROPRIETARY & CONFIDENTIAL

Welcome Students and Parents of the Class of 2024 - Chenango Forks

Edition 23 | 11 June 2021 - Accord Multi Academy Trust

Leeds West CCG Paediatric asthma project. January 2015-January 2017 - Aims - breakout A PowerPoint Presentation

Rocky Brook Park Pollinator Garden - Jason Shumsky - Hightstown Borough

Using social media to understand the impact of weather on skiing and snowboarding in Utah - National Outdoor Recreation Conference | Burlington ...

This webinar is being recorded. Phones are muted on entry. Please type your questions in the chat box at anytime. A copy of the ...

SENIORNET PAPAKURA APRIL 2021

Data Science in Earth Observation for Social Good - Xiaoxiang Zhu - LRZ

Discover Your Future 2021 - Enterprise and Entrepreneurship Virtual Work Inspiration Suffolk | April - July 2021 - Suffolk County Council

Snowshoe Magazine 2020-2021 MEDIA KIT

Rapid Diagnostic Testing for COVID-19 in a fully vaccinated population - June 18, 2021

2021 Poultry Skillathon Study Guide - Morgan County

BULLETIN 1 MOC Championship - XVI Edition MOC -Mediterranean Open Championships - Orienteering.it

RESHAPING UNSECURED LENDING - Danie Du Toit - Pepkor

STEM GAME CHANGERS Shifting Young Minds Today to Technology Leaders of Tomorrow - BrainTech Robotics STEM Learning Centre

If you saw a wheelchair ramp- we don't generally think- "This isn't fair!" "Disabled privilege!" "Why do they get special treatment?" So why can ...

Looking at Linkage Between the Processes of Autophagy and Apoptosis - Students: Zach Davis, Juliana Baumgardner, Jacob Kornilow, Joe Martin ...

Decolonizing the Healthcare & Social Service Field - Presenters: Teresa Springer, Director of Programs, Wellness Services Inc.

Journey to Employment - LabourMarket and Upskilling - Olympus Solutions Ltd

2021 From Reliable Partner to Cult of Thieves.

Journal of Religion & Film - DigitalCommons@UNO

Staying Active at "Home" - Alanna Simms, Reg. Physiotherapist Pulmonary Rehabilitation Program Vernon Jubilee Hospital - BC Lung Association

Journal of Religion & Film - DigitalCommons@UNO

IP STATUS 2010-2019 IP FORECAST 2020-2030 - IP-SOC 2019 - ERIC ESTEVE IPNEST FOUNDER (SEMIWIKI BLOGGER) - DESIGN-REUSE ...

Professionalizing Customer Training - TYPO3 Online Days 2021 Show what you can do!

IORA dan Maritime Security Samudera Hindia - Iwan Sulistyo

Expression of Interest for Trustees 2020 - Otago Community Trust

Case Presentation: 50 year old state trooper with elevated liver function tests - Christian Yon MS3 Michael Baldwin, MD - UConn Health