OBDP 2021 PRESENTATION DR PABLO GHIGLINO - A Low Power And High Performance Artificial Intelligence Approach To Increase Guidance Navigation And ...

Page created by Crystal Torres

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

OBDP 2021 PRESENTATION DR PABLO GHIGLINO - A Low Power And High Performance Artificial Intelligence Approach To Increase Guidance Navigation And ...

OBDP 2021 PRESENTATION
DR PABLO GHIGLINO
A Low Power And High Performance Arti cial Intelligence
Approach To Increase Guidance Navigation And Control
Robustness

                                          pablo.ghiglino@klepsydra.com
                                                    www.klepsydra.com
                           fi

KLEPSYDRA AI OVERVIEW

            Trained Model

                                          Advanced features

Language                    Performance
 bindings                     analysis

                                          Basic features

            Klepsydra AI

Images       Timeseries     Sensor Data

KLEPSYDRA AI DISTRIBUTION

     1. Developer station distribution:
             •   Libraries + header les
             •   Development tools (performance analysis, etc.)
             •   Unlimited seats.

     2. Target computer distribution:
             •   Libraries + header les
             •   License le per seat.
fi
        fi
        fi

SUPPORT MATRIX

Architecture    Supported Processors

ARM             Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz
                Samsung Exynos5 Octa ARM Cortex™-A15 Quad 2Ghz and Cortex™-A7
                Quad 1.3GHz CPUs

                64-bit Arm Neoverse core
                6-core Carmel ARM v8.2 64-bit CPU, 8MB L2 + 4MB L3

x86             Intel Xeon E5-2686 v4
                2.4GHz 8-core Intel Core i9
                2.9GHz quad-core Intel Core i7

OS Type        Supported OS

Linux          Ubuntu 18.04, 20.04, Docker, Embedded Linux, FreeRTOS, PykeOS

Core API

class DeepNeuralNetwork {
public:

     /**
      * @brief setCallback
      * @param callback. Callback function for the prediction result.
      */
     virtual void setCallback(std::function callback) = 0;

     /**
      * @brief predict. Load input matrix as input to network.
      * @param inputVector. An F32AlignedVector of floats containing network input.
      *
      * @return Unique id correspoding to the input vector
      */
     virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0;

     /**
      * @brief predict. Copy-less version of predict.
      * @param inputVector. An F32AlignedVector of floats containing network input.
      *
      * @return Unique id correspoding to the input vector
      */
     virtual unsigned long predict(const std::shared_ptr & inputVector) = 0;

};

                                                                                                                         5

ONNX API
class KPSR_API OnnxDNNImporter
{
public:

     /**
      * @brief import an onnx file and uses a default eventloop factory for all processor cores
      * @param onnxFileName
      * @param testDNN
      * @return a share pointer to a DeepNeuralNetwork object
      *
      * When log level is debug, dumps the YAML configuration of the default factory.
      * It makes use of all processor cores.
      */
     static std::shared_ptr createDNNFactory(const std::string & onnxFileName,
                                                                                 bool testDNN = false);

     /**
      * @brief importForTest an onnx file and uses a default synchronous factory
      * @param onnxFileName
      * @param envFileName. Klepsydra AI configuration environment file.
      * @return a share pointer to a DeepNeuralNetwork object
      *
      * This method is intented to be used for testing purposes only.
      *
      */
     static std::shared_ptr createDNNFactory(const std::string & onnxFileName,
                                                                                 const std::string & envFileName);
};
                                                                                                                     6

APPROACHES TO CONCURRENT
                      ALGORITHMIC EXECUTION

Parallelisation                         Pipeline

BENCHMARK DESCRIPTION

Description
•   Given an input matrix, a number of sequential multiplications will be
    performed:
       •   Step 1: A => B = A x A => Step 2 : C = B x B…
       •   Matrix A randomly generated on each new sequence
Parameters:
•   Matrix dimensions: 100x100
•   Data type: Float, integer
•   Number of multiplications per matrix: [10, 60]
•   Processing frequency: [2Hz - 100Hz]
Technical Spec
•   Computer: Odroid XU4
•   OS: Ubuntu 18.04

FLOAT PERFORMANCE RESULTS

                 CPU Usage. 30 Steps                                     Throughput. 30 Steps                                         Latency. 30 Steps
80.0                                                  20.00                                                  180.00

60.0                                                  15.00                                                  135.00

40.0                                                  10.00                                                   90.00

20.0                                                   5.00                                                   45.00

 0.0                                                   0.00                                                    0.00
       2.00    6.50       11.00       15.50   20.00           2.00    6.50       11.00       15.50   20.00            2.00     6.50         11.00     15.50   20.00
                   Publishing Rate (Hz)                                   Publishing Rate (Hz)                                     Publishing Rate (Hz)
              OpenMp             Klepsydra                           OpenMp             Klepsydra                            OpenMp              Klepsydra

                 CPU Usage. 40 Steps                                     Throughput. 40 Steps                                         Latency. 40 Steps
70.0                                                  14.00                                                  240.00

52.5                                                  10.50                                                  180.00

35.0                                                   7.00                                                  120.00

17.5                                                   3.50                                                   60.00

 0.0                                                   0.00                                                    0.00
       2.00    5.00        8.00       11.00   14.00           2.00    5.00        8.00       11.00   14.00            2.00     5.00         8.00      11.00   14.00
                   Publishing Rate (Hz)                                   Publishing Rate (Hz)                                     Publishing Rate (Hz)
              OpenMp             Klepsydra                           OpenMp             Klepsydra                            OpenMp              Klepsydra

KLEPSYDRA AI DATA PROCESSING
                                                                APPROACH

                                             Klepsydra AI threading model

Input                                                                                                  Output
                                     Layer                                   Layer                      Data
Data

    {
    {
                         Thread 1                                 Thread 2           Most layers can
                                                                                     be parallelised
                                                                                     and are
        Eventloops are      Threading model consists of:                             vectorised.
        assigned to
        cores               -   Number of cores assigned to event loops

                            -   Number of event loops per core

                            -   Number of parallelisation threads for each layer

Performance tuning
                                                           Performance parameters
Performance Criteria
                                                           • number_of_cores
•    CPU usage
•    RAM usage                                             Number of cores where event loops will be distributed (by
                                                           default one event loop per core). High throughput requires
•    Throughput (output data rate)
                                                           more cores, i.e., more CPU usage, low throughput requires
•    Latency                                               low number of cores, therefore substantial reduction in
                                                           CPU usage.

Performance parameters:                                    Performance parameters

• pool_size                                                • number_of_parallel_threads
Size of the internal queues of the event loop publish/     Number of threads assigned to parallelise layers. For low
subscribe pairs.                                           latency requirements, assign large numbers (maximum =
High throughput requires large numbers, i.e., more RAM     number of cores), i.e., increase CPU usage. For no latency
usage, low throughout requires smaller number, therefore   requirements, use low numbers (minimum = 1), therefore
less RAM.                                                  substantial reduction in CPU usage.
                                                                                                                        11

Example of performance benchmarks

        Latency: 56ms

                        Latency: 35ms

           TensorFlow   Klepsydra AI

                                        12

ROADMAP

Q2 2021                                Q1 2022
•   No third party dependencies.       •   NVIDIA Jetson TX2 Support (alpha
       •   Binaries are C/C++ only         release)
       •   Custom format for models    •   Quantisation support
Q3 2021
•   FreeRTOS support (alpha version)   Q2 2022
       •   Xilinx Ultrascale+ board    •   Graphs support
       •   Microchip SAM V71           •   Memory allocation new model
Q4 2021                                •   C support
•   PykeOS support (alpha version)
       •   Xilinx Zedboard             Legend:
                                       Hard deadlines
                                       Flexible dates

CONTACT INFORMATION

Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

You can also read

ANNUAL PERFORMANCE PLAN - FOR 2018-2019 NOVEMBER 2017 - AGRISETA

BCS Protocol (Basic Control Structure) - Instruction Manual

CORDIC v6.0 LogiCORE IP Product Guide - Vivado Design Suite - Xilinx

CBP and Trade Automated Interface Requirements - ACE Cargo Release April 2021 Version 29 - US ...

Amended Technical Indicator Descriptions - Updated For 2017/18

CNCS Performance Measures Instructions - AmeriCorps State and National 2020

ANNUAL PERFORMANCE PLAN - 2018/19 Date of Tabling: 15 March 2018 - Natal Sharks Board

2019 SEED GUIDE - HEFTY BRAND SEED MAXIMIZE CROP POTENTIAL - Hefty Seed

TECHNICAL INDICATOR DESCRIPTION - Independent Police Investigative Directorate - Independent Police ...

AXI BFM Cores v5.0 LogiCORE IP Product Guide - Vivado Design Suite

Real world performance - Intel Newsroom

DRAFT BEREC GUIDELINES ON VERY HIGH CAPACITY NETWORKS - 5 MARCH 2020 - BOR (20) 47

Package 'MTS' June 4, 2021 - CRAN

BELL HELICOPTER 427 / 429 / AH-1 / 222 / 230 / 430 - 2016 Product Catalog

Disability Services Key Performance Indicator Metadata 2019 - Health Service Executive - HSE

2019 ACTUARIAL REPORT - EMPLOYMENT INSURANCE PREMIUM RATE on the - OSFI-BSIF

The impact of COVID-19 on Brighton and Hove: a statistical evaluation

Rapid Planner QUESTIONNAIRE - FSM - FSM Toolbox

Privilege Management Reporting 21.1 Dashboard Guide

MAFi Product Catalogue 2018

REN User Guide Copyright 2021 Railinc. All Rights Reserved - 7001 Weston Parkway, Suite 200

Actuarial Report Employment Insurance Premium Rate 2022

AUSTRALIAN NATIONAL CURRICULUM & MATHLETICS - www.mathletics.com.au

DIGITAL DATA SUBMISSION STANDARDS

PNUTS: Yahoo!'s Hosted Data Serving Platform - Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon ...

Windows QTL Cartographer 2.5 User Manual - 2010 N.C. State University, Bioinformatics Research Center

Logix 5000 Controllers Data Access - Programming Manual

SENIORS, UNDER 23 & JUNIORS - the WORLD CHAMPIONSHIPS

Fieldview House, Folkton Guide Price £270,000 - Lisa Crowe ...

Open-Year Data Quality Test Specifications - Discharge Abstract Database DAD - CIHI

Applying Online to Ontario's Universities - 101 Online Application for Ontario High School Students - yrdsb

An Anatomy of Online Trade: Evidence from eBay Exporters

Ko te kai a te rangatira he kōrero: Braiding Western Literacy Practices with Māori Cultural Imperatives - A Better Start

COVER FOR PREGNANCY, CHILDBIRTH AND EARLY CHILDHOOD - DISCOVERY HEALTH MEDICAL SCHEME

Enrolment Register Instructions for Continuing Education Programs 2018-19 School Year

How to Complete a FedEx International Air Waybill

Joint Commission Guide for Data Entry of Chart-Abstracted Measures Version 2021

STRATEGIC PLAN 2015 2020 - TRADITIONAL AFFAIRS - COGTA

CORDLESS CATALOGUE Blue Professional Power Tools - Guarantee* - Bosch Power Tools

UC San Diego UC San Diego Previously Published Works - eScholarship

Alcohol-Related Harm Profile

Teacher Lesson Plans Common Core State Standards Daily Lessons for Classroom Instruction

Core Data Instructions, Definitions and Descriptions - NAIS DASL

Working document - Target indicator fiches for Pillar II (Priorities 1 to 6) APRIL 2015 - European Commission

Our performance data ERM Sustainability Report 2021

UK Home Broadband Performance - The performance of fixed-line broadband delivered to UK residential consumers - Ofcom

DUC/DDC Compiler v3.0 - LogiCORE IP Product Guide Vivado Design Suite - Xilinx

Bristol Water's Assurance Plan 2019/2020 - Information for our customers and stakeholders on our approach to assurance and on ensuring the data ...

STRATEGIC BUSINESS PLAN AND ANNUAL PERFORMANCE PLAN 2016/2017 2020/2021 - AWS

Viterbi Decoder v9.1 LogiCORE IP Product Guide - Vivado Design Suite - Xilinx