Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...

Page created by Patrick Torres
 
CONTINUE READING
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
Intel® AI Workshop 2021

Intel® Acceleration for
Classical Machine Learning
Laurent Duhem – HPC/AI Solutions Architect (Laurent.duhem@intel.com)
Shailen Sobhee - AI Software Technical Consultant (shailen.sobhee@intel.com)
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
Notices and Disclaimers
▪   Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
    configuration.

▪   No product or component can be absolutely secure.

▪   Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete
    information about performance and benchmark results, visit http://www.intel.com/benchmarks .

▪   Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
    measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
    information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
    complete information visit http://www.intel.com/benchmarks .

▪   Intel® Advanced Vector Extensions (Intel® AVX) provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause
    a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies
    depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.

▪   Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2,
    SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by
    Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
    microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

▪   Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost
    savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

▪   Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are
    accurate.

▪   © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

                                                                                                                                                                                                          2
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
Executive Summary
▪ Intel® Distribution for Python covers major
  usages in HPC and Data Science

▪ Achieve faster Python application performance
  — right out of the box — with minimal or no
  changes to a code

▪ Accelerate NumPy*, SciPy*, and scikit-learn*
  with integrated Intel® Performance Libraries
  such as Intel® oneMKL (Math Kernel Library) and
  Intel® oneDAL (Data Analytics Library)
                                                    ▪ Analysts
▪ Access the latest vectorization and
  multithreading instructions, Numba* and           ▪ Data Scientists
  Cython*, composable parallelism with
  Threading Building Blocks, and more               ▪ Machine Learning Developers

                                                                                    3
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
Intel® Distribution for Python Architecture
                Command Line           Scientific Environments                Developer Environments
               >   python script.py
  Interface
  Language

                                                                              CPython                            Intel® Distribution for Python

                                                                       GIL:

                                                           Numerical                                            Parallelism
 Packages
  Python

                                                                                                       tbb4py      smp             mpi4py

                                                                       daal4py
Technologies

                                                      DPC++
                                                                               oneDAL                             iomp                 impi
   Native

                                                                                                        TBB
    Intel

                                      oneMKL                                                                              Community
                                                                                                                          technology
                                                                                                                                            Intel
                                                                                                                                         technology

                                                                                                                                                      4
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
5

    Accelerated NumPy and SciPy
    • Optimizations include use of oneMKL which has optimized
      BLAS/LAPACK operations, FFT computations
    • Optimizations also include use of Intel® C and Fortran compilers to
      enable better use of vectorization
    • Interface directly works with single and double precision NumPy
      arrays
    • Natively supports multidimensional transforms

                                                                            5
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
Intel® Distribution for Python Architecture
                Command Line           Scientific Environments                Developer Environments
               >   python script.py
  Interface
  Language

                                                                              CPython                            Intel® Distribution for Python

                                                                       GIL:

                                                           Numerical                                            Parallelism
 Packages
  Python

                                                                                                       tbb4py      smp             mpi4py

                                                                       daal4py
Technologies

                                                      DPC++
                                                                               oneDAL                             iomp                 impi
   Native

                                                                                                        TBB
    Intel

                                      oneMKL                                                                              Community
                                                                                                                          technology
                                                                                                                                            Intel
                                                                                                                                         technology

                                                                                                                                                      6
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
oneAPI Data Analytics Library (oneDAL)
Optimized building blocks for all stages of data analytics on Intel Architecture

   GitHub: https://github.com/oneapi-src/oneDAL

                                                                                   7
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
8
    What makes oneDAL faster?

                                8
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
9Intel®    oneAPI Data Analytics Library(oneDAL) Algorithms
Machine Learning
                                                               Ridge
                                                             Regression
                                    Linear                                                            DBSCAN
                                  Regression
                                                              LASSO
    Regression                                                                       Unsupervised    K-Means
                                                                                       learning      Clustering
                                  Decision Tree                        AdaBoost

    Supervised                                                        Brown/Logit                    EM for GMM
     learning                    Random Forest
                                                                          Boosting

                               Gradient Boosting                      Naïve Bayes                    Alternating
                                                                                                        Least
   Classification                                                      Logistic      Collaborative    Squares
                                                                      Regression       filtering

                                                                            kNN                        Apriori
   Algorithms supporting Intel GPU (Gen 9 & Gen12) & dGPU
    Algorithms supporting batch processing                                 SVM
    Algorithms supporting batch and distributed processing

                                                                                                                   9
Intel Acceleration for Classical Machine Learning - Laurent Duhem - HPC/AI Solutions Architect () Shailen Sobhee - AI ...
10
     Intel® oneAPI Data Analytics Library (oneDAL) algorithms
     Data Transformation and Analysis

       Basic statistics               Correlation and                                     Dimensionality
        for datasets                                              Matrix factorizations                               Outlier detection
                                       dependence                                           reduction

            Low order                         Cosine                        SVD
                                                                                                  PCA                       Univariate
            moments                          distance

                                                                            QR
             Quantiles                     Correlation                                    Association rule                 Multivariate
                                            distance                                      mining (Apriori)
                                                                          Cholesky

               Order                        Variance-
             statistics                    Covariance
                                                                           tSVD           Optimization solvers              Math functions
                                             matrix
                                                                                          (SGD, AdaGrad, lBFGS, CD)          (exp, log,…)
      Algorithms supporting batch processing Intel GPU (Gen 9 & Gen12) & dGPU
      Algorithms supporting batch processing

      Algorithms supporting batch, online and/or distributed processing

                                                                                                                                             10
11
     K-Means Using Scikit-learn and daal4py
     ▪ Scikit-learn                       ▪ daal4py
     from sklearn.cluster import KMeans   from daal4py import kmeans_init, kmeans
     import pandas as pd                  import pandas as pd

     data = pd.read_csv("./kmeans.csv")   data = pd.read_csv("./kmeans.csv")        # Load the data

                                          init = kmeans_init(nClusters=20,        # Compute initial
                                            method="plusPlusDense").compute(data) # centroids

     algo = KMeans(n_clusters=20,         algo = kmeans(nClusters=20,
                                                                                    # Configure K-means
       init='k-means++', max_iter=5)        maxIterations=5, assignFlag=True)       # main object
     result = algo.fit(data)              result = algo.compute(data,               # Compute the
                                                                init.centroids)     # clusters and labels

     result.labels_                       result.assignments                        # Print the results
     result.cluster_centers_              result.centroids

                                                                                                            11
scikit-learn
Optimized building blocks for all stages of data analytics on Intel Architecture

   GitHub: https://github.com/oneapi-src/oneDAL

                                                                                   12
The most popular ML package for Python*

                               13

                                          13
Intel Distribution for Python (IDP) Scikit-learn

     Common Scikit-learn        Scikit-learn with Intel CPU opts

                                                                     Same Code,
                                import daal4py as d4p               Same Behavior
                                d4p.patch_sklearn()
▪ from sklearn.svm import SVC   from sklearn.svm import SVC
▪
    X, Y = get_dataset()        X, Y = get_dataset()
                                                                   • Scikit-learn, not scikit-learn-like

                                clf = SVC().fit(X, y)              • Scikit-learn conformance
▪ clf = SVC().fit(X, y)                                              (mathematical equivalence)
                                res = clf.predict(X)                 defined by Scikit-learn Consortium,
▪ res = clf.predict(X)
                                                                     continuously vetted by public CI
                                Available through Intel conda
    Scikit-learn mainline       (conda install daal4py –c intel)

                                             Intel Confidential
                                                                                                           14
Intel optimized Scikit-Learn
                                                        Speedup of Intel® oneDAL powered Scikit-Learn
                                                                over the original Scikit-Learn

       K-means fit 1M x 20, k=1000                                    44.0

  K-means predict, 1M x 20, k=1000

                                                                                                                                                Same Code,
                                              3.6

                    PCA fit, 1M x 50          4.0

                                                                                                                                               Same Behavior
            PCA transform, 1M x 50                           27.2

        Random Forest fit, higgs1m                                  38.3

   Random Forest predict, higgs1m                                            55.4

             Ridge Reg fit 10M x 20                                          53.4

             Linear Reg fit 2M x 100                                                       91.8

                 LASSO fit, 9M x 45                                        50.9

                       SVC fit, ijcnn                        29.0
                                                                                                                                              • Scikit-learn, not scikit-learn-like
                  SVC predict, ijcnn                                                        95.3

                      SVC fit, mnist                                                82.4

                 SVC predict, mnist                                                                                           221.0           • Scikit-learn conformance
             DBSCAN fit, 500K x 50                    17.3
                                                                                                                                                (mathematical equivalence)
            train_test_split, 5M x 20           9.4
                                                                                                                                                defined by Scikit-learn Consortium,
kNN predict, 100K x 20, class=2, k=5
                                                                                                                                                continuously vetted by public CI
                                                                                                           131.4

 kNN predict, 20K x 50, class=2, k=5                                                               113.8

                                        0.0                          50.0              100.0                  150.0   200.0           250.0

 HW: Intel Xeon Platinum 8276L CPU @ 2.20GHz, 2 sockets, 28 cores per socket;
 Details: https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912

                                                                                                                                                                                      15
Available algorithms
▪ Accelerated IDP Scikit-learn algorithms:
•   Linear/Ridge Regression
•   Logistic Regression
•   ElasticNet/LASSO
•   PCA
•   K-means
•   DBSCAN
•   SVC
•   train_test_split(), assume_all_finite()
•   Random Forest Regression/Classification - DAAL 2020.3
•   kNN (kd-tree and brute force) - DAAL 2020.3

                                                            16
Demo

       17
XGBoost
Optimized building blocks for all stages of data analytics on Intel Architecture

   GitHub: https://github.com/oneapi-src/oneDAL

                                                                                   18
Gradient Boosting - overview
• Gradient Boosting:
• Boosting algorithm (Decision Trees - base learners)
• Solve many types of ML problems
  (classification, regression, learning to rank)
• Highly-accurate, widely used by Data Scientists
• Compute intensive workload
• Known implementations: XGBoost*, LightGBM*, CatBoost*, Intel® DAAL, …

         Error          Error                      Error

                                                                          19
DMLC XGBoost* ACCELERATION
                        ▪ Intel® contributed 3 Pull requests
                          into XGBoost* project on
                          GitHub* during the year
                          Goal: performance optimizations
                          of ‘hist’ mode for Intel® CPUs

                                                               20
                                                               20
21
     XGBoost training improvements:
                                                                                                                                      Metric          Library versions                 Airline-OHE,
                                                                                                                                                                                              4.69M
                                                                                                                            Train time, s            XGBoost 0.81                                       4481
                                                                                                                                                     XGBoost 1.2.0                                      243
                                                                                                                                  Accuracy           XGBoost 0.81                             0.841544
                                                                                                                                                     XGBoost 1.2.0                            0.842981
                                                                                                                                                                 Speedup:                           18.4

                                                                                                                     Workload description: Airline dataset was
                                                                                                                     preprocessed with OHE, and then after
                                                                                                                     random permutation first 7M rows were
                                                                                                                     selected and divided to train test parts
                                                                                                                     (70%-30%).
     2 x Intel® Xeon Gold 6230R @ 26 cores, OS: CentOS Linux 8 (Core), 193 GB RAM.

     SW: XGBoost :1.2, 0.81 versions from xgboost PIP chanel. compiler – G++ 7.4, Intel DAAL: 2020.3 version, downloaded from conda. Python env: Python 3.7, Numpy 1.18.5, Pandas 0.25.3, Scikit-lean
     0.23.2.
                                                                                                                                                                                                           21
XGB and LGBM prediction acceleration
daal4py Gradient Boosting Model Convertors
XGBoost:
xgb_model = xgb.train(params, X_train) # Train common XGBoost model as usual
import daal4py as d4p
daal_model = d4p.get_gbt_model_from_xgboost(xgb_model) # XGBoost model to DAAL model
daal_prediction = d4p.gbt_classification_prediction(…).compute(X_test, daal_model) # make fast prediction with DAAL

LGBM:
lgb_model = lgb.train(params, X_train) # Train common LGBM model as usual
import daal4py as d4p
daal_model = d4p.get_gbt_model_from_lightgbm(xgb_model) # LGBM model to DAAL model
daal_prediction = d4p.gbt_classification_prediction(…).compute(X_test, daal_model) # make fast prediction with DAAL

Convert already trained XGB/LGBM model to speedup prediction performance without accuracy loosing
                      Prediction time, s              Prediction, time s                    Accuracy/MSE
           Dataset               LGBM +    Speed up              XGB + Speed up             LGBM +           XGB +
                      LGBM                              XGB                       LGBM                XGB
                                daal4py                         daal4py                     daal4py         daal4py
     Higgs            9.156       0.728      12.6      5.514        0.7   7.9     0.75626   0.75626 0.75828 0.75828
     Mortgage         9.156       0.728      12.6      5.514        0.7   7.9     0.49061   0.49061 0.4879 0.4879
     MSRank           0.857       0.111       7.7      0.934      0.121   7.7     0.57101   0.57101 0.57177 0.57177

                                                            Intel Confidential
                                                                                                                      22
Demo

       23
Intel® Distribution for Python Architecture
               Command Line               Scientific Environments                             Developer Environments
               >   python script.py
  Interface
  Language

                                      Extension             Numba        Release GIL          CPython        Release GIL
                                                                                                                                     Intel® Distribution for Python
                     SDC
                                                           LLVM IR                     GIL:                                C++

                                                                        Numerical                                                   Parallelism
                       Dataframe
                                                                                                        daal4py
 Packages
  Python

                                                                                                                           tbb4py      smp             mpi4py
Technologies

                                                  oneMKL             DPC++         oneDAL                                                                  impi
                                                                                                                                      iomp
   Native

                                                                                                                             TBB
    Intel

                                                                                                                                              Community         Intel
                                                                                                                                              technology     technology

                                                                                                                                                                          24
Intel® Scalable                                                           Just import Numba

DataFrame Compiler                                                         and use decorator

▪ Extension for Numba* to accelerate AI workflows
▪ Supports more data types (Series, Dataframes,
  ASCII/Unicode strings)
▪ Compiler, not a library
▪ Scales from laptops to multi-core servers
▪ Open-source project
  Github page https://github.com/IntelPython/sdc
  Documentation https://intelpython.github.io/sdc-doc/latest/index.html

▪ Available as conda package and pip wheels

                                                                                               25
Intel® SDC
                       SPEEDUP SDC VS. Pandas
16
                                                                       14.5491
14

12                                                        10.9496

10

 8

 6

 4                                    3.3001

             1.6991
 2

 0
            1 thread                4 threads            20 threads   40 threads

                                               run_etl

Intel® Xeon™ Gold 6248 CPU @ 2.50GHz, 2x20 cores
Numba* 0.51.2, Pandas* 1.0.5, SDC 0.37.0

                                                                                   26
Demo

       27
Modin
▪ Usable and Scalable
        memory
         Pandas
        DataFrame

 CPU   CPU   CPU       CPU

          Idle cores

                                 memory
                                                        To use Modin, replace the pandas import
                                    Modin
                                  DataFrame

                         CPU   CPU     CPU        CPU

                               Full utilization

                                                                                                  28
Modin
                                          Execution time Pandas vs. Modin[ray]
                                    400

                                    350                    340.0729

                                    300
                                                                                           10.8
                                                                                         speedup
                                    250

                          Time, s
                                    200

                                    150

                                    100

                                     50                                        31.2453

                                      0

                                                             Pandas    Modin

                                    Intel® Xeon™ Gold 6248 CPU @ 2.50GHz, 2x20 cores

▪   Dataset size: 2.4GB

                                                                                                   29
End-to-End Data
 Pipeline Acceleration
▪ Workload: Train a model using 50yrs of Census dataset
  from IPUMS.org to predict income based on education

▪ Solution: Intel Modin for data ingestion and ETL,
  Daal4Py and Intel scikit-learn for model training and
  prediction

▪ Perf Gains:
     • Read_CSV (Read from disk and store as a dataframe) : 6x

     • ETL operations : 38x

     • Train Test Split : 4x

     • ML training (fit & predict) with Ridge Regression : 21x

For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
See backup for configuration details.                                                                    30
Intel® Distribution for Python Architecture
                Command Line              Scientific Environments                             Developer Environments
               >   python script.py
  Interface
  Language

                                      Extension             Numba        Release GIL          CPython        Release GIL
                                                                                                                                     Intel® Distribution for Python
                     SDC
                                                           LLVM IR                     GIL:                                C++

                                                                        Numerical                                                   Parallelism
                       Dataframe
                                                                                                        daal4py
 Packages
  Python

                                                                                                                           tbb4py      smp             mpi4py
Technologies

                                                  oneMKL             DPC++         oneDAL                                                                  impi
                                                                                                                                      iomp
   Native

                                                                                                                             TBB
    Intel

                                                                                                                                              Community         Intel
                                                                                                                                              technology     technology

                                                                                                                                                                          31
Envision a GPU-enabled Python Library Ecosystem
Data Parallel Python
                                                            Unified Python Offload Programming Model
Extending PyData ecosystem for XPU                     with device_context(“gpu”):
                                                           a_dparray = dpnp.random.random(1024, 3)
                                                           X_dparray = numba.njit(compute_embedding)(a_dparray)
                                                           res_dparray = daal4py.kmeans().compute(X_dparray)

                                                   Optimized Packages for Intel CPUs & GPUs                       Jit Compilation

                                                                                                      •••

                    numpy     → dpnp
                                                                 Unified Data & Execution Infrastructure
                   ndarray    → dparray
     NDA Presentation
                host memory → unified shared mem      zero-copy USM array interface             common device execution queues

                        CPU   → XPU

                                                                                     DPC++ RUNTIME

                                                        OpenCL                        Level 0                         CUDA

                                                                                                                                    32
New Additions to Numba’s Language Design
            @dppy.kernel                                                 @njit
 import dpctl                                          from numba import njit
 import numba_dppy as dppy                             import numpy as np
 import numpy as np                                    import dpctl

                                                       @njit
 @dppy.kernel                                          def f1(a, b):
 def sum(a,b,c):                                           c = a + b
      i = dppy.get_global_id[0]                            return c
      c[i] = a[i] + b[i]
 a = np.ones(1024 dtype=np.float32)                    a = np.ones(1024 dtype=np.float32)
 b = np.ones(1024, dtype=np.float32)                   b = np.ones(1024, dtype=np.float32)
 c = np.zeros_like(a)                                  with dpctl.device_context("gpu"):
 with dpctl.device_context("gpu"):                         c = f1(a, b)
                                    NDA Presentation
      sum[1024, dppy. DEFAULT_LOCAL_SIZE](a, b, c)

     Explicit kernels, Low-level kernel                 NumPy-based array programming, auto-
      programming for expert ninjas                          offload, high-productivity

                                                                                               33
Seamless interoperability and sharing of resources

                                                                      • Different packages
                                                                        share same execution
                                                                        context
 import dpctl, numba, dpnp, daal4py

 @numba.njit
 def compute(a):                                                      • Data can be
    ...
                                                     Numba function     exchanged without
 with dpctl.device_context("gpu"):
                                                                        extra copies and kept
     a_dparray   = dpnp.random.random(1024, 3)                          on the device
     X_dparray   = compute(a_dparray)
     res_dparray = daal4py.kmeans().compute(X_dparray)

                                      daal4py function

                                                                                                34
Portability Across Architectures
 import numba
 import numpy as np
 import math

 @numba.vectorize(nopython=True)
                                                                # Runs on CPU by default
 def cndf2(inp):                                                blackscholes(...)
     out = 0.5 + 0.5 * math.erf((math.sqrt(2.0) / 2.0) * inp)
     return out
                                                                # Runs on GPU
 @numba.njit(parallel={"offload": True}, fastmath=True)         with dpctl.device_context("gpu"):
 def blackscholes(sptprice, strike, rate, volatility, timev):       blackscholes(...)
     logterm = np.log(sptprice / strike)
     powterm = 0.5 * volatility * volatility
     den = volatility * np.sqrt(timev)                          # In future
     d1 = (((rate + powterm) * timev) + logterm) / den          with dpctl.device_context(“cuda:gpu"):
     d2 = d1 - den                                                  blackscholes(...)
     NofXd1 = cndf2(d1)
     NofXd2 = cndf2(d2)
     futureValue = strike * np.exp(-rate * timev)
     c1 = futureValue * NofXd2
     call = sptprice * NofXd1 - c1
     put = call - futureValue + sptprice
     return put

                                                                                                         35
Scikit-Learn on XPU
Stock on Host:                  Optimized on Host:                         Offload to XPU:                     SAME
                                                                                                               NUMERIC
                                                                                                               BEHAVIOR
                                 import daal4py as d4p                     import daal4py as d4p
                                 d4p.patch_sklearn()
                                                                                                               as defined by
                                                                           d4p.patch_sklearn()
                                                                           import dpctl
                                                                                                               Scikit-learn
  from sklearn.svm import SVC    from sklearn.svm import SVC               from sklearn.svm import SVC         Consortium
  X, Y = get_dataset()           X, Y = get_dataset()                      X, Y = get_dataset()
                                                                                                               & continuously
                                                                           with dpctl.device_context(“gpu”):
                                                                                                               validated by CI
  clf = SVC().fit(X, y)          clf = SVC().fit(X, y)                        clf = SVC().fit(X, y)
  res = clf.predict(X)           res = clf.predict(X)                         res = clf.predict(X)

                                                        NDA Presentation
                                                                                                                               36
Installing Intel® Distribution for Python* 2021
                                      > conda create -n idp –c intel intelpython3_core python=3.x
     Anaconda.org                     > conda activate idp
https://anaconda.org/intel/packages   > conda install intel::numpy

                                      https://software.intel.com/content/www/us/en/develop/articles/installing-intel-
                                      free-libs-and-python-apt-repo.html
          YUM/APT                     https://software.intel.com/content/www/us/en/develop/articles/installing-intel-
                                      free-libs-and-python-yum-repo.html

        Docker Hub                    docker pull intelpython/intelpython3_full

                                      https://software.intel.com/content/www/us/en/develop/tools/onea
            oneAPI                    pi/ai-analytics-toolkit.html

        Standalone                    https://software.intel.com/content/www/us/en/develop/articles/one
                                      api-standalone-components.html#python
         Installer
                                      >   pip   install   intel-numpy
                                      >   pip   install   intel-scipy     + Intel library Runtime packages
              PyPI                    >   pip   install   mkl_fft          + Intel development packages
                                      >   pip   install   mkl_random

                                                                                                                        37
Get the Most from Your Code Today with Intel® Tech.Decoded

                                      Visit TechDecoded.intel.io to learn how to
                                      put key optimization strategies into practice
                                      with Intel development tools.

                                      Big Picture Videos            TOPICS:
                                      Discover Intel’s vision for
                                                                     Visual Computing
                                      key development areas.
                                                                     Code Modernization
                                       Essential Webinars
                                                                     Systems & IoT
                                      Gain strategies, practices
                                      and tools to optimize          Data Science
                                      application and solution
                                      performance.                   Data Center & Cloud

                                       Quick Hit How-To Videos
                                                         38
                                      Learn how to do specific
                                      programming tasks using
                                      Intel® tools.

                                                                                           38
More Resources

Intel® Distribution for Python
 • Product page – overview, features, FAQs…
 • Training materials – movies, tech briefs, documentation, evaluation guides…
 • Support – forums, secure support…
 • Machine Learning Benchmarks
   • https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics
   • https://github.com/IntelPython/scikit-learn_bench

                                                                                 39
Thank you
You can also read