ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...

Page created by Marshall Lambert
 
CONTINUE READING
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
088949 – ADVANCED COMPUTER ARCHITECTURES
                        AA 2014/2015

 Website: http://home.deib.polimi.it/silvano/aca-como.htm

                    Prof. Cristina Silvano
               email: cristina.silvano@polimi.it
  Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
                        Politecnico di Milano
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
Goals of the ACA course

   Provide an overview of the most recent and advanced
    computer architectures

   Introduce the basic micro-architectural mechanisms
    found in modern microprocessor architectures

   Provide the reasoning behind the adoption of advanced
    computer architectures

Cristina Silvano – Politecnico di Milano   -2-
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
ADVANCED COMPUTER
ARCHITECTURES: AN OVERVIEW
Cristina Silvano – Politecnico di Milano   -3-   March 2012
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
Advanced Computer Architectures:
                Supercomputers
    The first supercomputer reaching the Petascale peak
     performance (1015 Flops) was installed in 2008.
    Research on supercomputing is pushing towards the
     Exascale (1018 Flops) to be reached in 2020.

Cristina Silvano – Politecnico di Milano   -4-        March 2013
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
Top500 ranking of the world’s most
                powerful supercomputers
                                                No. 1 Tianhe-2 reaches 33.86 PetaFlops
                                                 (Linpack performance) 54.9 PetaFlops
                                                 peak performance with 17.8 MW power
                                                 dissipation
                                                Site: National Super Computer Center in
                                                 Guangzhou (China)

                                                No. 2 Titan: 17.59 PetaFlops (Linpack
                                                 performance) 27.11 PetaFlops (peak
                                                 performance) with 8.2MW power
                                                 dissipation
                                                Site: Oak Ridge National Laboratory
                                                 (USA)

                                                Both Tianhe-2 and Titan employ
                                                 accelerator/co-processor technology
Cristina Silvano – Politecnico di Milano   -5-                                March 2013
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
No. 2 TITAN – Cray XK7, Opteron 2.2GHz, NVIDIA K20X

Cristina Silvano – Politecnico di Milano   -6-             March 2012
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
Exascale supercomputers

    To reach 20 MW Exascale supercomputer projected to 2020,
     current supercomputers must achieve energy efficiency pushing
     towards a goal of 50 GigaFlops/W
    No.1 Tianhe-2 delivers 1.9 GigaFlops/W resulting only 40th in the
     Green500 list ranking supercomputers by their energy efficiency.
    Today most green supercomputer in Green500 achieves 4.5
     GigaFlops/W
    The top 17 positions of Green500 are currently occupied by
     heterogeneous computing systems
    This dominance will become a trend for the next coming years to
     reach the target of 20 MW Exascale supercomputer

Cristina Silvano – Politecnico di Milano   -7-                    March 2013
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
US Dept. of Energy recently announced
                Summit and Sierra supercomputers

Cristina Silvano – Politecnico di Milano   -8-          March 2013
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
Applications driving the demand for more
                computing performance
 Climate                                                   Astrophysics

                                                 Biology

     Business Analytics

Cristina Silvano – Politecnico di Milano   -9-                     March 2012
ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
Advanced Computer Architectures:
                 Intel® Core™ i7-3770T Processor
                 (Nehalem, up to 3.70 GHz)
                                            # of Cores                   4

                                            # of Threads                 8

                                            Clock Speed                  2.5 GHz

                                            Max Turbo Frequency          3.7 GHz

                                            Intel® Smart Cache           8 MB

                                            Instruction Set              64-bit

                                            Instruction Set Extensions   SSE4.1/4.2, AVX

                                            Embedded Options Available   No

160mm² die @ 22nm                           Lithography                  22 nm
1.40 billion transistors.                   Max TDP                      45 W

                                            Recomm. Customer Price       TRAY: $294.00

                                            Max Memory Size              32 GB

                                            Memory Types                 DDR3-1333/1600

                                            # of Memory Channels         2
 Cristina Silvano – Politecnico di Milano
                                            Max Memory Bandwidth         25.6 GB/s
Advanced Computer Architectures:
                Smart Phones

Cristina Silvano – Politecnico di Milano   - 11 -
ARM Cortex-A8 core processor
           in Apple A4 System-on-Chip

   Based on the ARMv7 architecture
   It’s a dual-issue in-order execution design
   The Apple A4 at 1 GHz (45nm manufactured by Samsung from March
    2010 to present), a System-on-Chip that combines an ARM Cortex-A8
    and a PowerVR GPU, is in the:
     • Original iPad, April 2010

     • iPhone4: June 2010 (Black; GSM), February 2011 (Black; CDMA),

        April 2011 (White; GSM & CDMA)
     • iPod Touch (4th generation): September 2010 (Black model),

        October 2011 (White model)
     • Apple TV (2nd generation): Sept. 2010

                                  12
ARM Cortex-A9 MP core processor in
                 Apple A5 System-on-Chip

   Based on the ARMv7 architecture
   It’s a dual-issue in-order execution design
   The Apple A5 at 1 GHz (45nm to 32 nm manufactured by Samsung
    from March 2011 to present), a System-on-Chip that combines a dual
    core ARM Cortex-A9 with NEON SIMD accelerator and a dual core
    PowerVR GPU, is in the:
     • iPad 2 (A5 dual-core 45 nm) – March 2011; (A5 dual-core 32 nm) –

        March 2012
     • iPhone 4S (A5 dual-core 45 nm) – October 2011

     • Apple TV 3rd generation (A5 single-core, 32 nm) – March 2012

     • iPod Touch 5th generation (A5 dual-core 32 nm) – October 2012

     • iPad Mini (A5 dual-core 32 nm) – November 2012

                                         13
Apple A6 System-on-Chip

   Apple A6 SoC was introduced on Sept. 2012 for the iPhone 5
   Apple states that it is up to twice as fast and has up to twice the
    graphics power compared to its predecessor the Apple A5
   The A6 uses a 1.3 GHz custom Apple-designed ARMv7 based dual-core
    CPU, called Swift, and an integrated triple-core PowerVR SGX 543MP3
    GPU.
   The A6 chip for iPhone 5 incorporates 1GB of LPDDR2-1066 RAM and
    provides double the memory capacity of iPhone4S while increasing the
    theoretical memory bandwidth from 6.4 GB/s to 8.5 GB/s.

                                        14
Apple A6 System-on-Chip

     ARMv7s ISA dual core
     Triple-core PowerVR
      SGX 543MP3 GPU
     1MB L2 cache
     1.3 GHz
     32nm Samsung
     96.71mm2 (22% smaller
      than A5)

    Cristina Silvano – Politecnico di Milano     - 15 -
Apple A7 System-on-Chip

   Apple A7 is a 64-bit SoC introduced on Sept. 2013 for the iPhone 5S
   Apple states that it is up to twice as fast and has up to twice the graphics
    power compared to its predecessor the Apple A6.
   The A7 features an Apple-designed 64-bit 1.3-1.4 GHz ARMv8-A dual-core CPU,
    called Cyclone, and an integrated GPU PowerVR G6430 in a four cluster
    configuration
   The A7 has a per-core L1 cache of 64KB for data and 64 KB for instructions, a L2
    cache of 1MB shared by both CPU cores, and a 4 MB L3 cache that services the
    entire SoC.
   Compared to A6, the A7 SoC no longer services the accelerometer, gyroscope
    and compass. To reduce power consumption, these functionalities have been
    moved to the new M7 motion coprocessor, a separate ARM-based
    microcontroller from NXP Semiconductors.

                                          16
Apple A8 System-on-Chip

   Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2014 for the
    iPhone 6 and iPhone 6 Plus
   Apple states that it has 25% more CPU performance and 50% more graphics
    performance with 50% of the power compared to its predecessor A7.
   The A8 features the second generation of the Apple-designed 64-bit 1.4 GHz
    ARMv8-A dual-core CPU, called Cyclone Gen 2, and an integrated PowerVR
    Series 6XT GX6450 quad-core GPU.
   The A8 is manufactured on a 20 nm process by TSMC which replaced Samsung
    as manufacturer of Apple's mobile device processors. It contains 2 billion
    transistors. It has 1 GB of LPDDR3 RAM included in the package.
   On October 16, 2014, Apple introduced a variant of the A8, the A8X, in the iPad
    Air 2 with improved graphics and CPU performance due to one extra core and
    higher frequency

                                          17
Moore’s Law (1965): The numbers of transistors on a
                processor will double every 18 to 24 months

Cristina Silvano – Politecnico di Milano   - 18 -
The end of the historic scaling

   Chip density is
    continuing increase
    ~2x every 2 years

   Max Clock
    Frequency Wall

   Power Wall

   Expose parallelism
    in a coarser level
    than single thread

      Cristina Silvano – Politecnico di Milano   - 19 -   March 2012
Stopper: On-Chip Temperature Wall

Cristina Silvano – Politecnico di Milano   - 20 -
Paradigm shift : Multi-core architectures

                                         ARM 9
                                         180 nm
                                         11.8 mm2

                                         130 nm,
                                         5.2 mm2

                                         90 nm,
                                         2.6 mm2

                                         65 nm
                                         1.4 mm2

                                  Source: STMicroelectronics
Intel 80 core

Cristina Silvano – Politecnico di Milano   - 22 -
NVIDIA Fermi GPU

Cristina Silvano – Politecnico di Milano   - 23 -
NVIDIA Tesla GPU

  Kepler GK110 Architecture
  • 7.1B Transistors
  • 15 SMX units (2880 cores)
  • >1TFLOP FP64
  • 1.5MB L2 Cache
  • 384-bit GDDR5
  • PCI Express Gen3

Cristina Silvano – Politecnico di Milano   - 24 -
ACA COURSE INFORMATION

Cristina Silvano – Politecnico di Milano   - 25 -   March 2012
ACA Course Schedule

    Schedule: Second Semester 2014-2015 (Spring 2015)

    TUESDAY 13.15 - 15.15
     Location: VS8 Via Valleggio 11, Como Campus
    THURSDAY 15.15 - 18.15
     Location: V07 via Valleggio 11, Como Campus

Cristina Silvano – Politecnico di Milano   - 26 -
Contact Information

    Office hours for students:
     Tuesday 15.15 - 16.15 at Polo di Como, Via Anzani 42,
     2nd floor (please send an email to get an appointment).

    Main Contact:
     The students can contact prof. Cristina Silvano by
     e-mail (cristina.silvano@polimi.it)
     by indicating:
    Subject: ACA COMO, Your_Surname, Your_Name,
     Your_POLIMI_ID_NUMBER

    Cristina Silvano – Politecnico di Milano   - 27 -
ACA Teaching Assistant

    Ing. Amir Hossein Ashouri: amirhossein.ashouri@polimi.it

    Cristina Silvano – Politecnico di Milano   - 28 -
ACA Course Info

    Teaching Activity: The course consists of 5 CFU and it is
     organized in 30 hours of lectures and 20 hours of
     written/tool-based exercises to prove the concepts
     presented during the lectures.
    Pre-requirements: Basic concepts on logic design and
     computer architectures.

Cristina Silvano – Politecnico di Milano   - 29 -
ACA Final Exam
    FINAL EXAM:
     The final examination consists of a WRITTEN EXAM and an OPTIONAL part
     consisting of an oral presentation OR discussion of a project topic prepared
     during the course (the topic for presentation and project will be assigned by
     the professor and it will cover specific techniques and methodologies) that
     will be presented by the student at the end of the course.
    For each written exam, a max. score of 33 points will be assigned: 15 max.
     points will be assigned for the solution of the exercise part and 18 points
     will be assigned for answering to the theory part. The OPTIONAL part can
     provide EXTRA points (from 1 to 2 extra points for the oral presentation
     and 1 to 4 extra points for the project). The additional points given by the
     project will be added to the score of the written exam only if the final score
     of the written exam will be sufficient (>=18).
    The project/presentation will be assigned at the midterm of the course
     semester and it must be concluded and presented by: June 25th, 2015
     (firm deadline).

Cristina Silvano – Politecnico di Milano   - 30 -
ACA Teaching Material

   Additional information in slides and papers available
    through the course webpage:
    http://home.dei.polimi.it/silvano/ARC-MULTIMEDIA.htm
     •   If you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct
         visualisation and printing of the PDF SLIDES, please use the SAVE AS
         option and save the PDF FILE on your laptop for correct
         visualisation and printing.
   Reference Book: "Computer Architecture, A Quantitative
    Approach", John Hennessy, David Patterson, Morgan
    Kaufmann, Fifth Edition.

Cristina Silvano – Politecnico di Milano   - 31 -
Support for the international students

    ACA course is offered in English
    Teaching materials (slides/papers/textbook) available in
     English
    Final exam can be done in English
    Teaching support available in English

Cristina Silvano – Politecnico di Milano   - 32 -        March 2013
Overview of the ACA topics

    How to increase performance while decrease the design cost ?
       •   RISC: Reduced Instruction Set Computer
       •   Pipeline

    Can we gain more ?
       •   Branch prediction
       •   Instruction Level Parallelism (ILP)
       •   Multithreading
       •   Multiprocessors

    Still performance does not scale ?
       •   Memory hierarchy
       •   Cache organization

    Cristina Silvano – Politecnico di Milano   - 33 -
Main lectures topics (1)

    Review of basic computer architecture definitions and components
     (Central Processing Unit, Memory System, Input/Output Interfaces,
     Communication System)
    Basic performance evaluation metrics of computer architectures
    Memory hierarchy: Basic and advanced concepts. Multi-level caches.
     Performance evaluation, optimisation techniques.
    Central Processing Unit: the RISC approach (Reduced Instruction Set
     Computer).

Cristina Silvano – Politecnico di Milano   - 34 -
Main lectures topics (2)

    Techniques for performance optimization:
      • Pipelining: The problem of hazards: structural, control and data

        hazards; Optimization techniques to solve the problem of
        hazards
      • Branch prediction techniques: Static and dynamic branch

        prediction techniques
      • Speculative execution

Cristina Silvano – Politecnico di Milano   - 35 -
Sequential vs. Pipelining Instruction Execution

                             I1                                   I2

       IF         ID         EX        MEM   WB    IF      ID    EX     MEM   WB   …

                          10 ns                                 10 ns

Cristina Silvano – Politecnico di Milano          - 36 -
Main lectures topics (3)

    Instruction Level Parallelism (ILP):
      • Static and dynamic scheduling;

      • Superscalar architectures;

      • VLIW (Very Long Instruction Word) architectures;

Cristina Silvano – Politecnico di Milano   - 37 -
Instruction Level Parallelism:
                  Example of 2-issue processor
        I1 I    IF        ID         EX    MEM    WB                          Time
           1

        I2 I    IF        ID         EX    MEM    WB
           2
               2 ns       IF         ID    EX     MEM       WB
        I3
                                                                  Instruction Per Clock = 2
        I4                IF         ID    EX     MEM       WB    CPI = Clock Per Instruction = 0.5

                                     IF    ID     EX        MEM   WB
        I5
                        2 ns
        I6                           IF    ID     EX        MEM   WB

                                            IF    ID        EX    MEM    WB
        I7                          2 ns

        I8                                  IF    ID        EX    MEM    WB

        I9                                         IF        ID    EX    MEM         WB
                                           2 ns
      I10                                          IF        ID    EX    MEM         WB

Cristina Silvano – Politecnico di Milano           - 38 -
Beyond ILP: Multithreading
  Threads: Independent sequences of instructions

                                     …

Single-threaded program            Multi-threaded program
Main lectures topics (4)

    Beyond ILP:
      • Multithreading (Thread Level Parallelism – TLP)

      • Multiprocessors and multicore systems: taxonomy,

        topologies, communication management, memory
        management, cache coherency protocols, example of
        architectures
      • System-on-Chip and Network-on-Chip architectures;

        Digital Signal Processors; Stream processors and
        vector processors; Graphic Processors

Cristina Silvano – Politecnico di Milano   - 40 -
You can also read