ADVANCED COMPUTER ARCHITECTURES - AA 2014/2015 Website Cristina Silvano email: ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
088949 – ADVANCED COMPUTER ARCHITECTURES
AA 2014/2015
Website: http://home.deib.polimi.it/silvano/aca-como.htm
Prof. Cristina Silvano
email: cristina.silvano@polimi.it
Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di MilanoGoals of the ACA course
Provide an overview of the most recent and advanced
computer architectures
Introduce the basic micro-architectural mechanisms
found in modern microprocessor architectures
Provide the reasoning behind the adoption of advanced
computer architectures
Cristina Silvano – Politecnico di Milano -2-ADVANCED COMPUTER ARCHITECTURES: AN OVERVIEW Cristina Silvano – Politecnico di Milano -3- March 2012
Advanced Computer Architectures:
Supercomputers
The first supercomputer reaching the Petascale peak
performance (1015 Flops) was installed in 2008.
Research on supercomputing is pushing towards the
Exascale (1018 Flops) to be reached in 2020.
Cristina Silvano – Politecnico di Milano -4- March 2013Top500 ranking of the world’s most
powerful supercomputers
No. 1 Tianhe-2 reaches 33.86 PetaFlops
(Linpack performance) 54.9 PetaFlops
peak performance with 17.8 MW power
dissipation
Site: National Super Computer Center in
Guangzhou (China)
No. 2 Titan: 17.59 PetaFlops (Linpack
performance) 27.11 PetaFlops (peak
performance) with 8.2MW power
dissipation
Site: Oak Ridge National Laboratory
(USA)
Both Tianhe-2 and Titan employ
accelerator/co-processor technology
Cristina Silvano – Politecnico di Milano -5- March 2013No. 2 TITAN – Cray XK7, Opteron 2.2GHz, NVIDIA K20X Cristina Silvano – Politecnico di Milano -6- March 2012
Exascale supercomputers
To reach 20 MW Exascale supercomputer projected to 2020,
current supercomputers must achieve energy efficiency pushing
towards a goal of 50 GigaFlops/W
No.1 Tianhe-2 delivers 1.9 GigaFlops/W resulting only 40th in the
Green500 list ranking supercomputers by their energy efficiency.
Today most green supercomputer in Green500 achieves 4.5
GigaFlops/W
The top 17 positions of Green500 are currently occupied by
heterogeneous computing systems
This dominance will become a trend for the next coming years to
reach the target of 20 MW Exascale supercomputer
Cristina Silvano – Politecnico di Milano -7- March 2013US Dept. of Energy recently announced
Summit and Sierra supercomputers
Cristina Silvano – Politecnico di Milano -8- March 2013Applications driving the demand for more
computing performance
Climate Astrophysics
Biology
Business Analytics
Cristina Silvano – Politecnico di Milano -9- March 2012Advanced Computer Architectures:
Intel® Core™ i7-3770T Processor
(Nehalem, up to 3.70 GHz)
# of Cores 4
# of Threads 8
Clock Speed 2.5 GHz
Max Turbo Frequency 3.7 GHz
Intel® Smart Cache 8 MB
Instruction Set 64-bit
Instruction Set Extensions SSE4.1/4.2, AVX
Embedded Options Available No
160mm² die @ 22nm Lithography 22 nm
1.40 billion transistors. Max TDP 45 W
Recomm. Customer Price TRAY: $294.00
Max Memory Size 32 GB
Memory Types DDR3-1333/1600
# of Memory Channels 2
Cristina Silvano – Politecnico di Milano
Max Memory Bandwidth 25.6 GB/sAdvanced Computer Architectures:
Smart Phones
Cristina Silvano – Politecnico di Milano - 11 -ARM Cortex-A8 core processor
in Apple A4 System-on-Chip
Based on the ARMv7 architecture
It’s a dual-issue in-order execution design
The Apple A4 at 1 GHz (45nm manufactured by Samsung from March
2010 to present), a System-on-Chip that combines an ARM Cortex-A8
and a PowerVR GPU, is in the:
• Original iPad, April 2010
• iPhone4: June 2010 (Black; GSM), February 2011 (Black; CDMA),
April 2011 (White; GSM & CDMA)
• iPod Touch (4th generation): September 2010 (Black model),
October 2011 (White model)
• Apple TV (2nd generation): Sept. 2010
12ARM Cortex-A9 MP core processor in
Apple A5 System-on-Chip
Based on the ARMv7 architecture
It’s a dual-issue in-order execution design
The Apple A5 at 1 GHz (45nm to 32 nm manufactured by Samsung
from March 2011 to present), a System-on-Chip that combines a dual
core ARM Cortex-A9 with NEON SIMD accelerator and a dual core
PowerVR GPU, is in the:
• iPad 2 (A5 dual-core 45 nm) – March 2011; (A5 dual-core 32 nm) –
March 2012
• iPhone 4S (A5 dual-core 45 nm) – October 2011
• Apple TV 3rd generation (A5 single-core, 32 nm) – March 2012
• iPod Touch 5th generation (A5 dual-core 32 nm) – October 2012
• iPad Mini (A5 dual-core 32 nm) – November 2012
13Apple A6 System-on-Chip
Apple A6 SoC was introduced on Sept. 2012 for the iPhone 5
Apple states that it is up to twice as fast and has up to twice the
graphics power compared to its predecessor the Apple A5
The A6 uses a 1.3 GHz custom Apple-designed ARMv7 based dual-core
CPU, called Swift, and an integrated triple-core PowerVR SGX 543MP3
GPU.
The A6 chip for iPhone 5 incorporates 1GB of LPDDR2-1066 RAM and
provides double the memory capacity of iPhone4S while increasing the
theoretical memory bandwidth from 6.4 GB/s to 8.5 GB/s.
14Apple A6 System-on-Chip
ARMv7s ISA dual core
Triple-core PowerVR
SGX 543MP3 GPU
1MB L2 cache
1.3 GHz
32nm Samsung
96.71mm2 (22% smaller
than A5)
Cristina Silvano – Politecnico di Milano - 15 -Apple A7 System-on-Chip
Apple A7 is a 64-bit SoC introduced on Sept. 2013 for the iPhone 5S
Apple states that it is up to twice as fast and has up to twice the graphics
power compared to its predecessor the Apple A6.
The A7 features an Apple-designed 64-bit 1.3-1.4 GHz ARMv8-A dual-core CPU,
called Cyclone, and an integrated GPU PowerVR G6430 in a four cluster
configuration
The A7 has a per-core L1 cache of 64KB for data and 64 KB for instructions, a L2
cache of 1MB shared by both CPU cores, and a 4 MB L3 cache that services the
entire SoC.
Compared to A6, the A7 SoC no longer services the accelerometer, gyroscope
and compass. To reduce power consumption, these functionalities have been
moved to the new M7 motion coprocessor, a separate ARM-based
microcontroller from NXP Semiconductors.
16Apple A8 System-on-Chip
Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2014 for the
iPhone 6 and iPhone 6 Plus
Apple states that it has 25% more CPU performance and 50% more graphics
performance with 50% of the power compared to its predecessor A7.
The A8 features the second generation of the Apple-designed 64-bit 1.4 GHz
ARMv8-A dual-core CPU, called Cyclone Gen 2, and an integrated PowerVR
Series 6XT GX6450 quad-core GPU.
The A8 is manufactured on a 20 nm process by TSMC which replaced Samsung
as manufacturer of Apple's mobile device processors. It contains 2 billion
transistors. It has 1 GB of LPDDR3 RAM included in the package.
On October 16, 2014, Apple introduced a variant of the A8, the A8X, in the iPad
Air 2 with improved graphics and CPU performance due to one extra core and
higher frequency
17Moore’s Law (1965): The numbers of transistors on a
processor will double every 18 to 24 months
Cristina Silvano – Politecnico di Milano - 18 -The end of the historic scaling
Chip density is
continuing increase
~2x every 2 years
Max Clock
Frequency Wall
Power Wall
Expose parallelism
in a coarser level
than single thread
Cristina Silvano – Politecnico di Milano - 19 - March 2012Stopper: On-Chip Temperature Wall Cristina Silvano – Politecnico di Milano - 20 -
Paradigm shift : Multi-core architectures
ARM 9
180 nm
11.8 mm2
130 nm,
5.2 mm2
90 nm,
2.6 mm2
65 nm
1.4 mm2
Source: STMicroelectronicsIntel 80 core Cristina Silvano – Politecnico di Milano - 22 -
NVIDIA Fermi GPU Cristina Silvano – Politecnico di Milano - 23 -
NVIDIA Tesla GPU Kepler GK110 Architecture • 7.1B Transistors • 15 SMX units (2880 cores) • >1TFLOP FP64 • 1.5MB L2 Cache • 384-bit GDDR5 • PCI Express Gen3 Cristina Silvano – Politecnico di Milano - 24 -
ACA COURSE INFORMATION Cristina Silvano – Politecnico di Milano - 25 - March 2012
ACA Course Schedule
Schedule: Second Semester 2014-2015 (Spring 2015)
TUESDAY 13.15 - 15.15
Location: VS8 Via Valleggio 11, Como Campus
THURSDAY 15.15 - 18.15
Location: V07 via Valleggio 11, Como Campus
Cristina Silvano – Politecnico di Milano - 26 -Contact Information
Office hours for students:
Tuesday 15.15 - 16.15 at Polo di Como, Via Anzani 42,
2nd floor (please send an email to get an appointment).
Main Contact:
The students can contact prof. Cristina Silvano by
e-mail (cristina.silvano@polimi.it)
by indicating:
Subject: ACA COMO, Your_Surname, Your_Name,
Your_POLIMI_ID_NUMBER
Cristina Silvano – Politecnico di Milano - 27 -ACA Teaching Assistant
Ing. Amir Hossein Ashouri: amirhossein.ashouri@polimi.it
Cristina Silvano – Politecnico di Milano - 28 -ACA Course Info
Teaching Activity: The course consists of 5 CFU and it is
organized in 30 hours of lectures and 20 hours of
written/tool-based exercises to prove the concepts
presented during the lectures.
Pre-requirements: Basic concepts on logic design and
computer architectures.
Cristina Silvano – Politecnico di Milano - 29 -ACA Final Exam
FINAL EXAM:
The final examination consists of a WRITTEN EXAM and an OPTIONAL part
consisting of an oral presentation OR discussion of a project topic prepared
during the course (the topic for presentation and project will be assigned by
the professor and it will cover specific techniques and methodologies) that
will be presented by the student at the end of the course.
For each written exam, a max. score of 33 points will be assigned: 15 max.
points will be assigned for the solution of the exercise part and 18 points
will be assigned for answering to the theory part. The OPTIONAL part can
provide EXTRA points (from 1 to 2 extra points for the oral presentation
and 1 to 4 extra points for the project). The additional points given by the
project will be added to the score of the written exam only if the final score
of the written exam will be sufficient (>=18).
The project/presentation will be assigned at the midterm of the course
semester and it must be concluded and presented by: June 25th, 2015
(firm deadline).
Cristina Silvano – Politecnico di Milano - 30 -ACA Teaching Material
Additional information in slides and papers available
through the course webpage:
http://home.dei.polimi.it/silvano/ARC-MULTIMEDIA.htm
• If you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct
visualisation and printing of the PDF SLIDES, please use the SAVE AS
option and save the PDF FILE on your laptop for correct
visualisation and printing.
Reference Book: "Computer Architecture, A Quantitative
Approach", John Hennessy, David Patterson, Morgan
Kaufmann, Fifth Edition.
Cristina Silvano – Politecnico di Milano - 31 -Support for the international students
ACA course is offered in English
Teaching materials (slides/papers/textbook) available in
English
Final exam can be done in English
Teaching support available in English
Cristina Silvano – Politecnico di Milano - 32 - March 2013Overview of the ACA topics
How to increase performance while decrease the design cost ?
• RISC: Reduced Instruction Set Computer
• Pipeline
Can we gain more ?
• Branch prediction
• Instruction Level Parallelism (ILP)
• Multithreading
• Multiprocessors
Still performance does not scale ?
• Memory hierarchy
• Cache organization
Cristina Silvano – Politecnico di Milano - 33 -Main lectures topics (1)
Review of basic computer architecture definitions and components
(Central Processing Unit, Memory System, Input/Output Interfaces,
Communication System)
Basic performance evaluation metrics of computer architectures
Memory hierarchy: Basic and advanced concepts. Multi-level caches.
Performance evaluation, optimisation techniques.
Central Processing Unit: the RISC approach (Reduced Instruction Set
Computer).
Cristina Silvano – Politecnico di Milano - 34 -Main lectures topics (2)
Techniques for performance optimization:
• Pipelining: The problem of hazards: structural, control and data
hazards; Optimization techniques to solve the problem of
hazards
• Branch prediction techniques: Static and dynamic branch
prediction techniques
• Speculative execution
Cristina Silvano – Politecnico di Milano - 35 -Sequential vs. Pipelining Instruction Execution
I1 I2
IF ID EX MEM WB IF ID EX MEM WB …
10 ns 10 ns
Cristina Silvano – Politecnico di Milano - 36 -Main lectures topics (3)
Instruction Level Parallelism (ILP):
• Static and dynamic scheduling;
• Superscalar architectures;
• VLIW (Very Long Instruction Word) architectures;
Cristina Silvano – Politecnico di Milano - 37 -Instruction Level Parallelism:
Example of 2-issue processor
I1 I IF ID EX MEM WB Time
1
I2 I IF ID EX MEM WB
2
2 ns IF ID EX MEM WB
I3
Instruction Per Clock = 2
I4 IF ID EX MEM WB CPI = Clock Per Instruction = 0.5
IF ID EX MEM WB
I5
2 ns
I6 IF ID EX MEM WB
IF ID EX MEM WB
I7 2 ns
I8 IF ID EX MEM WB
I9 IF ID EX MEM WB
2 ns
I10 IF ID EX MEM WB
Cristina Silvano – Politecnico di Milano - 38 -Beyond ILP: Multithreading
Threads: Independent sequences of instructions
…
Single-threaded program Multi-threaded programMain lectures topics (4)
Beyond ILP:
• Multithreading (Thread Level Parallelism – TLP)
• Multiprocessors and multicore systems: taxonomy,
topologies, communication management, memory
management, cache coherency protocols, example of
architectures
• System-on-Chip and Network-on-Chip architectures;
Digital Signal Processors; Stream processors and
vector processors; Graphic Processors
Cristina Silvano – Politecnico di Milano - 40 -You can also read