ADVANCED COMPUTER ARCHITECTURES - Polimi
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
088949 – ADVANCED COMPUTER ARCHITECTURES
AA 2017/2018 – Second Semester
http://home.deib.polimi.it/silvano/aca-milano.htm
Prof. Cristina Silvano
email: cristina.silvano@polimi.it
Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di MilanoGoals of the ACA course
Provide an overview of the most recent and advanced
computer architectures
Introduce the basic microarchitectural mechanisms
found in modern microprocessor architectures
Provide the reasoning behind the adoption of advanced
computer architectures
Cristina Silvano – Politecnico di Milano -2-Advanced Computer Architectures:
Supercomputers
First supercomputer reaching the Petascale peak
performance (1015 Flops) was IBM Roadrunner installed in
2008 at Los Alamos National Lab (New Mexico)
Research on supercomputing is pushing towards the
Exascale (1018 Flops) billions of billions to be reached in
2023.
Cristina Silvano – Politecnico di Milano -4-How to measure performance:
FLOPS, Floating Point Operations per Second
Name FLOPS
zettaFLOPS 1021
exaFLOPS 1018
petaFLOPS 1015
teraFLOPS 1012
gigaFLOPS 109
megaFLOPS 106
kiloFLOPS 103
FLOPS 1
Cristina Silvano – Politecnico di Milano -5-TOP500 List
• The TOP500 list is ranking the world's most powerful
supercomputers.
• The LINPACK Benchmark (introduced by Jack Dongarra) is
used to measure the system's floating point computing
power
• LINPACK measures how fast a computer solves a dense n by
n system of linear equations Ax = b, which is a common task
in engineering
www.top500.orgTop500 ranking of the world’s most
powerful supercomputers (Nov. 2017)
No. 1 Sunway TaihuLight reaches 93.01
PetaFlops (Linpack performance) 125.43
PetaFlops peak performance with 15.37
MW power dissipation. Site: National
Supercomputing Center in Wuxi (China)
No. 2 Tianhe-2 (Milky-Way-2) reaches
33.86 PetaFlops (Linpack performance)
54.9 PetaFlops peak performance with
17.8 MW power dissipation. Site:
National Super Computer Center in
Guangzhou (China)
Cristina Silvano – Politecnico di Milano -7-Top500 ranking: the Italian most powerful
supercomputer (Nov. 2017)
No. 14 in Top500 and No.2 in Europe: Marconi Intel Xeon Phi: 7.47
PetaFlops (Linpack performance) 15.37 PetaFlops (peak performance)
with 314,384 cores. Site: Casalecchio di Reno, Bologna (Italy)
Marconi is the Cineca's Tier-0 system, co-
designed by Cineca and Lenovo based on
the Lenovo NeXtScale platform and Intel®
Xeon Phi™ product family alongside with
Intel® Xeon® processor and Intel Omni-
Path
Cristina Silvano – Politecnico di Milano -8-Exascale Supercomputers
To reach 20 MW Exascale supercomputers projected to 2023,
current supercomputers must achieve energy efficiency pushing
towards a goal of 50 GigaFlops/W
No.1 Sunway delivers 6 GigaFlops/W resulting only 20th in the
Green500 list ranking supercomputers by their energy efficiency.
Today most green supercomputer in Green500 installed in Japan
achieves 17 GigaFlops/W
The top positions in Green500 are all occupied by heterogeneous
systems (based on accelerator/co-processor technology) equipped
with Intel Xeon processors and NVIDIA’s Tesla P100 and NVIDIA Volta
GV100 GPU to further accelerating the computation.
This dominance will become a trend for the next coming years to
reach the target of 20 MW Exascale supercomputer
Cristina Silvano – Politecnico di Milano - 10 -US Dept. of Energy Announced Summit and
Sierra Supercomputers
Cristina Silvano – Politecnico di Milano - 11 -Applications driving the demand for more
computing performance
Climate Astrophysics
Biology
Business Analytics
Cristina Silvano – Politecnico di Milano - 12 -Performance Trend Source: Jack Dongarra, U. of Tennessee, Oak Ridge National Lab, U. of Manchester
Performance of HPC over the last years from
the Top500
Source: Jack Dongarra, U. of Tennessee, Oak Ridge National Lab, U. of ManchesterAdvanced Computer Architectures:
Intel® Core™ i7-3770T Processor
# of Cores 4
# of Threads 8
Clock Speed 2.5 GHz
Max Turbo Frequency 3.7 GHz
Intel® Smart Cache 8 MB
Instruction Set 64-bit
Instruction Set Extensions SSE4.1/4.2, AVX
Embedded Options Available No
160mm² die @ 22nm Lithography 22 nm
1.40 billion transistors Max TDP 45 W
Next generations: Broadwell, Recomm. Customer Price TRAY: $294.00
Skylake, Kaby Lake at 14nm Max Memory Size 32 GB
(2014); Cannonlake at 10nm (2H
Memory Types DDR3-1333/1600
2017); Ice Lake 10nm (2018)
# of Memory Channels 2
Cristina Silvano – Politecnico di Milano
Max Memory Bandwidth 25.6 GB/sNVIDIA Fermi GPU Cristina Silvano – Politecnico di Milano - 16 -
NVIDIA Kepler GPU Kepler GK110 Architecture • 7.1B Transistors • 15 SMX units (2880 cores) • >1TFLOP FP64 • 1.5MB L2 Cache • 384-bit GDDR5 • PCI Express Gen3 Cristina Silvano – Politecnico di Milano - 17 -
NVIDIA Tesla P100 with Pascal GP100 GPU The Green500 list ranks the top 500 supercomputers in the world by energy efficiency for sustainable supercomputing Cristina Silvano – Politecnico di Milano - 18 -
Advanced Computer Architectures:
Smart Phones
iPhone 7 iPhone 7 Plus
4.7-inch 4.7-inch 5.5-inch display
12MP camera 12MP camera New 12MP camera New 12MP camera ++
5MP videocamera 5MP videocamera 7MP videocamera 7MP videocamera
Retina HD display Retina HD display Retina HD display Retina HD display
with 3D touch with 3D touch with 3D touch with 3D touch
A9 chip 64-bit A9 chip 64-bit Waterproof Waterproof
M9 coprocessor M9 coprocessor Audio stereo Audio stereo
iOS 10 iOS 10 A10 Fusion chip 64-bit A10 Fusion chip 64-bit
32GB 128GB 32GB 128GB M10 co-proecessor M10 coprocessor
iOS 10 iOS 10
- 20 -
32GB 128GB 256GB 32GB 128GB 256GBApple A8 System-on-Chip
Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2014 for the
iPhone 6 and iPhone 6 Plus
Apple states that it has 25% more CPU performance and 50% more graphics
performance with 50% of the power compared to its predecessor A7.
The A8 features the second generation of the Apple-designed 64-bit 1.4 GHz
ARMv8-A dual-core CPU, called Cyclone Gen 2, and an integrated PowerVR
Series 6XT GX6450 quad-core GPU.
The A8 is manufactured on a 20 nm process by TSMC which replaced Samsung
as manufacturer of Apple's mobile device processors. It contains 2 billion
transistors. It has 1 GB of LPDDR3 RAM included in the package.
On October 16, 2014, Apple introduced a variant of the A8, the A8X, in the iPad
Air 2 with improved graphics and CPU performance due to one extra core and
higher frequency
Cristina Silvano – Politecnico di MilanoApple A9 System-on-Chip
Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2015 for the
iPhone 6S and iPhone 6S Plus
Apple states that it has 70% more CPU performance and 90% more graphics
performance compared to its predecessor A8.
This is one of the most powerful mobile chip on the market toady along with
the Samsung Exynos 8890 and Qualcomm Snapdragon 820.
The A9 features the Apple-designed 64-bit 1.85 GHz ARMv8-A dual-core CPU,
called Twister, and an integrated PowerVR Series 7XT GT7600 six-core GPU.
The A9 is manufactured by two companies: 14nm FinFET process by Samsung
and 16 nm FinFET process by TSMC.
A9 has 2 GB of LPDDR4 RAM included in the package.
Apple introduced a variant of the A9, the A9X, in the iPad Pro with the M9
motion coprocessor embedded in it
Cristina Silvano – Politecnico di MilanoApple A10 Fusion
Apple A10 Fusion is a 64-bit ARM-based SoC designed by Apple and introduced
on Sept. 2016 for the iPhone 7 and iPhone 7 Plus
Apple states that it has 40% more CPU performance and 50% more graphics
performance compared to its predecessor A9.
The A10 with a die area of 125 mm2 and 3.3 billion transistors (including GPU
and cache) features two Apple-designed 64-bit 2.34 GHz ARMv8-A cores called
Hurricane and two energy-efficient 64-bit cores codenamed Zephyr (like the
ARM big.LITTLE technology).
A10 integrates new designed PowerVR Series 7XT GT7600 six-core GPU.
The A10 is manufactured 16 nm FinFET process by TSMC.
Cristina Silvano – Politecnico di MilanoEnergy efficiency underlies all markets
Energy efficiency is of paramount importance
for all application markets (automotive,
consumer, mobile, healthcare and beyond) and
target systems spanning from sensors, cyber-
physical systems, embedded systems up to
servers and HPC systems.Squeezing of computing cores
2005
65 nm
1.4 mm2
Source:
ARM9 STMicroelectronics
2007
45 nm
2009
32 nm
2011
22 nm
2013
14 nm… entering the multi/many‐core era
2005
65 nm
1.4 mm2
Source:
ARM9 STmicroelectronics
2007
45 nm
2009
32 nm
2011
22 nm
2013
14 nmWhat are the barriers of further scaling?
Transistor density
increases ~2x
every 2 years
Frequency wall
Power wall
Utilisation wall
… the end of the Dennard scaling
… increasing power densities
… entering the dark silicon eraThe dark silicon problem
The power wall
and the
utilisation wall
represent the main
barriers for the
efficient scaling in
the multi/many-
core era
Dark silicon: Fraction of the
die not usable due to the
power budgetACA COURSE INFORMATION Cristina Silvano – Politecnico di Milano - 40 -
Contact Information
Office hours for students:
Monday 14.00 - 15.00 at DEIB, Via Ponzio 34/5 First floor –
Internal phone number: 3692 (please send an email to get an
appointment).
Main Contact:
The students can contact prof. Cristina Silvano by
e-mail (cristina.silvano@polimi.it)
by indicating:
Subject: ACA COURSE Milano, Your_Surname,
Your_Name, Your_POLIMI_ID_NUMBER
Cristina Silvano – Politecnico di MilanoACA Teaching Assistants
Prof. Giovanni Agosta
e-mail (giovanni.agosta@polimi.it)
Prof. Gerardo Pelosi
e-mail (gerardo.pelosi@polimi.it)
Cristina Silvano – Politecnico di MilanoACA Course Info
Teaching Activity: The course consists of 5 CFU and it is
organized in 30 hours of lectures and 20 hours of
written/tool-based exercises to prove the concepts
presented during the lectures.
Pre-requirements: Basic concepts on logic design and
computer architectures.
Cristina Silvano – Politecnico di MilanoACA Final Exam
FINAL EXAM:
The final exam consists of a written exam.
For each written exam, a max. score of 32 points will be assigned
to 6 questions: max. 16 points will be assigned for the solution of
the exercise part (composed of 3 questions) and max. 16 points will
be assigned for answering to the theory part (composed of 3
questions)
It is possible to ask an OPTIONAL project to the instructor. The
project must be concluded before each written exam session (firm
deadline). The project assign an additional score up to max 12
points. The additional points given by the project will be added to
the score of the written exam only if the final score of the written
exam will be sufficient (>=18 points).
Cristina Silvano – Politecnico di MilanoACA Teaching Material
Additional information in slides and papers available
through Beep and the course webpage:
http://home.deib.polimi.it/silvano/aca-milano.htm
If you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct visualisation
and printing of the PDF SLIDES, please use the SAVE AS option and save the
PDF FILE on your laptop for correct visualisation and printing.
Reference Book: "Computer Architecture, A Quantitative
Approach", John Hennessy, David Patterson, Morgan
Kaufmann, Fourth Edition / Fifth Edition
Cristina Silvano – Politecnico di MilanoACA Course
ACA course is offered in English
Teaching materials (slides/papers/textbook) are
available in English
Final exam can be done in English
Teaching support available in English and Italian
Students with M-Z must follow the parallel ACA course
session held by prof. Donatella Sciuto. ACA course
objectives and program are aligned. Text of final written
exam is the same.
Cristina Silvano – Politecnico di MilanoYou can also read