Intersect360 Research White Paper: NEW AMD CPUs and GPUs CONTINUE

Page created by Dean Lucas
 
CONTINUE READING
Intersect360 Research White Paper: NEW AMD CPUs and GPUs CONTINUE
Intersect360 Research White Paper:
NEW AMD CPUs and GPUs CONTINUE
MOMENTUM INTO HPC FUTURE

EXECUTIVE SUMMARY
The HPC industry is in the midst of an era of expansion. Intersect360 Research studies have
shown the challenge that organizations face is the need to serve not only their ravenous
technical computing applications, but also the adopted appetites of data science and
machine learning. As this trend has progressed, it has led to a steady, corresponding shift in
computational architecture for HPC.
Today, HPC relies on new concepts of scalability. Most new HPC deployments are
heterogeneous; in addition to the CPUs that run the system, they are powered by
complementary co-processors, usually in the form of GPUs. The fields of machine learning
and data science, combined with the attendant computational power of GPUs, offer
tremendous upside for the application of HPC to an ever-increasing array of endeavors.
Among all the HPC processing options to emerge in the past few years—and there have
been many—the ones with the most momentum come from Advanced Micro Devices
(AMD). AMD has a nearly monomaniacal focus on performance, and the company has
impressively maintained consistent messaging company-wide, across generations of
development. AMD has performed benchmark testing in which its EPYC™ processors
outperform comparable Intel Xeon® parts on commonly-used HPC applications. As a result
of this dedicated effort, AMD perception and adoption are on the rise among HPC users.
The future of supercomputing is solidly heterogeneous, incorporating both CPUs and GPUs.
AMD serves HPC not only with AMD EPYC CPUs, but also with AMD Instinct™ accelerators,
and these are predestined to work well together. Upcoming generations of EPYC and Instinct
will be integrated with AMD’s high-speed Infinity architecture, boosting inter-processor
communication between CPU and GPU.
AMD is addressing heterogeneous programming in two ways. First, with coherent memory,
programming is more straightforward, with fewer lines of code or custom calls. Second, AMD
supports an open “ROCm™” (pronounced “rock ’em”) programming and software
environment, making applications more portable to future generations of processors,
regardless of where they come from. By bringing together new technologies and new
workloads, AMD provides a compelling vision of scalability into the future of HPC.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
Intersect360 Research White Paper: NEW AMD CPUs and GPUs CONTINUE
MARKET DYNAMICS: THE NEW SCALABILITY
One basic characteristic of High Performance Computing (HPC) is that it doesn’t hold still. No
amount of scientific discovery, product enhancement, or engineering achievement is final; it                             Data science and
is merely the next evolutionary step in ongoing advancement. Every solution unveils the next                                       machine
question, and the tools of HPC must also improve and evolve to solve each new generation of                                        learning
challenges.                                                                                                               techniques can
                                                                                                                                  augment
Due to the nature of this ongoing expansion, HPC has always included notions of scalability.
                                                                                                                               preexisting
Invariably, scalability was tied to notions of “more”: more processors, more computations,
                                                                                                                           computational
more bandwidth. But thanks to added complexity due to a confluence of trends, scalability in
HPC is more complicated than ever, and new insights and discoveries are increasingly linked
                                                                                                                                 methods,
to combinations of factors.                                                                                                unlocking new
                                                                                                                                computing
The New Scalability: Applications                                                                                             capabilities.
The HPC industry is in the midst of an era of expansion beyond its everyday, relentless                                   Applications in
pursuit of knowledge. The last decade has seen the introduction of successive, related uber-                                      financial
trends: the first, big data, which established data science and analytics as high-performance
                                                                                                                                   services,
enterprise workloads; the second, artificial intelligence (AI), which popularized machine
                                                                                                                          manufacturing,
learning as an alternate, experiential approach to computing. Both are enabled by the
                                                                                                                          oil exploration,
creation and accessibility of vast amounts of data to complement raw computation.
                                                                                                                         and bio-sciences
As they evolve, data science and machine learning need not be independent from traditional                                   are all data-
HPC. These techniques can augment preexisting computational methods, unlocking new                                          intensive and
computing capabilities. Applications in financial services, manufacturing, oil exploration, and                             well-suited to
bio-sciences are all data-intensive and well-suited to the incorporation of machine learning.                                      machine
Intersect360 Research studies have shown the strong majority of HPC-using organizations                                           learning.
have incorporated data science or machine learning workloads into their environments. Note
that this does not imply a corresponding tripling of budgets; far from it, although there has
been an increase. The corresponding challenge that organizations face is the need to serve
not only their ravenous technical computing applications, but also the adopted appetites of
data science and machine learning. As this trend has progressed, it has led to a steady,
corresponding shift in computational architecture for HPC.
The New Scalability: Architectures
As HPC applications have continued to evolve, the systems that power them have had to
continue to evolve as well. Just as no application is complete for all time, no supercomputer
has ever proved powerful enough that it could not eventually be saturated. Over time, HPC
architectures have processed through their own eras to achieve greater scalability for
expanding application sets. Vector processors were replaced by RISC scalar. RISC gave way to
x86 as single-system deployments were replaced by clusters.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
Intersect360 Research White Paper: NEW AMD CPUs and GPUs CONTINUE
For years, cluster deployments were largely homogeneous. They ran x86 processors in dual-
socket, industry-standard servers, over some type of networking fabric. One of the
advantages this hegemony presented was portability. Applications could be migrated from
one cluster to the next without much care to vendor. Components like CPUs, memory, and
network could be upgraded, but the model remained the same.
But innovation isn’t innovation if someone else gets there first. Furthermore, new application
requirements—such as those posed by AI—mean that one single processor or configuration
isn’t always best for every workload. This set of circumstances has conspired to help establish
a new computational norm in HPC: accelerated computing. Today, most new HPC
deployments are heterogeneous; in addition to the CPUs that run the system, they are
powered by complementary co-processors, usually in the form of GPUs.
High-power GPUs—graphics processing units—have their roots in gaming and
entertainment. The processing elements that power these high-end visual effects are strong
engines for floating-point performance, and with the right programming tools, they can be
harnessed to boost mathematical performance for scientific computing. As GPUs have
evolved as accelerators, differences have begun to emerge in the needs between HPC and
graphics, demanding further specialization.
The tide of GPU computing was already rising in HPC before AI burst onto the scene. As it
happened, GPUs were well suited to machine learning, for both training and inference. AI
accelerated the adoption of GPUs for HPC, particularly for mixed-workload environments.
And there has been a budgetary effect as well: Over 60% of HPC users are running machine
learning as part of their environments, leading to frequent increases in budgets for high-
performance workloads—in some cases more than doubling (see chart below).1
The New Scalability: Challenges
The fields of machine learning and data science, combined with the attendant computational
power of GPUs, offer tremendous upside for the application of HPC to an ever-increasing
array of endeavors. All that’s left to worry about is the software.
Application development and optimization are perpetual challenges in HPC. Not only does
the developer need to create algorithms for simulating complex phenomena, but the
resulting programs have to be run by vast arrays of independent processing elements. Single,
massive problems must be decomposed into digestible computations that can be run quickly,
with interim calculations compared and reassigned in step after step.
Heterogeneous computing—the use of multiple types of processors, such as CPUs together
with GPUs—makes programming more complicated. The programmer needs to specify not

1
    Intersect360 Research, HPC User Budget Map survey data, 2020.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
Intersect360 Research White Paper: NEW AMD CPUs and GPUs CONTINUE
only how to decompose a problem across multiple nodes, but also which portions of work to
assign to each type of element.
In the past, proprietary programming languages have been a hindrance for custom co-
processors. Recently, programming for GPUs has become more widespread, with more
accessible languages and libraries; however, the most commonly used tools are still
proprietary to particular brands of GPUs.
The net result is that HPC has, for the time being, returned to an era of specialization. Great
innovations are possible, but this has come at the expense of portability; application
advancements have been tied to particular vendors’ product lines.
When the HPC landscape was dominated by Intel x86 processors with few options, this
wasn’t a particular concern. Today, with a diversity of options between CPUs, GPUs, and                                         In a 2016
other components, users are less certain in committing to one vendor only. Portability is still                              Intersect360
an issue, as software investment protection remains important in the continual updating of                                Research study,
capabilities. Furthermore, as machine learning and data science beget new applications,                                       36% of HPC
there is an interest in securing a bridge to the future.                                                                   users said they
                                                                                                                          had a favorable
 Effect on High-Performance Workload Budget Related to Incorporation of Machine Learning                                 forward-looking
                                                       Intersect360 Research, 2021
                                                                                                                            impression of
                                                                                                                         AMD CPUs. In a
                                                                                                                          similar study in
                                                                                                                               2020, that
                                                                                                                          percentage had
                                                                                                                            risen to 78%.

INTERSECT360 RESEARCH ANALYSIS
An “EPYC” Comeback for AMD
Among all the HPC processing options to emerge in the past few years—and there have
been many—the ones with the most momentum come from Advanced Micro Devices
(AMD), a company on a major upward trend. AMD gained attention in HPC when it
announced its first generation of EPYC x86 processors. With the second generation, AMD

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
Intersect360 Research White Paper: NEW AMD CPUs and GPUs CONTINUE
gained notoriety, becoming the first microprocessor vendor going to market with a 7-
nanometer (7nm) manufacturing process. Now, with the launch of third-generation AMD
EPYC processors, AMD has continued its monomaniacal focus on winning the performance
race.
The AMD “Zen 3” core architecture at the heart of the EPYC 7003 CPU delivers a range of
optimizations, resulting in up to 19% improvement in operations per cycle, according to AMD.
This performance boost comes from a combination of enhancements over the “Zen 2” core
architecture, such as increased bandwidth for both loads and store, and improved latency for
certain calculations. (See chart.)

                                AMD “Zen 3” Architecture Enhancements vs. “Zen 2”2
                                                            Source: AMD, 2021

The AMD EPYC 7003 enhancements go beyond the architecture performance. The complete
SOC (system on chip) has enhanced memory and cache functionality and additional security
features, while maintaining socket compatibility with previous AMD EPYC versions. In
particular, the L3 cache is integrated into a single, large 32GB reservoir, rather than two
smaller ones, allowing full cache allocation to any single core that may need it, thereby
benefiting applications with databases that may fit into the single, larger cache.
In one other subtle improvement, the AMD Infinity Fabric™ clock is now synchronous with
DRAM memory. This helps reduce latency in waiting for data, resulting in an improvement for
memory-sensitive applications, which are common in HPC.
These enhancements add up to real-world results on applications at the heart of HPC. AMD
has performed benchmark testing in which its EPYC processors outperform comparable Intel

2
    AMD claim, based on AMD internal testing as of February 1, 2021, average performance improvement at ISO-frequency on an AMD
    EPYC™ 72F3 (8C/8T, 3.7 GHz), compared to an AMD EPYC™ 7F32 (8C/8T, 3.7 GHz), per-core, single thread, using a select set of
    workloads including estimated SPECrate®2017_int_base, SPECrate®2017_fp_base, and representative server workloads.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
Xeon processors on commonly-used HPC applications, with average improvements ranging
from 43% on crash-test simulations up to 99% for computational fluid dynamics. (See chart.)

                  AMD EPYC 7003 Comparative Benchmark Results on HPC Applications3
                        2x AMD EPYC™ 75F3 (32 core) vs. 2x Intel® Xeon® Gold 6258R (28 core),
                                Average Performance Across Representative Workloads
                                                Source: AMD, 2021

As a result of this dedicated effort, AMD perception is on the rise among HPC users. In a 2016
Intersect360 Research study, 36% of HPC users said they had a favorable forward-looking
impression of AMD CPUs. In a similar study in 2020, that percentage had risen to 78%—                                                      AMD CPUs have
higher even than the percentage of users with a favorable future impression of Intel CPUs                                                  a presence today
(69%). (See chart below.)                                                                                                                     in 70% of HPC
                                                                                                                                                    sites, an
And with AMD now shipping its third-generation of EPYC CPUs, this positive sentiment is
                                                                                                                                                 astonishing
beginning to show up in real customer deployments. AMD was named as the processor
                                                                                                                                             transition from
vendor in only 5% of surveyed HPC systems in 2017 and 2018 combined.4 Today, 23% of HPC
                                                                                                                                            three years ago.
users say they have AMD EPYC processors in widespread use. An additional 47% are testing

3
    AMD internal testing. All tests compare 2x EPYC™ 75F3 (32C) to 2x Intel® Xeon® Gold 6258R (28C) processors. WRF version 4.1.5
    comparison based on testing completed on 2/17/2021 on an AMD reference platform compared to an Intel server on a production
    system. ANSYS® CFX® 2021.1 comparison based on testing as of 2/5/2021 measuring the time to run the Release 14.0 test case
    simulations (converted to jobs/day – higher is better). ANSYS® LS-DYNA® version 2021.1 comparison based on testing as of 2/5/2021
    measuring the time to run neon, 3cars, PPT-short, odb10m-short, and car2car test case simulations (converted to jobs/day – higher is
    better); results were AMD 17,555 total seconds versus Intel 28,774 total seconds, for ~81.0% more per node or ~59% more per core
    performance advantage; the 3cars test case gain individually was ~126% more per node or ~98% more per core jobs/day performance.
    ESI Virtual Performance Solution (VPS better known as PAM-CRASH®) version 2020.0 comparison based on testing as of 2/5/2021
    measuring the neon test case simulation (converted to jobs/day – higher is better) for ~43% more per node or ~25% more per core
    jobs/day performance. Star-CCM+ 2020.3 comparison based on testing as of 2/5/2021 measuring the average seconds to complete 11
    test cases and converted to jobs/day (higher is better); the KCS Marine Hull with No Rudder in Fine Waves test case individually was
    ~79% more per node or ~57% per core performance. Results may vary.
4
    Intersect360 Research HPC User Site Census surveys across 2017 and 2018, total proportion of systems for which AMD was identified as
    CPU provider, including half-system credit for systems in which AMD was a shared CPU provider.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
or using AMD EPYC at some level, giving AMD CPUs a presence in 70% of HPC sites, an
astonishing transition from three years ago. (See charts below.)

             Percent of HPC Users with Favorable Forward-Looking Impressions of CPUs 5
                                                 Source: Intersect360 Research, 2021

                        Current Penetration of AMD CPUs Among Surveyed HPC Sites 6
                                                 Source: Intersect360 Research, 2021

                                                                                          23% of HPC sites
                                                                                          report “broad usage”
                                                                                             of AMD CPUs

                                70% of HPC sites have at
                                 least some AMD CPUs

5
    Intersect360 Research data from multiple studies. 2016: Special study: “Processing Elements for HPC”; question, “Overall, how favorable
    is your forward-looking impression of each of the following, with respect to your HPC workloads? (1 = Completely unfavorable; 5 =
    completely favorable)”; scores are combined percentage 4 and 5 for AMD Opteron versus Intel Xeon. 2020: “Vendor Satisfaction and
    Loyalty in HPC”; question, “What is your impression of each of the following vendors' future prospects for HPC?” (five-point scale); scores
    are combined top-two responses (Very Impressed; Impressed) for AMD EPYC CPUs versus Intel Xeon CPUs.
6
    Intersect360 Research HPC Technology Survey, 2021.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
AMD’s momentum shows up in more than user surveys. The U.S. Department of Energy
(DOE) has selected AMD as the processor vendor for the pending Frontier and El Capitan
supercomputers. These will each be among the world’s earliest “Exascale”-class systems, with
peak rated speeds of over an Exaflop: one quintillion calculations per second7 for scientific,
64-bit calculations. Frontier, when it arrives, is expected to be the first Exascale
supercomputer in the U.S.,8 perhaps in the world, and El Capitan is projected to be the
world’s first supercomputer at 2 Exaflops peak performance.9
Marrying CPU and GPU
It wasn’t the EPYC CPU alone that attracted the DOE to AMD for Frontier and El Capitan. As
GPU-accelerated applications have become more common, few HPC users want to give up
their acceleration. The future of supercomputing is solidly heterogeneous, incorporating both
CPUs and GPUs for HPC, data science, and machine learning.
For the past ten years, Intel has been the dominant provider of CPUs, and the dominant                                    AMD Instinct
provider of computational GPUs has been NVIDIA. While this situation has worked well                                       MI100 is the
enough thus far, the polarization it presents is worrisome. NVIDIA and Intel compete more                                first GPU with
than they cooperate, both in processing and in networking, and programming for one                                              over 10
company’s solutions does not translate to the other’s.                                                                      Teraflops of
Enter AMD, again. AMD serves HPC not only with its EPYC CPUs, but also with AMD Instinct                                  performance:
GPUs, and these are predestined to work well together. In fact, the upcoming generations of                              11.5 Teraflops
EPYC and Instinct will be integrated with AMD’s high-speed Infinity architecture, boosting                               of peak 64-bit
inter-processor communication between CPU and GPU.                                                                        performance.
Since they work in concert over a high-bandwidth, low-latency connection, AMD will be first
to market with a feature not previously seen for heterogeneous architectures: coherent
memory. With a single, coherent memory space across an integrated chipset, programmers
can assign individual calculations, subroutines, or loops without moving data. “Pass the
pointer, not the data” has been a mantra for advocates of coherent memory; now this
concept is applicable to heterogeneous computing as well.
AMD Instinct MI100 Accelerator
Reaching Exascale levels of performance doesn’t happen merely by ganging more elements
together; the individual elements need to get faster as well. AMD has advanced the CPU
components with its EPYC processor line, and in November 2020, in conjunction with the
annual SC conference10 for the worldwide supercomputing community, AMD announced its
latest GPU offering for HPC, the AMD Instinct MI100.

7
     One quintillion = one billion billion = one million million million = 1018 = 1,000,000,000,000,000,000.
8
     https://www.hpcwire.com/2020/10/01/auroras-troubles-move-frontier-into-pole-exascale-position/.
9
     https://www.amd.com/en/products/exascale-era.
10
     http://supercomputing.org/index.php.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
The MI100 brings AMD’s focus on HPC performance to its GPU line. AMD is promoting MI100                                                                “Pass the
as the first GPU with over 10 Teraflops of performance, with 11.5 Teraflops of peak 64-bit                                                       pointer, not the
performance.11                                                                                                                                  data” has been a
The MI100 is the first GPU based on the new AMD CDNA™ (Compute DNA) architecture,                                                                    mantra for
which separates GPU designs between those focused on computation (AMD CDNA) and                                                                     advocates of
those focused on graphics (AMD RDNA, for Radeon™ DNA). The AMD CDNA architecture                                                                        coherent
offers a new core design with double the computational efficiency of previous AMD GPUs,12                                                          memory; now
along with “Matrix Core Technology” that targets acceleration for HPC and AI applications. The                                                    this concept is
MI100 with AMD CDNA has 32GB of HBM2 memory, with over 1.2 Terabytes per second of                                                                 applicable to
memory throughput,13 and the second-generation Infinity architecture, with up to a 37%                                                            heterogeneous
speed-up of GPU-to-GPU communication over the first generation.14                                                                                  computing as
As much as it targets HPC as a primary market, MI100 does not ignore AI, which is an                                                                        well.
integrated part of HPC environments. The AMD CDNA architecture supports mixed-precision
workloads, with FP32 and FP16 matrix, bfloat16, INT8, and INT4 for machine learning. AMD
states MI100 is nearly seven times faster than its previous Radeon GPUs for FP16 workloads
for AI.15
Programming for the Future
All the computing power in the world doesn’t help if you can’t program for it. AMD is
addressing this in two ways. First, with coherent memory, programming is more
straightforward, with fewer lines of code or custom calls. [See diagrams below.] This is useful
for the ongoing development of new applications that take advantage of heterogeneous
computing. Second, AMD supports programming models that are open, rather than

11
     AMD claim: Calculations conducted by AMD Performance Labs as of September 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2
     PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak double precision (FP64), 46.1 TFLOPS
     peak single precision matrix (FP32), 23.1 TFLOPS peak single precision (FP32), 184.6 TFLOPS peak half precision (FP16) peak theoretical,
     floating-point performance. Published results on the NVidia Ampere A100 (40GB) GPU accelerator resulted in 9.7 TFLOPS peak double
     precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16) theoretical, floating-point
     performance. Server manufacturers may vary configuration offerings yielding different results. MI100-03.
12
     AMD claim: AMD Instinct™ MI100 accelerators provide 120 compute units and 7,680 stream cores in a 300W accelerator card. Radeon
     Instinct™ MI50 accelerators provide 60 compute units (CUs) and 3,840 stream cores in a 300W accelerator card. MI100-09.
13
     AMD claim: Calculations by AMD Performance Labs as of Oct 5th, 2020 for the AMD Instinct™ MI100 accelerator designed with AMD
     AMD CDNA 7nm FinFET process technology at 1,200 MHz peak memory clock resulted in 1.2288 TFLOPS peak theoretical memory
     bandwidth performance. The results calculated for Radeon Instinct™ MI50 GPU designed with “Vega” 7nm FinFET process technology
     with 1,000 MHz peak memory clock resulted in 1.024 TFLOPS peak theoretical memory bandwidth performance. AMD CDNA-04.
14
     AMD claim: Calculations as of SEP 18th, 2020. AMD Instinct™ MI100 accelerators support PCIe® Gen4 providing up to 64 GB/s peak
     theoretical transport data bandwidth from CPU to GPU per card. AMD Instinct™ MI100 accelerators include three Infinity Fabric™ links
     providing up to 276 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) transport rate bandwidth performance per GPU card.
     Combined with PCIe Gen4 support, this provides an aggregate GPU card I/O peak bandwidth of up to 340 GB/s. Server manufacturers
     may vary configuration offerings yielding different results. MI100-06.
15
     AMD claim: Calculations performed by AMD Performance Labs as of September 18, 2020 for the AMD Instinct™ MI100 accelerator at
     1,502 MHz peak boost engine clock resulted in 184.57 TFLOPS peak theoretical half precision (FP16) and 46.14 TFLOPS peak theoretical
     single precision (FP32 Matrix) floating-point performance. The results calculated for Radeon Instinct™ MI50 GPU at 1,725 MHz peak
     engine clock resulted in 26.5 TFLOPS peak theoretical half precision (FP16) and 13.25 TFLOPS peak theoretical single precision (FP32
     Matrix) floating-point performance. Server manufacturers may vary configuration offerings yielding different results. MI100-04.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
proprietary, making them more portable to future generations of processors, regardless of
where they come from.

                      Programming Heterogeneous Computing with Coherent Memory
                                                            Source: AMD, 2020

          AMD Commitment to Open-Source Development for Heterogeneous Computing
                                                            Source: AMD, 2020

To achieve this, AMD is putting forward its own open-development environment, AMD ROCm
(pronounced “Rock ’em”). ROCm is the critical piece that will determine software
performance and scalability in HPC environments. (See chart below.)

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
And what about applications that already exist? Many users have already spent years porting
and optimizing applications with NVIDIA’s CUDA tools. Here AMD has a potentially critical
solution: HIP (Heterogeneous-Compute Interface for Portability), which is a programming
model that enables applications to run on both AMD and NVIDIA hardware with the same
code base. HIP provides an abstraction layer to call the optimized code, compiled for each
architecture. Through a pair of “HIPify” porting tools, AMD offers automated conversion of
applications written to run only on NVIDIA CUDA to be compatible with open AMD HIP. AMD
cites examples of over 90% to 95% of code converting automatically, with no user
intervention.16

                                                    Features of the AMD ROCm Software Environment
                                                                                                           Source: AMD, 2021
                                                 Recognition

                                                                            Recommend
                                                                            ation Engine

                                                                                           Data Security
                                                               Processing
                                                               Language
            Govt Labs

                                      Academic

                                                                                            & Detection
                                      Research
                          Oil & Gas
 Sciences

                                                   Image
   Life

                                                                                                                     ◢   Complete set of libraries, tools and
     HPC Industry Solutions                                                                                              management APIs

                        ROCmTM Software Stack                                                                        ◢   Open-source compiler for OpenMP and HIP
     DC Tools, Dev Tools, Comm & Math Libraries, Compilers

               Validated, Optimized Systems                                                                          ◢   Easy, automated tools to convert CUDA code
                                                                                                                         to HIP with virtually no performance loss

                                                                                                                     ◢   Scales from Workstation to Cloud to Exascale

                                                                                                                     ◢   Open ecosystem for 3rd party development

Here the partnership with DOE will pay dividends for AMD as well. In a 2019 vendor profile
of AMD, Intersect360 Research wrote: “AMD need not wait for the installation of Frontier to
begin reaping the benefits of its affiliation with the DOE labs. … DOE researchers now have a
vested interest in seeing their scientific research applications run well, at scale, on AMD
architectures. Furthermore, the DOE is committed to open science; researchers will have an
incentive to push any porting or optimization to the broader scientific community.” 17
Bringing It All Together
HPC is continuing to expand and diversify in ways that are both exciting and frightening. New
workloads, new architectures, and new frontiers of scalability will lead to innovations and
discoveries beyond yesterday’s conception. But with expanding possibility comes the added
challenge of harnessing all that power.
AMD is back in the HPC game, and in many ways, AMD is back in the lead. Winning on
price/performance involves innovation on two fronts: maximizing raw performance and

16
     https://www.admin-magazine.com/HPC/Articles/Porting-CUDA-to-HIP.
17
     Intersect360 Research, Vendor Overview and Outlook: AMD in HPC, November 2019.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
optimizing efficiency. AMD is targeting both. AMD has both CPU and GPU, each aiming for
HPC supremacy, over its own integrated fabric.
Additionally, AMD has an open software environment that not only protects existing
investments with automated conversion tools but also promotes open software development
in the future. And AMD has a critical relationship with the DOE that will provide support and
stability for a new generation of supercomputing. Incorporating CPU, GPU, and software into
a consolidated, open-community environment is something no other company has done yet.
By bringing together new technologies and new workloads, AMD provides a compelling
vision of scalability into the future of HPC.

For more information about AMD solutions for HPC, visit www.AMD.com/HPC.

AMD, the AMD logo, EPYC, Infinity, AMD Instinct, ROCm, Radeon, AMD CDNA, AMD RDNA and
combinations thereof are trademarks of Advanced Micro Devices, Inc.

© 2021 Intersect360 Research.
White paper sponsored by AMD. Full neutrality statement at https://www.intersect360.com/features/neutrality-statement.
P.O. Box 60296 | Sunnyvale, CA 94088 | Tel. [888] 256-0124
www.Intersect360.com | info@Intersect360.com
You can also read