SOFTWARE DEFINED HARDWARE: THE NEW ERA OF COMPUTING INFRASTRUCTURE Accenture Labs

Page created by Tyrone Silva
 
CONTINUE READING
SOFTWARE DEFINED HARDWARE: THE NEW ERA OF COMPUTING INFRASTRUCTURE Accenture Labs
Accenture Labs

SOFTWARE
DEFINED
HARDWARE:
THE NEW ERA
OF COMPUTING
INFRASTRUCTURE
Executive summary
As companies embrace digital transformation and intelligent process
automation, demand for computing services continues to skyrocket. To
keep up, infrastructure providers are continuously adding more and more
general-purpose processors into their compute environments. However,
energy consumption and the associated costs are quickly becoming
bottlenecks and limiting their growth.
Adding to this complication is the fact that the computing speed-up experienced in the
last five decades—driven by Moore’s law, shrinking transistors and higher-density chips—is
slowing down. In just 10 years, when the size of a transistor reaches the size of an atom,
it’s expected to come to a halt. Many researchers are working on the next generation of
transistors, with new materials, design and fabrication technologies; but these new transistors
are at least a decade away from mass production, leaving a critical gap between the predicted
end of Moore’s law and the readiness of new approaches to pick up where it leaves off.

Many infrastructure providers, such as Microsoft and Baidu, are embracing hardware
accelerators and specialized computers to continue scaling their compute environments
without the commensurate energy consumption of general-purpose processors. In terms
of accelerators, General Purpose Graphic Processing Units (GPGPUs) are the most common.
Application-specific Integrated Circuits (ASICs) are also becoming
more popular, with Google creating its own Tensor Processing Unit
to support AI applications. But the most disruptive hardware
accelerators are powered by Field Programmable Gate Arrays (FPGAs).
With additional tools and frameworks such as Open Computing
Language (OpenCL), FPGAs allow traditional software developers to
embed custom logic into the hardware—effectively creating their own
custom hardware accelerators—hence the term “software-defined
hardware.” Several cloud providers such as Amazon and Microsoft are
already offering FPGA cloud services, dubbed FPGA-as-a-service. In
addition to accelerators, specialized computers such as adiabatic quantum
computers (www.accenture.com/quantum) and neuromorphic computers, along
with traditional super computers, are being used to solve specific computation problems.

The future of the enterprise computing infrastructure will consist of a diverse set of
computational hardware. But it’s critical to note that each of these approaches will only
scale specific types of applications and functions, whether those are artificial intelligence
(AI), data transformation, security, or other segments of collective enterprise needs. To
create performant applications at the right cost, companies will need to orchestrate the
right computing workload across a range of evolving computing hardware.

2   |   Software-Defined Hardware
Implications
Companies must consider the implications for future infrastructure and software decisions
in four key areas:

               Growing demand for AI, data transformation and secured applications has
               companies increasingly turning to hardware accelerators for their unique ability
               to scale these applications and functions. As a result, hardware accelerators are
               playing larger roles in the overall computing infrastructure, to the point where
               they will become a standard component. Companies must therefore manage and
               share hardware accelerators like other first-class infrastructure resources.

               Software-defined hardware is blurring the line between hardware and software.
               The growing interdependency between the two implies there will be more
               software that can only run on specific hardware. Thus, the process of selecting
               hardware and software—which has traditionally been decoupled—is becoming
               more complicated. The same applies to the dependencies between software and
               cloud providers, as different cloud providers are using different types of hardware
               accelerators, which impacts the type of software they can accelerate. Companies
               must ensure that their cloud providers can address these dependencies and
               continue to scale their software as demand continues to increase.

               More fragmentation or “balkanization” of cloud providers will occur due to
               the increasing hardware-software dependency. Thus, the market will see more
               software and services that are only available in specific cloud infrastructure. Cloud
               providers will be quick to take advantage of this balkanization effect, building up an
               ecosystem of partners of high-performance software applications that work only
               within the cloud provider’s infrastructure confine. This will become a major source
               of differentiation among competitors. Selecting the right cloud providers will
               become even more critical as it may limit companies from using certain software.
               Similarly, before building new software, companies should ensure their selected
               software components are aligned with and available from the cloud providers.

               Services, processes and frameworks (which may include containerization
               technologies, microservices and APIs) that allow easy orchestration of the right
               workload on the right compute infrastructure—whether within the same cloud
               provider or across different cloud providers—will play a key role in the future
               compute infrastructure for many large enterprises. This has a direct impact on
               how these companies should design their new software.

The remainder of this document covers the research, analysis and market examples used to
derive these findings. Further, it outlines the steps companies can take now to prepare for a
future of software-defined hardware.

3   |   Software-Defined Hardware
Computational
power hits a plateau
The demand for computing power continues to rise. The 2017 revenue
growth of infrastructure-as-a-service was 36.8 percent, and spending
growth on IT infrastructure products was 15.3 percent year-over-year in
the same timeframe.1 But Moore’s law, which drove the last five decades
of chip development, is slowing and is expected to hit its limit in the next
decade. Already, Intel has indicated that instead of doubling the number
of transistors on a chip every 18 months, the timeframe will ratchet up to
as much as 36 months.
This reversal of a fundamental tenet of computing can be attributed to the challenges of
manufacturing circuits and nanometer-scale transistors. The latest Intel processor, Kaby Lake—
first released in 2014—is 14 nanometers (nm) in size, which is smaller than a typical virus; and the
10 nm Cannon Lake processor is expected to be released in 2019. Not the be outdone, Samsung
has announced early production of a 4 nm chip with full release scheduled for 2020. As a
reference, 4 nm is the size of a “quantum dot” of just seven atoms in a single silicon crystal.

A single atom transistor chip is within reach, but it will require the creation of new process
and etching technologies for mass-scale production. Remarkably, such a transistor was first
created in 2012, but it had be operated at temperatures barely above absolute zero.

Figure 1 shows the relative cost of design for each generation of exceedingly smaller chip.
The cost of designing has risen steadily due to the complexity involved, which only serves to
extend the production timeline.

Figure 1: Increasing design cost for reduced chip component size
(Source: Accenture analysis)

                           800
        Design Cost ($M)

                           600

                           400

                           200

                            0
                                       20         16            10           7        5
                                               Chip Manufacturing Processes (in nm)

4   |              Software-Defined Hardware
Even if the physics issues can be addressed, the cost of fabrication for atom-sized chips
will be staggering. For instance, the cost to build a fabrication plant for 14 nm chip is more
than $5 billion. For 5 nm chips, it is estimated tto cost roughly $16 billion. This projection
does not account for the additional complexity involved in changing to new materials (e.g.,
Germanium), new structure (e.g., 3D stacking) and completely new fabrication process (e.g.,
carbon nanotube). All of these factors could seriously impact the commercial viability of the
upcoming chips.

Hardware
accelerators to the rescue
Infrastructure providers—and the companies that rely on their
services—are increasingly turning to hardware accelerators to provide
the necessary compute power and speed at scale. Traditional central
processing units (CPU) were designed to run a wide range of tasks.
However, for specific or repetitive functions, especially the ones that can
be executed in parallel, hardware accelerators can help run the task more
efficiently in terms of time and power consumption.
There are a range of hardware accelerators, the most common of which is the Graphics
Processing Unit or GPU. Other common types include Application Specific Integrated
Circuits (ASIC) and Field Programmable Field Gate Arrays (FPGA). Figure 2 provides a high-
level comparison of the different types of accelerators.

5   |   Software-Defined Hardware
Figure 2: Relative hierarchy of hardware accelerators
(Source: Accenture analysis)

                                                                                                        Application-Specific
          Processor             Central Processing    Graphics Processing    Field Programmable
                                                                                                         Integrated Circuit
         Performance               Unit (CPU)             Unit (GPU)          Gate Array (FGPA)
                                                                                                              (ASIC)

                                                                                    Array of
                                                          Designed for
                               Designed for general                          programmable blocks        Custom designed for
                                                        graphics-related
                               purpose applications                          with a programmable        specific functionality
                                                         computations
                                                                                 interconnect
        Relative Performance             1                    100                     1000               10,000 – 100,000
                                                        Special purpose            Field (re)
             Flexibility         General purpose                                                        Application specific
                                                          processor             programmable
                                                      Somewhat restricted    Somewhat restricted
              Market              Market agnostic                                                          Market specific
                                                           market                 market
                                 Widely available     Requires specialized   Requires specialized
        Ease of Programming                                                                             Rigid; interface only
                                programming skills           skills                 skills
                                                                              Xilinz, Intel (Altera),
            Key Players           Intel, AMD, ARM      NVIDIA, AMD, Intel                                NEC, LSI, Samsung
                                                                                       Actel

In the hierarchy of processor performance ranging from general purpose CPUs to ASIC,
there is a tradeoff between flexibility and efficiency, with efficiency increasing by orders
of magnitude when any given application is implemented higher up in that hierarchy.

                     GPU applications expanded
                     in last two decades
GPUs, the most common hardware accelerator, are continuing to gain traction. The chip was
first introduced to accelerate graphics-related tasks such as quickly rendering the shadow
of an object. However, GPUs can also be used to speed non-graphics tasks such as signal
processing, virus pattern recognition and medical image recognition.

Companies like NVIDIA have been capitalizing on this more general use of GPU, which is also
known as General-Purpose Computing on GPU (GPGPU). NVIDIA’s DGX-1 System coupled
with its Volta GPU, for example, is designed specifically to support AI applications such as
deep learning training, inference and accelerated analytics all in one system. This approach
has tripled the company’s revenue of its datacenter segment to a record $409 million,
up 186 percent year-over-year growth.2 Most cloud providers like Amazon, Microsoft and
Google also allow their customers to tap into the power of GPGPU through GPU cloud.

6   |     Software-Defined Hardware
ASIC is performant
                   but expensive to produce
An Application-Specific Integrated Circuit (ASIC) is a chip that is designed for a purpose.
Because it is highly optimized for a specific function, it typically operates at a higher level
of efficiency than its CPU- or GPGPU-counterparts. Google, for example, created its own
ASIC-based accelerator called a Tensor Processing Unit (TPU) to speed up machine learning
applications. In addition to being faster, ASICs typically consume less power. This is one of
the reasons Microsoft, for example, created its own ASIC-based Holographic Processing
Unit (HPU) to process data from various sensors in its HoloLens units.

Given that ASIC requires a large one-time, up-front investment for design and manufacture—
sometimes in the millions of dollars—this type of chip targets high-production volume.
Further, because ASIC functionality is fixed once manufactured, it is hard to quickly refine or
update functionality. As such, ASICs traditionally target functionality that is relatively stable.
In recent years, demand for ASIC has grown due to widespread use in smartphones and
tablets to quench the need for bandwidth. A recent study shows the global ASIC market is
estimated to grow at a CAGR of 17.01 percent during the period 2017 to 2021.3

                   Democratization of custom
                   hardware accelerators
Field Programmable Gate Arrays (FPGA) have been in existence for decades. However, the
original circuit design was bulky, hard to program and interface with, making it of limited
use. Until recently, FPGA was used primarily by engineers specializing in digital hardware
design [i.e., using VHSIC Hardware Description Language (VHDL) or Verilog]—mostly as a
hardware prototyping tool.4

FPGA is different from other hardware accelerators in that it does not have any specific
functionality when manufactured. As a hardware vessel, it needs to be programmed. But
once the software logic is embedded into it, running algorithms at the hardware level may
yield orders of magnitude in performance improvement. Unlike ASIC-based accelerators,
which may take months to design and manufacture, FPGA-based accelerators can be
developed in a matter of weeks. And unlike GPGPU, FPGA’s functionality is not confined
to graphics-related operations.

7   |   Software-Defined Hardware
Perhaps the biggest strength of FPGA, however, is its ability to be reprogrammed. Its
functionality can be further refined and upgraded on the fly—perfect for quickly evolving
areas such as machine learning.

Using FPGA, Microsoft has reported—for a certain type of computing—it has achieved up
to a 150 to 200x improvement in data throughput, up to a 50-fold improvement in energy
efficiency compared to a CPU, and lowered latency by about 75 percent. Other examples
of companies using FPGA as accelerators are:

          • China-based Baidu has adopted FPGA to accelerate SQL processing5

            icrosoft Azure uses FPGA to route network traffic and Office 365 uses FPGA
          •M
           for encryption and compression.6

          •N
            ervana Systems—recently acquired by Intel—has developed FPGA for deep
           learning; another startup, DeePhi is doing the same.7

As shown in Figure 3, market leaders like Intel, Microsoft and Amazon are at the forefront
of FPGA adoption. Intel, through its Altera acquisition, and Amazon, through its FPGA-
equipped Elastic Cloud Compute (EC2), have demonstrated their support in the future
of FPGA-driven hardware acceleration.

Figure 3: Market leaders upbeat about the future of FPGA
(Source: Accenture analysis)

                            •   In 2015, Intel bought Altera, a maker of FPGA, for $16.7 billion (its largest
                                acquisition to date).8
                            •   Intel is baking FPGA into its Xeon-based servers as a CPU accelerator.
                                It was announced in the 2016 Open Compute Project Summit to help
                                accelerate adoption.9

                            •   Microsoft is now putting FPGA on PCI Express networking cards in every
                                new server it deploys in its data centers.10
                            •   Microsoft Bing’s machine learning algorithms on FPGA yielded 40-100x
                                performance improvements.
                            •   Microsoft has announced the availability of Brainwave, an FPGA-based
                                system for ultra-low latency deep learning for Azure.16

                            •   In April 2017, Amazon released a FPGA-equipped EC2 offer called F1
                                (i.e., FPGA Cloud) to allow its customers to create custom hardware
                                accelerators.
                            •   F1 comes with tools to develop, simulate, debug and compile hardware
                                acceleration code, including a FPGA Developer AMI and Hardware
                                Developer Kit.10

8   |   Software-Defined Hardware
“By 2020, a third of all servers inside all the major cloud
computing companies will include FPGA.”
                                                 —DIANE BRYANT, Group President, Data Center, Intel

FPGA-accelerated tools are getting popular, too, with interesting use cases of FPGA-
accelerated platforms and reference frameworks beginning to emerge. For example,
Bigstream claims to offer 2 to 5x hyper-acceleration of Spark applications using FPGA.
Bigstream uses acceleration techniques like native compilation, vectorization, locality
optimization and custom data connectors to provide faster time-to-insight at significantly
lower cost.

Figure 4: Sample FPGA-accelerated tools

                            •   The DRAGEN engine is a software framework and constituent library of
                                hardware accelerator blocks, implemented in FPGA.
                            •   The platform solves two key unmet needs in big data genomics:
                                compute and storage.
                            •   It offers a scalable, accelerated and cost-efficient analysis solution for
                                genomics applications.

                            •   Ryft outperforms the fastest data analytics platforms by 200x or more.
                            •   Ryft Cloud enables users to get fast, actionable insight from their
                                cloud-based data 72x faster than is currently possible with commodity
                                cloud infrastructures.
                            •   Ryft ONE accelerator makes data analytics fast and simple by
                                combining heterogeneous FPGA/x86 compute, SSD-based storage,
                                a library of analytics algorithms and an open API.

With more companies turning to FPGA, the price has been declining as well: a developer
board now starts for as low as $80 and a developer kit at $20,000.12 Further, more software
capabilities (see OpenCL below) are being developed to allow people to develop their own
custom accelerators (hence the democratization of FPGA).13

9   |   Software-Defined Hardware
The rise of custom
accelerator marketplace
An interesting aspect of an FPGA-based accelerator is that it has two
independent components:

1        FPGA
         hardware            & 2          The program to be
                                          deployed to FPGA

The latter is a digital asset—just like music, movies, apps—which
easily can be bought and sold. As such, it is ripe for an app store-like
marketplace to monetize such programs online.
Amazon and companies like Accelize have started capitalizing on this phenomenon.
For instance, Accelize not only provides a marketplace to buy and sell programs for
FPGA accelerators in the form of AFI files (a specific file format deployable to AWS FPGA
cloud), but also provides services like digital rights management for these files, and the
associated payment services to allow AFI developers to monetize their work. Accelize
is also aggressively forming alliances to create an ecosystem of partners to hasten the
development and use of FPGA accelerators.

10   |   Software-Defined Hardware
OpenCL plays a big role
in driving hardware
accelerators
Open Computing Language (OpenCL) is a framework for writing
programs that execute easily across heterogeneous platforms consisting
of CPUs, GPUs, FPGAs and other accelerators. Prior to OpenCL,
programmers had to use vendor-provided toolkits (such as NVIDIA CUDA)
to use an accelerator. Since the code was vendor-specific, it locked the
software implementation to a specific vendor and limited the number of
different accelerators an application could use at any particular time.
OpenCL was originally developed by Apple in 2008, and is now managed by a large,
industry-wide consortium—Khronos Group—with 100+ members including IBM, Google,
Amazon, Microsoft and Baidu. This cross-platform framework has been deployed in a wide
range of applications, from machine learning, gaming and creative tools, to scientific and
medical applications. OpenCL is originally based on C programming languages and has
been expanded to cover other languages including Python and JavaScript.

11   |   Software-Defined Hardware
Future of compute
infrastructure
Given the approaching end of Moore’s law and the demand for compute
solutions that scale across a variety of application needs, the future
of compute infrastructure will consist of a diverse set of hardware.
The movement toward hardware accelerators is also supplemented
with active development on specialized computers to augment the
general-purpose computers used today. For example, D-Wave’s quantum
computers can be used to rapidly solve optimization problems (see
www.accenture.com/quantum). IBM’s TrueNorth, a neuromorphic computer,
is great for pattern recognition—a key capability for AI applications. While
these specialized computers are not covered in this article, it is important
to acknowledge their roles in the future of compute infrastructure.
Given their flexibility and pervasiveness, traditional general-purpose computers will evolve
toward working as orchestrators, directing specialized computers and accelerators to do specific
tasks, as well as covering areas beyond those specific tasks. Figure 5 shows how general-purpose
hardware and specialized hardware will most likely fit together, and this structure is already
emerging. For example, 1Qbit (see www.accenture.com/quantum) provides an interface between
traditional computers and quantum computers; in the process, the interface translates the
business problem into a form that is recognizable by a quantum computer.

Figure 5: Diversity of compute infrastructure in the future
(Source: Accenture analysis)

                                                             Applications

                             Common H/W Frameworks                                 Micro-Services
   General
                                 (e.g., Open CL)                                       (API’s)
 Purpose CPU

                  GPUs           FPGAs       ASICs        Other              Quantum Neuromorphic      Other
                                                       Accelerators         Computers Computers        HPCs

                         Specialized Hardware Accelerators                         Specialized Computers

12   |   Software-Defined Hardware
Business implications
and next steps
From managing supply chains in real-time to predicting the evolution
of cancerous cells in human body, the world is undoubtedly moving
toward computational heterogeneity to meet the computing demands
of next decade. Such changes will not happen overnight, but with recent
developments in quantum computing and neuromorphic computing, the
landscape is changing fast.

13   |   Software-Defined Hardware
Historically, companies have treated hardware and software as largely independent entities,
managed by disparate groups. But the future of computing will bring widespread and far-
reaching change. With hardware accelerators, the layer of separation between hardware
and software is now blurring, creating a tighter coupling between the two—a coupling
that demands changes throughout the organization. Critically, this new coupling will also
create an infrastructure divergence for cloud providers like Google, Microsoft and Amazon,
as they become even more specialized in terms of which type of hardware they choose to
accelerate their services.

Companies may need to work with more cloud providers in order to meet their software
needs, as software becomes more tightly linked to specific hardware approaches.
Conversely, if companies restrict themselves to a specific subset of infrastructure/
cloud providers, they may limit their ability to use best-of-breed software available in the
future. Companies choosing specialized hardware to improve performance must do so
thoughtfully, as it increases the complexity of the software and the dependency on certain
types of hardware, especially in the context of hybrid cloud. What’s more, as companies
accept more complexity in their architectural solutions, they must also contend with the fact
that the skills available to develop and maintain such environments become more limited.

Hardware accelerators have much to offer, and it’s no surprise that they are increasingly
being deployed across the enterprise. But companies must carefully manage their
architectures to ensure that the benefits from acceleration are commensurate with the
complexity of managing and maintaining the system in the long term. General purpose CPU
will continue to be the main workhorse to run general workloads, as well as orchestrating
and parceling out work from the specialized hardware. Software layering, componentization
and API approaches can be used to isolate the specialized workload. It will be essential to
revisit existing software development processes, tools and architecture to prepare for—and
maximize—the evolution to software-defined hardware.

14   |   Software-Defined Hardware
AUTHORS
PAUL DAUGHERTY
Chief Technology & Innovation Officer
Accenture

EDY LIONGOSARI
Chief Research Scientist
Accenture Labs

PRANAV KUDESIA
Tech Research Lead
Accenture Research

CONTRIBUTORS
COLIN PURI
TERESA TUNG
CARL DUKATZ
RENEE BYRNES

   15   |   Software-Defined Hardware
REFERENCES                                                ABOUT ACCENTURE LABS
1
      ttps://www.forbes.com/sites/
     h                                                    Accenture Labs incubates and prototypes new
     louiscolumbus/2017/04/29/roundup-of-cloud-           concepts through applied R&D projects that are
     computing-forecasts-2017/#acb92fb31e87               expected to have a significant strategic impact
                                                          on clients’ businesses. Our dedicated team of
2
      ttps://www.fool.com/investing/2017/05/17/nvidia-
     h
                                                          technologists and researchers work with leaders
     delivers-stunning-ai-growth-on-a-solid-gami.aspx
                                                          across the company to invest in, incubate and
3
      ttp://dailynewsks.com/2017/05/application-
     h                                                    deliver breakthrough ideas and solutions that
     specific-ic-asic-market-shares-competitive-          help our clients create new sources of business
     landscape-analysis-challenges-2021/                  advantage. Accenture Labs is located in seven key
                                                          research hubs around the world and collaborates
4
     http://www.ni.com/white-paper/6983/en/
                                                          extensively with Accenture’s network of nearly
5
      ttps://www.nextplatform.com/2016/08/24/baidu-
     h                                                    400 innovation centers, studios and centers
     takes-fpga-approach-accelerating-big-sql/            of excellence globally to deliver cutting-edge
6
      ttp://anonhq.com/microsoft-bets-future-
     h                                                    research, insights and solutions to clients where
     reprogrammable-computer-chip/                        they operate and live. For more information, please
                                                          visit www.accenture.com/labs
7
      ttps://www.nextplatform.com/2016/08/23/fpga-
     h
     based-deep-learning-accelerators-take-asics/
8
      ttp://www.eweek.com/servers/intel-begins-
     h
     shipping-xeon-chips-with-fpga-accelerators           ABOUT ACCENTURE RESEARCH
9
      ttps://newsroom.intel.com/news/intel-eases-use-
     h                                                    Accenture Research identifies and anticipates
     fpga-acceleration-combines-platforms-software-       game-changing business, market and technology
     stack-ecosystem-solutions/                           trends through provocative thought leadership.
                                                          Our 250 researchers partner with world-class
10
      ttps://aws.amazon.com/blogs/aws/ec2-f1-
     h
                                                          organizations such as MIT and Singularity to
     instances-with-fpgas-now-generally-available/
                                                          discover innovative solutions for our clients.
11
      ttp://anonhq.com/microsoft-bets-future-
     h
     reprogrammable-computer-chip/
12
     https://www.adafruit.com/category/69
                                                          ABOUT ACCENTURE
13
      ttp://www.barrons.com/articles/nvidia-amd-
     h
     xilinx-to-benefit-from-rise-of-gpu-fpga-says-        Accenture is a leading global professional
     jefferies-1479222054                                 services company, providing a broad range of
                                                          services and solutions in strategy, consulting,
14
      ttp://spectrum.ieee.org/semiconductors/design/
     h
                                                          digital, technology and operations. Combining
     the-death-of-moores-law-will-spur-innovation
                                                          unmatched experience and specialized skills
15
      ttps://techcrunch.com/2017/08/22/microsoft-
     h                                                    across more than 40 industries and all business
     brainwave-aims-to-accelerate-deep-learning-          functions – underpinned by the world’s largest
     with-fpgas/                                          delivery network – Accenture works at the
                                                          intersection of business and technology to help
                                                          clients improve their performance and create
                                                          sustainable value for their stakeholders. With
Copyright © 2018 Accenture                                449,000 people serving clients in more than
All rights reserved.                                      120 countries, Accenture drives innovation to
                                                          improve the way the world works and lives.
Accenture, its logo, and
                                                          Visit us at www.accenture.com.
High Performance Delivered
are trademarks of Accenture.
You can also read