The New Big Data World - GETTING MORE VALUE FROM DATA: THE RISE OF HIGH-PERFORMANCE DATABASES - TECHNOLOGY WHITE PAPER MARCH 2019 - Bryan, Garnier ...

Page created by Christopher Hartman

Hobbies & Interests

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

The New Big Data World - GETTING MORE VALUE FROM DATA: THE RISE OF HIGH-PERFORMANCE DATABASES - TECHNOLOGY WHITE PAPER MARCH 2019 - Bryan, Garnier ...

The New Big Data World
GETTING MORE VALUE FROM DATA:
THE RISE OF HIGH-PERFORMANCE DATABASES

TECHNOLOGY WHITE PAPER MARCH 2019

Contents

THE NEW BIG DATA WORLD – GETTING MORE VALUE FROM DATA:
THE RISE OF HIGH-PERFORMANCE DATABASES

Foreword - The database world has changed                 1

1.   TECTONIC SHIFTS IN THE DATABASE MARKET               2
     Relational database vendors are challenged           2

     Mainstream players have reacted with acquisitions    6

     SAP pushes in-memory databases to primetime          8

2.   NEW DATA MANAGEMENT TECHNOLOGIES                    10
     From SQL to NoSQL to NewSQL                         10

     Hadoop for big data: the rise and fall?             14

     NoSQL databases: simpler than SQL                   16

     NewSQL databases                                    20
                                                              The database market has seen seismic change in the last decade.
     Cloud data warehousing                              22   Trends including cloud computing, Big Data and IOT, AI and automation
3.   CLOUD-BASED DATA WAREHOUSING: THE SNOWFLAKE CASE    24   have contributed to an explosion in data volumes. This has shaken up
     Snowflake, a ‘superfunded’ company                  24   the typical enterprise data management ecosystem and created fertile
                                                              ground for a new generation of startups offering high-volume,
     Snowflake’s approach: 100% ‘as-a-service’           26
                                                              high-performance applications. Many mainstream relational database
4.   IN-MEMORY ANALYTICS DATABASE: THE EXASOL CASE       28   players have reacted by acquiring the newcomers. Other big players
     Exasol, a hidden champion                           28   such as SAP have championed in-memory databases.
5.   CONCLUSION                                          30
                                                              Relational databases still dominate globally. But while the segment is expected to keep growing,
                                                              driven by factors such as in-memory SQL databases, its global share is forecast to fall steadily.
                                                              At the other end of the market, NoSQL and Hadoop databases surged by an estimated 66%
                                                              per year from 2013-18 and are expected to account for 18% of the total database market
                                                              by 2021. The excitement around new database technologies has been reflected in a wave of
                                                              pre- IPO funding rounds dominated by Cloudera and Snowflake. However, cloud innovations and
                                                              plummeting storage costs have driven consolidation in the Hadoop space. NoSQL databases,
                                                              which offer simplicity and speed in big data applications, and NewSQL, which adds the ACID
                                                              guarantees of relational databases, have seen significant activity and interest.
                                                              To examine some aspect of the evolving database market in more detail, we profile cloud-based
                                                              data warehousing software vendor Snowflake, a ‘superfunded’ company offering data warehouse
                                                              as-a-service; and we look at Exasol, which develops in-memory analytics for big data use cases.
                                                              In conclusion, we see a continuing tussle between Oracle and Amazon Web Services for the
                                                              #1 spot in the database market, Hadoop being challenged by its own complexity, acquisitions
                                                              in new database technologies as incumbents continue to battle AWS, and plenty of potential
                                                              unicorns in the data management space.

                                                                                                                                                                  1

1. Tectonic shifts in the
    database market

Relational database vendors                             the backbone of mainstream              Over the last decade, the database          FIG2:2 ANNUAL SIZE OF THE ‘GLOBAL DATASPHERE’ IN ZETTABYTES (2010-2025E)
                                                                                                                                          FIG.
are challenged                                          enterprise application software like    market has undergone massive
                                                        ERP or CRM that flourished in the       changes, thanks to the nature of data
Gartner estimated the size of the                                                                                                                                                                                                                                                175 ZB
                                                        1990s and 2000s. (SQL was first         being generated, the volume, and                       180
database management systems
                                                        developed by IBM in 1974 and made       the processing capability needed to
market at USD 38.8bn for 2017,                                                                                                                         160
                                                        commercially available by Oracle        make sense of all the data. According
up 12.7%1, after an 8.1% rise in
                                                        five years later). As they constantly   to an IDC white paper published in                     140
2016 and a 1.6% decline in 2015.
                                                        improved in performance, driven by      November 20182, the total volume of
For more than two decades, this                                                                                                                        120
                                                        the data warehousing, data mining       global data has exploded, rocketing
market has been dominated by three
                                                        and business intelligence tools that    to 33 zettabytes (ZB) in 2018 from

                                                                                                                                           Zetabytes
                                                                                                                                                       100
players: Oracle, which had 41%
                                                        appeared throughout the 1980s           less than 5 ZB in 2005. It is projected
market share in 2016, IBM (20%),
                                                        and 90s, relational databases were      to reach 163 ZB in 2025. One                           80
and Microsoft (15%). Relational
                                                        fast enough to address all of an        zettabyte is equivalent to a trillion
databases based on Structured                                                                                                                          60
                                                        organisation’s data processing needs.   gigabytes (GB).
Query Language (SQL) have become
                                                                                                                                                       40

                                                                                                                                                        20
FIG. 1: DATABASE MANAGEMENT SYSTEMS MARKET SHARES (2016)
                                                                                                                                                         0
                                                                                                                                                             2010   2011    2012     2013      2014      2015     2016   2017   2018   2019   2020      2021    2022   2023    2024    2025
               FIG 1
                                                                                                                                          Source: IDC and Seagate, The Digitisation of the World, November 2018
                                                                            MICROSOFT

                                                                                                                                          IT departments of large and small                           inexpensive technologies such                  Google Cloud BigTable (wide column

                                                                            20%                                                           companies alike are scrambling
                                                                                                                                          to bridge this data infrastructure
                                                                                                                                                                                                      as software-defined storage (SDN),
                                                                                                                                                                                                      they made data storage and the
                                                                                                                                                                                                                                                     store), Google Cloud Spanner
                                                                                                                                                                                                                                                     (NewSQL database), Google Cloud
                                                                                                 IBM                                      gap as they failed to anticipate this                       general use of data analytics                  Datastore (NoSQL database) and

                                                                                        15%                                               data explosion, which was primarily
                                                                                                                                          caused by six converging trends:
                                                                                                                                                                                                      accessible to organisations of all
                                                                                                                                                                                                      sizes and across all departments.
                                                                                                                                                                                                                                                     Google BigQuery (data warehouse);
                                                                                                                                                                                                                                                     3) Microsoft launched Microsoft Azure

                                                    41%
                                                                                                                                                                                                      These three players developed their            SQL Database and Microsoft Azure
                                                                                                                                          Cloud computing                                             own data management systems:                   Data Warehouse.
                                                                                                                                          The public cloud services Amazon                            1) Amazon launched Amazon RDS

                                                                                  24%
                                  ORACLE                                                                                                                                                              (relational database), Amazon                  Big Data
                                                                                                                                          Web Services (2002), Google Cloud
                                                                                                                                          Platform (2008) and Microsoft                               DynamoDB (NoSQL database),                     As we have already seen, data volumes
                                                                                                                                          Azure (2010) have been designed                             Amazon Aurora (relational database)            have increased massively. Social media
                                                                                                                                          to facilitate agile projects and                            and Amazon Redshift (data                      platforms create plenty of it: the overall
                                                                                          OTHER                                           provide flexible, demand-oriented                           warehouse); 2) Google launched                 data footprint of eBay, Facebook
                                                                                          (AMAZON, TERADATA, SAP…)
                                                                                                                                          infrastructure and services. Using                          Google Cloud SQL (SQL database),               and Twitter now is 500, 300 and 300

 Gartner, State of the Operational DBMS market, 2018, 6th September 2018.
 1

 IDC and Seagate, The Digitisation of the World, November 2018.
 2

2 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                                                        3

petabytes, respectively (a petabyte        such as cameras or vending machines,            and include the most recent data,       External data                            self-driving cars and predictive           stored in a ‘data lake’, which can
is equivalent to one million GB). And      has created new challenges for data             meaning that the patience of users                                               analytics based on self-learning           then be connected to enterprise
                                                                                                                                   Data is no longer created only within
the Internet of Things (IoT) – data from   storage and analysis.                           waiting for reports, analyses and                                                algorithms are radically changing          data warehouses.
                                                                                                                                   the organisation itself. It can now be
phones, cameras, connected cars,                                                           results is increasingly short.                                                   business processes.
                                           Self-service data analytics                                                             easily enriched with all manner of                                                  Another consequence of these
wearables, sensors, operational control
                                                                                           Operational business intelligence       external sources, for example: social    These trends have shaken up the            changes is the advent of Amazon
centers – generates vast volumes of        By design, access to legacy data
                                                                                           and automation                          media, public statistics, geospatial     typical enterprise data management         Web Services (AWS) as the fourth
data and needs new technologies            warehouses was restricted to                                                            data, weather forecasts, competition     ecosystem. It is no longer limited to      largest database vendor in the market
for analysis, more scalable storage,       centralised IT departments. The                 Data analytics no longer simply         analysis and event-driven data.          structured data, relational databases,     (with a Cloud database offering).
streaming solutions and operational data   decentralisation of data and the                measures historical data. It is being
systems. New applications also need to                                                                                                                                      ETL (Extract Transform Load) tools,        In 2017, AWS had an estimated
                                           advent of more easy-to-use analysis             built into all kinds of operational     Artificial intelligence
be created on top of data, for example                                                                                                                                      enterprise data warehouses, and            9%+ market share of the whole
                                           tools like Tableau, Qlik, or Microsoft          processes and is predictive,
to offer dynamic pricing or personalised                                                                                           With recent advances in infrastructure   traditional business intelligence tools.   database market and revenues up
                                           Power BI enabled staff to access and            deterministic and self-learning.
marketing campaigns. The complexity                                                                                                and tools, data science is much          In particular, structured data,            an impressive 112% in this area.
                                           analyse data and get more insights              Completely automated and self-
of IoT, where data is processed either                                                                                             more accessible and now allows           semi-structured data, unstructured
                                           from it. User requirements have also            learning processes make decisions
on servers, public or private clouds, or                                                                                           for automated decision-making.           data and machine-based data
                                           changed dramatically. Dashboards                based on data at a speed and
at the endpoint or the edge of devices                                                                                             Applications such as face recognition,   (IoT, logs, sensors) can be now
                                           can be updated within seconds                   accuracy that humans cannot match.

FIG. 3: TYPICAL MODERN DATA MANAGEMENT ECOSYSTEM

                   STRUCTURED DATA         SEMI-STRUCTURED DATA            UNSTRUCTURED DATA      MACHINE-GENERATED DATA

                      RELATIONAL
                       DATABASES
                                                                  NON-
                     (SQL, NEWSQL)
                                                              RELATIONAL
                                                              DATABASES
                                                                (NOSQL)

                       ETL TOOLS
                  (EXTRACT TRANSFORM
                         LOAD)

                                                          DATA LAKE

                      DATA
                   WAREHOUSE

                          BUSINESS ANALYTICS AND BUSINESS INTELLIGENCE TOOLS

4 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                  5

Mainstream players have                           in-memory databases, but these were       These newly acquired technologies
reacted with acquisitions                         confined to high-end applications.        contained their own weaknesses.
                                                                                            For example, they are all built on a
Relational database management                    In the mid-2000s, start-ups including     kernel of the PostgreSQL database
software leaders have had to                      Vertica, Greenplum, Netezza,              (a pre-existing database based on
embrace these trends, but at the                  DatAllegro, Aster Data, ParAccel          Ingres) which was built in the 1970s,
same time deal with hundreds of                   and Exasol acknowledged that              revamped in the 1990s, and difficult
thousands of customers who need                   greater performance was going to          to change. The legacy players
backwards compatibility with legacy               be needed to deal with increasing         attempted to lock in the customers
IT environments. This constraint made             data volumes. They developed              by offering ‘end-to-end’ business
it practically impossible to replace              solutions in the form of large-volume,    solutions (from databases to BI
30-40-year-old code and rebuild their             high-performance data warehousing         solutions) of which the relational
solutions from scratch. So instead                applications designed and built for       database was only one component.
of migrating their database product               analytical use cases. Their products      For example, for use in corporate
offerings to the new technology                   started to see success and most           data centers deployed as private
platforms, these large incumbents                 of them were acquired by global IT        clouds, Oracle launched in 2008 its
kept their old platforms and tried to             players (see Fig. 4). ParAccel, which     Exadata Database Machine, which
fix their legacy systems by adding                has served as the core component          is an Oracle database running on
elements of the new technologies                  of Amazon Redshift, was acquired          dedicated hardware. Exadata was
to their existing products. The result            by Actian (formerly Ingres) in 2013.      adapted to public clouds in 2015.
has been a mix of approaches and                  These acquisitions enabled global
a hybrid architecture. Alternative                vendors to temporarily address
solutions emerged, such as                        performance issues and consolidate
column-based data stores                          their market positions.
(like Sybase IQ in 1995) and

FIG. 4: ACQUISITIONS IN DATABASES AND DATA WAREHOUSING

 Company                                          Funding                                  Price/acquirer

 DatAllegro                                       USD 63m                                  USD 200m by Microsoft (2008)

 Netezza                                          USD 63m                                  USD 1,700m by IBM (2010)

 Greenplum                                        USD 97m                                  USD 300m by EMC (2010)

 Vertica Systems                                  USD 31m                                  USD 350m by HP (2011)

 Aster Data                                       USD 53m                                  USD 295m by Teradata (2011)

Source: Company Data; Bryan, Garnier & Co ests.

6 | THE NEW BIG DATA WORLD                                                                                                          7

SAP pushes in-memory                     FIG. 5: SAP HANA REVENUES (2011-2013) (EUR M)                                                   In response to the success of SAP                            governed data platform, with several            data – including semi-structured and
databases to primetime                   FIG 5                                                                                           HANA, which grew from nothing in                             product extensions and innovations for          unstructured external data like clicks,
                                                                                                                                         2010 to EUR 633m in revenues by                              orchestrating and managing enterprise           logs, IoT, text, images, and video –
Before it launched HANA, SAP was
not a database player, even though                                                                        633                            2013 and now reaches over 21,000
                                                                                                                                         customers, relational database
                                                                                                                                                                                                      business data within multi-cloud and
                                                                                                                                                                                                      hybrid environments. To integrate Big
                                                                                                                                                                                                                                                      before it is surfaced through SAP HANA
                                                                                                                                                                                                                                                      for further, more detailed analysis.
in 1997 it had acquired Max DB, a
                                                                                                                                         incumbents moved to embrace                                  Data stored in Hadoop with enterprise
relational database with an in-memory                                                                                                                                                                                                                 As of today, the key components
                                                                                                                                         in-memory technology and protect                             data in SAP HANA, SAP offers a
engine that was developed in the
late 1970s at the Technical University
                                                                                      392                                                their installed base. During 2013 and
                                                                                                                                         2014, IBM, Oracle and Microsoft all
                                                                                                                                                                                                      separate in-memory query engine, SAP
                                                                                                                                                                                                      Vora, which was launched in 2015. And
                                                                                                                                                                                                                                                      of SAP HANA Data Management
                                                                                                                                                                                                                                                      Suite include SAP Data Hub -
of Berlin. For years, SAP relied on                                                                                                                                                                                                                   which creates a central metadata
                                                                                                                                         launched in-memory options on top                            with the acquisition in 2016 of Big Data
IBM, Oracle and Microsoft to provide                                                                                                                                                                                                                  management catalog; SAP HANA;
                                                                                                                                         of their relational databases, rather                        as-a-service (BDaaS) provider Altiscale,
the database that ran its enterprise
application software products. In
                                                                 160                                                                     than releasing an independent                                SAP was able to create its SAP Cloud            SAP Enterprise Architecture
                                                                                                                                                                                                                                                      Designer, which creates architecture
                                                                                                                                         in-memory database.                                          Platform Big Data Services offering
2010, the acquisition of Sybase for                                                                                                                                                                                                                   documentation; and SAP Cloud
                                                                                                                                                                                                      based on Hadoop and Spark. This acts
USD 5.8bn gave SAP the Sybase ASE                                                                                                        Over the years, SAP HANA has                                                                                 Platform Big Data Services.
                                                                                                                                                                                                      as a data refinery, with the ability to
relational database (which served as                                 2011                2012                 2013                       evolved to become an open and                                cleanse and transform terabytes of raw
the basis for Microsoft SQL Server)
and the Sybase IQ column store.          Source: SAP. From 2014 onwards, SAP stopped reporting SAP HANA revenues

The launch in November 2010 of the       The HANA technology had several                        designed eliminate the use of ETL        FIG. 6: ARCHITECTURE OF THE SAP HANA DATA MANAGEMENT SUITE
SAP HANA in-memory database, sold        advantages from the start: 1) speed                    (Extract Transform Load) tools, stored
                                                                                                                                             SAP INTELLIGENT ENTERPRISE SUITE                         SAP LEONARDO AND SAP ANALYTICS CLOUD                  THIRD-PARTY APPLICATIONS
as an application running on certified   – the ability to handle several billion                procedures, materialised views and
hardware, was a revolution in the        records in seconds; 2) scalability                     OLAP cubes.
database world. SAP now mandated         - designed to positively respond                                                                                                                                 SAP HANA DATA MANAGEMENT SUITE
its own in-memory database for           to the increase in the number of                       SAP HANA was made possible by                                                                                                                                                                     SAP ADD-ON
                                                                                                                                                                                                                                                                                                  API SERVICES
                                                                                                the arrival of standard servers at

                                                                                                                                                          HYBRID CLOUD MANAGEMENT
                                                                                                                                                                                                                                                                                                 AND PRODUCTS
running its business applications, at    cores per server; 3) in-memory data
                                                                                                affordable prices, equipped with

                                                                                                                                                                                                                                                                         SAP CLOUD PLATFORM
the expense of third-party relational    compression between 5 and 20x; and                                                                THIRD-PARTY
                                                                                                                                            PRODUCTS
                                                                                                                                                                                                                                                                                                    SAP HANA
                                                                                                                                                                                                                                                                                                     SPATIAL
                                                                                                                                                                                      IN-MEMORY          DATA            DATA             DATA             DATA
database providers. The initial          4) data management simplification                      several terabytes of memory and            AND SERVICES                              TRANSACTION      DISCOVERY     ORCHESTRATION       CLEANSING        STORAGE                                    SERVICES
                                                                                                                                                                                                                                                                                                      HANA
purpose of SAP HANA was to carry         – with no need to create data                          multi-core processors. With blade             SPARK
                                                                                                                                                                                          AND
                                                                                                                                                                                       ANALYTICS
                                                                                                                                                                                                         AND
                                                                                                                                                                                                     GOVERNANCE
                                                                                                                                                                                                                         AND
                                                                                                                                                                                                                     INTEGRATION
                                                                                                                                                                                                                                           AND
                                                                                                                                                                                                                                       ENRICHMENT
                                                                                                                                                                                                                                                           AND
                                                                                                                                                                                                                                                         COMPUTE                                  BLOCKCHAIN
                                                                                                                                             HADOOP                                                                                                                                                 SERVICES
out real-time data analytics and         aggregates, there is no duplication.                   servers then sold for EUR 40,000           THIRD-PARTY                                                                                                                                                HANA

from 2011, SAP used it to power                                                                 a unit (now down to EUR 15,000),            DATABASES                                                                                                                                              STREAMING
                                                                                                                                           THIRD-PARTY                                                                                                                                             ANALYTICS
SAP Business Warehouse, its data         In addition, HANA processed                            massively parallel computing                   DATA                                                                                                                                                OTHER SAP
                                                                                                                                           MANAGEMENT                                                                                     SAP ENTERPRISE                                             CLOUD
warehousing system. But SAP HANA         in-memory data in both lines and                       became widespread. In-memory data                                                              SAP HANA
                                                                                                                                                                                                                                       ARCHITECTURE DESIGNER                                      PLATFORM +
                                                                                                                                                                                                                                                                                                 SAP LEONARDO
was quickly used to manage data          columns, and could handle data                         processing technology has therefore                                                                                                                                                                 SERVICES
                                                                                                                                                                                            SAP DATA HUB                                 SAP CLOUD PLATFORM,
transactions, powering SAP Business      management standards for                               long been reduced to the notion of a                                                      SAP EIM SOLUTIONS                                BIG DATA SERVICES
Suite (2013) then its replacement SAP    relational databases and OLAP                          cache or database accelerator. While
S/4HANA (2015) and all SAP cloud         (OnLine Analytical Processing)                         all operations take place in-memory,
applications. Today, SAP HANA is         multi-dimensional analysis. Finally,                   each transaction is automatically                                                   BUSINESS
                                                                                                                                                                                                  CLOUD
                                                                                                                                                                                                APPLICATION       IOT        SPATIAL         SOCIAL          IMAGE
the mandatory database for running       with its ability to handle transactional               captured and copied onto a disk with        ON-PREMISES                               DATA
                                                                                                                                                                                                   DATA                                                                                       MULTI-CLOUD

SAP’s Core and Cloud applications.       and analytical data, SAP HANA was                      redundancy and high availability.

                                                                                                                                         Source: SAP

8 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                                                                       9

2. New data management
    technologies
                                          FIG 7

From SQL to NoSQL                         FIG. 7: DATABASE MANAGEMENT SYSTEMS MARKET (2013-2021E)
to NewSQL
Relational databases still account        60,000               DYNAMIC DBMS (NOSQL, HADOOP)
for around 82% of the global market                            LEGACY NON-RELATIONAL DBMS

                                          50,000               RELATIONAL DBMS
for database management systems.
However, this has fallen from 87%         40,000
in 2013, and IDC anticipates that it
                                          30,000
will fall further, to 77% by 2021. That
said, the relational database segment     20,000

is still expected to grow by 6-7%         10,000
per year over the next three years,
                                               0
driven in particular by in-memory                             2013               2014        2015          2016                2017          2018E        2019E                 2020E       2021E
SQL databases or in-memory options
and the strong growth of NewSQL           Source: IDC

databases. NoSQL, NewSQL will
not replace SQL databases, but
co-exist within technology stacks
for the foreseeable future. At the        FIG. 8: PRE-IPO FUNDING OF NO/NEWSQL, HADOOP AND CLOUD EDW
                                          FIG 8
other end of the market, NoSQL and        VENDORS (USD M)

Hadoop databases have increased
their market share from 1% in 2013
to an estimated 11% in 2018, with             1,000
revenues up from USD 366m to                                    929
USD 4.6bn. These databases have
surged by an estimated 66% per
year on average over the period. IDC
anticipates that this segment will see
average annual growth of 33% from
                                                                                 311 280
2017-2021, reaching USD 10bn or                                                          248 247
18% of the total database market
                                                                                                                                190 173 162 160 146
                                                                                                                                                    108
by 2021.
                                                   CLOUDERA

                                                                     SNOWFLAKE

                                                                                   MONGODB

                                                                                             MAPR

                                                                                                    HORTONWORKS

                                                                                                                  DATABRICKS

                                                                                                                                  DATASTAX

                                                                                                                                              MARKLOGIC

                                                                                                                                                          ELASTIC

                                                                                                                                                                    COUCHBASE

                                                                                                                                                                                    NEO4J

                                                                                                                                                                                              MEMSQL

The excitement around NoSQL,
NewSQL and Hadoop, which
are aimed at replacing relational
databases, is reflected by the funding
of a significant number of startups in    Source: Crunchbase

these areas (see Fig. 8).

10 | THE NEW BIG DATA WORLD                                                                                                                                                                            11

The table below recaps the financing       Yellowbrick Data (USD 92m), Neo4j          and Elastic – the four companies        FIG. 10: MARKET CAPITALISATION OF NEW DATA MANAGEMENT TECH VENDORS
rounds that have taken place since         (USD 80m), NuoDB (USD 31m) and             in this space that have gone public
early 2016 in the database and data        MemSQL (USD 30m). In 2017, the             since 2014 – show strong investor        Elastic                                      USD 5.19bn                        +107% since IPO

warehousing areas. 2018 saw plenty         two biggest rounds were Databricks         appetite for these stocks as they        Mongo DB                                     USD 4.59bn                        +181% since IPO
of activity, with USD 713m raised          (USD 140m), and Snowflake again            generate strong growth, reduce
                                                                                                                               Cloudera                                     USD 1.72bn                        -23% since IPO
for Snowflake Computing, which             (USD 105m).                                their losses, and become potential
is seen as the new ‘unicorn’ in the                                                   champions in their specific area or      Hortonworks                                  USD 1.24bn*                       +88% since IPO
space, over two rounds. The other          Valuation multiples achieved by            M&A targets for the likes of IBM,
                                                                                                                               Vertica Systems                              USD 31m
significant rounds for this year include   Hortonworks, Cloudera, MongoDB             Oracle, SAP and others.
                                                                                                                               Aster Data                                   USD 53m

FIG. 9: FINANCING ROUNDS FOR NO/NEW SQL, HADOOP AND CLOUD EDW VENDORS                                                         *As of 2 January 2019, when it left the stock market following merger with Cloudera.

 Date                Company               Round           No. investors     Amount              Lead financing

 Dec. 2018           NuoDB                 Venture         5                 USD 31m             Temenos
                                                                                                                              Current EV/(N+1) sales multiples are impressive for Elastic and MongoDB:
 Nov. 2018           Neo4j                 Series E        5                 USD 80m             Morgan Stanley, One Peak

 Oct. 2018           Yellowbrick Data      Series B        2                 USD 48m             Next47
                                                                                                                                FIG11:
                                                                                                                              FIG.  11 EV/(N+1) SALES MULTIPLES AS OF 8TH JANUARY 2019
 Oct. 2018           Snowflake Computing   Series F        8                 USD 450m            Sequoia

 Oct. 2018           Elastic               IPO                               USD 252m            Valuation: USD 2.5bn

 Aug. 2018           Yellowbrick Data      Series A        5                 USD 44m             Menlo Ventures                                    19
 May 2018            MemSQL                Series D        5                 USD 30m             Glynn, GV
                                                                                                                                                                                 18.5
 Jan. 2018           Snowflake Computing   Series E        3                 USD 263m            Altimeter, Iconiq, Sequoia

 Oct. 2017           MongoDB               IPO                               USD 256m            Valuation: USD 1.6bn

 Sep. 2017           MariaDB               Series C        6                 USD 27m             Alibaba

 Sep. 2017           Snowflake Computing   Series D        7                 USD 105m            Iconiq

 Sep. 2017           MapR Technologies     Venture         7                 USD 56m             Lightspeed

 Aug. 2017           Databricks            Series D        4                 USD 140m            Andreessen Horowitz

 May 2017            MariaDB               Venture         1                 EUR 25m             European Investment Bank
                                                                                                                                                                                                                     3.2
 May 2017            Cockroach Labs        Series B        5                 USD 27m             Redpoint

 Apr. 2017           Cloudera              IPO                               USD 225m            Valuation: USD 1.9bn

 Dec. 2016           Databricks            Series C        3                 USD 60m             New Enterprise Associates                      ELASTIC                        MONGODB                        CLOUDERA

 Nov. 2016           Neo4j                 Series D        4                 USD 36m
                                                                                                                              Source: Company data; Thomson Reuters
 Aug. 2016           MapR Technologies     Venture         7                 USD 50m             Future Fund

 Jul. 2016           Elastic               Series D        2                 USD 58m

 Apr. 2016           MemSQL                Series C        7                 USD 36m             Caffeinated, REV

 Mar. 2016           Cockroach Labs        Series A        4                 USD 20m             Index

 Mar. 2016           Couchbase             Series F        7                 USD 30m             Sorenson

 Feb. 2016           NuoDB                 Series B        3                 USD 17m

Source: Crunchbase

12 | THE NEW BIG DATA WORLD                                                                                                                                                                                                     13

Hadoop for big data:                               System or HDFS; and processing,                       computation and data are distributed    FIG. 13: REVENUES OF CLOUDERA AND HORTONWORKS (2014-2018E) (USD M)
                                                                                                                                                 FIG 13
the rise and fall?                                 the MapReduce programming model.                      via high-speed networking.
                                                   Hadoop splits files into large blocks
The initial release of Apache Hadoop                                                                     An ecosystem of additional software
                                                   and distributes them across nodes in                                                          500
took place in 2006. This distributed                                                                     packages for managing data has
                                                                                                                                                             HORTONWORKS (FYE 31/12 N)
                                                                                                                                                                                                                                445.0
                                                   a cluster. It then transfers packaged                                                         450         CLOUDERA (FYE 31/1 N+1)
file system is a collection of open-                                                                     been developed around Hadoop
                                                   code into nodes to process the data                                                           400
                                                                                                                                                                                                                367.4
source software tools that facilitate                                                                    (See Fig.12).                                                                                                                  340.5
                                                   in parallel. This approach allows the                                                         350
the use of computer clusters to solve                                                                                                            300
                                                   dataset to be processed faster and                                                                                                          261.0                    261.8
issues involving massive amounts                                                                         Hadoop can be deployed on-premise,      250
                                                   more efficiently than in a conventional
of data. Its core has two elements:                                                                      or in a public or private cloud:        200
                                                                                                                                                                             166.0                      184.5
                                                   supercomputer architecture, which
storage, Hadoop Distributed File                                                                         Microsoft Azure (with Microsoft Azure
                                                                                                                                                 150
                                                                                                                                                         109.1                         121.9
                                                   uses a parallel file system where                                                             100
                                                                                                         HDInsight), Amazon EC2 and S3             50             46.0
                                                                                                         (with Amazon Elastic MapReduce),           0
FIG. 12: THE HADOOP DATA MANAGEMENT ECOSYSTEM                                                            Google Cloud Platform (Google Cloud                  2014                 2015           2016              2017          2018E

                                                                                                         Dataproc), SAP Cloud Platform (SAP      Source: Cloudera; Hortonworks
Spark               Cluster-computing framework for data analytics and machine-learning algorithms
                                                                                                         Big Data Services), and Oracle Cloud
Hive                Data warehousing software providing a SQL-like query language, HiveQL                Platform (Oracle Big Data Cloud).
                    Non-relational distributed database for storing large quantities of sparse data,                                             In October 2018, Cloudera                             Plummeting storage costs                 to load it into special storage. Other
HBase                                                                                                    The main Hadoop software
                    for serving data-driven websites                                                                                             announced it was acquiring                                                                     cloud database services - Snowflake,
                                                                                                         distributions are Cloudera,                                                                   On cloud storage services like AWS
Phoenix             Massively parallel relational database based on SQL, using HBase as its data store                                           Hortonworks through an all-stock                                                               Google BigTable, Amazon Aurora,
                                                                                                         Hortonworks and MapR. Hortonworks                                                             S3, Azure Blob Storage, and Google
                                                                                                                                                 ‘merger of equals’, leaving Cloudera                                                           and Microsoft Cosmos - are similarly
Impala              Massively parallel SQL query engine for analytics
                                                                                                         and Cloudera have been listed in the                                                          Cloud Storage, a terabyte of cloud
                                                                                                                                                 shareholders with 60% of the                                                                   massive-scale, highly flexible,
Pig                 Platform for creating programmes on Hadoop                                           US since December 2014 and April                                                              object storage costs about USD 20 a
                                                                                                                                                 combined company. The transaction                                                              globally distributed ‘pay-as-you-
                                                                                                         2017, respectively. Their revenues                                                            month, compared to around USD 100
ZooKeeper           Distributed computing system                                                                                                 was completed in January 2019.                                                                 go’ databases.
                                                                                                         surged at over +40% per year until                                                            a month on Hadoop Distributed File
Flume               Log data collection                                                                  2017, after which growth rates slowed   We understand the deal as a signal                    System (HDFS).                           Python and R data science running
                    Data transfer between relational databases and Hadoop
                                                                                                         in 2018, when the mid-point of          the Hadoop market could no longer                                                              on containers
Sqoop                                                                                                                                                                                                  Faster, better, and cheaper
                                                                                                         company guidance implied +21% for       sustain two big competitors, with
Oozie               Workflow scheduling                                                                                                                                                                cloud databases                          Hadoop’s revolutionary innovation
                                                                                                         Cloudera and +30% for Hortonworks.      several trends driving the change:
                                                                                                                                                                                                                                                was that it offered both a storage and
Storm               Distributed stream processing computation framework                                  Cloudera’s shares dropped 40% on                                                              The advent of cloud services like
                                                                                                                                                                                                                                                compute environment for applications
                                                                                                         4th April 2018 when it reported its     The shift to public cloud                             Google BigQuery in 2011 eliminated
                                                                                                                                                                                                                                                written in Java. However, this was out
                                                                                                         FY18 results due to disappointing       AWS, Microsoft Azure, and Google                      the need to run Hadoop or Spark.
                                                                                                                                                                                                                                                of step with data scientists working
                                                                                                         guidance for FY19.                      Cloud offer their own managed                         While Apache Spark is used to handle
                                                                                                                                                                                                                                                on machine learning in Python and
                                                                                                                                                 Hadoop/Spark services as part of                      ad-hoc distributed SQL queries,
                                                                                                                                                                                                                                                R, who need native deployment
                                                                                                                                                 fully integrated offerings that are                   Google BigQuery runs ad-hoc queries
                                                                                                                                                                                                                                                of Python/R models to iterate and
                                                                                                                                                 cheaper to buy and scale.                             on any amount of data stored in its
                                                                                                                                                                                                                                                improve machine learning models.
                                                                                                                                                                                                       object storage service without having

14 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                                         15

NoSQL databases:                        There are four main categories of         Key-value (KV) stores                       FIG. 14: CLASSIFICATION OF NOSQL DATABASES
simpler than SQL                        NoSQL database:
                                                                                  These databases use dynamic data
                                                                                                                               Wide column stores                           Document-oriented databases   Key-value stores     Graph databases
The concept of ‘NoSQL’ databases        Wide column stores                        such as session data and machine-
emerged around 2009. The term was                                                 generated data used for analysis and         Apache Accumulo (based on Google BigTable)   Amazon DocumentDB             Aerospike Database   AllegroGraph

introduced by IT developer Johan        Data is collected in very large           systems and device coordination.             Apache Cassandra (DataStax)                  Apache CouchDB                Amazon DynamoDB      Apache Giraph
Oskarsson to label a growing number     structures for high-performance           They treat the data as a single
of non-relational, distributed data     analytics. By storing data in columns     collection that may have different fields    Apache HBase                                 ArangoDB                      Apache Ignite        ArangoDB

stores including Google BigTable,       rather than rows, the database an         for every record. Key-value databases        Druid                                        BaseX                         Apache ZooKeeper     Objectivity InfiniteGraph
Amazon Dynamo and the newly             access the data it needs to answer        often use far less memory to store the
launched Amazon DocumentDB.             a query more precisely, without the       same data than relational databases,
                                                                                                                               Micro Focus Vertica                          Clusterpoint                  ArangoDB             MarkLogic

Most early NoSQL systems did not        need to scan and discard unwanted         which can lead to large performance                                                       Couchbase Server              Basho Riak           Neo4J

attempt to provide the atomicity,       data in rows. Wide column stores          gains in certain workloads.
                                                                                                                                                                            Elasticsearch                 Couchbase Server     OpenLink Virtuoso
consistency, isolation and durability   are well-suited for data warehouses
(ACID) guarantees that relational       involving highly complex queries.         Graph databases                                                                           Microsoft Azure Cosmos DB     FoundationDB         SAP OrientDB

databases usually offer. NoSQL                                                    These are organised in a graph                                                            IBM Notes/Domino              InfinityDB
                                        Document-oriented databases
databases enable the storage and                                                  structure for relationship and pattern
                                                                                                                                                                            MarkLogic                     Oracle BerkeleyDB
retrieval of data modelled in other     These are used for managing semi-         analysis, for example fraud detection
means than through tabular relations.   structured data in document formats       or semantic text analysis. The graph                                                      MongoDB                       SAP OrientDB

They are increasingly used in big       such as JSON or XML. Document             directly relates data items in the                                                        Qualcomm Qizx                 SciDB
data and real-time web applications     databases store all the information       store to a collection of nodes of
due to design simplicity, simpler       for a given object in a single instance   data and edges representing the                                                           RethinkDB

scaling to clusters of machines,        in the database, and every stored         relationships between the nodes. The                                                      SAP OrientDB
and finer control over availability.    object can be different from every        relationships allow data in the store to
The data structures used by NoSQL       other. This makes mapping objects         be linked together directly. Querying       Source: Bryan, Garnier & Co
databases (key-value, wide column,      into the database a simple task and       relationships within a graph database
graph, or document) make some           typically eliminates anything similar     is fast because they are perpetually
operations faster in NoSQL than in      to an object-relational mapping. This     stored within the database itself.
relational databases.                   makes document stores attractive for      Relationships can be intuitively
                                        programming web applications, which       visualised using graph databases,
                                        are subject to continual changes, and     making them useful for heavily
                                        where speed of deployment is an           inter-connected data.
                                        important issue.

16 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                17

MongoDB was the first listed            FIG 1515: MONGODB - REVENUES AND NON-GAAP OPERATING MARGIN (FYE 31/1)
                                        FIG.
pure-play NoSQL database software
vendor, with a NASDAQ IPO in                        REVENUES (USD M) (LEFT SCALE)
                                                    NON-GAAP OPERATING MARGIN (%) (RIGHT SCALE)
October 2017. The company is still                                                                                 -36.2%
heavily loss-making, with a                                                                       -49.2%
non-GAAP operating margin                                                        -64.0%                             -229.0
forecasted at -36% at the mid-point
of company guidance for FY19.                                     -91.7%
Losses are reducing year-on-year
while revenues are growing around                                                                  154.5
50% annually.

                                                                                    101.4
The second NoSQL database
pure-player to be listed was Elastic,
which develops and sells the                -168.1%                    65.3
                                              - 40.8
Elasticsearch document-oriented
database and search and analytics
                                                FY15                    FY16         FY17          FY18              FY19E
engine. Listed on NYSE in October
2018, this company is also loss-        Source: MongoDB
making, with a non-GAAP operating
margin of -20% for the fiscal year
                                        FIG 1616: ELASTIC - REVENUES AND NON-GAAP OPERATING MARGIN (FYE 30/4)
                                        FIG.
ending 30th April 2018. Revenues,
on the other hand, grew 81% during                  REVENUES (USD M) (LEFT SCALE)
that fiscal year.                                   NON-GAAP OPERATING MARGIN (%) (RIGHT SCALE)

                                                                                                           255
                                                                                   159.9
                                                                                                       -17.6%
                                                                                -20.0%
                                                           88.2

                                                       -31.2%
                                                           FY17                     FY18                   FY19E

                                        Source: Elastic; Bryan, Garnier & Co

18 | THE NEW BIG DATA WORLD                                                                                                  19

NewSQL databases                          The issue with NoSQL databases          There are three main categories          FIG. 17: CLASSIFICATION OF NEWSQL DATABASES
                                          is their lack of compliance with the    of NewSQL database:
One of the reasons why relational
                                          ACID rules when processing                                                        Shared-noting databases         Storage engines for SQL                                         Transparent sharding
databases have been so successful                                                 Shared-nothing (SN) databases
                                          ‘high-profile’ data (e.g. financial
for decades is that they have a set                                                                                         Altibase                        Infobright                                                      ScaleBase
                                          data or orders) with strong             These are designed to operate in
of characteristics that guarantee                                                                                           Apache Ignite                   MyRocks                                                         Vitess
                                          transactional and consistency           a distributed cluster of ‘shared-
their validity: Atomicity, Consistency,
                                          requirements. NewSQL – a concept        nothing’ nodes in which each node         CockroachDB                     MariaDB Columnstore
Isolation, and Durability (ACID).
                                          which emerged in 2011 from the          is independent and self-sufficient,
Atomicity guarantees each                                                                                                   Exasol                          Microsoft SQL Server (with ColumnStore and InMemory features)
                                          451 Group analyst Matthew Aslett        and owns a subset of the data,
transaction is treated as a single unit
                                          – seeks to offer the same scalable      while none of the nodes shares            Google Spanner                  Oracle InnoDB
that either succeeds completely or
                                          performance of NoSQL databases          memory or disk storage. Some SN
fails completely. Consistency means                                                                                         GridGain                        Oracle MySQL Cluster
                                          for transaction workloads while still   database platforms are in-memory,
that any data written to the database
                                          maintaining the ACID guarantees of a    like Altibase, Apache Ignite, Exasol,     HarperDB                        TokuDB
must be valid according to all defined
                                          relational database. Both NoSQL and     GridGain, MariaDB Clustrix,               MariaDB Clustrix
rules – this prevents database
                                          NewSQL support the relational data      and VoltDB.
corruption by an illegal transaction,                                                                                       MemSQL
                                          model and use SQL as their primary
but does not guarantee a transaction                                              Storage engines for SQL
                                          interface. Applications typically                                                 Microsoft Azure Cosmos DB
is correct. Isolation ensures that
                                          targeted by NewSQL databases            Optimised storage engines for SQL,
concurrently executed transactions                                                                                          NuoDB
                                          have a large number of short-lived,     these databases providing the same
leave the database in the same state
                                          repetitive transactions that touch a    programming interface as SQL but          Trafodion
which would have been obtained
                                          small subset of data. Unlike NoSQL      scale better than built-in engines.
if the transactions were executed                                                                                           VoltDB
                                          databases, NewSQL databases are
sequentially. Finally, durability                                                 Transparent sharding
                                          not easy to categorise by technical                                              Source: Bryan, Garnier & Co
guarantees that once a transaction
                                          approach as they vary significantly     These NewSQL databases provide
has been committed, it will remain
                                          in their internal architecture.         a horizontal data micro-partition
committed even in the case of a
                                          Some support only transactional         layer to automatically split databases
power outage or crash.
                                          workloads, but others support hybrid    across multiple nodes.
                                          transactional/analytical workloads,
                                          which some NoSQL databases
                                          support too.

20 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                        21

Cloud data warehousing                   While the on-premise data
                                                                                   As for access, Amazon and
                                         warehousing software market
An increasing number of data                                                       Google provide a fully managed
                                         is dominated by Oracle, IBM,
warehouses are now available in the                                                data warehousing service that
                                         Microsoft, Teradata and SAP, the
cloud rather than on-premise. Cloud                                                is available exclusively on their
                                         cloud data warehouse market is led
data warehouses have the advantage                                                 own public cloud infrastructure.
                                         by Amazon Web Services (Redshift),
of being provisioned in minutes and                                                Oracle’s cloud data warehouse is
                                         Snowflake, Google (BigQuery),
without any technical expertise.                                                   only available within the Oracle
                                         and Oracle (Autonomous Data
They suit small and mid-sized                                                      Cloud. IBM is available as a
                                         Warehouse Cloud).
companies, business analysts, and                                                  managed cloud service on IBM’s
other non-technical users who want       Challengers in this market include        cloud infrastructure (SoftLayer),
to access, store and process large       Teradata (IntelliCloud), IBM (Db2         although it can also be deployed
amounts of data at low cost. Initially   Warehouse on Cloud), Microsoft            on all major cloud providers with
focused on simple data loading and       (Azure SQL Data Warehouse), and           the potential ‘downside’ of the
reporting, cloud data warehouses         Cloudera (Hortonworks HDP). Smaller       customer having to manage the
now support complex cases such           players exist, like MarkLogic, Pivotal,   software themselves. However,
as IoT analytics, real-time analytics,   Exasol and Yellowbrick Data, while        most of the other independent
fraud analysis and 360-degree            some large players like Alibaba, Micro    players have a multi-cloud
customer views at the hundreds of        Focus (Vertica) and Huawei have a         approach, using Amazon Web
terabyte (and even petabyte) scale.      weak presence.                            Services, Microsoft Azure, Google
And while cloud data warehouses                                                    Cloud and/or their own data
were initially only deployed for new                                               centers to run their software.
projects, some organisations are                                                   Finally, Yellowbrick Data - which
migrating their on-premise data                                                    exited stealth in July 2018 and
warehouses to the cloud, whether it                                                since then has raised USD 92m -
is a public, private or hybrid cloud.                                              has its data warehouse available
                                                                                   on-premise, on private clouds, on
                                                                                   colocation and on the edge (IoT),
                                                                                   and plans to be available
                                                                                   on public clouds in 2019.

22 | THE NEW BIG DATA WORLD                                                                                            23

3. Cloud-based data warehousing:
    the Snowflake case

                                   Snowflake,                              database performance efforts for its       FIG. 18: SNOWFLAKE COMPUTING – SUMMARY OF FUNDING ROUNDS                                             In January 2018, CEO Bob Muglia
                                   a ‘superfunded’ company                 parallel systems. Thierry Cruanes                                                                                                               publicly stated that the USD 263m
 “In many cases, Snowflake is                                              spent 13 years at Oracle, working on                                                                                                            round would probably be the last
 the only product that’s viable    The cloud-based data warehousing
                                                                           the optimisation and parallelisation        Date                    Lead investor(s)        Other investors          Capital raised/valuation   before a possible IPO. At that time,
 for customers to move to.         software vendor Snowflake
                                                                           layers in Oracle databases. Before                                                                                                              the company had more than 1,000
 Customers demand a level of       Computing is, in our view, a good                                                                                                   Redpoint ventures
                                                                           Oracle, he spent seven years at the         June 2015               Altimeter Capital       Sutter Hill              USD 71m                    customers, up from 450 in April
 scale that Amazon Redshift        example of the new generation                                                                                                       Wing VC
                                                                           IBM and Sema, working on data                                                                                                                   2017, and expected this to double
 just can’t do. We can scale       of successful data management                                                                               Iconiq Capital
                                                                           mining techniques. Finally, Marcin          April 2017                                      Existing investors       USD 105m                   over 2018. Snowflake employed 330
 to a large number of queries.     companies that have emerged from                                                                            Madrona Venture Group
                                                                           Zukowski was co-founder and CEO                                                                                                                 staff, up from 175 in April 2017, and
 Because we’re just a very large   the cloud revolution. The company                                                                           Iconiq Capital
                                                                           of Vectorwise, an analytical database       January 2018            Altimeter Capital                                USD 263m/USD 1.5bn         expected this to also almost double,
 cloud service, we throw the       develops and sells a cloud-based                                                                            Sequoia Capital
                                                                           software vendor sold to Actian in                                                                                                               to 600, during 2018. The two large
 power of the cloud at running     storage and analytics platform, the
                                                                           2010. During its first two years of         October 2018
                                                                                                                                               Sequoia Capital
                                                                                                                                                                       Existing investors
                                                                                                                                                                                                USD 450m/USD 3.5bn         funding rounds in 2018 was aimed
 things at the same time. Every    Snowflake Elastic Data Warehouse                                                    (pre- IPO)                                      Meritech Capital
                                                                           existence, Snowflake operated in                                                                                                                helping Snowflake in four areas:
 single day, we run almost 20      (EDW), which uses a cloud-based
                                                                           stealth mode, looking to create data
 million queries a day.”           infrastructure and has been available                                              Source: Crunchbase                                                                                   1.	continuing to expand its multi-
                                                                           warehouse software designed for            FIG 19
                                   since June 2015. As a testimony to                                                                                                                                                          cloud strategy beyond Amazon
                                                                           the cloud.
 Bob Muglia, CEO, Snowflake        its success, large companies like                                                                                                                                                           Web Services and Microsoft Azure;
 Computing.                        Capital One, Nielsen, and Adobe use                                                FIG. 19: SNOWFLAKE COMPUTING - TOTAL FUNDING AND VALUATION (USD M)
                                                                           The company came out of stealth
                                   Snowflake EDW. New customers in         mode in 2014, just after the                                                                                                                    2.	growing sales teams across and
                                                                                                                      4,000         TOTAL VALUATION
                                   2018 include leading brands such as     appointment of CEO Bob Muglia. At                                                                                                                   outside the US to meet increasing
                                                                                                                      3,500
                                                                                                                                    FUNDING ROUND                                                           3,500
                                   Netflix, Office Depot, Netgear, and     Microsoft between 1988 and 2011, he                                                                                                                 demand for the only cloud-built
                                   Yamaha. In our view, Snowflake’s        had worked as the product manager
                                                                                                                      3,000                                                                                                    data warehouse;
                                   success can be attributed to the        for SQL Server, Vice-President             2,500

                                   strategic vision of its founders, the                                              2,000                                                                                                3.	investing further in Snowflake’s
                                                                           Windows NT, Server Application
                                   involvement of its CEO Bob Muglia,                                                 1,500
                                                                                                                                                                                            1,500                              data warehouse-as-a-service by
                                                                           Group and .NET Services Group,
                                   and a successful funding series.                                                   1,000                                                                                                    growing its engineering team in
                                                                           then Senior Vice-President Servers &
                                                                           Tools. Then at Juniper Networks he           500
                                                                                                                                                                                             263              450              San Mateo, CA, Bellevue, WA,
                                   Snowflake Computing was founded                                                                     26                 45            105                                                    and Berlin, Germany);
                                                                           served as EVP Software & Solutions              0
                                   in 2012 in San Mateo, California,                                                                  2014             JUN 2015        APR 2017             JAN 2018         OCT 2018
                                                                           (2011-2013).
                                   by CTO Benoît Dageville, Thierry                                                                                                                                                        4.	delivering innovations such
                                                                                                                      Source: Snowflake company data
                                   Cruanes (Product Architect) and                                                                                                                                                             as Snowflake Data Sharing, also
                                                                           In June 2015, Snowflake announced
                                   Marcin Zukowski. Benoît Dageville                                                                                                                                                           known as ‘The Data Sharehouse’,
                                                                           the general availability of Snowflake
                                   spent 15 years at Oracle as the                                                                                                                                                             which further consolidates its
                                                                           EDW, as well as the first in a series of
                                   lead architect for parallel execution                                                                                                                                                       lead over legacy cloud and
                                                                           funding rounds (See Fig. 18 and 19).
                                   and a key architect in the SQL                                                                                                                                                              on-premise solutions.
                                   Manageability group. Before Oracle,
                                   he worked at Bull, where he helped
                                   define the architecture and lead

24 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                       25

Snowflake’s approach:                     The Snowflake EDW platform has the        FIG. 20: SNOWFLAKE EDW PRODUCT ARCHITECTURE
                                                                100% ‘as-a-service’                       following features:

                                                                Snowflake’s data warehouse-as-            •	The cloud storage layer is
                                                                a-service uses the SQL language              engineered to scale independently                                                                 AD-HOC ANALYTICS

                                                                to manage data. It can work with a           of compute resources. It can                                              TRANSACTIONS                                          SECURITY
                                                                mix of structured data such as CSV           process data loading and unloading
                                                                files and tables, as well as semi-           without impacting running queries.
                                                                structured data like JSON, Avro, XML         Snowflake uses micro-partitions                                   MANAGEMENT                                                         METADATA
                                                                and Parquet. It uses the Amazon              (‘sharding’) to store customer data,
                                                                Web Services S3 service as a cloud           which are compressed in columns
                                                                infrastructure. Since September 2018,        and encrypted.
                                                                                                                                                                                  LOADING
                                                                                                                                                                                                            CENTRALISED                          DEVELOPMENT

                                                                Snowflake has also been generally
                                                                available on Microsoft Azure,             •	On the compute layer, data
                                                                                                                                                                                                             STORAGE
                                                                                                                                                                               OPTIMISATION                                                       METADATA
                                                                integrated with Azure services such          processing is performed by virtual
                                                                as Microsoft Azure Data Lake Store           warehouses, which retrieve the
                                                                and Microsoft Azure Power BI. The            data required from the storage
                                                                platform also employs several new                                                   Source: Snowflake Computing, company data
                                                                                                             layer to satisfy queries, then
                                                                features, for example limitless storage      cache it locally with computing
                                                                                                                                                    The company was initially competing               Bob Muglia’s long-term vision for
                                                                accounts, accelerated networking             resources along with the caching
                                                                                                                                                    with Amazon Redshift, another                     Snowflake is that the company’s
                                                                and storage soft delete. As of today,        of query resources to improve the
                                                                                                                                                    cloud-based data warehousing                      data lake technology becomes
                                                                Snowflake is available in the Azure          performance of future queries.
                                                                                                                                                    system. However, as an increasing                 a platform for building data
                                                                East-US-2 region (in Virginia), with         Multiple data warehouses can
                                                                                                                                                    number of large firms move towards                applications. This could be for data
                                                                more regions to be added in the              simultaneously operate on the
                                                                                                                                                    cloud adoption and look for solutions             sharing between customers or to sell
                                                                future3. Whether it is running on            same data.
                                                                                                                                                    that can handle structured, semi-                 that data. He foresees companies
                                                                Amazon Web Services or Microsoft
                                                                                                                                                    structured and transactional data,                building businesses on top of the
                                                                Azure, from the US and Dublin,            •	The services layer authenticates
                                                                                                                                                    Snowflake has been increasingly                   data stored in the Snowflake data
                                                                Snowflake generally costs USD 40             user sessions, manages,
                                                                                                                                                    competing with Oracle, IBM and                    warehouse much as companies
                                                                per terabyte of data per month for           enforces security, performs query
                                                                                                                                                    Teradata. The company often picks                 have built businesses on top of the
                                                                on-demand storage, and USD 23                compilation and optimisation,
                                                                                                                                                    up customers disillusioned with                   Salesforce platform.
                                                                per terabyte per month for capacity          and coordinates transactions.
                                                                                                                                                    Hadoop, and it benefits from the
                                                                storage. From Frankfurt, pricing is          It uses a distributed metadata
                                                                                                                                                    IoT wave as the semi-structured
                                                                USD 45 and USD 24.50 respectively.           store for global state management.
                                                                                                                                                    data generated by sensors is
                                                                                                             Metadata processing is powered
                                                                                                                                                    complicated to manage with a
                                                                                                             by a separate integrated
                                                                                                                                                    structured approach.
                                                                                                             sub-system, which allows data
                                                                                                             processing without using query
                                                                                                             computer resources.

3
    Microsoft Azure has at least 14 cloud regions in the US and 54 globally.

26 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                    27

4. In-memory analytics database:
    the Exasol case

                                      Exasol, a hidden champion                    Exasol has offices in Germany,             The main memory served as a large
                                                                                   France, the UK and the US, and sells       cache, from which content was placed
 “Exasol’s vision is to provide       Headquartered in Nuremberg
                                                                                   its products through partners in the       in local storage when it was not
 the most extensible, powerful        (Germany), Exasol develops and
                                                                                   rest of Europe, Israel and the USA,        efficient to fit everything into the cache.
 and platform independent             sells a high-performance, in-memory,
                                                                                   but also directly to customers in India,   That way, it was possible to analyse
 data analytics framework for         column-oriented relational database
                                                                                   Australia and South America. As of         more data even more quickly than with
 customers to govern, connect         management and data warehousing
                                                                                   today, more than 140 companies             a pure in-memory database. Current
 and understand the entirety of       system specifically designed
                                                                                   use Exasol to run business                 main shareholders are board members,
 the data flowing through their       for analytics. It launched its first
                                                                                   intelligence and data analytics,           founders, the German State-owned
 organisations. As they transition    product in 2005. Either standalone
                                                                                   including well-known names such as         bank KfW and the Swiss early-stage
 to data-driven processes,            or integrated with Hadoop, Exasol
                                                                                   Adidas, Zalando, GfK, King Digital         venture firm Mountain Partners.
 companies need to see and            is used in a wide range of Big Data
                                                                                   Entertainment, Vodafone, Xing,
 leverage all the data necessary      use cases, including accelerating
                                                                                   Revolut, Dailymotion, and Postbank.
 for the success of their business,   standard reporting, running multi-user                                                  FIG. 21: EXASOL PRODUCT ARCHITECTURE
 wherever the data may be. For        ad-hoc analytics, and performing             Exasol’s technology was created by
 this they will need an incredibly    complex modelling. Exasol’s ability          one of its founders, Falko Mattasch,                                                                  APPLICATIONS
 powerful and intelligent data        to pull in data in real-time allows          in the 1990s. While researching
 framework which will permit          end-users to compress the multi-             intelligent algorithms in the Parallel                                                                                 REAL-TIME
 new tools, technologies and          step process of data collection and          Processing Department at the
                                                                                                                                      DATA                  ADVANCED             PREDICTIVE
                                                                                                                                                                                                           AD-HOC
                                                                                                                                                                                                                              BUSINESS            LOGICAL DATA
                                                                                                                                     SCIENCE                ANALYTICS            ANALYTICS                                  INTELLIGENCE           WAREHOUSE
 processes to be implemented          pre-processing into a rapid execution        University of Jena (Germany), he
                                                                                                                                                                                                          REPORTING

 on demand, as smoothly and as        analytics engine. This means users           showed how data analytics could
 simply as possible.”                 can perform quick ad-hoc analysis            benefit from parallel processing. After
                                      and answer critical business                 leaving academia, he implemented a
 Aaron Auld, CEO, Exasol              questions in minutes.                        new prototype for a parallel database                                                                      ANALYTICS
                                                                                   system, which served as the basis for
                                       Exasol’s core differentiation lies in its
                                                                                   the technology behind Exasol. It took
                                      proprietary technology architecture
                                                                                   years for the company to develop a                  SQL          MAP REDUCE      R           PYTHON          JAVA           LUA         SKYLINE   GEOSPATIAL       GRAPH
                                      that combines in-memory plus MPP
                                                                                   complete database, as everything
                                      (Massive Parallel Processing) which
                                                                                   was created in-house without
                                      results in extreme high querying                                                                                                      EXASOL – THE PARALLEL IN-MEMORY DATABASE
                                                                                   recourse to open source database
                                      speed at large data volumes. Their
                                                                                   cores such as PostgreSQL. This led
                                      platform integrates with a wide range
                                                                                   to a software system that offered
                                      of front-end analytics tools, such as
                                                                                   optimum performance and was
                                      Tableau and Microstrategy.
                                                                                   highly scalable and ‘tuning-free’. As                  EXASTORAGE
                                                                                                                                                                        HADOOP SYSTEMS
                                                                                                                                                                                                               DATABASES                   CLOUD STORAGE
                                                                                   data was continuously changed and
                                                                                   updated, Exasol designed the system
                                                                                   so that not all data had to be stored                                                                        DATA
                                                                                   in the main memory.
                                                                                                                              Source: Exasol company data

28 | THE NEW BIG DATA WORLD                                                                                                                                                                                                                                      29

5. Conclusion

In this report, we have reviewed Database is adding resilience. As for market consolidation, we
the deep changes happening in Oracle also has the financial might saw a significant wave of M&A
the database world, and the new to acquire cloud database or data deals between 2008 and 2013,
generation of players aiming to warehouse players if they add value before the advent of NoSQL and
challenge longstanding leaders such to its offerings and for shareholders. NewSQL databases and cloud data
as Oracle, Microsoft and IBM. The So, while it’s not dying yet, its warehouses. Many small database
question now is how the competitive competitive position is not as players have emerged since the
landscape will be reshaped ten years comfortable as it used to be. beginning of the decade, with only
on. Will Oracle remain as the #1 a handful achieving significant scale
database player? Can Amazon Web AWS can steal the #1 position in or funding. Nonetheless, some have
Services be the new #1 in this market? databases. But with a market share become ‘unicorns’, most recently
Is Hadoop definitely a fallen ‘star’? is at 9%, it could take a long time to Snowflake Computing in data
How the market will consolidate? get there. However, Amazon’s growth warehousing. We believe there will
What could be the next database comes from the cloud: for every cloud be an opportunity for incumbent
‘unicorns’? Let’s try to address customer it gets, there is a potential database companies to acquire
these questions. customer for DynamoDB, Aurora and some of these players to stay
Redshift. That said, its database and competitive with AWS. As Elastic
Oracle, in our view, will be increasingly data warehouse offerings are tied to its and MongoDB demonstrated, EV/
challenged by Amazon Web Services, cloud, so a Microsoft Azure or Google sales valuation multiples can be very
but this does not mean it will lose its Cloud customer will not use AWS rich. Consequently, adding a sizeable
leadership in the database market. database or data warehouse services. premium to the valuation of the next
Recently, AWS CEO Andy Jassy IPOs in this space is tempting.
declared that by the end of 2018 Hadoop is losing ground because
or mid-2019, 88% of Amazon’s of its complexity but is not dead in Finally, who are the new unicorns? In
databases that ran on Oracle would our view. The 2019 merger between data management, it’s now MongoDB,
be on an Amazon database instead. Cloudera and Hortonworks means Elastic and Snowflake. There are
He also noted that Amazon moved its growth is unlikely to be as strong as plenty of potential unicorns out
data warehouse from Oracle to its own expected. We think that independent there too: we think Neo4j, DataStax
service, Redshift in early November players may well disappear, probably (Cassandra), Databricks, Cockroach
2018. SAP’s long-term ambition is through M&A. However, almost all the Labs, MemSQL, Yellowbrick Data,
to replace Oracle databases running large software vendors have made Couchbase, and MariaDB could be the
on its application software with significant investment in Hadoop open next ones. In Europe, there are only
SAP HANA. Other defections may source technologies, which suggests a few players, such as Elastic in The
follow. But Oracle has the strength it is here to stay and will probably Netherlands and MariaDB in Finland,
of an installed base (several hundred evolve, supported by the open source but Exasol, a German company, has
thousand of customers) which is not Apache Software Foundation. potential to benefit from the cloud data
fully dependent on Amazon or SAP. warehouse wave as its technology is
Oracle’s databases are also evolving, based on a solid and powerful
and the new Oracle Autonomous in-memory database.

30 | THE NEW BIG DATA WORLD 31

White Paper Authors                                                                                                       Corporate Transactions
            Falk Müller-Veerse                        Gregory Ramirez
            Partner                                   Equity Research Analyst                                             Bryan, Garnier & Co leverage in-depth sector expertise to create fruitful and long lasting relationships between investors and
            Technology                                Software & IT Services                                              European growth companies.
            fmuellerveerse@bryangarnier.com           gramirez@bryangarnier.com

            Berk Kirca                                Priyanshu Bhattacharya
            Vice President                            Vice President                                                                    understanding your
                                                                                                                                        audience

            Investment Banking                        Investment Banking
                                                                                                                                         Acquired by               Strategic Investment               Acquired by              Private Placement             Acquired by
            bkirca@bryangarnier.com                   pbhattacharya@bryangarnier.com

                                                                                                                                                                                                                                   and Cisco

Technology Team                                                                                                                      Undisclosed
                                                                                                                                     Sole Advisor to the Sellers
                                                                                                                                                                    Undisclosed
                                                                                                                                                                    Sole Advisor to the Sellers
                                                                                                                                                                                                  Undisclosed
                                                                                                                                                                                                  Sole Advisor to the Seller
                                                                                                                                                                                                                               Undisclosed
                                                                                                                                                                                                                                Sole Financial Advisor
                                                                                                                                                                                                                                                         Undisclosed
                                                                                                                                                                                                                                                         Sole Advisor to the Sellers

INVESTMENT BANKING

PARIS                                                                                                                     About Bryan, Garnier & Co
            Greg Revenu                               Olivier Beaudouin                Guillaume Nathan
            Managing Partner                          Partner, Technology              Partner, Digital Media
                                                                                                                          Bryan, Garnier & Co is a European, full service growth-focused independent investment banking partnership founded in 1996.
            grevenu@bryangarnier.com                  & Smart Industries               & Business Services
                                                                                                                          The firm provides equity research, sales and trading, private and public capital raising as well as M&A services to growth companies
                                                      obeaudouin@bryangarnier.com      gnathan@bryangarnier.com
                                                                                                                          and their investors. It focuses on key growth sectors of the economy including Technology, Healthcare, Consumer and Business
                                                                                                                          Services. Bryan, Garnier & Co is a fully registered broker dealer authorised and regulated by the FCA in Europe and the FINRA
            Thibaut De Smedt                          Philippe Patricot                                                   in the U.S. Bryan, Garnier & Co is headquartered in London, with additional offices in Paris, Munich and New York. The firm
            Partner,                                  Managing Director,                                                  is a member of the London Stock Exchange and Euronext.
            Application Software                      Technology
            & IT Services                             ppatricot@bryangarnier.com
            tdesmedt@bryangarnier.com
MUNICH                                                                                                                    Bryan, Garnier & Co Technology Equity Research Coverage
            Falk Müller-Veerse                        Lars Dürschlag
            Partner                                   Managing Director
            Technology                                lduerschlag@bryangarnier.com
            fmuellerveerse@bryangarnier.com

EQUITY RESEARCH ANALYST TEAM

            Richard-Maxime Beaudoux                   Thomas Coudry                    Gregory Ramirez
            Video Games, Payment                      Telecoms & Media                 Equity Research Analyst
            & Security                                tcoudry@bryangarnier.com         Software & IT Services
            rmbeaudoux@bryangarnier.com                                                gramirez@bryangarnier.com

            David Vignon                              Fréderic Yoboué
            Industrial Software                       Semiconductors
            dvignon@bryangarnier.com                  fyoboue@bryangarnier.com

EQUITY CAPITAL MARKETS                        EQUITY DISTRIBUTION

            Pierre Kiecolt-Wahl                       Nicolas-Xavier de Montaigut      Nicolas d’Halluin                                                                           5 Analysts | 50+ Stocks Covered
            Partner                                   Partner, Head of Distribution    Partner, Head of US Distribution
            pkiecoltwahl@bryangarnier.com             nxdemontaigut@bryangarnier.com   ndhalluin@bryangarnier.com         With more than 150 professionals based in London, Paris, Munich and New York, Bryan, Garnier & Co
                                                                                                                          combines the services and expertise of a top-tier investment bank with a long-term client focus.

32 | THE NEW BIG DATA WORLD

LONDON                                              PARIS                                             MUNICH                                           NEW YORK
Beaufort House                                      26 Avenue des Champs Elysées                      Widenmayerstrasse 29                             750 Lexington Avenue
15 St. Botolph Street                               75008 Paris                                       80538 Munich                                     New York, NY 10022
London, EC3A 7BB                                    France                                            Germany                                          USA
UK

T: +44 (0) 20 7332 2500                             T: +33 (0) 1 56 68 75 00                          T: +49 89 242 262 11                             T: +1 (0) 212 337 7000
F: +44 (0) 20 7332 2559                             F: +33 (0) 1 56 68 75 01                          F: +49 89 242 262 51                             F: +1 (0) 212 337 7002

Authorised and regulated by the Financial           Regulated by the Financial Conduct                                                                 FINRA and SIPC member
Conduct Authority (FCA)                             Authority (FCA) and the Autorité de Contrôle
                                                    prudential et de resolution (ACPR)

IMPORTANT INFORMATION

This document is classified under the FCA Handbook                not reflected in this Report. The Firm or an associated         United States to “Major US Institutional Investors”
as being investment research (independent research).              company may have a consulting relationship with a               as defined in SEC Rule 15a-6 and may not be furnished
Bryan Garnier & Co Limited has in place the measures              company which is the subject of this Report.                    to any other person in the United States. Each Major
and arrangements required for investment research                                                                                 US Institutional Investor which receives a copy of this
as set out in the FCA’s Conduct of Business Sourcebook.           This Report may not be reproduced, distributed                  Report by its acceptance hereof represents and agrees
                                                                  or published by you for any purpose except with the             that it shall not distribute or provide this Report to any other
This report is prepared by Bryan Garnier & Co Limited,            Firm’s prior written permission. The Firm reserves all          person. Any US person that desires to effect transactions
registered in England Number 03034095 and its MIFID               rights in relation to this Report.                              in any security discussed in this Report should call or write
branch registered in France Number 452 605 512.                                                                                   to our US affiliated broker, Bryan Garnier Securities, LLC.
Bryan Garnier & Co Limited is authorised and                      Past performance information contained in this Report           750 Lexington Avenue, New York NY 10022.
regulated by the Financial Conduct Authority (Firm                is not an indication of future performance. The information     Telephone: 1-212-337-7000.
Reference Number 178733) and is a member of the                   in this report has not been audited or verified by an
London Stock Exchange. Registered address:                        independent party and should not be seen as an                  This Report is based on information obtained from
Beaufort House 15 St. Botolph Street, London EC3A 7BB,            indication of returns which might be received by investors.     sources that Bryan Garnier & Co Limited believes to
United Kingdom.                                                   Similarly, where projections, forecasts, targeted or            be reliable and, to the best of its knowledge, contains
                                                                  illustrative returns or related statements or expressions       no misleading, untrue or false statements but which it has
This Report is provided for information purposes only             of opinion are given (“Forward Looking Information”)            not independently verified. Neither Bryan Garnier & Co
and does not constitute an offer, or a solicitation of an         they should not be regarded as a guarantee, prediction          Limited and/or Bryan Garnier Securities LLC make
offer, to buy or sell relevant securities, including securities   or definitive statement of fact or probability. Actual events   no guarantee, representation or warranty as to its
mentioned in this Report and options, warrants or rights          and circumstances are difficult or impossible to predict        accuracy or completeness. Expressions of opinion
to or interests in any such securities. This Report is for        and will differ from assumptions. A number of factors,          herein are subject to change without notice. This Report
general circulation to clients of the Firm and as such is         in addition to the risk factors stated in this Report, could    is not an offer to buy or sell any security.
not, and should not be construed as, investment advice or         cause actual results to differ materially from those in any
a personal recommendation. No account is taken of the             Forward Looking Information.                                    Bryan Garnier Securities, LLC and/or its affiliate, Bryan
investment objectives, financial situation or particular needs                                                                    Garnier & Co Limited may own more than 1% of the
of any person.                                                    Disclosures specific to clients in the United Kingdom           securities of the company(ies) which is (are) the subject
                                                                  This Report has not been approved by Bryan Garnier              matter of this Report, may act as a market maker in the
The information and opinions contained in this Report             & Co Limited for the purposes of section 21 of the              securities of the company(ies) discussed herein, may
have been compiled from and are based upon generally              Financial Services and Markets Act 2000 because it is           manage or co-manage a public offering of securities for
available information which the Firm believes to be               being distributed in the United Kingdom only to persons         the subject company(ies), may sell such securities to
reliable but the accuracy of which cannot be guaranteed.          who have been classified by Bryan Garnier & Co Limited          or buy them from customers on a principal basis and
All components and estimates given are statements                 as professional clients or eligible counterparties. Any         may also perform or seek to perform investment banking
of the Firm, or an associated company’s, opinion only             recipient who is not such a person should return the            services for the company(ies).
and no express representation or warranty is given or             Report to Bryan Garnier & Co Limited immediately
should be implied from such statements. All opinions              and should not rely on it for any purposes whatsoever.          Bryan Garnier Securities, LLC and/or Bryan Garnier & Co
expressed in this Report are subject to change without                                                                            Limited are unaware of any actual, material conflict
notice. To the fullest extent permitted by law neither the        NOTICE TO US INVESTORS                                          of interest of the research analyst who prepared this
Firm nor any associated company accept any liability                                                                              Report and are also not aware that the research analyst
whatsoever for any direct or consequential loss arising           This research report (the “Report”) was prepared                knew or had reason to know of any actual, material
from the use of this Report. Information may be available         by Bryan Garnier & Co Limited for information purposes          conflict of interest at the time this Report is distributed
to the Firm and/or associated companies which are                 only. The Report is intended for distribution in the            or made available.

                                                                                                                                                                                                     03.19-TCS

You can also read