The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...

Page created by Dustin Bryant
 
CONTINUE READING
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
The time for
serverless is now!
Serverless Architecture
Whitepaper
Up in the Cloud: Step by step
towards serverless applications,
platforms and a cloud-native ecosystem

  @ServerlessCon # SLA_con   www.serverless-architecture.io
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
Content
Serverless Development
First things first                                                        3
Your first step towards serverless application development

Quarkus: Modernizing Java to keep pace in a cloud-native world7
Scaling the modern app world

Serverless Architecture & Design
Why platform as a service is such a great model9
Looking into the future of PaaS

The time for serverless is now – tips for getting started11
If not now, when?

Building a data platform on Google Cloud Platform                        13
Laying the groundwork for big data

Migrating big data workloads to Azure HDInsight –
Smoothing the path to the cloud with a plan17
Strategies for big data migration

Serverless Engineering & Operations
Cloud-Native DevOps20
The driving force behind the digital transformation of
modern enterprises

Serverless Security                                                      25
Basic considerations on the subject of serverless architecture security

www.serverless-architecture.io       @ServerlessCon # SLA_con              2
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
WHITEPAPER                                                                  Serverless Development

Your first step towards serverless application development

First things first
In this article, Kamesh Sampath shows us how to master the first steps
on the journey towards a serverless application. He shows how to set up
the right environment and takes us through its deployment.

by Kamesh Sampath                                              RAM, 6 CPUs and 50 GB hard disk space. The boot
                                                               command also contains a few additional configurations
In the first part of this article, we will deal with setting   for the Kubernetes cluster that are necessary to get Kna-
up a development environment that is suitable for Kna-         tive up and running. It is also important that the used
tive in version 0.6.0. The second part deals with the          Kubernetes version is not older than version 1.12.0,
deployment of your first serverless microservice. The          otherwise Knative will not work. If Minikube doesn’t
basic requirement for using Knative to create serverless       start immediately, it’s completely normal; it can take a
applications is a solid knowledge of Kubernetes. If you        few minutes until the initial startup is complete, so you
are still inexperienced, you should complete the official      should be a little patient when setting it up.
basic Kubernetes tutorial [1].
   Before we get down to the proverbial “can do”, a few        Setting up an Istio Ingress Gateway
tools and utilities have to be installed:                      Knative requires an Ingress Gateway to route requests to
                                                               Knative Services. In addition to Istio [6], Gloo [7] is also
• Minikube [2]                                                 supported as an Ingress Gateway. For our example, we
• kubectl [3]                                                  will use Istio, though. The following steps show how to
• kubens [4]                                                   perform a lightweight installation of Istio that contains
                                                               only the Ingress Gateway:
For Windows users, WSL [5] has proven to be quite use-
ful, so I recommend installing that as well.                    curl -L https://raw.githubusercontent.com/knative/serving/release-0.6/
                                                                third_party/istio-1.1.3/istio-lean.yaml \
Setting up Minikube                                             | sed ‘s/LoadBalancer/NodePort/’ \
Minikube is a single node Kubernetes cluster that is ide-       | kubectl apply --filename –
al for everyday development with Kubernetes. After the
setup, the following steps must be performed to make           Like the setup of Minikube, the deployment of the Istio
Minikube ready for deployment with Knative Serving.            Pod takes a few minutes. With the command kubectl
Listing 1 shows what this looks like in the code.              —namespace istio-system get pods –watch you can see
   First, a Minikube profile must be created, which is         the status; the overview is finished with Ctrl + C. Whe-
what the first line achieves. The second command is then       ther the deployment was successful or not can be easi-
used to set up a Minikube instance that contains 8 GB          ly determined with the command kubectl –namespace

www.serverless-architecture.io                             @ServerlessCon # SLA_con                                                      3
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
WHITEPAPER                                                                                    Serverless Development

istio-system get pods. If everything went well, the output                        Create the deployment and service
should look like Listing 2.                                                       By applying the previously created YAML file, we can
                                                                                  create the deployment and service. This is done using
Installing Knative Serving                                                        the kubectl apply –filename app.yaml command. Also,
The installation of Knative Serving [8] allows us to run                          at this point, the command kubectl get pods –watch
serverless workloads on Kubernetes. It also provides au-                          can be used to get information about the status of the
tomatic scaling and tracking of revisions. You can ins-                           application, while CTRL + C terminates the whole
tall Knative Serving with the following commands:                                 thing. If all went well, we should now have a deploy-
                                                                                  ment called greeter and a service called greeter-svc (Lis-
 kubectl apply --selector knative.dev/crd-install=true \                          ting 5).
 --filename https://github.com/knative/serving/releases/download/v0.6.0/             To activate a service, you can also use a Minikube
 serving.yaml                                                                     shortcut like minikube service greeter-svc, which opens
                                                                                  the service URL in your browser. If you prefer to use
 kubectl apply --filename https://github.com/knative/serving/releases/            curl to open the same URL, you have to use the com-
 download/v0.6.0/serving.yaml --selector networking.knative.dev/certificate-      mand curl $(minikube service greeter-svc –url). Now
 provider!=cert-manager                                                           you should see a text that looks something like this: Hi
                                                                                  greeter => ‘9861675f8845’ : 1
Again, it will probably take a few minutes until the Knati-
ve Pods are deployed; with the command kubectl –name-                             Migrating the traditional Kubernetes
space knative-serving get pods –watch you can check the                           deployment to serverless with Knative
status. As before, the check can be aborted with Ctrl + C.                        The migration starts by simply copying the app.yaml
With the command kubectl –namespace knative-serving                               file, naming it serverless-app-yaml and updating it to the
get pods you can check if everything is running. If this is                       lines shown in Listing 6.
the case, an output like in Listing 3 should be displayed.                           If we compare the traditional Kubernetes application
                                                                                  (app.yaml) with the serverless application (serverless-
Deploy demo application
The application we want to create for demonst-
ration is a simple greeting machine that outputs                                    Listing1
“Hi”. For this we use an existing Linux container
image, which can be found on the Quay website [9].                                   minikube profile knative
The first step is to create a traditional Kubernetes de-
ployment that can then be modified to use serverless                                 minikube start -p knative --memory=8192 --cpus=6 \
functionality. This will make clear where the actual dif-                             --kubernetes-version=v1.12.0 \
ferences lie and how to make existing deployments using                               --disk-size=50g \
Knative serverless.                                                                   --extra-config=apiserver.enable-admission-plugins=”LimitRanger,Namesp
                                                                                     aceExists,NamespaceLifecycle,ResourceQuota,ServiceAccount,DefaultStora
Create a Kubernetes resource file                                                    geClass,MutatingAdmissionWebhook”
The following steps show how to create a Kubernetes
resource file. To do this, you must first create a new file
called app.yaml, into which the code in Listing 4 must
be copied.                                                                          Listing 2
                                                                                     NAME                          READY STATUS RESTARTS AGE
                                                                                     cluster-local-gateway-7989595989-9ng8l 1/1 Running 0
   Session: From Monolith to Serverless:                                             2m14s
                                                                                     istio-ingressgateway-6877d77579-fw97q 2/2 Running 0
   Rethinking your Architecture
                                                                                     2m14s
   Michael Dowden
                                                                                     istio-pilot-5499866859-vtkb8        1/1 Running 0     2m14s
             It’s easy to understand the benefits of
             serverless but it’s not always easy to un-
             derstand how this will impact our software
             architecture. In this talk we will deconst-                            Listing 3
   ruct a set of requirements and walk through the
   architecture of both a traditional service-oriented                               NAME                       READY STATUS RESTARTS AGE
   architecture and a modern serverless architecture.                                activator-54f7c49d5f-trr82     1/1 Running 0      27m
   You’ll leave with a better understanding of how to                                autoscaler-5bcd65c848-2cpv8      1/1 Running 0      27m
   design event-driven systems and serverless APIs,                                  controller-c795f6fb-r7bmz      1/1 Running 0      27m
   along with some alternatives to the traditional                                   networking-istio-888848b88-bkxqr 1/1 Running 0       27m
   RESTful API layer.                                                                webhook-796c5dd94f-phkxw          1/1 Running 0      27m

www.serverless-architecture.io                                                 @ServerlessCon # SLA_con                                                       4
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
WHITEPAPER                                                                                       Serverless Development

 Listing 4
   ---                                                         - name: greeter                                            path: /healthz
   apiVersion: apps/v1                                           image: quay.io/rhdevelopers/knative-                     port: 8080
   kind: Deployment                                        tutorial-greeter:quarkus                              ---
   metadata:                                                     resources:                                      apiVersion: v1
     name: greeter                                                 limits:                                       kind: Service
   spec:                                                             memory: “32Mi”                              metadata:
     selector:                                                       cpu: “100m”                                   name: greeter-svc
       matchLabels:                                              ports:                                          spec:
        app: greeter                                             - containerPort: 8080                             selector:
     template:                                                   livenessProbe:                                      app: greeter
       metadata:                                                   httpGet:                                        type: NodePort
        labels:                                                      path: /healthz                                ports:
          app: greeter                                               port: 8080                                    - port: 8080
       spec:                                                     readinessProbe:                                     targetPort: 8080
        containers:                                                httpGet:

 Listing 5                                                                            Listing 7
   $ kubectl get deployments                                                            $ kubectl get deployments
   NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE                                        NAME                  DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
   greeter 1        1     1  1    16s                                                   greeter              1      1    1       1       30m
                                                                                        greeter-bn8cm-deployment 1       1     1       1      59s
   $ kubectl get svc
   NAME         TYPE   CLUSTER-IP    EXTERNAL-IP PORT(S)      AGE
   greeter-svc NodePort 10.110.164.179       8080:31633/TCP 50s
                                                                                      Listing 8
                                                                                        $ kubectl get services
                                                                                        NAME                 TYPE       CLUSTER-IP      EXTERNAL-IP
                                                                                        PORT(S)        AGE
 Listing 6                                                                              greeter             ExternalName         istio-ingressgateway.istio-
  ---                                                                                   system.svc.cluster.local         114s
  apiVersion: serving.knative.dev/v1alpha1                                              greeter-bn8cm          ClusterIP 10.110.208.72
  kind: Service                                                                         80/TCP         2m21s
  metadata:                                                                             greeter-bn8cm-metrics ClusterIP 10.100.237.125
    name: greeter                                                                       9090/TCP        2m21s
  spec:                                                                                 greeter-bn8cm-priv ClusterIP 10.107.104.53
    template:
                                                                                        80/TCP         2m21s
      metadata:
       labels:
         app: greeter
      spec:
       containers:                                                                    Listing 9
       - image: quay.io/rhdevelopers/knative-tutorial-greeter:quarkus
         resources:                                                                     kubectl get services.serving.knative.dev
           limits:                                                                      NAME URL                            LATESTCREATED LATESTREADY
             memory: “32Mi”                                                             READY REASON
             cpu: “100m”                                                                greeter http://greeter.default.example.com greeter-bn8cm greeter-
         ports:                                                                         bn8cm True
         - containerPort: 8080
         livenessProbe:                                                                 Attention
           httpGet:                                                                     In a Minikube deployment we will have neither LoadBalancer nor DNS to
             path: /healthz                                                             resolve anything to *.example.com or a service URL like http://greeter.
         readinessProbe:                                                                default.example.com. To call a service, the host header must be used with
           httpGet:                                                                     http/curl.
             path: /healthz

www.serverless-architecture.io                                               @ServerlessCon # SLA_con                                                               5
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
WHITEPAPER                                                                 Serverless Development

app.yaml), we find three things. Firstly, no additional        find out the address of the Istio gateway we have to use
service is needed, as Knative will automatically crea-         in the http/curl call, the following command can be used:
te and route the service. Secondly, since the definition
of the service is done manually, there is no need for           IP_ADDRESS=”$(minikube ip):$(kubectl get svc istio-ingressgateway
selectors anymore, so the following lines of code are           --namespace istio-system --output ‘jsonpath={.spec.ports[?(@.port==80)].
omitted:                                                        nodePort}’)”

 selector:                                                     The command receives the NodePort of the service is-
      matchLabels:                                             tio-ingressgateway in the namespace istio-system. If we
        app: greeter                                           have the NodePort of the istio-ingressgateway, we can
                                                               call the greeter service via $IP_ADDRESS by passing
Lastly, under TEMPLATE | SPEC | CONTAINERS                     the host header with http/curl calls.
name: is omitted because the name is automatically ge-
nerated by Knative. In addition, no ports need to be de-        curl -H “Host:greeter.default.example.com” $IP_ADDRESS
fined for the probe’s liveness and readiness.
                                                                 Now you should get the same answer as with
Deploying the serverless app                                   traditional Kubernetes deployment (Hi greeter =>
The deployment follows the same pattern as before,             ‘9861675f8845’ : 1). If you allow the deployment to
using the command kubectl apply –filename serverless-          be in idle mode for about 90 seconds, the deployment
app.yaml. The following objects should have been crea-         will be terminated. At the next call, the scheduled de-
ted after the successful deployment of the serverless          ployment is then reactivated, and the request is answe-
application: The deployment should now have been               red.
added (Listing 7). A few new services should also be             Congratulations, you have successfully deployed and
available (Listing 8), including the ExternalName ser-         called your first serverless application!
vice, which points to istio-ingressgateway.istio-system.
svc.cluster.local. There should also be a Knative service
available with a URL to which requests can be sent (Lis-                 Kamesh is a Principal Software Engineer at Red Hat.
ting 9).                                                                 As part of his additional role as Director of Developer
   To be able to call a service, the request must go                     Experience at Red Hat, he actively educates on Ku-
                                                                         bernetes/OpenShift, Service Mesh, and serverless
through the ingress or gateway (in our case Istio). To                   technologies. With a career spanning close to two
                                                               decades, most of Kamesh’s career was with services industry
                                                               helping various enterprise customers build Java-based solu-
  Session: Serverless Development Life-                        tions. Kamesh has been a contributor to Open Source pro-
                                                               jects for more than a decade and he now actively contributes
  cycle: How to Design, Structure and                          to projects like Knative, Quarkus, Eclipse Che etc. As part of
  Maintain Serverless Projects                                 his developer philosophy he strongly believes in: “Learn more,
                                                               do more and share more!”
  Jorge Vargas
           We’ve been working with AWS Lambda
           for more than 2 years in production; we
           started with a small and iterative process,
           which helped us to quickly adapt and
  iterate our projects, but we also made a lot of mis-
  takes along the way. We didn’t know how to auto-
  mate our serverless deployments, how to manage
  serverless tasks or how to structure our projects.
  This led to manual work, maintainability and or-
                                                               Links & Literature
  chestration issues. ​Luckily, after 2 years we’ve lear-
                                                               [1] https://kubernetes.io/docs/tutorials/kubernetes-basics/
  ned a lot about what to do and what not to do
  when it comes to serverless systems on AWS. We               [2] https://kubernetes.io/docs/tasks/tools/install-minikube/
  would like to show our initial mistakes, how we              [3] https://kubernetes.io/docs/tasks/tools/install-
  overcame them and what has worked for our                        kubectl/#install-kubectl-on-linux
  team. We will also talk about how to be as lean as           [4] https://github.com/ahmetb/kubectx/blob/master/
  possible while maintaining flexibility and robust-               kubens/
  ness in a serverless system. We will show how to             [5] https://docs.microsoft.com/en-us/windows/wsl/install-
  use, configure and extend a serverless framework                 win10
  and how to combine it with Terraform. Additio-
                                                               [6] https://istio.io
  nally, we will show how to easily structure, develop,
  build and deploy AWS components, Lambda func-                [7] https://gloo.solo.io
  tions, and lambda@edge functions.                            [8] https://knative.dev/docs/serving/
                                                               [9] https://quay.io/rhdevelopers/knative-tutorial-greeter

www.serverless-architecture.io                              @ServerlessCon # SLA_con                                                       6
The time for serverless is now! - Serverless Architecture Whitepaper Up in the Cloud: Step by step towards serverless applications, platforms and ...
WHITEPAPER                                                              Serverless Development

Scaling the modern app world

Quarkus: Moderni-
zing Java to keep
pace in a cloud-na-
tive world
Java is no spring chicken and some are even referring to it as a “vintage
language”. Despite its popularity, there are some complaints about it. In
our new cloud-native world, why does Java need to evolve? In order to
evolve to keep up with modern, cloud-native apps, Java needs to keep all
of what makes it so dependable, while also being able to function in new
app environments.

by Rich Sharples                                              that release cycles have been shortened to every six
                                                              months, and Java 13 was just recently announced at this
Don’t worry, you are not the only one who feels old           year’s Oracle OpenWorld. It addition to never dipping
when you hear Java being described as a “vintage” pro-        below number two on the TIOBE index, SlashData [2]
gramming language. While Java has been around since           has predicted that there will be 7.6 million Java develo-
1995, it is certainly not ready to retire (or rather, be      pers by the end of 2019.
retired), and continues to rank among the top languages         Java has many advantages, including being designed
TIOBE index [1]. In fact, no other language has been so       for ease of use, and it is often said that it is easier to
popular for so long.                                          write, compile and debug in Java than in any other pro-
  However, it is not without its issues, including someti-    gramming language. This, coupled with the fact that it
mes being too clunky to keep up with some of the newer        ranks among the top programming languages used by
programming languages, not agile and flexible enough to       companies in the Fortune 25, means that it continues to
work in this new world of containers, and not really re-      remain relevant, even as shiny new programming langu-
levant in applications that are not coded to be Java first.   ages like Rust, Elixir, and Swift come on to the scene.
While they say you can’t teach an old dog new tricks, you
can rethink how it performs what they already know.           Why does Java need to evolve?
  This piece will discuss what the community can do           The disconnect between modern application develop-
to help the language keep up with modern application          ment and Java is that the apps built on newer program-
development trends, to ensure that it continues to have a     ming languages tend to be more lightweight, agile and
place in the new cloud-native programming world.              flexible, often running in containers, which traditionally
                                                              Java has not been well-equipped for.
Why has Java stood the test of time?                             Common complaints include:
It has been said that Java is having a “Renaissance Mo-
ment” where the programming language keeps evolving.          • Java is too fat, often starting with libraries that are
In fact, there is so much demand for new innovations            not used. This does not bode well for microservices

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                        7
WHITEPAPER                                                             Serverless Development

  architectures but does work when the Java applica-         hout having to learn an entirely new language and shift
  tion is being used to solve a more complex problem.        how they work.
• It still follows the “write once, run anywhere” prin-
  ciple, meaning that any device that has a Java Virtual     Java in the modern application
  Machine (JVM) should be able to successfully – me-         development world
  aning without it being altered – run a Java app. While     When I say modern application development, I am re-
  this is generally a good feature, it is not as important   ferring to environments like Kubernetes and Serverless,
  when targeting containers.                                 both of which rely on containers for deploying code into
• Java has a longer start-up time when compared with         production, that up until very recently, Java has been
  newer apps, which goes back to it being really good        incompatible with.
  at having everything it needs to solve complex prob-          Long time Java leaders like Red Hat are aiming to
  lems, but leaves something to be desired in terms of       make it a key player in these environments, through
  simpler processes.                                         initiatives like Quarkus, which is a Kubernetes-native
• Having too many libraries, and therefore having a          Java framework tailored for GraalVM and OpenJDK
  large package size, slows down the start-up time and       Hostpot. By offering developers the ability to use Java
  makes the Java app less agile.                             in a unified reactive and imperative programming mo-
• Some also say that Java is too verbose and that more       del, Quarkus aims to enable developers to work within
  modern languages can do the same thing with less           Kubernetes and serverless environments without having
  code.                                                      to learn a new paradigm. It can deliver new runtime ef-
• Java is a very dynamic language, which is part of          ficiencies to try to tackle some of what currently makes
  what makes it so productive and agile but can also         Java stuck in the past, including faster startup time, lo-
  cause some frameworks to abuse the dynamic ca-             wer memory utilization and a smaller application and
  pabilities, resulting in longer startup time and large     container image footprint.
  memory overhead.                                              Through frameworks like Quarkus, I believe Java will
• It is not always the best equipped language to hand-       be better equipped to scale in the modern application
  le event driven architectures where concurrency            development landscape and continue to not only evolve
  and throughput are more important. Java’s plan to          but also innovate. Because that is what is key here –
  address this is through Fibers.                            creating a path to the future for cloud-native Java and
                                                             in doing so, keep Java at the center of enterprise inno-
In order to evolve to keep up with modern, cloud-na-         vation.
tive apps, Java needs to keep all of what makes it so
dependable, while also being able to function in new                  Rich Sharples is the Senior Director of Product Ma-
app environments. Part of Java’s renaissance moment                   nagement in the Application Platforms Business
                                                                      Group at Red Hat. He has spent the last twenty ye-
is that developers are beginning to realize that, and
                                                                      ars evangelizing, using, and designing Enterprise
are doing what they can to modernize Java while not                   Middleware. He previously worked for Forte Software
straying too far from the tried and true benefits of the     and Sun Microsystems and as an independent software de-
language. This can enable allow the millions of current      veloper and consultant building large distributed software
                                                             systems for the space, transport, and energy sectors. He also
Java developers to expand the work they can do wit-          serves on the node.js Foundation Board of Directors. In his
                                                             spare time he enjoys running, cycling, and anything that gets
                                                             him outdoors.
  Session: The Real Cost of Pay-Per-Use in
  Serverless
  Ran Ribenzaft
           Pay-per-use is one of the major drivers of
           serverless adoption. Small startups love it
           because their monthly bill is almost zero.
           Large organizations are attracted to im-
  proving their IT spending when the old servers
  have very low utilization and are mostly idle. While
  it sounds promising, it also produces a massive
  challenge – paying per use means that you don’t
  know how much you are going to pay – why? Be-
  cause most of us don’t know exactly how much we
  are going to use. In addition, new and unique chal-
  lenges arise – a bug in the code can suddenly lead
                                                             Links & Literature
  to a very high cloud bill, as well as an external API
                                                             [1] https://www.tiobe.com/tiobe-index/
  that has a very slow response, causing us to pay for
  this additional time.                                      [2] https://slashdata-website-cms.s3.amazonaws.com/
                                                                 sample_reports/ZAamt00SbUZKwB9j.pdf

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                      8
WHITEPAPER                                             Serverless Architecture & Design

Looking into the future of PaaS

Why platform as
a service is such a
great model
By now, we have all become used to Software as a Service (SaaS), but for many,
the idea of platform as a service is still fairly new. PaaS offers enterprise-ready
infrastructure, systems, and tools and is the logical next step of SaaS. What can
we expect from the future of platform as a service? For one, the requirements
to use PaaS has only gotten cheaper and easier.

by MicroStartups                                             (software as a service). We first saw the advent of inter-
                                                             net-supported suites such as Adobe’s Creative Cloud,
  Platform as a service (PaaS) involves providing an in-     and everyone got used to that (by now, it’s fair to say
stantly-deployable set of cloud computing services that      that Creative Cloud has some serious clout [2]) — then
can scale to meet the user’s needs and is typically priced   came the advent of software solutions accessible only
by resource use. Though such services have been around       through online portals, with Salesforce being one of the
for some time, their regular practicality has steadily       earliest success stories.
trended up over the years as systems have been updated          People jumped on SaaS because it skirts the need to
and the fundamental requirement to use them — fast           invest heavily in expensive hardware such as server
and stable internet access — has become cheaper and          equipment, as well as the awkward setup phase of get-
easier.                                                      ting software up and running across different devices.
  There are currently two suites dominating the cloud        Provided you can reach a business-level internet con-
computing field (AWS (Amazon Web Services) and Mi-           nection, you can use SaaS — the pricing keeps going
crosoft Azure [1]), and they’re changing how the busi-       down alongside bandwidth costs, and since the location
ness world approaches digital activity in general. It’s      doesn’t matter, you can work from anywhere without
clear that there’s a huge long-term market for this type     needing an office.
of system — Microsoft and Amazon, two giants of the             PaaS takes this concept to its logical zenith. Instead of
business world, wouldn’t have made such commitments          simply offering software that can be accessed through
to theirs otherwise.                                         the internet, it offers enterprise-ready infrastructure plus
  So, what is it about the platform as a service (PaaS)      all the systems you need to put it to good use: tools for
model that makes it so compelling? How is it changing        development, design, testing, CRM, CRO, and essenti-
how business is done, and what might the future hold?        ally anything on the market the provider has created (or
Let’s discuss.                                               licenced).

What PaaS involves                                           How PaaS brings simplicity and efficiency
   What we’ve seen over the last decade is the steady on-    to business
line migration of local tasks, ultimately constituting the     No matter what you want to achieve as a business
biggest portent of the still-nascent PaaS industry: SaaS     with a keen eye on the digital world, you can drive it

www.serverless-architecture.io                           @ServerlessCon # SLA_con                                      9
WHITEPAPER                                              Serverless Architecture & Design

through a PaaS solution with no need to plan ahead,           compelling choice in the PaaS world. The more it can
go through cumbersome setup procedures, or painsta-           offer online, the more people will be inclined to invest
kingly pick out integration-capable tools. The tools and      in the Microsoft ecosystem — if you’re going to migrate
options offered through a PaaS system are guaranteed          everything to Azure [5], you might as well go all-in on
to work together.                                             the company.
   The advantages don’t stop there. Think about how
costly it can be to acquire licences for high-level soft-     What can we expect in the future?
ware suites [3], particularly if you’re a small business.       For users, we should see the convenience of IT resour-
Because PaaS companies work in bulk, tying tools to-          ces continue to rise as the PaaS model becomes further
gether and making them available to many clients, it ty-      entrenched in everyday business. Just as SaaS seemed
pically works out as significantly cheaper to use a PaaS      novel one minute and near-mandatory the next, expec-
service. What’s more, since PaaS pricing is determined        tations will soon have adjusted to the extent that using
by resource use, you don’t lose out if you don’t use it       on-site platforms will seem antiquated and cumberso-
for a while.                                                  me.
   There are some reasons why people dislike PaaS ser-          For PaaS in general, the best thing for the digital
vices, of course: they’re wary about cloud-based servi-       landscape will be an influx of competitors to give peo-
ces because they want full control over their data, or        ple realistic alternatives to Azure and Amazon’s AWS.
they don’t have sufficiently-stable internet connections,     Narrowed options lead to stagnation and profiteering,
or they fear getting invested in a system only to see the     after all. 10 years from now, every business should have
pricing increase exponentially and leave them in a tough      a wide range of PaaS options from which it can select an
spot (replatforming is tough enough in the SaaS-driven        ecosystem that perfectly suits its needs.
ecommerce world [4], so imagine attempting the same
thing for an industry-spanning enterprise company).                    MicroStartups is a business community that celebra-
   On the whole, though, there’s every reason to be posi-              tes inspiring startups, small businesses, and entre-
                                                                       preneurs. Whether you’re a solopreneur or a startup
tive — if you can get good internet access, the other fears
                                                                       making your way in the business world, we’re here
are unfounded. The implementation of GDPR has chan-                    to help. For the latest news, inspiring stories and ac-
ged how data security is viewed, and as long as viable        tionable advice, follow us on Twitter @getmicrostarted.
PaaS alternatives keep appearing, we’ll avoid a duopoly.

The role of Azure in Microsoft’s business
model
  Where Amazon yields all of its cloud computing reve-
nue from AWS, Microsoft has Office 365 and Dynamics
alongside Azure, and together they recently achieved an
annual revenue of $9.6 billion (as announced in the la-
test quarterly earnings). This is a significant portion of
Microsoft’s bottom line, and it stands to keep getting
bigger.
  This makes a lot of sense when you think about
Microsoft’s identity today. While it doesn’t really get
to compete at the high end as far as consumer products
go (the Surface line has never threatened Apple’s Mac-
books and iPads, for example), it maintains a strang-
lehold on business operating systems that makes it a

  Session: Let a thousand flower bloom?                       Links & Literature
  Scaling Serverless SaaS in the
                                                              [1] https://www.zdnet.com/article/top-cloud-providers-2019-
  enterprise                                                      aws-microsoft-azure-google-cloud-ibm-makes-hybrid-
  Holger Reinhardt                                                move-salesforce-dominates-saas/
          How to scale and converge serverless                [2] https://www.vice.com/en_us/article/3kgw83/is-adobes-
          Microservice architecture and patterns                  creative-cloud-too-powerful-for-its-own-good
          using engineering blue prints. Using deve-          [3] https://www.financialdirector.co.uk/2018/07/31/the-
          loper evangelism and Inner Source to                    hidden-costs-of-software-licensing-and-how-to-avoid-
  drive adaption of common patterns across the                    them/
  organisation, create composable applications and            [4] https://www.shopify.com/enterprise/ecommerce-
  accelerate time to market.                                      replatforming-guide
                                                              [5] https://jaxenter.com/cloud-big-data-azure-161270.html

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                        10
WHITEPAPER                                              Serverless Architecture & Design

If not now, when?

The time for ser-
verless is now –
tips for getting
started
Just because everyone is talking about something, that doesn’t mean it’s
actually worth your time. Chris Wahl shares his experiences getting to grips
with serverless technology, what he learned throughout the process, and
whether, ultimately, serverless is something worth considering.

by Chris Wahl                                                 mewhere. This is often the case with on-premises inf-
                                                              rastructure; you will find a little pizza box server or set
The world of IT operations is rife with all sorts of “hot     of virtual machines that spend most of their time just
new things” being lauded by thought leaders and ven-          waiting to run code based on dates and times.
dors alike. When your job is to reliably and consistently       In one specific example, a series of data center envi-
deliver services and applications to engage and delight       ronments used for demonstration purposes were being
your users, it’s tough to absorb the idea of serverless.      reset to their base configurations on a nightly basis.
Even the name makes it sound like you’re going to be          Rather than constructing a container or server so-
tricked into throwing away your servers somehow!              mewhere within the data center to do this, all of the
   With that said, the ability to deploy code via functions   baseline functions were migrated over to AWS Lambda
that can be run anywhere with the underlying layers ab-       functions and triggered based on a CloudWatch Event
stracted sounded interesting to my team of engineers          set to a nightly datetime value. This had two immediate
based on our collective experiences. In this post, I’ll go    benefits:
over some of the initial serverless sticker shock items
from the past few years to help you prepare to bring IT       1. The team no longer had to maintain the cron ser-
operations into the world of serverless to drive higher          vers. This eliminated thankless and time-intensive
value and ultimately do less manual work.                        patching, securing, and maintenance tasks.
                                                              2. The total cost of ownership (TCO) was reduced
Where to start?                                                  by CapEx (more resources available for other wor-
There are a number of places you can start. My team’s            kloads, which avoids hardware spend) and OpEx
first use case with serverless was tackling a long list          (we can focus on other things). Now, it only costs a
of cron jobs and scheduled tasks that were often just            few pennies a month to run the majority of our cron
soaking up idle CPU time on a job or batch server so-            functions.

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                   11
WHITEPAPER                                              Serverless Architecture & Design

                                                                  basis. Our goal was to leverage a rotation service
Operationalize first, optimize later                              along with an encrypted vault in the future, but the
However, things weren’t just magical and rosy from                use of environmental variables helped kick-start
the first pass. Due to the iterative nature of construc-          things in the beginning.
ting a function, you’ll often want to start by getting
your workflow operational and understanding all of
the various requirements, permissions, and non-obvious         The biggest investment with serverless
caveats. Later, after having learned more about how            As you seek to adopt serverless, you might ask: What
serverless operates, you can make more passes across           was the biggest time investment for the move towards
the functional code to streamline your workflow. For           serverless? The answer to that would easily be documen-
example, once our team had the on-premises workflow            tation. The need for design documentation is critical.
operating in serverless functions, we then took another        Because your serverless functions are typically standalo-
pass to optimize, refactor, and slim things down.              ne, loosely organized, and triggered based on a variety
   In addition, we learned a few things the hard way:          of different inputs (API gateways, CloudWatch Events,
                                                               Lex inputs, and so on), it’s a good idea to maintain a
1. Longer timeouts for functions become necessary              high-level workflow that programmatically describes
   when calling back to environments that require the          the triggers, functions, and outputs. We use a combina-
   construction of a network that lives within a VPC.          tion of git-backed repositories that contain JSON files
   This is often due to the cold start period where the        along with Confluence pages that have embedded docu-
   default of 30 seconds is not quite enough.                  mentation and links to functions.
2. Python or Go can offer a much better user experi-              Today, some of our functions can be called via Slack
   ence compared to other scripting languages, such as         commands that are triggered via an API Gateway. One,
   PowerShell, and can help solve the problem of hit-          for example, is used to construct new GitHub reposito-
   ting a timeout due to cold starts.                          ries that have the correct naming standards, contribu-
3. Encrypted environmental variables are good for              tor file, code of conduct, labels (tags), and license file.
   storing tokens and keys. However, that’s not enough         An authorized user simply uses the slash command to
   – you’ll still want to rotate these objects on a regular    answer a few questions (without any authority in Git-
                                                               Hub) and a new, compliant repository is generated and
                                                               handed over. This offers the ability to:
  Session: Serverless Patterns Made
                                                               1. Pass along workflows that are complicated or securi-
  Simple with Real-World Use Cases                                ty sensitive to authorized users in a ChatOps style.
  Sheen Brisals                                                2. Collaborate together using a messaging platform
           Patterns are common in all walks of life,              where the team can see what is being done in real
           including in serverless computing! Soft-               time and offer input, troubleshooting, or guidance.
           ware architectural patterns are often view-
           ed as complex constructs that are beyond
  the grasp of many engineers. When mixed with                 The time for serverless is now
  cloud computing and serverless, the perception
                                                               If you’re on the fence about diving into serverless tech-
  takes it even further. Textbook-style narratives of
                                                               nologies, now is the time to make your move. Getting
  serverless patterns often fail to connect with engi-
  neers. Sharing a true serverless journey with the
                                                               your hands on the workflows presented by serverless
  community brings authenticity and acts as a great            options such as AWS Lambda will put your ahead of the
  learning platform. This is exactly what this talk            pack in the world of IT operations and is actually quite
  aims to achieve.The Shopper and Consumer Tech-               fun to learn, configure, and operate.
  nology team at LEGO have successfully migrated
  their legacy eCommerce platform onto a cloud-
  based serverless solution on AWS. This employed a
  number of technologies, best practices and of
  course serverless patterns that helped to accelera-
  te the overall process. In this talk, I will touch upon
                                                                        Chris Wahl is Chief Technologist at Cloud Data Ma-
  the experience of the migration journey and focus                     nagement company Rubrik. He has acquired over
  mainly on the patterns that we came across and                        two decades of IT experience in enterprise infrastruc-
  got implemented. Each of those architectural                          ture design, service orchestration, and building po-
  patterns will be associated with a use case from                      licy-based automation. As an independent author
  our journey that everyone can easily relate to.              of the award winning Wahl Network blog and host of the Da-
                                                               tanauts Podcast, Chris focuses on creating content that re-
  What better way to equip the generation of server-
                                                               volves around next generation data centers, workflow
  less engineers to be efficient and innovative than
                                                               automation, building operational excellence, and evangeli-
  sharing the knowledge?                                       zing solutions that benefit the technology community.

www.serverless-architecture.io                              @ServerlessCon # SLA_con                                      12
WHITEPAPER                                              Serverless Architecture & Design

Laying the groundwork for big data

Building a data
platform on Goog-
le Cloud Platform
At the moment, big data is very popular and there is a wide variety of pro-
ducts available for handling data. In this article, read a case study about
a German startup tackled their data problems and built a common data
platform into their architecture. The data platform consists of four compo-
nents: Ingestion, storage, process, and provisioning.

by Claire Fautsch                                                However, especially for new users, we do not have
                                                              much insight yet. Thus, we want to predict their beha-
Introduction                                                  viour based on other users showing a similar user jour-
Joblift is a Germany-based startup. Its core business is      ney. In our case, this could be for example another user
a meta-search engine for job seekers in Germany, the          simply searching for the same job title. Or, in a more
Netherlands, France, the UK and since 2018, the US. We        complex scenario, we could also consider users attracted
aggregate jobs from various job boards, agencies and          via the same marketing campaigns.
companies. Our aim is to provide job seekers a single            To solve this, we want to put data from different users
page listing all jobs instead of having to browse multiple    in relation. One possibility to do this, is using machine
websites.                                                     learning algorithms. They can be used to predict possi-
   Since its beginnings Joblift focused on the so-called      ble relations with a certain level of confidence. Together
arbitrage business, i.e., buying traffic via various marke-   with facts we already know about a user (e.g., “User
ting channels, and forwarding it to its own customers.        clicked on Job XY” or “Users having clicked on Job XY
The profit is made from the difference in the price. As       also clicked on Job AB”), we can store those predictions
this is a successfully running business right now, the        in a knowledge graph (“User is likely to click on Job
time has come to take the platform to the next level.         AB”).
   Our vision is to become the job board of the future.          The foundation of all this is big and fast data. The
Instead of helping our users search for a job, we want to     arising challenge is consequently how to ingest, store,
recommend them their perfect match. The foundation            process and provide the accumulated data.
for this new line of business is data.
                                                              Solution
Challenges                                                    To tackle this challenge, we decided to build a common
In order to be able to provide the perfect matching job to    data platform into our architecture. The data platform
a user, we need to gain as much knowledge and insights        is a set of infrastructure components, tools and proces-
about them as possible. This means every move of a user       ses with the sole aim of collecting data, storing it, and
on our platform needs to be tracked, analyzed, put into       providing it to all possible use cases in an efficient way.
context and modeled into information.                            As Joblift’s existing architecture is deployed on the

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                   13
WHITEPAPER                                            Serverless Architecture & Design

Google Cloud Platform (GCP) [1], we decided to stay         updates from various job boards). Looking at the GCP
with GCP also for our data platform. Google offers a        managed products, the obvious choice for the ingestion
wide range of managed products, and wherever fitting,       component would have been Cloud Pub/Sub. This is
we stuck to those managed products. For us, the big         GCP’s managed solution for message and event inges-
advantage is, that once we decide to get started with a     tion.
technology, we do not need to invest much time setting         However, on Pub/Sub it was at the time not possible
up the infrastructure. The drawback is that we might        to replay or re-read messages, which is a crucial fea-
not have all the flexibility we have with self-managed      ture for us (that changed in the meantime). Also, Pub/
products and have a vendor lock-in. However, in our         Sub does not guarantee a certain ordering. For those
case, velocity clearly wins over flexibility.               reasons, and because we had a fully setup Kafka clus-
   In those cases where we decided not to use a managed     ter on Kubernetes in GCP already, we chose Kafka for
product, we deployed the respective components direct-      ingesting streaming data.
ly to our Kubernetes cluster, also running on GCP.             By setting the ley properly, we have a guaranteed or-
                                                            der on the partitions and because of the retention are
Architecture                                                able to replay messages. Additionally, we can have se-
We aimed to clearly separate the data platform, and the     veral consumers for the same data, without having to
use cases using it. Our Data Platform consists of 4 sepa-   do fan-outs as on Pub/sub (which increases costs). If at
rate components which we will outline in more detail        some point we would need the additional features Cloud
below, namely:                                              Pub/Sub provides, we can take advantage of the availa-
                                                            ble connectors to connect Kafka to Cloud Pub/Sub via
• Ingestion                                                 Kafka Connect.
• Storage                                                      For getting batches of data into our system we mainly
• Process                                                   use ftp or http.
• Provisioning
                                                            Storage
                                                            Once the data is in the system we want to store it in its
Ingestion                                                   raw format. We opted here for the concept of a two-way
In a first step. we need to get the data into the plat-     storage architecture. On one side, we have a long-term
form. We have a wide range of different data sources.       storage, referred to as cold storage. Data on the cold
They provide either continuous data (e.g., in the form      storage has an unlimited retention with slower access
of transactional events) or batched data (e.g., daily job   times. On the other hand, there is the short-term sto-
                                                            rage, or hot storage, with limited retention but faster
                                                            access times.
  Session: Serverless Integration                              For the hot storage, we opted for Cloud Bigtable,
                                                            which is GCP’s managed NoSQL wide-column database
  Architectures
                                                            service, using the HBase API. The drawback of Bigtab-
  Samuel Vandecasteele
                                                            le/HBase, compared to Cassandra, for example, is that
           Serverless is changing the way we ar-            there is no possibility for a secondary index. We can live
           chitect our cloud environments. It lever-
                                                            with this drawback. In case we need to have different
           ages all the advantages of the cloud wit-
                                                            row keys, we duplicate our data with appropriate row
           hout the operational overhead. By gradu-
  ally moving towards more Serverless paradigms,
                                                            keys. This also guarantees us fast access.
  big steps can be taken in offering more reliable,            For the cold storage, we simply use Google Cloud Sto-
  scalable and cost-efficient IT services. With FaaS        rage (GCS), which is designed for durable storage.
  (function as a service), cloud-native messaging and
  serverless API management the mayor building              Data processing
  blocks for a new generation of enterprise integrati-      For the data processing, that happens inside our data
  on architectures are available, making integration        platform, we have various approaches.
  use cases ideal candidates as early adopters for            For streaming data, we mostly rely on Java micro-
  this serverless movement. In this session, we go          services using Kafka streaming technology as in and
  deeper into how serverless is changing the integ-
                                                            output. This gives us a maximum of flexibility, paired
  ration landscape. We’ll cover serverless integration
                                                            with high scalability and low complexity. Using gitlab
  Architectures. We investigate the ecosystem of
  available tools and frameworks, cover best practi-
                                                            CI/CD pipelines, we build docker images, and deploy
  ces and answer questions like: „Are the famous            them to our GCP hosted Kubernetes cluster using Helm
  enterprise integration patterns still relevant? Will      charts. Adhering to the microservice principle and using
  FaaS replace my ESB? How to choose between                asynchronous communication via Kafka, we can ensure
  iPaaS and FaaS? How do different public vendors           a high level of continuous integration and deployment.
  and ecosystems relate to one another?“                    New requirements can thus be implemented with an ap-
                                                            propriate speed.

www.serverless-architecture.io                           @ServerlessCon # SLA_con                                 14
WHITEPAPER                                              Serverless Architecture & Design

  For modeling our more advanced ETL data pipelines,          One way to do so is via a knowledge graph. A know-
with opted for Apache Airflow, respectively the GCP           ledge graph is a way of representing data in relation. It
managed version of it, Cloud Composer. It allows us           quickly allows us to extract connections between diffe-
to define and schedule pipelines in the form of a direct      rent users or jobs (e.g., “users that applied for this job,
acyclic graph (DAG). One of the main advantages of            also applied to jobs XYZ”).
using Cloud Composer is that it neatly integrates with          The technical foundation for our knowledge graph is
other GCP projects, such as GCS. The DAG defined in           a graph database. When this article was written, GCP
Python can simply be uploaded to GCS and then they            did not offer any managed graph database. We thus op-
are automatically picked up by Cloud Composer’s sche-         ted for a self-managed JanusGraph deployed to GCP,
duling component.                                             as it nicely integrates with Google BigTable as backend.
                                                              For the indexes, we use Elasticsearch.
Data provisioning
Gathering all that data is well and good, but it is worth-    Machine learning
less if we are not able to provide our use cases with the     To fill our knowledge graph not only with facts but also
right data in an efficient way. Consequently, the aim of      with probable relations (“user is likely to click on job
our data platform is not only to gather and store data,       XY”) we want to make use of machine learning algo-
but also to provide the correct data in the right way.        rithms. We want to predict their journey based on the
   For providing streaming access to our data, we opted       knowledge we have from other users. Another use for
again for Kafka. Combined with Avro and the kafka-            machine learning algorithms is clustering and catego-
schema registry, this allows us to have a clearly defined     rization of our jobs. Machine learning algorithms need
interface contract including proper versioning.               data, ideally lots of historic data, and this data is provi-
   For batch access, the provisioning depends on the use      ded via the cold storage previously described.
case and is optimized for the respective use-case.
                                                              Takeaways
Use cases                                                     We have successfully laid the groundwork to operate a
Having all this data available and being able to provide      big and fast data platform, and while it clearly showed
it, we now want to use it to fulfill our business needs.      its dependability and the proof of concept was success-
In the following sections, we outline several of our use-     ful we also had some key learnings.
cases.                                                           Big data being very popular at the moment, there is
                                                              a wide range of open- and closed source, managed and
Analytics                                                     unmanaged products available out there. Having such
Before even tackling our new vision, one of the first use     an overwhelming amount of choices, we learned that
cases for our data is analytics. To provide our business      sometimes, you just have to take a decision on which
analysts with an easy way of creating business reports        product to use. After some time of evaluation, we deci-
on one side, and ad-hoc analysis on the other side, we        ded that for the sake of velocity, we would use wherever
chose to build an analytics data warehouse. In the lan-       possible GCP managed products. Those products are
dingzone of our data warehouse, we store relevant raw         optimized for a dedicated purpose and fulfill that purpo-
and pre-aggregated data, which can then be used for re-       se with bravery. Some of them are simply the managed
porting and analysis.
  We opted for Google BigQuery for our data
warehouse. It is fully managed, and can be queried using        Session: Modern Architecture in the
simple SQL. Additionally, with its BI Engine it comes           Cloud of 2020
with all that is needed to build reports and dashboards.        Marius Zaharia
On top of BigQuery, we use Google Data Studio to crea-                   Today, the large public clouds – Azure and
te interactive dashboards to support our business and                    AWS – deploy a diversity of services and
salespeople in their daily decision making.                              features at high speed. Between Azure
                                                                         Functions, Lambda, Event Grid, Simple
Knowledge graph                                                 Workflow Service or Logic Apps, which one should
In view of our vision, our goal is to provide the best          I choose? Should I go for microservices? Event-
matching job to every user. The foundation of Joblift           driven? Lambda architecture? Deploy on server-
                                                                less? Containers? Modern Compute? Let’s put a bit
is a search engine. Users can search for jobs. The cur-
                                                                of order in all that. Enter the Modern Architecture,
rent flow is a user coming to our page and searching
                                                                the foundation for the new wave of cloud services
either for any job in a given city (location search) or for     and much more. This session will be focused on
a specific job (expert search). The executed search that is     application and infrastructure architecture, with
based on the keywords entered in the search mask. Un-           live examples based on the cloud, perspectives and
fortunately, the results provided by such a simple search       roadmap of the corresponding services at Amazon
often lack some relevance. Thus, we want to enhance             and Microsoft.
the results in order to provide the best matching job.

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                    15
WHITEPAPER                                              Serverless Architecture & Design

version of popular open source products, such as for          throwing 2 weeks work away, in the end, this shows to
example Cloud Composer being the managed version              be far more efficient and in the end, there is a stable and
of Apache Airflow. Also, still being a startup, it is quite   dependable solution.
some relief not to have to the additional effort operating
a self-managed component.                                     Conclusion and outlook
   Right with that decision came the second learning,         The combination of managed and self-deployed services
managed products often feel like a blackbox. While for        so far showed to be a good choice for our data platform.
self-deployed services you have all the liberty of confi-     Having proven its dependability with the first set of use
guration and full insights, a managed product usually         cases, the next steps are now implementing further use
comes with a limited set of configurable parameters.          cases. Additionally, we would like to set up metadata
   For example, with Cloud Composer we had the prob-          management, to get a full data lineage of our data and
lem that its logs get fed into stackdriver (GCP’s monito-     gain even more insights. Here we could possibly imagi-
ring solution). For our existing microservices, however,      ne another one of Google’s products to come into play,
we use the complete ELK stack for storing log files,          namely Data Catalog.
and Prometheus and Grafana for monitoring and aler-
ting. While stackdriver integrates with Grafana, it just                Dr. Claire Fautsch is an Engineering Manager at Joblift
doesn’t run as smooth as our existing setup. Especially                 GmbH where she is leading a team of backend de-
                                                                        velopers while still enjoying getting hand-on coding
when analyzing issues, the stackdriver setup proves less
                                                                        time as well. Previously she worked at Goodgame
flexible for us.                                                        Studios as a Java developer and as IT Consultant.
   Finally, additional learning is to fail fast. Documen-     Dr. Fautsch obtained her PhD in Computer Science on the
tation is often sparse, and every use case different. All     topic of Information Retrieval as well as her bachelor’s and
                                                              master’s degree in mathematics at University of Neuchâtel
theoretical knowledge in spite, it can still be that your     (Switzerland). She enjoys exploring new technologies and
chosen solution just doesn’t work out at the end, for         never says no to a nice challenge.
reasons you simply weren’t aware of or didn’t consider.
   Thus, we opted for an agile development process star-
ting with building prototypes quickly to prove feasibility
(or the contrary). When something doesn’t work out,
don’t with all means try to stick to the initial idea, just
because on paper it reads great. Learn from what you          Links & Literature
did and move on with an improved solution tackling
the encountered issues. Even if sometimes this feels like     [1] https://cloud.google.com

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                         16
WHITEPAPER                                           Serverless Architecture & Design

Strategies for big data migration

Migrating big
data workloads to
Azure HDInsight
– Smoothing the
path to the cloud
with a plan
Migrating to the cloud isn’t the easiest task however, you can limit its
complexity. Smooth out the plan for migrating big data to the cloud with
a step by step plan. Learn the correct questions to ask yourself before mi-
grating big data workloads to Azure HDInsights in order to ensure a per-
fect, error-free migration.

by Shivnath Babu                                           understand its current environment, determine high
                                                           priority applications to migrate, and set a performance
Migrating big data workloads to the cloud remains          baseline to be able to measure and compare on-premises
a key priority as well as a challenge for business lea-    clusters versus Azure HDInsight clusters.
ders. Many are looking to AI and predictive analytics
to increase performance, throughput, and to reduce         • What does my current on-premises cluster look like,
application, data, and processing costs as a way out         and how does it perform?
of the complexities of the big data operations lands-      • How much disk, compute, and memory am I using
cape.                                                        today?
  Planning is key, and there are some sensible questions   • Which of my workloads are best suited for migration
to ask to ensure the planning phase runs smoothly and        to the cloud?
sets the project up for success. The organisation must     • What are my HDInsight resource requirements?

www.serverless-architecture.io                         @ServerlessCon # SLA_con                               17
WHITEPAPER                                              Serverless Architecture & Design

• Should I use manual scaling or auto-scaling HDIn-           • Which big data services (Spark, Hadoop, Kafka, etc.)
  sight clusters, and with what VM sizes?                       are installed?
                                                              • Which datasets should I migrate?
Overall, organisations that understand the true path to
the cloud isn’t paved with rainbows know the need to
reduce the complexity of delivering reliable application      Azure HDInsight environment
performance when migrating data from on-premises or           • What are my HDInsight resource requirements?
a different cloud platform onto HDInsight. Application        • How do my on-premises resource requirements map
Performance Management (APM) solutions have a vital             to HDInsight?
role in bringing a host of services that should provide       • How much and what type of storage would I need on
unified visibility and operational intelligence to plan and     HDInsight, and how will my storage requirements
optimise the migration process. It’s strongly recommen-         evolve with time?
ded to make use of such solutions in order to not suffer      • Would I be able to meet my current SLAs or better
some of the common challenges that crop up time and             them once I’ve migrated to HDInsight?
again.                                                        • Should I use manual scaling or auto-scaling HDIn-
  An APM will automate and optimise some of these               sight clusters, and with what VM sizes?
major areas to simplify the overall project:

• Identify the current big data landscape and platforms       Baselining on-premises performance and
  for baselining performance and usage                        resource usage
• Make use of AI and predictive analytics to increase         To effectively migrate big data pipelines from physical
  the performance and throughput and to reduce the            to virtual data centres one needs to understand the dy-
  application, data, and processing costs from an elas-       namics of on-premises workloads, usage patterns, re-
  tic cloud environment                                       source consumption, dependencies and a host of other
• Automatically size cluster nodes and tune configura-        factors.
  tions for the best throughput for big data workloads           It’s vital to get these detailed reports of on-premises
• Find, tier, and optimise storage choices in HDInsight       clusters including total memory, disk, number of hosts,
  for hot, warm, and cold data                                and number of cores used. A cluster discovery report
                                                              also delivers insights on cluster topology, running servi-
An organisation must understand its current environ-          ces, operating system version and more. Resource usage
ment, determine high priority applications to migrate,        heatmaps can be used to determine any unique needs
and set a performance baseline to be able to measure          for Azure.
and compare its on-premises clusters versus its Azure            It’s also key to gain app usage insights from cluster
HDInsight clusters.                                           workload analytics and data insights. When the busi-
                                                              ness can highlight application workload seasonality by
In the on-premises environment                                user, department, application type, etc., it helps calibra-
• What does my current on-premises cluster look like,         te and make the best use of Azure resources. This type
  and how does it perform?                                    of reporting can greatly aid in HDInsight cluster design
• How much disk, compute, and memory am I using               choices (size, scale, storage, scalability options, etc.) to
  today?                                                      maximise the ROI on Azure expenses.
• Who is using it, and what apps are they running?               Don’t neglect searching for the best strategy for sto-
• Which of my workloads are best suited for migration         rage in the cloud by looking at specific metrics on usa-
  to the cloud?                                               ge patterns of tables and partitions in the on-premises
                                                              cluster.
                                                                 Next, consider identifying unused or ‘cold’ data. Once
  Session: Event-Driven Serverless                            identified, one can then decide on the appropriate layout
  Microservices in Azure                                      for the data in the cloud accordingly, and make the best
                                                              use of the Azure budget. Based on this information, one
  Rainer Stropek
                                                              can distribute datasets most effectively across HDIn-
           With Azure Function Apps, Microsoft has
                                                              sight storage options.
           been supporting serverless microservices
           for quite a while. In this session, Rainer
                                                              Data migration
           Stropek focuses on the current version of
  Microsoft’s Functions SDK and demonstrates how              Migrate on-premises data to Azure
  it integrates with the rest of the Azure platform to          There are two main options to migrate data from on-
  design and implement event-driven software. This            premises to Azure.
  will be a demo-heavy session (C#, .NET Core) with
  only a few slides.                                          1. Transfer data over network with TLS
                                                              2. Shipping data offline

www.serverless-architecture.io                            @ServerlessCon # SLA_con                                    18
You can also read