Netflix: Building Up and Scaling Out on Open Source

Page created by Jessica Rivera
 
CONTINUE READING
Netflix: Building Up and Scaling Out on Open Source
Netflix: Building Up and
Scaling Out on Open Source

                                © Black Duck 2013
Netflix: Building Up and Scaling Out on Open Source
Presenters

                        Adrian Cockcroft is the director of architecture for the Cloud Systems team at Netflix. He is
                        focused on availability, resilience, performance, and measurement of the Netflix cloud
                        platform, and has presented at many conferences, including QCon San Francisco, Beijing
                        and Tokyo. Adrian is also well known as the author of several books while a Distinguished
                        Engineer at Sun Microsystems: Sun Performance and Tuning; Resource Management; and
                        Capacity Planning for Web Services.
                        From 2004-2007 he was a founding member of eBay Research Labs. He graduated with a
                        BSc in Applied Physics from The City University, London.

    Andrew Aitken - Founder and GM of Olliance Consulting, the leading open source business
    and strategy consultancy and a division of Black Duck. With 15+ years of industry
    experience, Andrew is a recognized expert on strategies for FOSS commercialization and a
    leader in the open source community. Founder of the industry’s only “think tank” on the
    future of commercial open source, a bi-annual event held in Napa, CA and Paris, France,
    and regularly attended by the leading CEOs and visionaries. He has served as an expert
    witness on the issues of open source and been an invited guest lecturer at Stanford’s
    Entrepreneur program. Andrew has chaired and spoken internationally at multiple industry
    conferences, sits on the Board of Advisors of SugarCRM, DotNetNuke, and Funambol, and
    has personally worked with companies such as IBM, Microsoft, Intel and the U.S. Navy. In

2                                                  © Black Duck 2013
                                                                                                                   2
Netflix: Building Up and Scaling Out on Open Source
Olliance Consulting, a division of Black Duck

Open Source Strategy: Our Experience, Your Success
    The world’s leading organizations turn to Olliance Consulting to create
    and implement open source strategies to achieve business success.
    With more than a decade of experience and hundreds of engagements
    assisting companies ranging from start-ups to the world’s largest
    corporations, Olliance creates innovative strategies to leverage the
    strategic, financial and technological advantages of open source
    software and methods.
Profile
      – Open Source Software Industry’s leading business consultancy
      – Over 700 engagements to date
      – Trusted Advisor to leading Fortune 2000 companies

3                                  © Black Duck 2013
Netflix: Building Up and Scaling Out on Open Source
Open Source Think Tank

     The Open Source Think Tank is an invitation-only conference for 140 CEOs, CIOs, CTOs,
     legal experts, investors and other senior executives engaged in open source software. An
    annual event held in Napa, CA, and regularly attended by the industry’s leading CEO’s and
                                             visionaries.

                                Visit osthinktank.com
4                                        © Black Duck 2013
Netflix: Building Up and Scaling Out on Open Source
Software is Eating the World
      Marc Andreessen – 2011

5              © Black Duck 2013
Netflix: Building Up and Scaling Out on Open Source
Cloud Native Open Source at
           Netflix
               June 2013
            Adrian Cockcroft
      @adrianco #netflixcloud @NetflixOSS
   http://www.linkedin.com/in/adriancockcroft
Netflix: Building Up and Scaling Out on Open Source
Cloud Native

NetflixOSS – Cloud Native On-Ramp

Netflix Open Source Cloud Prize
Netflix: Building Up and Scaling Out on Open Source
We are Engineers

      We solve hard problems
We build amazing and complex things
   We fix things when they break
Netflix: Building Up and Scaling Out on Open Source
We strive for perfection

        Perfect code
      Perfect hardware
     Perfectly operated
Netflix: Building Up and Scaling Out on Open Source
But perfection takes too long…

         So we compromise
      Time to market vs. Quality
     Utopia remains out of reach
Where time to market wins big

             Web services
     Agile infrastructure - cloud
      Continuous deployment
How Soon?

   Code features in days instead of months
    Hardware in minutes instead of weeks
Incident response in seconds instead of hours
Tipping the Balance

  Utopia   Dystopia
A new engineering challenge

 Construct a highly agile and highly
available service from ephemeral and
      often broken components
Inspiration
Netflix Streaming

A Cloud Native Application
Netflix Member Web Site Home Page
    Personalization Driven – How Does It Work?
How Netflix Streaming Works
Consumer
Electronics                                           User Data
                                    Web Site or
AWS Cloud
                                   Discovery API
 Services
                                                    Personalization
CDN Edge
Locations
                                                         DRM
                 Customer Device
                                   Streaming API
                  (PC, PS3, TV…)
                                                     QoS Logging

                                                        CDN
                                                   Management and
                                                      Steering
                                   OpenConnect
                                    CDN Boxes
                                                   Content Encoding
Content Delivery Service
Open Source Hardware Design + FreeBSD, bird, nginx
Nov
2012
Streaming
Bandwidth
   18x

March
2013

Mean
Bandwidth
+39% 6mo

    25x     Amazon Video   1.31%
Real Web Server Dependencies Flow
         (Netflix Home page business transaction as seen by AppDynamics)

Each icon is
three to a few
hundred
instances
across three                                        Cassandra
AWS zones
                                                                memcached
                                                            Web service
         Start Here
                                                                S3 bucket

Personalization movie group choosers
(for US, Canada and Latam)
New Anti-Fragile Patterns
Micro-services and Chaos engines
Highly available systems composed
   from ephemeral components
     Open Source is the default
Cloud Native

Master copies of data are cloud resident
 Everything is dynamically provisioned
      All services are ephemeral
How to get to Cloud Native

   Freedom and Responsibility for Developers
    Decentralize and Automate Ops Activities
Integrate DevOps into the Business Organization
Netflix BusDevOps Organization
                                         Chief Product
                                            Officer

                   VP Product      VP UI          VP Discovery
                                                                 VP Platform
                  Management    Engineering       Engineering

                   Directors      Directors         Directors     Directors
                   Product      Development       Development     Platform

Code, independently updated     Developers +      Developers +   Developers +
continuous delivery               DevOps            DevOps         DevOps

Denormalized, independently       UI Data          Discovery       Platform
updated and scaled data           Sources         Data Sources   Data Sources

Cloud, independently updated
and scaled infrastructure          AWS                   AWS        AWS
Four Transitions
• Management: Integrated Roles in a Single Organization
   – Business, Development, Operations -> BusDevOps

• Developers: Denormalized Data – NoSQL
   – Decentralized, scalable, available, polyglot

• Responsibility from Ops to Dev: Continuous Delivery
   – Decentralized small daily production updates

• Responsibility from Ops to Dev: Agile Infrastructure - Cloud
   – Hardware in minutes, provisioned directly by developers
Cost                                                         Process
             reduction                                                     reduction

Lower                      Slow down                        Higher                        Speed up
margins                    developers                       margins                      developers

                        Less                                      More               More
   Less revenue
                     competitive                                 revenue           competitive

                                        What’s Different?

                           Get out of the way of innovation
                         Best of breed, provisoned by the hour
                          Choices based on features and scale
                           Almost everything is Open Source
Decentralized Deployment
Asgard
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Ephemeral Instances
  • Largest services are autoscaled
  • Average lifetime of an instance is 36 hours
                                                  P
                                                  u
                                                  s
                                                  h

Autoscale Up
                Autoscale Down
Global Deployment
Cross Region Use Cases
• Geographic Isolation
  – US to Europe replication of subscriber data
  – Read intensive, low update rate
  – Production use since late 2011

• Redundancy for regional failover
  – US East to US West replication of everything
  – Includes write intensive data, high update rate
  – Testing now
Managing Multi-Region Availability

                                          AWS                                                           DynECT
                                         Route53                                                                                  Denominator
                                                                        UltraDNS                         DNS

                Regional Load Balancers                                                            Regional Load Balancers

     Zone A                   Zone B                    Zone C                          Zone A                    Zone B                    Zone C

Cassandra Replicas       Cassandra Replicas        Cassandra Replicas              Cassandra Replicas        Cassandra Replicas        Cassandra Replicas

                     Denominator – manage traffic via multiple DNS providers
Benchmarking Global Cassandra
                 Write intensive test of cross region capacity
                16 x hi1.4xlarge SSD nodes per zone = 96 total
                                                               Validation
            Test                1 Million reads                                                                          Test
                                                                 Load
            Load                                                                 1 Million writes                        Load
                                CL.ONE with no
                                                                                 CL.ONE
                                Data loss

            US-West-2 Region - Oregon                                                     US-East-1 Region - Virginia

     Zone A               Zone B               Zone C                            Zone A                  Zone B                  Zone C

Cassandra Replicas   Cassandra Replicas   Cassandra Replicas                Cassandra Replicas      Cassandra Replicas      Cassandra Replicas

     Inter-Zone Traffic                      Inter-Region Traffic
                                            Up to 9Gbits/s, 83ms                                 18TB
                                                                                                  S3
Cloud Native Big Data
Netflix Dataoven
From cloud                                   RDS
Services
~100 Billion       Ursula
Events/day                              Metadata

From C*           Aegisthus
Terabytes of
Dimension
data
               Data Pipelines
                                                                  Gateways
                                    Data Warehouse
                                    Over 2 Petabytes

                                             Hadoop Clusters – AWS EMR                        Tools

                                1300 nodes         800 nodes     Multiple 150 nodes Nightly
A Cloud Native Open Source Platform
Beware of Geeks Bearing Gifts: Strategies for an
         Increasingly Open Economy
      Simon Wardley - Researcher at the Leading Edge Forum
How did Netflix get ahead?
Netflix BusDevOps Org            Traditional IT Operations
• Doing it since 2009            • Taking their time
• SaaS Applications              • Pilot private cloud projects
• PaaS for agility               • Beta quality installations
• Public IaaS for AWS features   • Small scale
• Big data in the cloud          • Integrating several vendors
• Integrating many APIs          • Paying big $ for software
• FOSS from github               • Paying big $ for consulting
• Renting hardware for 1hr       • Buying hardware for 3yrs
• Coding in Java/Groovy/Scala    • Hacking at scripts
Netflix Platform Evolution

  2009-2010               2011-2012                 2013-2014

Bleeding Edge             Common                     Shared
  Innovation               Pattern                   Pattern

          Netflix ended up several years ahead of the
          industry, but it’s becoming commoditized now
Making it easy to follow
Exploring the wild west each time vs. laying down a shared route
Establish our            Hire, Retain and
  solutions as Best            Engage Top
Practices / Standards           Engineers

                     Goals

  Build up Netflix             Benefit from a
 Technology Brand            shared ecosystem
How does it all fit together?
Example Application – RSS Reader

                       Zuul Traffic
                       Processing
                       and Routing

                                  Z
                                  U
                                  U
                                  L
Zuul Architecture
http://techblog.netflix.com/2013/06/announcing-zuul-edge-service-in-cloud.html
Zuul Components
What’s Coming Next?

           Better portability

           Higher availability
 More
Features   Easier to deploy

           Contributions from end users

           Contributions from vendors

                     More Use Cases
Vendor Driven Portability
     Interest in using NetflixOSS for Enterprise Private Clouds

                                                      “It’s done when it runs Asgard”
                                                      Functionally complete
                                                      Demonstrated March
                                                      Released June in V3.3

                                                      Growing vendor interest
Some vendor interest                                  Openstack “Heat” getting there
Needs AWS compatible Autoscaler

              Another very large vendor planning to
              demo NetflixOSS at July 17th Meetup
AWS 2009
Baseline features needed to support NetflixOSS

            Eucalyptus 3.3
Boosting the @NetflixOSS Ecosystem
Judges

         Aino Corry
                                                                        Martin Fowler
Program Chair for Qcon/GOTO          Simon Wardley              Chief Scientist Thoughtworks
                                        Strategist

       Werner Vogels                                                Yury Izrailevsky
       CTO Amazon                       Joe Weinman                 VP Cloud Netflix
                              SVP Telx, Author “Cloudonomics”
Award
          Registration                      Apache
                                                                        Close Entries          AWS       Ceremony
Github     Opened               Github     Licensed           Github   September 15                       Dinner
           March 13                      Contributions                                       Re:Invent
                                                                                                         November

                                                 Six Judges                                        Winners
            $10K cash
             $5K AWS

                                                    Netflix
                                                                                        Nominations          Categories
          Ten Prize                            Engineering
         Categories
                           AWS
Trophy                   Re:Invent                                 Conforms to           Working             Community
                          Tickets                 Entrants            Rules               Code                Traction
Functionality and scale now, portability coming

   Moving from parts to a platform in 2013

 Netflix is fostering a cloud native ecosystem

      Rapid Evolution - Low MTBIAMSH
      (Mean Time Between Idea And Making Stuff Happen)
Slideshare NetflixOSS Details
•   Lightning Talks Feb S1E1
     – http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks

•   Asgard In Depth Feb S1E1
     – http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house

•   Lightning Talks March S1E2
     – http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-
       roadmap

•   Security Architecture
     – http://www.slideshare.net/jason_chan/

•   Cost Aware Cloud Architectures – with Jinesh Varia of AWS
     – http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-
       varia-aws-and-adrian-cockroft-netflix
Takeaway

NetflixOSS makes it easier for everyone to become Cloud Native

  Open Source is not just the default, it’s a strategic weapon

                   @adrianco #netflixcloud @NetflixOSS
Q&A

57    © Black Duck 2013
Amazon Cloud Terminology Reference
     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•    AWS – Amazon Web Services (common name for Amazon cloud)
•    AMI – Amazon Machine Image (archived boot disk, Linux, Windows etc. plus application code)
•    EC2 – Elastic Compute Cloud
      –   Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configurations.
      –   Instance – a running computer system. Ephemeral, when it is de-allocated nothing is kept.
      –   Reserved Instances – pre-paid to reduce cost for long term usage
      –   Availability Zone – datacenter with own power and cooling hosting cloud instances
      –   Region – group of Avail Zones – US-East, US-West, EU-Eire, Asia-Singapore, Asia-Japan, SA-Brazil, US-Gov
•    ASG – Auto Scaling Group (instances booting from the same AMI)
•    S3 – Simple Storage Service (http access)
•    EBS – Elastic Block Storage (network disk filesystem can be mounted on an instance)
•    RDS – Relational Database Service (managed MySQL master and slaves)
•    DynamoDB/SDB – Simple Data Base (hosted http based NoSQL datastore, DynamoDB replaces SDB)
•    SQS – Simple Queue Service (http based message queue)
•    SNS – Simple Notification Service (http and email based topics and messages)
•    EMR – Elastic Map Reduce (automatically managed Hadoop cluster)
•    ELB – Elastic Load Balancer
•    EIP – Elastic IP (stable IP address mapping assigned to instance or ELB)
•    VPC – Virtual Private Cloud (single tenant, more flexible network and security constructs)
•    DirectConnect – secure pipe from AWS VPC to external datacenter
•    IAM – Identity and Access Management (fine grain role based security keys)
You can also read