Automated transformation of ETL, data warehouse, and analytics to Snow ake - Address key challenges and move your data warehouse to the cloud ...

Page created by Lucille Cooper
 
CONTINUE READING
WHITEPAPER

WHITE PAPER

Automated transformation
of ETL, data warehouse, and
analytics to Snowflake
Address key challenges and move your data warehouse
to the cloud
Enterprises are increasingly moving to next-generation cloud data warehouses to reduce
infrastructure administration overheads, achieve business agility, and enable
uncompromising simplicity. A cloud-based data warehouse like Snowflake provides a decou-
pled architecture, eliminates the need for remodeling, and facilitates unified data across
hybrid sources. Enterprises moving to Snowflake get several benefits, including full SQL
support, serverless architecture, strong partnerships with BI and ETL tools, and ease
of maintenance.

However, enterprises face multiple challenges in dealing with code, business logic, and
analytics jobs while moving to Snowflake. For example, workloads must have the exact
target-native equivalent to match the production performance SLAs. To achieve this,
enterprises need to:

- Thoroughly assess the existing inventory of workloads
- Identify the chain of workloads to be moved
- Match the source and target data
- Convert scripts, business logic, reporting logic, etc.
- Validate the migrated logic before putting them in production

Automation toolsets can effectively deal with these intricacies to make the migration
seamless and risk-free.

Key considerations when migrating to Snowflake
There are several factors that enterprises need to consider when deciding their
migration path.

Business considerations:
• Avoid functional, operational, or end-user disruptions
• Reuse data, code, business logic, reports, database views, etc. from the legacy
  environment in the cloud
• Manage risk by planning for phased offload and comprehensive validation of migrated
  workloads
• Ensure a vendor-agonistic approach
• Assess existing inventory of workloads to decide the migration scale

                                                                                             2
Technical considerations:

• Visualize and identify what needs to be migrated in a phased manner
• Leverage automation for prescriptive recommendations, code transformation, and
  optimization
• Optimize poor performing and resource-intensive workloads
• Identify technical debts in existing schema, code, etc.
• Identify complex interdependencies between the workloads and the future-state architecture
• Automate logic transformation to a target-native engine of your choice
• Run workloads on dual environments till the new environment and applications stabilize
• Accelerate decommissioning of legacy systems after the parallel run period

Key questions to ask:
• Which workloads can be migrated with minimal effort?
• How can we leverage our existing investments?
• What is the extent of automation possible?
• What is the level of risk and uncertainty involved?
• Which workloads should be migrated as-is, which can be optimized for performance,
  and which need a complete overhaul?

          REUSE               AUTOMATE                       OPTIMIZE                 CERTIFY
      Embrace what you     Leverage automation for        Meet performance   Validate migrated workloads before
        already have         faster time-to-value          SLAs on cloud        putting them into production

                                         A typical migration checklist

                                                                                                                  3
Migrate ‘as-is’ or ‘total re-engineering’?

Whether to move data and processes in one bulk operation or deploy a staged approach
depends on several factors. These include the nature of your current data analytics
platform, types, the number of data sources, and your future business plans.

What you need is an intelligent solution that helps you create a fine balance between the
two approaches, attain agility and reliability, and make your existing workloads work best
in the new environment. This fine balance creates a win-win situation for end-to-end data
warehouse modernization pursuits. It provides an opportunity to:
1. Migrate the already optimized workloads as-is
2. Fine-tune expensive, resource-intensive, and poor performing workloads
3. Archive/destroy unimportant/unused workloads
4. Completely re-engineer workloads that contain poor logic

Key challenges when migrating from a legacy data
warehouse to Snowflake
The transition from any RDBMS to Snowflake is not easy. Enterprises would have built ETL
pipelines to push data to legacy warehouses, customized visualization tools to pull data out
of their warehouses, and designed client applications dependent closely on data from their
warehouse. The top challenges faced when migrating workloads from a legacy
environment to Snowflake are:
• Risk of moving mission-critical applications already in production
• Multiple ETL/ELT jobs in progress on the legacy environment
• Identifying optimal cloud data architecture components
• Transition to cloud-native capabilities (native schedulers, ingestion, governance, metadata
  management, etc.)
• Manual transformation of data types and SQL compliance
• Query (semantic and syntactic) and data validation
• Decommissioning legacy systems

                                                                                                4
Addressing the key challenges
Data type mapping and schema conversion

One of the key challenges when migrating to Snowflake is to match RDBMS data types
to Snowflake data types. This requires creating the database structure and typically
involves using DDL exports from the enterprise data warehouse, converting them to
Snowflake compatible DDL, and executing it.

The Impetus Workload Transformation Solution automatically transforms more than
95% of the RDBMS table definitions/schema to Snowflake equivalent. Any remaining
DDL scripts are then converted manually by our experts, completing the end-to-end
transformation.

The tool maps all data types and handles a variety of complex use cases automatically.
For instance, for Teradata to Snowflake transformations, it can handle complex data
types such as FLOAT BETWEEN, PERIOD, TIME WITH TIME ZONE, CLOB, BLOB, VARBYTE,
and many more.

In addition to automated schema conversion, database views built on top of the
schema are also auto-converted. However, recursive views need manual intervention
for optimum performance. A comprehensive migration is achieved when the tool
intelligently handles interdependencies between entities such as tables, views, and
queries. The tool produces a graphical dependency structure highlighting all the
entities that are directly recommended for migration and the entities that are
dependent on those entities.

Automated logic conversion

How does automated conversion help simplify the transformation journey, mitigate migra-
tion risks, and accelerate time-to-market? To understand this better, let’s take the example
of converting RDBMS source-code into Snowflake SQL and code into Python. Here are
some of the areas where automation brings immense value:

• SQL query conversion – Automated conversion of SQL queries to SnowSQL
• PL/SQL query conversion – Automated conversion of PL/SQL statements, including
  arguments, variables, exception handling, etc. across various statements.
  Other conditional statements, loops, dynamic SQLs, cursors, etc. are handled with
  equal dexterity
                                                                                   or write to us at

                                                                                                       5
• Script conversion (BTEQ, FLOAD, and FEXP) – Automated conversion of a variety of
  script types into Python + SnowSQL with different complex UDFs and keywords such as
  ERRORCODE, ACTIVITYCOUNT, etc. for a variety of statement types

A systematic approach to Snowflake transformation
The Impetus Workload Transformation Solution brings together data-driven
decision support, automation, and cloud data platform expertise to address these
challenges through a 4-step process.

STEP 1: Assessment and prescription
• Automated legacy data warehouse inventory and profiling
• Identification of workloads (metadata, data, etc.) and dependencies
• Creation of optimized schema (clustering keys, Parquet format/file size for S3 uploads, etc.)
• Grouping of workloads into migration units

STEP 2: Transformation

• Up to 90% automated code conversion to SnowSQL
• Automated data migration to an optimized schema
• Automated handling of data types, nested views, intervals, loops, UDFs, procedures, etc.
• Creation of patterns for the target platform (ingestion, data sync, recon, lineage, security,
  orchestration, etc.)
• Auto-generation of patterns for newer migrations
• Query-editing for optimized fixes and performance tuning

STEP 3: Validation

• Pipeline-based automated validation of the transformed code
 – Row and cell-level validation of code and error reporting
 – Pluggable validation transformation for instant verification of transformed code
• Data-based validation of transformed code

                                                                                                  6
STEP 4: Execution

• Deliver a target-specific executable package
 – Cloud-native orchestration and execution on production
• Optimal performance through parallel execution
 – Parallel execution recommendations through exhaustive data-driven assessment
 – Generation of required artifacts in the transformation output
 – Parallel execution of the generated artifacts on production
• Productionalization support
 – End-to-end transitioning into production and operationalization
 – Capacity optimization
 – Environment stabilization through parallel-run period
 – Implicit data governance and compliance on cloud

The Impetus Workload Transformation Solution creates a fine balance between migrating
as-is and total re-engineering. It helps eliminate technical debt and ensures agility and
quality while moving your legacy data warehouse to Snowflake.

                                                                                                                             MODERN DATA PLATFORM
    ENTERPRISE DATA WAREHOUSE

                                                           Auto-transform DML,                          Executable package
                                                           DDL, procedures,       Pipeline-based        with cloud-native
                                                           ETL, jobs, data        validation:           orchestrators
                                Inventory listing
                                                                                  - Schema, metadata,   Parallel execution
                                Lineage –                  Packaged using           data
                                dependency                 cloud-native
                                                           wrappers                                     CI/CD and
                                analysis                                          - Data-based code     transition support
                                Target-specific            Repeatability,         - Pre and post
                                recommendations            extensibility            processed data
                                Capacity planning
                                Resource estimation
                                                           LOGIC TRANSFORMATION

                                   ASSESS                  TRANSFORM                VALIDATE              EXECUTE

                                                    The Impetus Workload Transformation Approach

                                                                                                                                                    7
Benefits of the Impetus Workload Transformation Solution
 • Reuse all your existing investments
 • Automatically transform decades of effort in 12-20 weeks
 • Fast and reliable end-to-end EDW transformation
 • 50% time and cost savings compared to manual migration
 • Strategize between migrating as-is and total re-engineering to achieve maximum with
   least effort
 • Proven to reduce development, testing, and validation effort compared to
   manual migration
 • Extensive experience in delivering projects for Fortune 100 companies
 • Caters to any industry, any domain, any use case
 • Up to 60% cost-effective, massive savings, and faster time-to-value
 • Meet performance SLAs
 • Decrease risk and uncertainty
 • Maximize ROI

Impetus Technologies is focused on enabling a unified, clear, and present view for the intelligent enterprise by enabling data warehouse modernization, unification of data sources,
self-service ETL, advanced analytics, and BI consumption. For more than a decade, Impetus has been the 'Partner of Choice' for several Fortune 500 enterprises in transforming their data
and analytics lifecycle. The company brings together a unique mix of software products, consulting services, and technology expertise. Our solutions include industry's only platform for
the automated transformation of legacy systems to the cloud/big data environment and StreamAnalytix – a self-service ETL and machine learning platform. To learn more, visit
 or write to us at .

© 2020 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. Jan 2021
You can also read