Migration (SQL Server → AWS Lakehouse + Redshift + Domo)

Project Overview

This engagement focused on modernizing a legacy enterprise analytics ecosystem that was heavily dependent on SQL Server, SSIS, stored procedures, and on-prem reporting tools.
The organization required a scalable cloud-native platform capable of supporting both serverless analytics and high-performance warehouse workloads for BI.

Rather than performing a simple lift-and-shift, we implemented a multi-track modernization strategy on AWS, ensuring data governance, historical retention, cost control, and long-term extensibility for analytics and AI workloads.

Role: Senior Data Engineer / Migration Lead
Tech Stack: AWS S3, AWS Glue, Glue Crawlers, Athena, Delta Lake, AWS Lambda, Step Functions, EventBridge, Redshift, RDS PostgreSQL, Domo, Terraform/CloudFormation, CloudWatch, IAM, Python, SQL

Migration Strategy (Two Parallel Tracks)

To avoid disruption and reduce business risk, the migration was executed in two structured tracks:

Track 1: SQL Server / RDS → AWS Lakehouse (Serverless Analytics)

This track focused on building a serverless Data Lakehouse platform where business teams could query data using Athena while maintaining schema governance and historical retention.

Goal: Replace heavy dependency on on-prem SQL Server compute with scalable storage + serverless query.

Track 2: SQL Server / Redshift → BI Analytics Platform (Domo Enablement)

This track focused on enabling enterprise BI reporting by building a curated warehouse layer in Redshift and supporting downstream dashboards and KPI reporting in Domo.

Goal: Provide high-performance curated datasets for BI while maintaining security, lineage, and certification workflows.

Legacy Platform (Before Migration)

The legacy data ecosystem consisted of:

SQL Server OLTP + ODS workloads
SSIS packages for ingestion and transformation
Stored procedures for batch orchestration and aggregations
Power BI dashboards built on top of SQL Server views
Informatica jobs for integration workflows
On-prem scheduling and runbooks for dependency handling
Separate historical warehouse structures (HDS) maintained through incremental loads

Major limitations included:

High cost of SQL Server compute and licensing
Slow batch processing during peak windows
Difficult lineage tracking across SSIS and stored procedures
Limited ability to scale storage and historical retention
BI reporting impacted during heavy ETL load windows

Target Architecture (AWS Modernized Platform)

We implemented a layered enterprise architecture aligned to industry-standard Data Lakehouse principles.

Data Zones

The platform was designed with strict separation of layers:

Raw Layer (Landing / Immutable)
ODS Layer (Operational standardized datasets)
HDS Layer (Historical retention and slowly changing datasets)
Curated / Gold Layer (Business-ready marts for BI & analytics)

This ensured the platform supported both operational use cases and long-term historical reporting.

Track 1: AWS Lakehouse Architecture (S3 + Glue + Athena)

Storage Foundation (S3 Data Lake)

Designed S3 structure with standardized prefixing:
- /raw/
- /ods/
- /hds/
- /curated/
Implemented partitioning standards (date-based, domain-based)
Enabled lifecycle policies for cost optimization (archive tiers)

Metadata & Catalog (AWS Glue Crawlers)

To ensure discoverability and governance:

Created Glue databases per domain (finance, risk, operations, customer)
Configured Glue Crawlers for schema inference and automated table registration
Enabled partition discovery for scalable query performance
Standardized naming conventions to avoid table sprawl

Query & Analytics Layer (Athena + Delta Table Enablement)

To support serverless analytics:

Built Athena query patterns aligned with BI use cases
Enabled Delta table querying patterns for scalable table operations
Introduced data format standards (Parquet/Delta-compatible formats)
Implemented schema validation to prevent breaking downstream dependencies

This allowed analysts to query multi-terabyte datasets without requiring dedicated warehouse compute.

Transformation & Processing

Transformations were implemented using AWS-native pipelines:

Python-based ETL jobs running on AWS Glue / Spark runtime
Data quality checks and schema enforcement before promoting to ODS/HDS
Incremental loads with watermarking logic for efficient processing
Partition optimization to avoid scan-heavy queries

Orchestration & Dependency Management

Instead of monolithic SSIS chains, we built a cloud-native orchestration layer:

AWS Step Functions for workflow orchestration
EventBridge schedules for batch triggering
Lambda functions for lightweight dependency checks
Automated retries and failure routing
CloudWatch alarms for monitoring and alerting

This improved pipeline stability and reduced operational support load.

Track 2: Redshift + Domo Analytics Enablement

Redshift Warehouse Layer

For curated analytics workloads requiring high performance:

Designed dimensional models and reporting marts
Implemented incremental load pipelines into Redshift
Optimized distribution keys and sort keys for BI performance
Established standardized views for reporting consistency

This provided a centralized warehouse layer optimized for enterprise reporting.

Domo Integration & BI Modernization

Domo was used as the primary BI and reporting tool, requiring:

Dataset publishing pipelines into Domo
Controlled certification workflows for production KPI datasets
Ownership, access controls, and governance alignment
Automated refresh scheduling and dependency tracking

Key BI deliverables included:

Curated KPI datasets for finance and operations
Certified datasets with consistent naming and definitions
Performance improvements for high-usage dashboards

Infrastructure as Code (IaC) – Production Grade Setup

All infrastructure was provisioned and managed using Infrastructure as Code to ensure repeatability, governance, and environment consistency.

Provisioned components included:

S3 buckets and standardized folder structures
Glue databases, crawlers, and job configurations
IAM roles with least privilege policies
Athena workgroups and query output governance
Step Functions state machines
Lambda functions for orchestration triggers
CloudWatch monitoring, logs, and alert rules
Redshift cluster provisioning and networking rules
Security policies for encryption (SSE-S3/KMS) and audit controls

This approach ensured environments could be replicated across Dev/Test/Prod without manual setup.

Data Governance, Security & Compliance

To support enterprise compliance requirements:

Enforced encryption at rest (KMS) for sensitive datasets
Applied IAM-based access controls per domain/team
Implemented table-level access restrictions
Standardized audit logging and pipeline traceability
Introduced lineage tracking through metadata registration patterns

Key Engineering Challenges & Solutions

Challenge 1: Breaking Stored Procedure Dependencies

Legacy workflows heavily depended on SQL Server stored procedures, making direct migration risky.

Solution

Decomposed stored procedures into modular transformation layers
Migrated business logic into reusable SQL/PySpark modules
Implemented validation checkpoints between ODS and curated marts

Challenge 2: Historical Data Retention (HDS Model)

The business required multi-year historical tracking for regulatory and trend analytics.

Solution

Built an HDS framework with incremental snapshots
Implemented slowly changing logic for key entities
Maintained audit columns and change tracking metadata

Challenge 3: Schema Drift and Source System Instability

Incoming feeds frequently changed column types and naming patterns.

Solution

Implemented schema enforcement before promotion to ODS
Used Glue Crawler versioning strategies
Added automated schema mismatch alerts

Challenge 4: Cost Control and Query Optimization

Athena queries and unmanaged table growth could lead to uncontrolled costs.

Solution

Enforced partitioning and Parquet-based storage standards
Implemented Athena workgroup controls for query governance
Applied lifecycle rules to archive low-access historical partitions
Optimized curated datasets for BI consumption to reduce scan volumes

Results & Business Impact

Successfully modernized a legacy SQL Server + SSIS ecosystem into a scalable AWS-native data platform.
Enabled serverless analytics using Athena, reducing dependency on expensive always-on compute.
Improved reporting performance by delivering optimized curated marts in Redshift.
Established standardized Raw → ODS → HDS → Curated pipeline design for long-term scalability.
Increased platform reliability through workflow orchestration, automated retries, and monitoring.
Improved governance by implementing cataloging, naming standards, and certification-ready datasets for BI.

Key Takeaways

This migration was executed as a modernization program, not a lift-and-shift.
The final architecture delivered a production-grade AWS Data Lakehouse + Warehouse ecosystem, enabling both cost-efficient serverless analytics and high-performance BI reporting through Redshift and Domo.

The platform is now designed to scale for future initiatives such as:

real-time streaming ingestion
ML feature store development
enterprise data governance expansion
automated data quality scorecards
AI-ready curated datasets

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Vamsi Thokala

Project Overview

Migration Strategy (Two Parallel Tracks)

Track 1: SQL Server / RDS → AWS Lakehouse (Serverless Analytics)

Track 2: SQL Server / Redshift → BI Analytics Platform (Domo Enablement)

Legacy Platform (Before Migration)

Target Architecture (AWS Modernized Platform)

Data Zones

Track 1: AWS Lakehouse Architecture (S3 + Glue + Athena)

Storage Foundation (S3 Data Lake)

Metadata & Catalog (AWS Glue Crawlers)

Query & Analytics Layer (Athena + Delta Table Enablement)

Transformation & Processing

Orchestration & Dependency Management

Track 2: Redshift + Domo Analytics Enablement

Redshift Warehouse Layer

Domo Integration & BI Modernization

Infrastructure as Code (IaC) – Production Grade Setup

Data Governance, Security & Compliance

Key Engineering Challenges & Solutions

Challenge 1: Breaking Stored Procedure Dependencies

Challenge 2: Historical Data Retention (HDS Model)

Challenge 3: Schema Drift and Source System Instability

Challenge 4: Cost Control and Query Optimization

Results & Business Impact

Key Takeaways

Share on