Migration (SQL Server → AWS Lakehouse + Redshift + Domo)
Project Overview
This engagement focused on modernizing a legacy enterprise analytics ecosystem that was heavily dependent on SQL Server, SSIS, stored procedures, and on-prem reporting tools.
The organization required a scalable cloud-native platform capable of supporting both serverless analytics and high-performance warehouse workloads for BI.
Rather than performing a simple lift-and-shift, we implemented a multi-track modernization strategy on AWS, ensuring data governance, historical retention, cost control, and long-term extensibility for analytics and AI workloads.
Role: Senior Data Engineer / Migration Lead
Tech Stack: AWS S3, AWS Glue, Glue Crawlers, Athena, Delta Lake, AWS Lambda, Step Functions, EventBridge, Redshift, RDS PostgreSQL, Domo, Terraform/CloudFormation, CloudWatch, IAM, Python, SQL
Migration Strategy (Two Parallel Tracks)
To avoid disruption and reduce business risk, the migration was executed in two structured tracks:
Track 1: SQL Server / RDS → AWS Lakehouse (Serverless Analytics)
This track focused on building a serverless Data Lakehouse platform where business teams could query data using Athena while maintaining schema governance and historical retention.
Goal: Replace heavy dependency on on-prem SQL Server compute with scalable storage + serverless query.
Track 2: SQL Server / Redshift → BI Analytics Platform (Domo Enablement)
This track focused on enabling enterprise BI reporting by building a curated warehouse layer in Redshift and supporting downstream dashboards and KPI reporting in Domo.
Goal: Provide high-performance curated datasets for BI while maintaining security, lineage, and certification workflows.
Legacy Platform (Before Migration)
The legacy data ecosystem consisted of:
- SQL Server OLTP + ODS workloads
- SSIS packages for ingestion and transformation
- Stored procedures for batch orchestration and aggregations
- Power BI dashboards built on top of SQL Server views
- Informatica jobs for integration workflows
- On-prem scheduling and runbooks for dependency handling
- Separate historical warehouse structures (HDS) maintained through incremental loads
Major limitations included:
- High cost of SQL Server compute and licensing
- Slow batch processing during peak windows
- Difficult lineage tracking across SSIS and stored procedures
- Limited ability to scale storage and historical retention
- BI reporting impacted during heavy ETL load windows
Target Architecture (AWS Modernized Platform)
We implemented a layered enterprise architecture aligned to industry-standard Data Lakehouse principles.
Data Zones
The platform was designed with strict separation of layers:
- Raw Layer (Landing / Immutable)
- ODS Layer (Operational standardized datasets)
- HDS Layer (Historical retention and slowly changing datasets)
- Curated / Gold Layer (Business-ready marts for BI & analytics)
This ensured the platform supported both operational use cases and long-term historical reporting.
Track 1: AWS Lakehouse Architecture (S3 + Glue + Athena)
Storage Foundation (S3 Data Lake)
- Designed S3 structure with standardized prefixing:
/raw//ods//hds//curated/
- Implemented partitioning standards (date-based, domain-based)
- Enabled lifecycle policies for cost optimization (archive tiers)
Metadata & Catalog (AWS Glue Crawlers)
To ensure discoverability and governance:
- Created Glue databases per domain (finance, risk, operations, customer)
- Configured Glue Crawlers for schema inference and automated table registration
- Enabled partition discovery for scalable query performance
- Standardized naming conventions to avoid table sprawl
Query & Analytics Layer (Athena + Delta Table Enablement)
To support serverless analytics:
- Built Athena query patterns aligned with BI use cases
- Enabled Delta table querying patterns for scalable table operations
- Introduced data format standards (Parquet/Delta-compatible formats)
- Implemented schema validation to prevent breaking downstream dependencies
This allowed analysts to query multi-terabyte datasets without requiring dedicated warehouse compute.
Transformation & Processing
Transformations were implemented using AWS-native pipelines:
- Python-based ETL jobs running on AWS Glue / Spark runtime
- Data quality checks and schema enforcement before promoting to ODS/HDS
- Incremental loads with watermarking logic for efficient processing
- Partition optimization to avoid scan-heavy queries
Orchestration & Dependency Management
Instead of monolithic SSIS chains, we built a cloud-native orchestration layer:
- AWS Step Functions for workflow orchestration
- EventBridge schedules for batch triggering
- Lambda functions for lightweight dependency checks
- Automated retries and failure routing
- CloudWatch alarms for monitoring and alerting
This improved pipeline stability and reduced operational support load.
Track 2: Redshift + Domo Analytics Enablement
Redshift Warehouse Layer
For curated analytics workloads requiring high performance:
- Designed dimensional models and reporting marts
- Implemented incremental load pipelines into Redshift
- Optimized distribution keys and sort keys for BI performance
- Established standardized views for reporting consistency
This provided a centralized warehouse layer optimized for enterprise reporting.
Domo Integration & BI Modernization
Domo was used as the primary BI and reporting tool, requiring:
- Dataset publishing pipelines into Domo
- Controlled certification workflows for production KPI datasets
- Ownership, access controls, and governance alignment
- Automated refresh scheduling and dependency tracking
Key BI deliverables included:
- Curated KPI datasets for finance and operations
- Certified datasets with consistent naming and definitions
- Performance improvements for high-usage dashboards
Infrastructure as Code (IaC) – Production Grade Setup
All infrastructure was provisioned and managed using Infrastructure as Code to ensure repeatability, governance, and environment consistency.
Provisioned components included:
- S3 buckets and standardized folder structures
- Glue databases, crawlers, and job configurations
- IAM roles with least privilege policies
- Athena workgroups and query output governance
- Step Functions state machines
- Lambda functions for orchestration triggers
- CloudWatch monitoring, logs, and alert rules
- Redshift cluster provisioning and networking rules
- Security policies for encryption (SSE-S3/KMS) and audit controls
This approach ensured environments could be replicated across Dev/Test/Prod without manual setup.
Data Governance, Security & Compliance
To support enterprise compliance requirements:
- Enforced encryption at rest (KMS) for sensitive datasets
- Applied IAM-based access controls per domain/team
- Implemented table-level access restrictions
- Standardized audit logging and pipeline traceability
- Introduced lineage tracking through metadata registration patterns
Key Engineering Challenges & Solutions
Challenge 1: Breaking Stored Procedure Dependencies
Legacy workflows heavily depended on SQL Server stored procedures, making direct migration risky.
Solution
- Decomposed stored procedures into modular transformation layers
- Migrated business logic into reusable SQL/PySpark modules
- Implemented validation checkpoints between ODS and curated marts
Challenge 2: Historical Data Retention (HDS Model)
The business required multi-year historical tracking for regulatory and trend analytics.
Solution
- Built an HDS framework with incremental snapshots
- Implemented slowly changing logic for key entities
- Maintained audit columns and change tracking metadata
Challenge 3: Schema Drift and Source System Instability
Incoming feeds frequently changed column types and naming patterns.
Solution
- Implemented schema enforcement before promotion to ODS
- Used Glue Crawler versioning strategies
- Added automated schema mismatch alerts
Challenge 4: Cost Control and Query Optimization
Athena queries and unmanaged table growth could lead to uncontrolled costs.
Solution
- Enforced partitioning and Parquet-based storage standards
- Implemented Athena workgroup controls for query governance
- Applied lifecycle rules to archive low-access historical partitions
- Optimized curated datasets for BI consumption to reduce scan volumes
Results & Business Impact
- Successfully modernized a legacy SQL Server + SSIS ecosystem into a scalable AWS-native data platform.
- Enabled serverless analytics using Athena, reducing dependency on expensive always-on compute.
- Improved reporting performance by delivering optimized curated marts in Redshift.
- Established standardized Raw → ODS → HDS → Curated pipeline design for long-term scalability.
- Increased platform reliability through workflow orchestration, automated retries, and monitoring.
- Improved governance by implementing cataloging, naming standards, and certification-ready datasets for BI.
Key Takeaways
This migration was executed as a modernization program, not a lift-and-shift.
The final architecture delivered a production-grade AWS Data Lakehouse + Warehouse ecosystem, enabling both cost-efficient serverless analytics and high-performance BI reporting through Redshift and Domo.
The platform is now designed to scale for future initiatives such as:
- real-time streaming ingestion
- ML feature store development
- enterprise data governance expansion
- automated data quality scorecards
- AI-ready curated datasets
