Production-Grade
ETL Pipeline

Extract from CSV and PostgreSQL, transform with star schema modeling, load into Snowflake with SCD Type 2 history tracking.

Data Flow Architecture

End-to-end pipeline from source to warehouse

📄
CSV Files
Raw flat files
🐘
PostgreSQL
Source database
Extract
Read & validate
🔧
Transform
Clean, join, model
📦
Load
Merge & upsert
Snowflake
Data warehouse
0
K Rows Processed
0
Dimension Tables
0
Fact Table (Star)
0
Data Quality Score

Star Schema

Central fact table surrounded by dimension tables

dim_customer
customer_key
name, email
segment, region
dim_product
product_key
name, category
price, brand
dim_date
date_key
year, quarter
month, day_of_week
dim_store
store_key
name, city
state, manager
fact_sales
sale_key
customer_key, product_key
date_key, store_key, promo_key
quantity, revenue, discount
dim_promotion
promo_key
name, type
discount_pct, channel

SCD Type 2 - History Tracking

How dimension records evolve over time

Before Update

KeyNameCityValid FromValid ToCurrent
101VikrantAdelaide2023-01-159999-12-31Y

After Update (City changed)

KeyNameCityValid FromValid ToCurrent
101VikrantAdelaide2023-01-152024-06-01N
102VikrantMelbourne2024-06-019999-12-31Y

SQL Transformations

Key queries powering the pipeline

Star Schema Load
SCD Type 2 Merge
Quality Check