Implementing Medallion Architecture: Bronze, Silver, Gold Data Layers
A practical guide to implementing the medallion architecture pattern for enterprise data platforms with dbt, Snowflake, and Apache Airflow for progressive data refinement.
Awwaltech
Data Engineering Team
Progressive Data Refinement
The medallion architecture organizes data into three layers of progressive quality refinement. Raw ingested data lands in Bronze, cleaned and validated data moves to Silver, and business-ready aggregated data lives in Gold. Each layer serves different users with different quality requirements.
Bronze Layer: Preserve Everything
The Bronze layer is an append-only landing zone. Every record from every source system is stored with its original schema, ingestion timestamp, and source metadata. We never transform data in Bronze — the goal is a complete, immutable record of everything the organization has ever received.
-- Bronze table: raw, untransformed, append-only
CREATE TABLE bronze.orders_raw (
_ingested_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
_source_system VARCHAR(100),
_batch_id VARCHAR(50),
_raw_payload VARIANT -- Store original JSON without schema enforcement
);
This design protects against schema evolution issues. When a source system changes its API response format, Bronze continues ingesting without breaking. The transformation logic in Silver handles both old and new formats.
Silver Layer: Clean and Conform
Silver applies data quality rules: type casting, null handling, deduplication, and schema conformance. Each Silver model is a dbt model with built-in tests validating business rules.
-- dbt model: silver/orders_cleaned.sql
WITH deduplicated AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY order_id
ORDER BY _ingested_at DESC
) AS rn
FROM {{ source('bronze', 'orders_raw') }}
)
SELECT
raw_payload:order_id::VARCHAR AS order_id,
raw_payload:customer_id::VARCHAR AS customer_id,
TRY_TO_DECIMAL(raw_payload:total_amount::VARCHAR, 10, 2) AS total_amount,
TRY_TO_TIMESTAMP(raw_payload:created_at::VARCHAR) AS order_created_at,
_ingested_at AS processed_at
FROM deduplicated
WHERE rn = 1
AND order_id IS NOT NULL
Gold Layer: Business-Ready
Gold tables are optimized for specific analytical use cases. Each Gold model is designed in collaboration with business stakeholders and serves a defined set of dashboards or reports. Gold models aggregate Silver data, apply business logic, and pre-compute expensive calculations.
The key principle: Gold tables should be queryable by business analysts without requiring SQL expertise beyond basic filtering and grouping. Complex joins, window functions, and business rule calculations happen in the dbt transformation — not in the BI tool.
Orchestration with Airflow
Our Airflow DAG structure mirrors the medallion layers. Bronze DAGs run on ingestion schedules (real-time for streaming sources, hourly for batch APIs). Silver DAGs trigger on Bronze completion. Gold DAGs run after Silver, with additional scheduling for time-based aggregations (daily summaries, weekly rollups).
Each DAG includes data quality checkpoints using dbt tests. If Silver cleaning fails quality thresholds — more than 5% null rate on required fields, for example — the pipeline halts and alerts the data engineering team before bad data reaches Gold.