Skip to main content
All Articles
Engineering13 min read

Implementing Medallion Architecture: Bronze, Silver, Gold Data Layers

A practical guide to implementing the medallion architecture pattern for enterprise data platforms with dbt, Snowflake, and Apache Airflow for progressive data refinement.

Awwaltech

Data Engineering Team

March 18, 2026
Data EngineeringSnowflakedbtAirflowAnalytics

Progressive Data Refinement

The medallion architecture organizes data into three layers of progressive quality refinement. Raw ingested data lands in Bronze, cleaned and validated data moves to Silver, and business-ready aggregated data lives in Gold. Each layer serves different users with different quality requirements.

Bronze Layer: Preserve Everything

The Bronze layer is an append-only landing zone. Every record from every source system is stored with its original schema, ingestion timestamp, and source metadata. We never transform data in Bronze — the goal is a complete, immutable record of everything the organization has ever received.

-- Bronze table: raw, untransformed, append-only
CREATE TABLE bronze.orders_raw (
    _ingested_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
    _source_system  VARCHAR(100),
    _batch_id       VARCHAR(50),
    _raw_payload    VARIANT  -- Store original JSON without schema enforcement
);

This design protects against schema evolution issues. When a source system changes its API response format, Bronze continues ingesting without breaking. The transformation logic in Silver handles both old and new formats.

Silver Layer: Clean and Conform

Silver applies data quality rules: type casting, null handling, deduplication, and schema conformance. Each Silver model is a dbt model with built-in tests validating business rules.

-- dbt model: silver/orders_cleaned.sql
WITH deduplicated AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY order_id
            ORDER BY _ingested_at DESC
        ) AS rn
    FROM {{ source('bronze', 'orders_raw') }}
)
SELECT
    raw_payload:order_id::VARCHAR AS order_id,
    raw_payload:customer_id::VARCHAR AS customer_id,
    TRY_TO_DECIMAL(raw_payload:total_amount::VARCHAR, 10, 2) AS total_amount,
    TRY_TO_TIMESTAMP(raw_payload:created_at::VARCHAR) AS order_created_at,
    _ingested_at AS processed_at
FROM deduplicated
WHERE rn = 1
    AND order_id IS NOT NULL

Gold Layer: Business-Ready

Gold tables are optimized for specific analytical use cases. Each Gold model is designed in collaboration with business stakeholders and serves a defined set of dashboards or reports. Gold models aggregate Silver data, apply business logic, and pre-compute expensive calculations.

The key principle: Gold tables should be queryable by business analysts without requiring SQL expertise beyond basic filtering and grouping. Complex joins, window functions, and business rule calculations happen in the dbt transformation — not in the BI tool.

Orchestration with Airflow

Our Airflow DAG structure mirrors the medallion layers. Bronze DAGs run on ingestion schedules (real-time for streaming sources, hourly for batch APIs). Silver DAGs trigger on Bronze completion. Gold DAGs run after Silver, with additional scheduling for time-based aggregations (daily summaries, weekly rollups).

Each DAG includes data quality checkpoints using dbt tests. If Silver cleaning fails quality thresholds — more than 5% null rate on required fields, for example — the pipeline halts and alerts the data engineering team before bad data reaches Gold.