Back to Blog
December 7, 2024Architecture

What Is Medallion Architecture?

If you're working with data lakes or modern data platforms like Databricks, you've probably encountered the term "medallion architecture." It's a design pattern that organizes data into layers of increasing refinement - and it's become the standard approach for building scalable, maintainable data systems.

The Three Layers

Medallion architecture divides your data lake into three distinct layers, named after precious metals:

BronzeRaw DataSilverCleaned DataGoldBusiness-Ready

Bronze Layer (Raw)

The bronze layer stores data exactly as it arrived from source systems. No transformation, no cleaning - just raw data preserved in its original form.

Why keep raw data? It provides an audit trail and safety net. If something goes wrong downstream, you can always go back to the source. If business requirements change (and they will), you have the original data to work with.

"But won't that waste storage?" Storage is cheap. Debugging data issues when you've lost the original is expensive.

Silver Layer (Cleaned)

The silver layer is where transformation happens. Data gets: - Cleaned: Fix data types, handle nulls, remove duplicates - Standardized: Consistent formats, naming conventions, schemas - Validated: Business rules applied, quality checks enforced - Joined: Related data connected together

This layer is queryable and useful, but it's still fairly granular. Think of it as "clean operational data."

Gold Layer (Business-Ready)

The gold layer contains curated, aggregated data optimized for specific business use cases. This is what powers dashboards, reports, and analytics.

Examples: - sales_summary_daily: Pre-aggregated daily sales metrics - customer_360: A unified view of each customer - revenue_by_region_monthly: Business-ready KPIs

Gold tables are designed for performance and usability, not flexibility. They answer specific questions fast.

Layer Purpose Summary
Bronze = "What happened?" (raw facts) | Silver = "What does it mean?" (cleaned data) | Gold = "What should we do?" (business metrics)

Why This Pattern Works

1. Separation of Concerns

Each layer has a clear responsibility. Bronze handles ingestion. Silver handles transformation. Gold handles presentation. Teams can work independently.

2. Data Quality Progression

Quality improves at each stage. You can catch issues early without corrupting downstream data.

3. Reprocessing is Safe

Since bronze preserves raw data, you can reprocess silver and gold layers when requirements change. No need to re-ingest from sources.

4. Performance Optimization

Gold tables can be heavily optimized for query patterns without affecting the flexibility of silver tables.

Common Pitfalls

Skipping layers. Tempting to go straight from raw to gold. Don't. You'll regret it when requirements change.

Gold tables that are too generic. Gold should serve specific use cases. If it's trying to do everything, it belongs in silver.

Not documenting transformations. Each layer should have clear documentation of what transformations were applied.

Treating bronze as temporary. Bronze is permanent storage, not a staging area. Size your storage accordingly.

When to Use Medallion Architecture

Medallion architecture works best when: - You have multiple data sources feeding into a central platform - Data quality varies by source - Business requirements are likely to evolve - Multiple teams need to access data at different levels of refinement

It may be overkill for small, simple datasets with stable requirements. A single well-designed table might suffice.

Getting Started

If you're starting fresh, begin with these steps:

1. Define your bronze layer - What sources? What format? How frequently? 2. Design silver transformations - What cleaning is needed? What's the target schema? 3. Identify gold use cases - What questions need answering? What aggregations?

Start simple. You can add complexity as needs evolve.

Medallion architecture pairs well with data lakehouses and modern platforms like Databricks.

---

Sources: - Databricks: Medallion Architecture - Microsoft Azure: Medallion Architecture - Qlik: Guide to Medallion Architecture

Ready to Talk Data Strategy?

Let's discuss how we can help with your data challenges.

Book a Call