Back to Blog
November 29, 2024Data Fundamentals

What Is a Data Model?

A data model is a blueprint for how your data is organized. It defines what data you store, how pieces of data relate to each other, and the rules that govern them.

Think of it like an architect's drawing for a building. Before you build, you plan: how many floors, where the walls go, how rooms connect. A data model does the same for your data.

Why Data Models Matter

Without a data model, data becomes chaos. Different systems define "customer" differently. Sales has one version, marketing another, support a third. Reports conflict. Nobody trusts the numbers.

A good data model provides: - Consistency - Everyone uses the same definitions - Clarity - Relationships between data are explicit - Quality - Rules prevent bad data from entering - Efficiency - Queries run faster when data is well-organized

The Cost of No Model
Companies without clear data models spend 40-60% of their analytics time just finding and reconciling data. That's before any actual analysis happens.

Types of Data Models

Conceptual Model - High-level view. What entities exist and how they relate. "Customers place Orders containing Products." No technical details.

Logical Model - More detail. What attributes each entity has. Customers have name, email, created_date. Orders have total, status, order_date.

Physical Model - Implementation details. Actual table names, column types, indexes, constraints. What gets built in the database.

Most business discussions focus on conceptual and logical models. Physical models are for developers and database administrators.

Core Concepts

Entities - The things you track. Customers, Products, Orders, Employees. Usually become tables in your database.

Attributes - Properties of entities. A Customer has name, email, phone. An Order has total, date, status.

Relationships - How entities connect. A Customer places many Orders. An Order contains many Products. Relationships define how data links together.

Keys - Unique identifiers. Customer ID, Order ID. How you reference specific records and connect tables.

Common Patterns

One-to-Many: One customer has many orders. One department has many employees. The most common relationship.

Many-to-Many: Orders contain products. Products appear in orders. Need a linking table (order_items) to model this.

Hierarchies: Employees report to managers who report to directors. Products belong to categories that belong to departments.

Data Modeling for Analytics

Analytics has its own modeling approaches:

Star Schema - A central fact table (like sales transactions) surrounded by dimension tables (customer, product, date, store). Simple, fast for queries, widely used.

Snowflake Schema - Like star, but dimensions are normalized (broken into more tables). More complex, saves storage, sometimes slower.

Dimensional Modeling - The overall approach (made famous by Ralph Kimball) of thinking in terms of facts (what happened) and dimensions (the context).

If you're building a data warehouse for analytics, you'll likely use dimensional modeling. It's optimized for the questions business users ask.

Practical Example

Say you're modeling an e-commerce business:

Entities: - Customers (who buys) - Products (what's sold) - Orders (transactions) - Order Items (line items within orders)

Relationships: - Customer → Orders (one to many) - Order → Order Items (one to many) - Product → Order Items (one to many)

Key Questions the Model Answers: - What did a customer buy? - How is a product selling? - What's the revenue trend?

The model makes these questions easy to answer with simple queries.

Common Modeling Mistakes

No single customer view - Customer data fragmented across systems with no way to connect them.

Inconsistent definitions - "Active customer" means different things in different reports.

Over-normalization - Breaking data into too many tables, making queries complex and slow.

Under-normalization - Duplicating data everywhere, leading to inconsistencies and update headaches.

Ignoring time - Not tracking how data changes over time. Losing history.

Getting Your Model Right

1. Start with business questions - What do you need to analyze? Work backward to what data you need.

2. Define your entities clearly - What is a customer? When does a lead become a customer? Get agreement.

3. Document everything - A model nobody understands is useless. Keep a data dictionary.

4. Plan for change - Business evolves. Build flexibility into your model.

5. Test with real queries - Make sure the model actually supports the questions you need to answer.

A good data model powers everything else. Learn about data warehouses that implement these models and SQL for querying them.

---

Sources: - IBM: What Is Data Modeling? - Kimball Group: Dimensional Modeling Techniques - erwin: What Is Data Modeling?

Ready to Talk Data Strategy?

Let's discuss how we can help with your data challenges.

Book a Call