Data warehouse design patterns for mid-market businesses
Why mid-market businesses need a data warehouse
A data warehouse is a centralised repository that consolidates data from multiple source systems into a structure optimised for analysis and reporting. If your business relies on reports cobbled together from spreadsheets, CSV exports, and manual data pulls from different applications, you’ve already felt the pain a warehouse solves.
The problems it addresses
- Inconsistent numbers - finance, sales, and operations each report different revenue figures because they pull from different sources with different definitions.
- Slow reporting - analysts spend 80% of their time finding and cleaning data and 20% analysing it.
- No single source of truth - critical business questions require querying multiple systems and reconciling the results by hand.
- Operational system strain - running heavy analytical queries against production databases degrades application performance.
Mid-market businesses often assume data warehouses are only for large enterprises with dedicated data teams. That was true a decade ago. Modern cloud-based warehouses and tooling have reduced the cost and complexity to the point where a two-person data team can operate a warehouse that serves the entire business.
Core design patterns
Star schema
The star schema is the most common warehouse design pattern. It organises data into:
- Fact tables - contain the quantitative data you want to analyse (sales transactions, support tickets, website visits). Each row represents an event or measurement.
- Dimension tables - contain the descriptive attributes you use to filter, group, and label your facts (customer details, product categories, date hierarchies, geographic regions).
The star schema gets its name from the visual layout: a central fact table surrounded by dimension tables, connected by foreign keys.
Advantages:
- Simple to understand and query
- Optimised for the way business users think (show me sales by region by month)
- Excellent performance for common BI tool patterns
- Well-supported by every BI platform
Snowflake schema
A snowflake schema normalises dimension tables into sub-dimensions. Instead of a single product dimension with category and subcategory columns, you’d have separate tables for products, categories, and subcategories linked by foreign keys.
Advantages:
- Reduces data redundancy
- Easier to maintain when dimension attributes change frequently
Disadvantages:
- More complex queries with additional joins
- Slower query performance in most cases
- Harder for business users to navigate
For most mid-market warehouses, the star schema is the better starting point. Snowflake schemas add complexity that’s rarely justified unless you have specific normalisation requirements.
Wide (denormalised) tables
An increasingly popular pattern in modern cloud warehouses: collapse facts and dimensions into wide, flat tables with many columns. Cloud warehouses like BigQuery, Snowflake, and Databricks handle wide tables efficiently thanks to columnar storage.
This pattern trades storage efficiency for query simplicity. Analysts can answer questions without writing joins, which reduces errors and speeds up exploration.
ETL vs ELT
Data needs to be extracted from source systems, transformed into the warehouse schema, and loaded into the warehouse. The order of the last two steps defines two approaches.
ETL (extract, transform, load)
Data is transformed before loading into the warehouse. This was the standard approach when warehouse compute was expensive and storage was limited. Transformations happen in a separate processing layer (an ETL tool or custom scripts).
Best for: on-premise warehouses with limited compute, heavily regulated environments where only clean data should enter the warehouse.
ELT (extract, load, transform)
Data is loaded raw into the warehouse first, then transformed using the warehouse’s own compute engine. This is the dominant pattern in modern cloud warehouses, which offer cheap storage and powerful compute.
Best for: cloud-based warehouses, agile analytics teams that need to iterate quickly on transformations, environments where raw data preservation is valuable.
The modern consensus favours ELT for new warehouse projects. Tools like dbt (data build tool) have standardised the transformation layer, making ELT accessible to teams without heavy engineering backgrounds.
The modern data stack
A practical mid-market data warehouse typically comprises four layers.
Ingestion layer
Tools that extract data from source systems and load it into the warehouse.
- SaaS connectors - Fivetran, Airbyte, Stitch extract data from common applications (CRMs, ERPs, marketing platforms, databases) with minimal configuration.
- Custom pipelines - for proprietary systems or unusual data sources, Python scripts or lightweight orchestration tools (Dagster, Prefect) handle extraction.
Storage layer (the warehouse)
The cloud warehouse that stores and processes your data.
- Snowflake - strong all-rounder with excellent concurrency and data sharing capabilities
- Google BigQuery - serverless, no infrastructure to manage, cost-effective for intermittent workloads
- Amazon Redshift - tightly integrated with AWS ecosystem
- Databricks - lakehouse pattern combining warehouse and data lake, strong for teams that also do data science
For most mid-market deployments, the choice between these platforms matters less than getting started. All are capable, and migration between them, while not trivial, is feasible.
Transformation layer
Tools that model and transform raw data within the warehouse.
dbt has become the de facto standard. It lets analysts write SQL-based transformations, version them in Git, test them automatically, and document them. This brings software engineering practices to data transformation without requiring a software engineering team.
BI layer
The tools that business users interact with to explore data and build dashboards. Covered in depth in our business intelligence guide.
An incremental adoption path
You don’t need to build the entire stack on day one. A phased approach manages risk and delivers value early.
Phase 1: Centralise (4-6 weeks)
Pick your two or three most critical data sources (typically your CRM, financial system, and one operational system). Set up a cloud warehouse, configure ingestion, and load raw data.
Even without transformation, having data from multiple systems in one place unlocks analysis that was previously impossible.
Phase 2: Model (4-8 weeks)
Build your first star schema models in dbt. Start with the metrics your leadership team asks for most often - revenue, pipeline, customer counts, operational KPIs. Create a single, authoritative definition for each metric.
Phase 3: Serve (2-4 weeks)
Connect a BI tool to the warehouse and build dashboards for the metrics defined in Phase 2. Train key users on self-service exploration.
Phase 4: Expand (ongoing)
Add more data sources, build more models, and onboard more users. Each iteration follows the same pattern: ingest, model, serve.
Common mistakes
Boiling the ocean
Trying to warehouse every data source simultaneously guarantees delays and scope creep. Start with the data that supports your most critical business questions.
Neglecting data quality
A warehouse that contains bad data erodes trust faster than no warehouse at all. Implement data quality checks (row counts, null checks, referential integrity) in your transformation layer from the start.
No ownership
A warehouse without an owner becomes a dumping ground. Assign a data team or individual as steward, responsible for data models, quality, documentation, and user support.
Ignoring access control
Just because data is centralised doesn’t mean everyone should see everything. Implement role-based access from day one, especially for financial, HR, and customer data.
Over-engineering early
Start with simple patterns (star schema, basic dbt models, a single BI tool) and add complexity only when specific requirements demand it. Premature optimisation wastes effort and introduces fragility.
Building the capability
A mid-market data warehouse doesn’t require a large team. One to two people with SQL skills and an understanding of the business can operate the stack described here. What it does require is deliberate design, consistent investment, and alignment between technical implementation and business questions.
ITHQ’s data, analytics, and database engineering team helps businesses design and build warehouse solutions that scale. We also advise on cloud architecture to ensure the underlying platform is cost-effective and well-governed, and on business technology strategy to align your data investments with organisational priorities.
Next steps
If your reporting depends on spreadsheets, manual data pulls, or querying production databases, a data warehouse will transform how your business uses information.
Contact ITHQ to discuss your data landscape and design a warehouse strategy that fits your scale and budget.