Skip to main content
When building your dbt models for Lightdash, following these best practices will help you create a better experience in Lightdash for your end users and improve query performance.

Use wide, flat tables in the BI layer

We recommend using wide, flat tables in the BI layer because this minimizes complex joins that need to be handled at runtime.

Why wide tables work better

Modern columnar data warehouses (like Snowflake, BigQuery, and Redshift) are optimized for wide table formats. The star schema was initially introduced to optimize performance for row-based data warehouses, but with today’s columnar warehouses, wide and flat is the way to go. Wide tables offer several advantages:
  • Faster query performance: Fewer joins mean faster queries at runtime
  • Better user experience: All related fields appear together in a single section in the Lightdash sidebar, making it easier for business users to find what they need
  • Simpler to understand: End users don’t need to understand complex relationships between multiple tables
  • More accurate AI agents: AI agents have more context when working with wide tables, so they provide more accurate answers

How to implement wide tables

If your data is already modeled in a star schema upstream, you can maintain that structure in your transformation layer, then combine the models into wide tables that you surface in the BI layer. Build these wider tables for key business areas where you might want specialized analysis or AI agents capabilities, such as:
  • Revenue and sales
  • Marketing and campaigns
  • Operations and logistics
  • Customer behavior
For example, instead of having separate orders, customers, and products tables that need to be joined, create a wide orders_enriched table that includes all the relevant customer and product information alongside the order data.

One schema.yml file per dbt model

We recommend structuring your dbt project with one .yml file per model (or .sql file). This approach makes it easier to:
  • Navigate through your YAML files as your project grows
  • Manage and maintain individual models
  • Avoid cluttered shared schema files
  • Keep related configuration together
Here’s an example structure:
models/
  orders.sql
  orders.yml
  customers.sql
  customers.yml
  products.sql
  products.yml
While Lightdash supports having all model definitions in a single schema.yml file at the directory level, we’ve found that separate files per model scales better as your project grows.

What about star schema?

While we recommend wide flat tables, we do support joins in Lightdash and via AI agents, so you have the flexibility to build out your semantic layer in a way that works best for your team. If you’re using a star schema, keep in mind:
  • Fields get split into multiple sections in the Lightdash sidebar, which can be less intuitive for business users
  • Cross-model references in underlying values become more complex to manage
  • Now that Lightdash has fanout protection, the main performance concern with joins is mitigated
One approach is to maintain your star schema upstream for data modeling purposes, then materialize wide summary tables for specific business use cases as needed. This gives you the best of both worlds: clean data modeling practices upstream and optimized tables for BI consumption.

Optimizing query performance and warehouse costs

All Lightdash queries run against your data warehouse. Beyond using wide, flat tables (covered above), these additional strategies help improve performance and reduce costs.
StrategyPerformance impactCost impact
Materialize as tablesHighHigh
Index and partition dataHighHigh
Minimize joinsHighMedium
Enable cachingMediumHigh
Limit exposed modelsLowMedium
Monitor usageVisibility

Materialize models as tables

Views re-execute SQL on every query. Tables store pre-computed results.
# Recommended for frequently queried models
{{ config(materialized='table') }}

# For large datasets with append-only updates
{{ config(materialized='incremental') }}
Schedule dbt runs (daily/hourly) to keep tables fresh while avoiding on-demand computation. For large datasets with append-only updates, consider incremental models.

Index and partition your data

Proper indexing and partitioning in your data warehouse can dramatically improve query performance and reduce costs. These optimizations happen at the warehouse level and benefit all queries, including those from Lightdash. Partitioning divides large tables into smaller segments based on a column (typically a date). Queries that filter on the partition column only scan relevant partitions instead of the entire table. Clustering/indexing organizes data within partitions to speed up filtering and sorting on frequently queried columns.
WarehousePartitioningClustering/Indexing
BigQuerypartition_bycluster_by
SnowflakeAutomatic micro-partitionscluster_by
Redshiftdist and sort keyssort keys
Databrickspartition_byzorder
Example dbt configuration for BigQuery:
{{ config(
    materialized='table',
    partition_by={
      "field": "created_at",
      "data_type": "date",
      "granularity": "day"
    },
    cluster_by=["customer_id", "status"]
) }}
Best practices:
  • Partition by date columns used in time-based filters (e.g., created_at, order_date)
  • Cluster by columns frequently used in WHERE clauses or GROUP BY
  • Review your warehouse’s query history to identify high-cost queries that could benefit from partitioning

Minimize joins at query time

Pre-join data in your dbt models rather than joining at query time. As discussed in wide, flat tables, this approach outperforms runtime joins.

Leverage caching

Caching stores query results so repeat visits skip the warehouse entirely. Most effective for:
  • Frequently accessed dashboards
  • Charts without dynamic time filters
  • Scheduled deliveries

Limit models exposed to the BI layer

Only expose production-ready, optimized models to your users in Lightdash. Staging models, intermediate transformations, work-in-progress and tables that are not yet optimized for query performance should remain hidden from end users to avoid confusion and ensure they’re working with reliable, performant data.

Monitor query usage

Query tags help you identify optimization opportunities:
  • Tables that need materialization or indexing
  • Expensive queries to optimize
  • Usage patterns for caching decisions
  • Cost attribution by dashboard, chart, or user