Back to home

dbt-model-diff — Data-level diffs across Git branches

An open-source CLI that compares the actual data output of a dbt model between two Git refs (e.g. main vs a feature branch). It builds both versions, snapshots results into an isolated schema, and reports deterministic differences for CI, PR reviews, and safer refactors.

Open source · 2025dbt · Data quality toolingPostgres + RedshiftView on GitHub
Situation
dbt changes are usually reviewed via SQL diffs, compiled code, or manifest changes — but these don’t answer the real question: did the data change? Small refactors can cause silent row drops, duplicates, or schema drift.
Task
Build a lightweight tool to compare model outputs between main and a feature branch, and make the result usable for CI checks and PR feedback.
Action
Implemented a CLI that builds the model on both refs using Git worktrees, snapshots results into an isolated schema, computes row diffs and column profiles using warehouse-native SQL, and renders output as rich, json, or markdown.
Result
Enables safer dbt refactors with deterministic data regression checks. Teams can validate whether changes affect row counts, schema shape, or key-based records — and plug it into CI workflows.

High-level architecture

The tool creates isolated Git worktrees for base/head, builds both with dbt, snapshots results into a diff schema, then runs comparisons and outputs a report.

1Step 1 – Create Git worktrees

The tool checks out base and head refs into isolated worktrees so builds don't conflict.

Git worktrees
  • Creates isolated folders for base + head refs
  • Prevents dbt artifact collisions
  • Keeps your working tree untouched
dbt build + snapshot
  • Runs dbt build on both refs
  • Resolves relation via manifest.json
  • Copies results into diff schema as {model}__base and {model}__head
Compare + report
  • Row counts + schema diff
  • Column profile: null %, uniqueness
  • Key-based row diff (added/removed/changed)

Demo: usage & output formats

Example command and sample outputs. JSON is ideal for CI gating; markdown is PR-comment ready.

Command
dbt-model-diff diff dim_customers \
  --keys customer_id \
  --base main \
  --head feature/include-4 \
  --profiles-dir . \
  --project-dir . \
  --format rich
Output
Summary
────────────────────────
Base rowcount:    3
Head rowcount:    4
Added rows:       1
Removed rows:     0
Changed rows:     0

Schema Changes
────────────────────────
No column differences detected.

Column Profile (example)
────────────────────────
customer_id:
  % nulls (base): 0.0
  % nulls (head): 0.0
  uniqueness (base): 1.0
  uniqueness (head): 1.0

Tech stack

Pythondbt CoreGit worktreesPostgreSQLRedshiftTyper CLIDocker E2E tests

dbt-model-diff helps teams shift left on data correctness by catching data regressions before merge. It’s designed to be warehouse-native, CI-friendly, and easy to extend with new adapters.

Explore dbt-model-diff on GitHub