LLM-Powered Automatic Documentation for dbt
dbt Power Tools is a CLI that auto-generates dbt model and column documentation using LLMs (Ollama or OpenAI). It parses your dbt project, understands model SQL and data, and writes clear documentation back into schema.yml.
schema.yml descriptions. Documentation was always “later”, and never caught up with reality.manifest.json, inspects model SQL and (optionally) warehouse data, prompts an LLM with rich context, and writes structured descriptions back into schema.yml using Jinja templates.High-level architecture
The tool plugs into an existing dbt project and uses dbt's own artifacts as the contract: manifest.json for structure and schema.yml for documentation as code.
Read manifest.json, existing schema.yml, and optional warehouse profile.
manifest.json(models, refs, sources)- Existing
schema.yml(if present) - Optional warehouse connection via
profiles.yml
- Parses model SQL and dependency graph
- Samples data (row counts, missing %, example values) when enabled
- Prompts LLM (Ollama / OpenAI) with structured context per model and column
- Updates
schema.ymlin-place - Model descriptions: purpose, grain, business meaning
- Column docs: semantics + derived stats
- Ready for dbt docs site / PR review
How it works
The CLI is designed to be safe, repeatable, and friendly to CI. Documentation is generated from the project artifacts – not from ad-hoc inspection.
- Discover models. The CLI parses
manifest.jsonto enumerate models, sources, and their dependencies. - Collect context. For each model, it gathers SQL, upstream dependencies, tags, and existing docs. If configured, it queries the warehouse for simple profile stats.
- Prompt the LLM. It builds a structured prompt that explains the model's intent, joins, and transformations in plain language, and asks for concise documentation.
- Write docs as code. The resulting descriptions are written into
schema.ymlusing a Jinja-based template so teams can tweak the style without changing Python. - Run in CI. The CLI can be wired into a dbt or GitHub Actions workflow so new or changed models are documented automatically in pull requests.
Example generated documentation
An example of how a model and its columns look after running the tool. The actual text is controlled via templates so teams can match their documentation tone.
models:
- name: fct_orders
description: >
Fact table representing one row per customer order, joined to core
customer and product dimensions. Used for revenue, margin and volume
reporting at the daily grain.
columns:
- name: order_id
description: >
Surrogate key for the order. Unique per order across all channels.
- name: customer_id
description: >
Links to the dim_customers table to enrich with customer attributes.
- name: order_date
description: >
Business date of the order, used as the primary reporting date.
- name: gross_revenue
description: >
Pre-discount revenue in the transaction currency.
- name: net_revenue
description: >
Revenue after discounts and returns. Used in margin calculations.Tech stack
The goal of dbt Power Tools is to make documentation the easiest part of analytics engineering, not the guiltiest secret in the repo.
Read usage & installation on GitHub