dbt Integration: nodes, lineage, tests, SQL¶

The dbt integration reads artifact files generated by your dbt project and makes models, sources, seeds, and snapshots first-class objects in Datahub. Once connected, you can browse every dbt node, view its raw and compiled SQL, trace upstream and downstream lineage, and inspect test results — all without running dbt inside Datahub.

If your warehouse holds the transformed data and your dbt project describes how it was built, the dbt integration is what makes that description visible to the rest of your organisation.

When to choose this¶

Reach for the dbt integration when you want to:

Explore your dbt project. Browse all models, sources, seeds, and snapshots with their schema, materialization, and tags.
View SQL. See the raw SQL from schema.yml and the compiled SQL after Jinja rendering, per node.
Trace lineage. Visualise which models depend on which sources or models, and which downstream models a source feeds.
Monitor test results. See whether each dbt test passed, failed, warned, or errored — per sync run.
Inspect column metadata. Column names, data types, and descriptions sourced from your manifest and catalog.

You do not need the dbt integration for:

Running dbt (Datahub does not execute dbt commands).
Managing dbt profiles or projects.
Scheduling dbt jobs (trigger syncs from your own CI/CD pipeline).
Importing dbt metadata into the Data Catalog — promotion from dbt to the Catalog is not yet wired; the dbt integration is a read-only explorer.

What Datahub integrates from dbt¶

Category	What you get
Nodes	All models, sources, seeds, and snapshots — name, schema, database, resource type, materialization strategy, tags, and FQN path.
SQL	Raw SQL (from `schema.yml`) and compiled SQL (from `dbt compile` / `dbt docs generate`) for each node.
Columns	Column names, data types, and descriptions. Manifest provides descriptions; `catalog.json` enriches the types with database-actual values.
Lineage	Directed dependency graph: upstream and downstream edges derived from `parent_map` in `manifest.json`.
Test results	Pass / fail / warn / error status per dbt test, with failure count, message, and execution time — one snapshot per sync run.

How it works¶

Your dbt project generates artifact files (manifest.json, catalog.json, run_results.json) during a dbt run / dbt test / dbt docs generate cycle.
You — or your CI/CD pipeline — upload those files to an Azure Blob Storage container.
You trigger a sync in Datahub; the platform downloads the artifacts, parses them, and persists the results as a Sync Run.

Each sync run is a versioned snapshot. You can compare nodes and test results across runs by selecting a previous run from the version picker.

What the dbt integration looks like¶

Surface	Route	What you see
dbt Connections	`/metadata-engine/dbt-connections`	Grid of configured dbt artifact connections with name, last sync timestamp, and node count.
Connection detail	`/metadata-engine/dbt-connections/{id}`	Node browser, lineage tab, SQL viewer, test results, and sync history for one connection.
Node list	Node tab on the connection detail	Filterable, searchable table of all dbt nodes for the selected sync run.
Node detail	Click a node	Schema tab (columns + types + descriptions), dbt tab (raw + compiled SQL), Lineage tab (dependency graph), Data Quality tab (test results for this node).
Sync history	Sync Runs section	List of past syncs with counts of nodes, columns, edges, and test results synced.

Concepts¶

Concept	What it is
dbt Connection	A registered link to an Azure Blob Storage container that holds your dbt artifacts. Credentials live in Azure Key Vault.
Artifact source	The storage backend (Azure Blob) from which artifacts are downloaded on sync.
Sync Run	A single execution of the sync operation. Produces a versioned snapshot of all nodes, columns, edges, and test results.
dbt Node	One model, source, seed, or snapshot from your dbt project, identified by its `unique_id`.
Lineage edge	A directed dependency between two nodes (upstream → downstream), derived from `parent_map` in `manifest.json`.
Test result	The outcome of one dbt test execution — status, failure count, message, and execution time — tied to a specific sync run.

Setup — what you need once¶

Prereq	Where	Why
dbt project generating artifacts	Your own infrastructure	Datahub reads pre-built artifacts; it does not run dbt itself.
Azure Blob Storage	Your Azure subscription	The artifact source. One container per dbt project is a common pattern.
App Registration	Azure Active Directory	Datahub authenticates with a client ID + secret; the secret is stored in Key Vault.
Azure Key Vault	Configured on the Datahub platform	Connection secrets never touch the Datahub database.

See dbt on Azure Blob Storage for a step-by-step setup guide.