dbt Artifact Files¶

dbt generates a set of JSON files — artifacts — every time you run dbt run, dbt test, or dbt docs generate. Datahub reads these files to learn about your dbt project: which models exist, what SQL they run, what columns they produce, how they depend on each other, and whether their tests passed.

Datahub does not run dbt itself. You generate the artifacts in your own pipeline and upload them to Azure Blob Storage; Datahub downloads and parses them on demand.

Why artifacts and not a live connection?¶

dbt Core produces static output files that capture a complete snapshot of the project at one point in time. Reading these files is faster, more portable, and less invasive than executing dbt commands on a live environment. It also means the integration works regardless of where dbt runs — locally, in GitHub Actions, on Airflow, or anywhere else.

manifest.json — Required¶

manifest.json is the primary source of truth. It is generated by dbt run, dbt compile, or dbt docs generate and contains everything Datahub needs to build the initial picture of your project.

What it contains:

Data	Detail
Node definitions	Every model, source, seed, snapshot, and analysis with its `unique_id`, name, schema, database, and resource type
SQL	`raw_code` (pre-Jinja) and `compiled_code` (post-Jinja) for each node
Column metadata	Column names, declared data types, and descriptions from `schema.yml`
Descriptions	Table-level and column-level documentation strings
Tags	Table and column tags
Materialization	`table`, `view`, `incremental`, `ephemeral`, etc.
Lineage	`parent_map` — which nodes each node depends on
FQN path	Fully-qualified name list for hierarchical display

What Datahub uses it for: creating all dbt nodes, populating SQL, building the lineage graph, and pre-filling column metadata.

Without manifest.json the sync has nothing to process. It is the only required artifact.

catalog.json — Recommended¶

catalog.json is generated by dbt docs generate and describes the actual state of tables and columns in your database — the types and column ordering as the warehouse reports them, not just what schema.yml declares.

What it adds:

Data	Detail
Database-actual column types	More precise than schema.yml (e.g. `NUMERIC(18,4)` vs `numeric`)
Column ordering	The physical order columns appear in the table

What Datahub uses it for: enriching column types where manifest.json leaves them blank or uses generic declarations. If catalog.json is absent, Datahub uses types from the manifest only.

run_results.json — Recommended¶

run_results.json is written (or updated) by dbt test and captures the result of every test that ran.

What it contains:

Data	Detail
Test unique ID	Links the result back to a test node in `manifest.json`
Status	`pass`, `fail`, `warn`, or `error`
Failures	Number of failing rows
Message	Error message or description when status is not `pass`
Execution time	Seconds the test took to run
Run timestamp	When the test executed

What Datahub uses it for: populating the Data Quality tab on each node and the Test Results section on the connection detail page. If run_results.json is absent, the test results tab is empty.

Generating your artifacts¶

Run the following sequence in your dbt project to produce all three files:

dbt run              # compiles and executes models → writes manifest.json
dbt test             # runs data tests → updates run_results.json
dbt docs generate    # introspects the database → writes catalog.json

All output files land in your project's target/ directory:

target/
├── manifest.json
├── catalog.json
└── run_results.json

dbt run alone writes a manifest.json but it may lack compiled_code for some nodes. Running dbt docs generate guarantees compiled_code is populated — which is required for lineage extraction. If your pipeline only runs dbt run, lineage on affected nodes will be empty.

Supported dbt Core versions¶

Datahub parses artifacts from dbt Core v1.2 through v1.9. The manifest schema is stable across this range; artifacts from earlier or later versions may parse with warnings or missing fields.

What Datahub stores from each file¶

Field	Source	Stored as
Node name, schema, database	`manifest.json`	dbt Node
Resource type (model / source / seed / snapshot)	`manifest.json`	dbt Node `resourcetype`
Raw SQL	`manifest.json` `raw_code`	dbt Node
Compiled SQL	`manifest.json` `compiled_code`	dbt Node
Materialization	`manifest.json` `config.materialized`	dbt Node
Tags	`manifest.json` `tags`	dbt Node + dbt Node Column
Column name + description	`manifest.json` `columns`	dbt Node Column
Column type (declared)	`manifest.json` `columns[].data_type`	dbt Node Column
Column type (actual)	`catalog.json` `nodes[].columns[].type`	dbt Node Column (enriches manifest value)
Lineage edges	`manifest.json` `parent_map`	dbt Lineage Edge (node-to-node only; macro edges excluded)
Test status, failures, message	`run_results.json` `results[]`	dbt Test Result
Test execution time	`run_results.json` `results[].execution_time`	dbt Test Result