Skip to content

dbt Artifact Files

dbt generates a set of JSON files — artifacts — every time you run dbt run, dbt test, or dbt docs generate. Datahub reads these files to learn about your dbt project: which models exist, what SQL they run, what columns they produce, how they depend on each other, and whether their tests passed.

Datahub does not run dbt itself. You generate the artifacts in your own pipeline and upload them to Azure Blob Storage; Datahub downloads and parses them on demand.

Why artifacts and not a live connection?

dbt Core produces static output files that capture a complete snapshot of the project at one point in time. Reading these files is faster, more portable, and less invasive than executing dbt commands on a live environment. It also means the integration works regardless of where dbt runs — locally, in GitHub Actions, on Airflow, or anywhere else.

manifest.json — Required

manifest.json is the primary source of truth. It is generated by dbt run, dbt compile, or dbt docs generate and contains everything Datahub needs to build the initial picture of your project.

What it contains:

Data Detail
Node definitions Every model, source, seed, snapshot, and analysis with its unique_id, name, schema, database, and resource type
SQL raw_code (pre-Jinja) and compiled_code (post-Jinja) for each node
Column metadata Column names, declared data types, and descriptions from schema.yml
Descriptions Table-level and column-level documentation strings
Tags Table and column tags
Materialization table, view, incremental, ephemeral, etc.
Lineage parent_map — which nodes each node depends on
FQN path Fully-qualified name list for hierarchical display

What Datahub uses it for: creating all dbt nodes, populating SQL, building the lineage graph, and pre-filling column metadata.

Without manifest.json the sync has nothing to process. It is the only required artifact.

catalog.json is generated by dbt docs generate and describes the actual state of tables and columns in your database — the types and column ordering as the warehouse reports them, not just what schema.yml declares.

What it adds:

Data Detail
Database-actual column types More precise than schema.yml (e.g. NUMERIC(18,4) vs numeric)
Column ordering The physical order columns appear in the table

What Datahub uses it for: enriching column types where manifest.json leaves them blank or uses generic declarations. If catalog.json is absent, Datahub uses types from the manifest only.

run_results.json is written (or updated) by dbt test and captures the result of every test that ran.

What it contains:

Data Detail
Test unique ID Links the result back to a test node in manifest.json
Status pass, fail, warn, or error
Failures Number of failing rows
Message Error message or description when status is not pass
Execution time Seconds the test took to run
Run timestamp When the test executed

What Datahub uses it for: populating the Data Quality tab on each node and the Test Results section on the connection detail page. If run_results.json is absent, the test results tab is empty.

Generating your artifacts

Run the following sequence in your dbt project to produce all three files:

dbt run              # compiles and executes models → writes manifest.json
dbt test             # runs data tests → updates run_results.json
dbt docs generate    # introspects the database → writes catalog.json

All output files land in your project's target/ directory:

target/
├── manifest.json
├── catalog.json
└── run_results.json

dbt run alone writes a manifest.json but it may lack compiled_code for some nodes. Running dbt docs generate guarantees compiled_code is populated — which is required for lineage extraction. If your pipeline only runs dbt run, lineage on affected nodes will be empty.

Supported dbt Core versions

Datahub parses artifacts from dbt Core v1.2 through v1.9. The manifest schema is stable across this range; artifacts from earlier or later versions may parse with warnings or missing fields.

What Datahub stores from each file

Field Source Stored as
Node name, schema, database manifest.json dbt Node
Resource type (model / source / seed / snapshot) manifest.json dbt Node resourcetype
Raw SQL manifest.json raw_code dbt Node
Compiled SQL manifest.json compiled_code dbt Node
Materialization manifest.json config.materialized dbt Node
Tags manifest.json tags dbt Node + dbt Node Column
Column name + description manifest.json columns dbt Node Column
Column type (declared) manifest.json columns[].data_type dbt Node Column
Column type (actual) catalog.json nodes[].columns[].type dbt Node Column (enriches manifest value)
Lineage edges manifest.json parent_map dbt Lineage Edge (node-to-node only; macro edges excluded)
Test status, failures, message run_results.json results[] dbt Test Result
Test execution time run_results.json results[].execution_time dbt Test Result

See also