dbt Artifact Files¶
dbt generates a set of JSON files — artifacts — every time you run dbt run, dbt test, or dbt docs generate. Datahub reads these files to learn about your dbt project: which models exist, what SQL they run, what columns they produce, how they depend on each other, and whether their tests passed.
Datahub does not run dbt itself. You generate the artifacts in your own pipeline and upload them to Azure Blob Storage; Datahub downloads and parses them on demand.
Why artifacts and not a live connection?¶
dbt Core produces static output files that capture a complete snapshot of the project at one point in time. Reading these files is faster, more portable, and less invasive than executing dbt commands on a live environment. It also means the integration works regardless of where dbt runs — locally, in GitHub Actions, on Airflow, or anywhere else.
manifest.json — Required¶
manifest.json is the primary source of truth. It is generated by dbt run, dbt compile, or dbt docs generate and contains everything Datahub needs to build the initial picture of your project.
What it contains:
| Data | Detail |
|---|---|
| Node definitions | Every model, source, seed, snapshot, and analysis with its unique_id, name, schema, database, and resource type |
| SQL | raw_code (pre-Jinja) and compiled_code (post-Jinja) for each node |
| Column metadata | Column names, declared data types, and descriptions from schema.yml |
| Descriptions | Table-level and column-level documentation strings |
| Tags | Table and column tags |
| Materialization | table, view, incremental, ephemeral, etc. |
| Lineage | parent_map — which nodes each node depends on |
| FQN path | Fully-qualified name list for hierarchical display |
What Datahub uses it for: creating all dbt nodes, populating SQL, building the lineage graph, and pre-filling column metadata.
Without
manifest.jsonthe sync has nothing to process. It is the only required artifact.
catalog.json — Recommended¶
catalog.json is generated by dbt docs generate and describes the actual state of tables and columns in your database — the types and column ordering as the warehouse reports them, not just what schema.yml declares.
What it adds:
| Data | Detail |
|---|---|
| Database-actual column types | More precise than schema.yml (e.g. NUMERIC(18,4) vs numeric) |
| Column ordering | The physical order columns appear in the table |
What Datahub uses it for: enriching column types where manifest.json leaves them blank or uses generic declarations. If catalog.json is absent, Datahub uses types from the manifest only.
run_results.json — Recommended¶
run_results.json is written (or updated) by dbt test and captures the result of every test that ran.
What it contains:
| Data | Detail |
|---|---|
| Test unique ID | Links the result back to a test node in manifest.json |
| Status | pass, fail, warn, or error |
| Failures | Number of failing rows |
| Message | Error message or description when status is not pass |
| Execution time | Seconds the test took to run |
| Run timestamp | When the test executed |
What Datahub uses it for: populating the Data Quality tab on each node and the Test Results section on the connection detail page. If run_results.json is absent, the test results tab is empty.
Generating your artifacts¶
Run the following sequence in your dbt project to produce all three files:
dbt run # compiles and executes models → writes manifest.json
dbt test # runs data tests → updates run_results.json
dbt docs generate # introspects the database → writes catalog.json
All output files land in your project's target/ directory:
dbt runalone writes amanifest.jsonbut it may lackcompiled_codefor some nodes. Runningdbt docs generateguaranteescompiled_codeis populated — which is required for lineage extraction. If your pipeline only runsdbt run, lineage on affected nodes will be empty.
Supported dbt Core versions¶
Datahub parses artifacts from dbt Core v1.2 through v1.9. The manifest schema is stable across this range; artifacts from earlier or later versions may parse with warnings or missing fields.
What Datahub stores from each file¶
| Field | Source | Stored as |
|---|---|---|
| Node name, schema, database | manifest.json |
dbt Node |
| Resource type (model / source / seed / snapshot) | manifest.json |
dbt Node resourcetype |
| Raw SQL | manifest.json raw_code |
dbt Node |
| Compiled SQL | manifest.json compiled_code |
dbt Node |
| Materialization | manifest.json config.materialized |
dbt Node |
| Tags | manifest.json tags |
dbt Node + dbt Node Column |
| Column name + description | manifest.json columns |
dbt Node Column |
| Column type (declared) | manifest.json columns[].data_type |
dbt Node Column |
| Column type (actual) | catalog.json nodes[].columns[].type |
dbt Node Column (enriches manifest value) |
| Lineage edges | manifest.json parent_map |
dbt Lineage Edge (node-to-node only; macro edges excluded) |
| Test status, failures, message | run_results.json results[] |
dbt Test Result |
| Test execution time | run_results.json results[].execution_time |
dbt Test Result |
See also¶
- dbt Integration — overview of what Datahub integrates and how it works.
- dbt on Azure Blob Storage — step-by-step guide to uploading artifacts and connecting Datahub.