Skip to content

Metrics: defining and joining tables

DataHub's metric layer lets you define a single, governed source of truth for every business metric — once, and then reuse it across dashboards, alerts, and downstream consumers. This page covers two surfaces of the Metric builder modal that data stewards reach for most often:

  • Joining tables — building a metric whose semantic model spans more than one physical table, joined by a Unity Catalog foreign key.
  • System schemas — what to do when the foreign-key picker tells you Unity Catalog system schemas are disabled.

If you are looking for the read-only metric query surface (running an existing metric from a dashboard, applying filters, picking a granularity), that is documented separately and out of scope here.

When to choose this

Use the Joined tables flow in the metric builder when:

  • Your metric needs to slice a measure by a dimension that lives on a different table — for example sum(orders.amount) by customers.region. The amount lives on the orders table; the region lives on the customers table.
  • The two tables are already related by a foreign key declared in Unity Catalog (the referential_constraints view in system.information_schema). DataHub will not invent a join for you — UC is the source of truth.
  • Both tables live in the same Databricks catalog and schema. Cross-catalog and cross-schema joins are out of scope for v1.

Stick with the Single table tab (the default) when:

  • Your metric's measure and grouping columns all live on one physical table.
  • You want to ship the metric without involving an admin to declare new UC FKs.
  • You are still iterating on the measure or aggregation — picking columns is faster on a single-table surface.

You can always start in the Single table tab, ship a v1 metric, and come back later to add joins. The Joined tables tab does not lock you out of single-table flows — your existing metrics are unchanged.

Setup — what an admin needs to do once

The Joined tables tab depends on two things being in place in Databricks. Both are one-time admin steps; once they are set, every metric author benefits.

1. Declare the foreign key in Unity Catalog

DataHub reads foreign-key relationships from system.information_schema.referential_constraints and system.information_schema.key_column_usage. If the relationship is not declared in UC, the picker will show No declared foreign keys in Unity Catalog for that table — even if the column names match.

Declare the relationship on the child table (the one carrying the FK column):

sql ALTER TABLE main.raw.orders ADD CONSTRAINT orders_customer_id_fk FOREIGN KEY (customer_id) REFERENCES main.raw.customers (customer_id);

UC also requires a primary key on the parent table — DataHub picks this up automatically if it exists:

sql ALTER TABLE main.raw.customers ADD CONSTRAINT customers_pk PRIMARY KEY (customer_id);

Once both are declared, the next time a metric author opens the builder against orders, the Joined tables tab will list customer_id → main.raw.customers as an available join target.

2. Enable system schemas (Standard SKU only)

Premium SKU workspaces have system.information_schema enabled by default. Standard SKU workspaces require an explicit enablement — see § System schemas below.

What metric authors see

Open the metric builder from Metrics → Create metric (top-right of the metrics list). The modal opens on the Single table tab — exactly as it did before. The new behaviour lives behind the second tab:

Tab What it does
Single table The existing single-table builder, unchanged. Pick a connection, catalog, schema, table; pick a measure column, time dimension, and primary key; save. This is the default landing surface.
Joined tables Adds a foreign-key picker plus a list of staged joins. Each join is one hop: [base table].[FK column] → [joined table].[PK column].

The Joined tables tab

The tab's content depends on the state of the base-table pick:

State What you see
You have not picked a base table yet "Pick a base table first to discover joinable tables."
The base table has no declared UC foreign keys An info banner: "This table has no declared foreign keys in Unity Catalog. See the docs on joining tables to add a join manually." (links here)
The workspace has not enabled system.information_schema A warning banner: "Foreign-key auto-discovery requires Unity Catalog system.information_schema access. Ask your workspace admin to enable system schemas to use the joined-tables flow." See § System schemas.
The base table has at least one FK A dropdown labelled Join via foreign key, an Add join button, and a list of staged joins (one card per join, with a remove button).

When at least one join is staged:

  • The Measure column and Time column dropdowns gain a header per source table so you can clearly see which column comes from which table.
  • The Primary key field continues to refer to the base table's primary key — this is the entity the metric is anchored to.
  • The compiled YAML now contains one semantic_models: entry per source table (the base table plus each joined table). MetricFlow uses the entity declarations to wire the join automatically — no hand-rolled SQL.

Joining tables

You will arrive at this anchor from the Joined tables tab's info banner when the base table has no declared UC foreign keys, or from a teammate sharing the link.

The metric builder's Joined tables tab supports one-hop joins only in v1. That means a metric can span:

  • orders ⨝ customers
  • events ⨝ users

but not:

  • orders ⨝ customers ⨝ regions ❌ (two hops)

If your metric needs a two-hop chain, the v1 workaround is to materialise a single-hop view in dbt and bind the metric to that. We are tracking multi-hop joins as a v2 polish item.

How dependency tracking handles joined tables

When a metric joins multiple tables (for example, joining orders to customers to slice revenue by region), the metric depends on every contract whose table contributes a measure or a dimension. Linking only the headline table's contract is not enough — the catalog will refuse to save the metric until all contributing contracts are linked, surfacing a COLUMN_REF_NOT_IN_LINKED_CONTRACTS error that names the table and column it could not find.

Foreign-key entities used purely to wire up the join (no measures or dimensions pulled from the joined model) do not require their own contract link.

Concretely: if Total revenue lives on orders and you slice by customers.region, you must link both the orders contract AND the customers contract. If you only link orders, the save fails with a clear pointer at the customers.region reference.

Removing a staged join

Click the trash icon on a staged join's card. If the join already has measure / time / primary-key columns bound to it, DataHub asks you to confirm before removing — the bound columns will be cleared because they no longer have a home.

System schemas

You will arrive at this anchor from the Joined tables tab's warning banner when the workspace's system.information_schema is not enabled.

Unity Catalog ships with a system catalog that exposes metadata views — including referential_constraints (the source of truth for foreign keys) and key_column_usage (the column-pair mapping). Premium SKU workspaces have these enabled by default. Standard SKU workspaces require an account-admin to enable them once.

Without system.information_schema access, the FK picker has no way to discover joinable tables. We deliberately surface this as a clear admin-action banner — silently falling back to manual entry would erode the "UC is the source of truth" invariant the metric layer relies on.

How an admin enables system schemas

  1. Go to the Databricks Account Console.
  2. Under Catalog Explorer (or Data), open the system catalog.
  3. If the catalog does not exist, click Enable system catalog (Account admin role required).
  4. Inside system, ensure the information_schema schema is enabled. Some workspaces also need access and billing for separate features; only information_schema is required for DataHub's FK picker.

The official Databricks docs cover this in detail under Enable system schemas. Once enabled, refresh the metric builder; the FK picker should appear within a few seconds.

How it works

The metric builder reads UC metadata via DataHub's POST /insightsmodule/query/foreign-keys endpoint, which queries:

  • system.information_schema.referential_constraints (the FK declarations themselves), and
  • system.information_schema.key_column_usage (the column-pair mapping for each FK).

Both queries are parameterised — DataHub does not interpolate raw catalog / schema / table names into SQL. The endpoint is gated on the same insights.manage role as the rest of the catalog discovery surface.

When you save a metric with one or more joined tables, DataHub:

  1. Generates one semantic_models: block per source table — the base table plus each joined table.
  2. Adds the FK column to the base table's entities: block as a type: foreign entry, matching the joined table's type: primary entity by name. MetricFlow uses this entity-name match to wire the join at compile time.
  3. Compiles the spec via MetricFlow, persists a metric_contract_dependencies row per (contract, column), and only then accepts the save.

If MetricFlow rejects the spec — for example, because a contributing contract is missing — the save fails with a precise pointer at the unsatisfied reference. You can adjust contracts and re-save without losing your other inputs.

Limitations

Limit Why Workaround
One join hop only (A ⨝ B, not A ⨝ B ⨝ C) Keeps the FK-discovery query trivial and the UI flat for v1. Materialise a one-hop view in dbt and bind the metric to that.
Both tables must live in the same catalog and schema v1 scope decision. Move the joined table or define it as a view inside the base catalog/schema.
FKs must be declared in Unity Catalog UC is the source of truth — no manual FK entry in v1 to avoid typos and drift. Declare the FK in UC (see § Setup).
The headline measure stays on the base table A joined model carrying measures: requires its own agg_time_dimension, which the v1 emitter does not add. Either keep the measure on the base table, or model the joined surface as a separate metric.
Embedded dashboards do not honour per-user UC RLS Embed tokens use a different identity model — see Databricks per-user OAuth. This is a Databricks identity-model constraint, not a metric-layer one.

Audit & compliance

Every multi-table metric is auditable through the same surfaces as a single-table metric:

Question a CISO might ask Where to look
"Which contracts does this metric depend on?" The metric detail page lists every linked contract; backed by metric_contract_dependencies rows in Postgres.
"Did the user have permission to query both tables?" UC enforces this at query time — DataHub mints a per-user OAuth token (when configured) so the actual end-user identity flows to UC.
"Was the join definition tampered with after approval?" Each save creates an immutable metric_versions snapshot containing the full spec_yaml. The dependency rows are version-scoped (metric_version_id, not metric_definition_id).
"Why did this metric fail to save?" The error envelope names the unsatisfied reference (COLUMN_REF_NOT_IN_LINKED_CONTRACTS plus the (table, column) pair).

Troubleshooting

Symptom Likely cause Fix
Joined tables tab shows "No declared foreign keys in Unity Catalog" UC FK is not declared, or DataHub has not refreshed its UC cache yet. Verify with SHOW CONSTRAINTS ON main.raw.orders; in Databricks. If declared, give DataHub ~30s to refresh; otherwise declare the FK as shown in § Setup.
Joined tables tab shows "Foreign-key auto-discovery requires Unity Catalog system.information_schema access" Workspace is on Standard SKU and system.information_schema is disabled. See § System schemas.
Save fails with COLUMN_REF_NOT_IN_LINKED_CONTRACTS naming a column on a joined table You linked the headline table's contract but not the joined table's contract. Open the Contracts picker on the metric builder and link the contract that covers the joined table; re-save.
Save fails with dbt parse failed mentioning agg_time_dimension The joined model received a measure but no time dimension was declared on it. In v1, keep measures on the base table. If you must move the measure to a joined table, also pick a time column on that table.
FK picker is empty even though referential_constraints shows the FK in Databricks The customer querying user (the per-user OAuth identity) does not have USE CATALOG / USE SCHEMA privileges on system.information_schema. Grant USAGE on system.information_schema to the user or the role they assume.
The compiled YAML contains the joined table but the metric returns no rows when run The join key types do not match across the two tables (e.g. INT on one side, STRING on the other). Cast the column types to match in upstream dbt models — UC does not cast for you.

If something is broken that this page does not cover, ping your DataHub contact with the metric ID and the error text — both are visible on the metric detail page.