Databricks: per-user authentication (U2M OAuth)¶
DataHub can run Databricks queries as the calling user instead of as a single shared service identity. This unlocks Unity Catalog row-level security (RLS), per-user audit trails, and least-privilege access — without changing how your existing service-principal integrations behave.
When to choose oauth_u2m¶
Choose per-user OAuth (oauth_u2m) when:
- Your Unity Catalog uses row filters or column masks that depend on
current_user(), group membership, or any per-user attribute. - You need Databricks audit logs to attribute each query to the end-user running it (not to a shared service principal).
- You want users to only see data they have UC privileges for, even when querying through DataHub.
Stick with service principal when:
- You're embedding dashboards via
mint_embed_token(per-user embed tokens use a different identity model — see Limitations). - You're running batch / scheduled jobs where there is no end-user to attribute the query to.
- Your UC permissions are coarse-grained and don't depend on the calling user.
You can mix both: keep your existing service-principal integration for embeds and batch jobs, and add a parallel oauth_u2m integration for interactive querying.
Setup¶
One-time admin steps¶
- Register a public OAuth app in the Databricks Account Console.
- App type: public (no client secret — DataHub uses PKCE).
- Redirect URI:
https://{your-datahub-base}/usermanagementmodule/databricks/oauth/callback -
Note the resulting Client ID.
-
Create the integration in DataHub.
- Go to Admin → Integrations → New integration → Databricks.
- Set Authentication method to User-to-Machine OAuth.
- Paste the Host (your Databricks workspace URL) and the Client ID from step 1.
- Save. The integration is now ready for users to consent.
Per-user steps (each user, once per integration)¶
- Open Admin → Integrations → {your-integration}.
- On the Authentication tab, find the Databricks consent card.
- Click Connect to Databricks. You'll be taken to Databricks to consent.
- After consenting, Databricks redirects you back to DataHub. The card now shows Re-consent and the date your consent was last rotated.
That's it — every Databricks query you trigger from now on runs as you.
What users see¶
| State | What you see | What it means |
|---|---|---|
| Before consenting | "Connect to Databricks" button | You haven't consented yet. Click it once. |
| After consenting | "Re-consent" button + "Last rotated …" timestamp | You're connected. Tokens rotate transparently — you don't need to re-consent unless you revoke or your session is wiped. |
| When a query needs consent | A toast appears: "Databricks consent required" | DataHub tried to query Databricks and your consent was missing or revoked. Open the integration page and click "Connect to Databricks" to restore access. |
The consent card also has a small revoke button (X icon). Revoking deletes your stored refresh token. Your next query will surface the consent toast again.
How rotation works¶
You consent once. DataHub stores an encrypted refresh token in our key vault and uses it to mint a short-lived access token for each Databricks request. The refresh token rotates transparently — you do not see and do not need to re-consent on rotation.
You only need to re-consent when:
- You explicitly revoke (via the X button on the consent card).
- Databricks invalidates your refresh token (e.g. password reset, IdP session wipe).
- An admin removes the OAuth app from the Databricks Account Console.
In all three cases, the consent toast appears the next time you query — re-consenting takes one click.
Limitations¶
Embedded dashboards still use the service-principal identity¶
When DataHub embeds a Databricks dashboard via the embed-token flow (mint_embed_token), the embed token itself attests the viewing user to Databricks under a different identity model. Per-user UC RLS does not apply to embedded dashboards. Plan for this:
- If a dashboard contains data with per-user row filters, embedding it will show data as the integration's service principal sees it, not as the end-user.
- For RLS-sensitive views, prefer the in-app query surfaces (Insights, Metrics, Dashboards' native query builders, Genie) — these all honour
oauth_u2m.
Admin-triggered list / discovery operations¶
Operations that list workspaces, notebooks, or other Databricks resources from an admin's perspective use the admin's own consent. If an admin tries a list operation and has not yet consented, they get the same consent toast and need to click Connect once.
One refresh token per (user, integration)¶
If your tenant has multiple Databricks workspaces under one integration, your single consent applies to all of them. The OAuth consent step in the UI anchors on the first workspace; the underlying refresh token is reused across workspaces of the same integration.
Service principal not bypassed¶
oauth_u2m is additive. Existing PAT and service-principal integrations behave exactly as before. You can have one tenant with three integrations (PAT, SP, OAuth) — they coexist without interference.
Audit & compliance¶
| Concern | DataHub's behaviour |
|---|---|
| Where are refresh tokens stored? | Encrypted in your tenant's key vault. Only DataHub's backend can decrypt them, and only at query time. |
| Does DataHub leak the key-vault reference? | No. The reference (encrypted_refresh_token_ref) is never returned in any API response. This is enforced by an automated test that fails CI if the field appears in any endpoint response body. |
| Can a user impersonate another user via the API? | No. Every endpoint that touches per-user state derives the calling user from the JWT exclusively. Decoy user_id fields in request bodies, queries, or headers are silently ignored. A regression-test suite (*_no_spoof.py) asserts this on every relevant endpoint. |
| Does Databricks see the actual end-user in audit logs? | Yes. Each query runs as the calling user's UC identity — Databricks audit attributes the query to them, not to DataHub or to a service principal. |
| Can I revoke a user's access centrally? | Yes — via the Databricks Account Console (revoke the user from the OAuth app or remove their UC privileges). The next DataHub query for that user will fail and they will see the consent toast. |
Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
| "Databricks consent required" toast on every query | Refresh token missing or revoked | Open the integration detail page → Connect to Databricks |
| Consent loop (consent → toast → consent again) | Redirect URI mismatch between the OAuth app and your DataHub base URL | Confirm the Databricks Account Console app's redirect URI is exactly https://{your-datahub-base}/usermanagementmodule/databricks/oauth/callback |
| Embedded dashboard shows different data than the same query in Insights | mint_embed_token uses SP identity, not your user identity |
Expected. Use the in-app query surfaces for RLS-sensitive views. |
| Revoke button does nothing visible | Revoke is silent on success — the card flips back to "Connect to Databricks" | If it doesn't flip, refresh the page; the underlying token has already been deleted on the backend. |