Databricks: per-user authentication (U2M OAuth)¶
Datahub can run Databricks queries as the calling user instead of as a single shared service identity. This unlocks Unity Catalog row-level security (RLS), per-user audit trails, and least-privilege access — without changing how your existing service-principal integrations behave.
When to choose oauth_u2m¶
Choose per-user OAuth (oauth_u2m) when:
- Your Unity Catalog uses row filters or column masks that depend on
current_user(), group membership, or any per-user attribute. - You need Databricks audit logs to attribute each query to the end-user running it (not to a shared service principal).
- You want users to only see data they have UC privileges for, even when querying through Datahub.
Stick with service principal when:
- You're embedding dashboards via
mint_embed_token(per-user embed tokens use a different identity model — see Limitations). - You're running batch / scheduled jobs where there is no end-user to attribute the query to.
- Your UC permissions are coarse-grained and don't depend on the calling user.
You can mix both: keep your existing service-principal integration for embeds and batch jobs, and add a parallel oauth_u2m integration for interactive querying.
Setup¶
One-time admin steps¶
The form in Datahub walks you through the same four steps in order. The exact Redirect URI for your tenant is displayed in the form with a copy button — no need to type it by hand.
- Open the integration form in Datahub.
- Go to Admin → Integrations → New integration → Databricks.
-
Set Authentication method to User-to-Machine OAuth. The form now shows the redirect URI you'll paste into Databricks in step 2.
-
Register an OAuth app in the Databricks Account Console.
- Open the Databricks Account Console → Settings → App connections (or User Management → OAuth apps depending on your account UI).
- Click Add connection.
- Paste the Redirect URI that Datahub displays in the integration form into the Redirect URLs field. It must match character-for-character.
- Choose Public client (uses PKCE) unless you have a specific reason to register a confidential client. Public is the recommended path — Datahub mints PKCE-protected tokens, so no client secret is needed.
-
Save the OAuth app and copy the resulting Client ID.
-
Finish the integration in Datahub.
- Back in the Datahub integration form, paste the Host (your Databricks workspace URL) and the Client ID from step 2.
- Leave the Confidential client (has a client secret) toggle off for a public PKCE app. Turn it on only if you registered the OAuth app as a confidential client in Databricks; the form will then reveal an optional Client Secret field.
- Save. The integration is now ready for users to consent.
Per-user steps (each user, once per integration)¶
End users do not need access to the admin Integrations module. The first time Datahub needs to query Databricks on your behalf, the platform creates a high-priority Connect to Databricks task for you and surfaces a toast on the page that triggered the query.
- Open the Connect to Databricks task from your inbox (or the bell), or click the toast on a dashboard / metric / insight that needs Databricks.
- You land on the dedicated consent page at
/databricks/consent/{integration-id}— it shows the integration name, the workspace host you're about to authorise, and a single Connect to Databricks button. - Click Connect to Databricks. You're sent to Databricks to consent.
- After consenting, Databricks redirects you back to the same consent page. The status flips to Connected, the task auto-completes, and any pending toast disappears.
Admins still have the same flow available on Admin → Integrations → {integration} (Authentication tab → Per-user OAuth section). Admins also see a global view of who has consented per integration; end users only ever see their own consent state.
That's it — every Databricks query you trigger from now on runs as you.
What users see¶
| State | What you see | What it means |
|---|---|---|
| Before consenting (passive) | A Connect to Databricks task in your inbox, high priority | Datahub minted the task automatically the first time it needed your Databricks identity. The task's open action takes you straight to the consent page — you don't need admin permissions. |
| Before consenting (active) | A toast on a dashboard / metric / insight: "Connect your Databricks account to continue. We've added it to your tasks." | Background polling tripped the consent gate. The toast is rate-limited to once per 30 seconds per integration so it doesn't stack on auto-refreshing pages. |
| On the consent page | Integration name, workspace host, status pill (Connected / Not connected) and the single Connect to Databricks button | A focused, opinionated landing — no admin chrome, no other integration's settings. |
| After consenting | The status flips to Connected · last refreshed … and the task is marked done | You're connected. Tokens rotate transparently — you don't need to re-consent unless you revoke or your session is wiped. |
| When a query needs consent again later | The same task is re-opened (or a new one is created if the old one was completed/closed) and the toast returns | Datahub tried to query Databricks and your consent was missing or revoked. Open the task or the toast to restore access. |
Admins additionally see a small revoke button on the integration's Per-user OAuth card. End users can revoke from the same consent page (/databricks/consent/{id}) — the page renders the same revoke control once you're connected.
How rotation works¶
You consent once. Datahub stores an encrypted refresh token in our key vault and uses it to mint a short-lived access token for each Databricks request. The refresh token rotates transparently — you do not see and do not need to re-consent on rotation.
You only need to re-consent when:
- You explicitly revoke (via the X button on the consent card).
- Databricks invalidates your refresh token (e.g. password reset, IdP session wipe).
- An admin removes the OAuth app from the Databricks Account Console.
In all three cases, the consent toast appears the next time you query — re-consenting takes one click.
Limitations¶
Embedded dashboards still use the service-principal identity¶
When Datahub embeds a Databricks dashboard via the embed-token flow (mint_embed_token), the embed token itself attests the viewing user to Databricks under a different identity model. Per-user UC RLS does not apply to embedded dashboards. Plan for this:
- If a dashboard contains data with per-user row filters, embedding it will show data as the integration's service principal sees it, not as the end-user.
- For RLS-sensitive views, prefer the in-app query surfaces (Insights, Metrics, Dashboards' native query builders, Genie) — these all honour
oauth_u2m.
Admin-triggered list / discovery operations¶
Operations that list workspaces, notebooks, or other Databricks resources from an admin's perspective use the admin's own consent. If an admin tries a list operation and has not yet consented, they get the same consent toast and need to click Connect once.
One refresh token per (user, integration)¶
If your tenant has multiple Databricks workspaces under one integration, your single consent applies to all of them. The OAuth consent step in the UI anchors on the first workspace; the underlying refresh token is reused across workspaces of the same integration.
Service principal not bypassed¶
oauth_u2m is additive. Existing PAT and service-principal integrations behave exactly as before. You can have one tenant with three integrations (PAT, SP, OAuth) — they coexist without interference.
Audit & compliance¶
| Concern | Datahub's behaviour |
|---|---|
| Where are refresh tokens stored? | Encrypted in your tenant's key vault. Only Datahub's backend can decrypt them, and only at query time. |
| Does Datahub leak the key-vault reference? | No. The reference (encrypted_refresh_token_ref) is never returned in any API response. This is enforced by an automated test that fails CI if the field appears in any endpoint response body. |
| Can a user impersonate another user via the API? | No. Every endpoint that touches per-user state derives the calling user from the JWT exclusively. Decoy user_id fields in request bodies, queries, or headers are silently ignored. A regression-test suite (*_no_spoof.py) asserts this on every relevant endpoint. |
| Does Databricks see the actual end-user in audit logs? | Yes. Each query runs as the calling user's UC identity — Databricks audit attributes the query to them, not to Datahub or to a service principal. |
| Can I revoke a user's access centrally? | Yes — via the Databricks Account Console (revoke the user from the OAuth app or remove their UC privileges). The next Datahub query for that user will fail and they will see the consent toast. |
Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
| "Databricks consent required" toast on every query | Refresh token missing or revoked | Open the integration detail page → Connect to Databricks |
| Consent loop (consent → toast → consent again) | Redirect URI mismatch between the OAuth app and your Datahub base URL | Open the integration form in Datahub, copy the Redirect URI value shown there, and paste it into the Databricks OAuth app's Redirect URLs field — it must match character-for-character (scheme, host, path, no trailing slash). |
invalid_request — redirect_uri 'http://127.0.0.1:8000/...' not registered |
Build older than 2026-04-21: backend derived the redirect URI from request.base_url, which becomes the upstream backend bind host behind a reverse proxy / Vite dev proxy and never matches what you pasted into Databricks. |
Pull the latest build. The frontend now sends its own window.location.origin-derived URI; the backend uses it verbatim (after path validation) so the URI Databricks sees matches the one shown in the integration form. No re-registration needed if you already pasted the correct URI in Databricks. |
Not authenticated on Databricks callback after consenting |
Build older than 2026-04-21: the datahub-token cookie was set with SameSite=Strict, so the browser stripped it on the cross-site redirect from databricks.net back into Datahub. The callback endpoint then saw no session and returned 401. |
Pull the latest build. The auth cookie is now SameSite=Lax, which still blocks cross-site POST/PUT/DELETE and XHR/fetch (full CSRF protection for every mutating endpoint) but allows the cookie to travel on legitimate inbound top-level navigations like the OAuth callback. After deploying, users need to log in once more for the cookie to be rewritten with the new attribute. |
API or Warehouse connection test reports 'credentials_secret_ref' is required but not set for an OAuth U2M integration |
Build older than 2026-04-21: connection-test code branched on auth_type and treated anything that wasn't a service principal as PAT, which has a credentials_secret_ref. OAuth U2M has no integration-level credential by design (auth is per-user at query time). |
Pull the latest build. The API connection test now verifies workspace host reachability and reports Workspace host reachable — … (per-user OAuth tested at consent). The Warehouse test reports a non-failing OAuth U2M — warehouse connectivity is verified per-user at query time after consent. End-to-end query authentication is exercised the moment any user runs a query. |
OAUTH_CLIENT_ID_MISSING when clicking Connect to Databricks |
Either (a) the integration predates per-user OAuth and never had a Client ID saved, or (b) the integration was saved on a build older than 2026-04-21 where Key Vault silently rejected the secret name (the underscore was incompatible with Azure KV). | Open the integration → Authentication tab → re-paste the OAuth app's Client ID (and Client Secret if you use a confidential client) and save. The current build writes a Key Vault-compatible name and fails loudly on storage errors, so the next save lands a working entry. |
| Embedded dashboard shows different data than the same query in Insights | mint_embed_token uses SP identity, not your user identity |
Expected. Use the in-app query surfaces for RLS-sensitive views. |
| Revoke button does nothing visible | Revoke is silent on success — the card flips back to "Connect to Databricks" | If it doesn't flip, refresh the page; the underlying token has already been deleted on the backend. |