Skip to content

Insights: how cached results work

When you open an Insights dashboard, every card runs a query against your Databricks SQL Warehouse. For most teams those underlying tables only refresh once a day — but every viewer who lands on the dashboard, every filter change, every page tab used to re-pay the warehouse cost from scratch. That is wasteful for the warehouse bill and slow for the user.

As of April 2026, Datahub keeps a shared, organization-wide cache of card query results. The first viewer to land on a dashboard pays the warehouse cost; everyone else gets the cached payload back in milliseconds, until the cache expires or someone clicks the in-card Refresh button. This page covers what the cache does for viewers, how admins tune the freshness window, when the per-card override is the right tool, and what "shared across the organization" actually means for sensitive data.

If you are looking for the metric-cards equivalent, that lives at Insights: backing a card with a metric. Metric-backed cards have their own cache and are not affected by the settings on this page.

What viewers see

Every SQL-backed card on a dashboard now shows a small chip in its top-right corner:

Updated 3h ago • next refresh in 21h [refresh icon]

That is the freshness chip. It tells you two things:

  • Updated <time> ago — when the card was last fetched from Databricks. If two of you are on the same dashboard, you both see the same value because you are both reading from the same cached row.
  • Next refresh in <time> — when the cache will automatically expire. After that point, the next viewer's load will hit Databricks again and the cached row will be replaced.

The clockwise arrow icon next to the chip is a manual refresh. Clicking it skips the cache for this card, fetches fresh data from Databricks, and stores the new payload. Everyone else on the dashboard immediately benefits — your refresh becomes their cached read.

When does the chip not show?

  • On metric-backed cards. Metrics have their own cache (see metricmodule.md → "Cache and refresh"). The chip only appears on cards that read raw SQL.
  • During the very first paint of a card. Once the first response arrives, the chip appears with whatever freshness the server reported.

What admins do once — set a sensible default per warehouse

The cache duration is configurable at two levels: per-integration (the default for every card on that warehouse) and per-card (an override for one specific card that needs a tighter or looser SLA). Both are optional. If you set neither, the platform uses 24 hours — the typical Databricks refresh cadence.

Per-integration default

  1. Go to Administration → Integrations and open the Databricks integration powering your dashboards.
  2. Scroll to the SQL Warehouse card.
  3. Set Cache duration (minutes). The field accepts 1 to 1440 (1 minute to 24 hours).
  4. Save.

When you save, every card on every dashboard backed by this integration gets its cache invalidated immediately. The next page load on any dashboard re-queries Databricks once and starts the new TTL window. Existing cached rows under the old TTL are dropped.

A few sensible starting points:

  • Daily ETL (the most common case): leave it empty or set 1440. The cache lasts a full day; the first morning viewer pays the cost; everyone else after that is fast.
  • Hourly batch refresh: set 60. Viewers within the same hour share one cached read; the next hour pays once.
  • Streaming-flavoured warehouses where users expect near-live data: set 5 or 1. The cache helps with rapid filter / page-tab clicks but not much else.

Per-card override

Sometimes one card needs a different cadence than the rest of its dashboard. A "live order count" tile on an otherwise daily-refresh dashboard is the canonical example.

  1. Open the dashboard, click the pencil icon on the card, and look for the Cache section in the right-side editor.
  2. Toggle / fill in Cache duration (minutes). 1 to 1440 minutes.
  3. Save.

Leaving the override empty makes the card inherit the integration default (or the 24h fallback if no integration default is set). Override values shorter than the integration default behave exactly like you'd expect — that one card refreshes more often, the others stay on the longer cadence.

"Shared across the organization" — what about row-level security?

For most Databricks integrations the cache is fully shared across your entire organization. The first viewer to load a dashboard fills the cache; every other viewer in your tenant reads that same cached payload. This is the whole point — it is what saves you the warehouse bill.

There is one exception: integrations that authenticate with per-user OAuth (the Databricks U2M OAuth flow). Per-user OAuth runs every Databricks query as the real human user viewing the dashboard, so Unity Catalog row-level security can scope what each person sees. Sharing one user's cached payload with another would defeat that — Alice would suddenly see Bob's restricted rows.

For per-user OAuth integrations the cache is therefore per-user-per-card, not org-shared. Each viewer still benefits from caching across their own filter changes and page tabs (which is what the user-visible chip reflects), but two different users on the same dashboard each pay the warehouse cost on their first load.

If you want the org-shared performance win and your data has no row-level restrictions per user, switch the integration's auth method to a service principal or PAT (see Databricks: per-user OAuth for the trade-offs).

When does the cache get invalidated?

You should rarely need to think about this — Datahub invalidates aggressively whenever something semantically changes — but for completeness:

  • A card is edited (its SQL, its filters, its query config, or its cache override). All cached rows for that card are dropped. The next viewer pays the warehouse cost once and refills the cache.
  • A dashboard is archived or deleted. All cached rows for the dashboard are dropped (the cleanup is automatic).
  • An integration's cache duration is changed, or its underlying warehouse is swapped. All cached rows for that integration are dropped.
  • A viewer clicks the Refresh icon on a card. Just that card is refreshed, for everyone.
  • Hourly background sweep. A reaper task removes expired rows so the table doesn't grow unbounded.

Manual changes to the underlying tables in Databricks (a fresh ETL run, a schema patch) do not automatically invalidate the cache — Databricks doesn't notify Datahub of those events. The cache will refresh on its own when the TTL expires; if you need to surface a fresh load earlier, click the Refresh icon on the affected card or use the admin purge.

Admin: clearing all cached results

Workspace administrators with the Insights manage role get an admin page at Administration → Insights cache. It shows:

  • Cached queries — total cached rows across the organization right now.
  • Dashboards with cache — how many distinct dashboards have at least one cached card.
  • Hit rate (24h) — how often viewers hit the cache instead of Databricks in the last 24 hours. Higher is better; expect 80–95% on a dashboard accessed by a team.
  • Total bytes (approx.) — how much storage the cache table currently uses.
  • Oldest entry / Newest entry — bookends of the current cache window.

The Purge all cached queries button clears every cached row across the organization. Use it after:

  • A platform-wide warehouse migration (you swapped the workspace URL on every integration at once).
  • An emergency where you want every viewer to see fresh data from Databricks on their next page load, regardless of the configured TTLs.
  • A schema change you know affects many dashboards and you want to force everyone to re-query rather than wait for the per-card invalidation hooks.

A confirmation dialog warns that the next page load on every dashboard will re-query Databricks. Use the per-card refresh first when only one card is stale; reach for purge-all only when something organization-wide changes.

Troubleshooting

"I clicked Refresh but the chip still says 'Updated 3h ago'."

The chip text updates after the new response comes back. Refreshes against a slow warehouse can take several seconds; watch the spinner inside the refresh button. If the icon stops spinning and the chip still hasn't moved, the request failed silently — check your browser's network panel for the /cards/{card_id}/execute call and report the error.

"I set Cache duration on my integration to 60 minutes but my card still says 'next refresh in 23h'."

The integration setting only applies to new cached rows. Existing rows live out their original TTL. Saving the integration also runs an invalidation pass, but if you have a card showing the long countdown right after the change, hit the in-card Refresh — that one click resets it to your new 60-minute window.

"I'm an admin but I don't see the Insights cache page in the sidebar."

You need the Insights manage role. Workspace owners and stewards have it by default; other roles have to be granted explicitly. Ask a workspace owner to check Administration → Roles.

"Two of us are on the same dashboard and we see different 'Updated X ago' values."

That means you are on a per-user OAuth integration (see § "Shared across the organization" above). The cache is per-user-per-card on those integrations by design — your freshness clock is independent of your colleague's. If you expect to share the cache, switch the integration's auth method.

"Sensitive numbers showed up after a viewer with broader access loaded the dashboard."

This should not happen if you correctly chose per-user OAuth for sensitive data. The cross-user cache sharing is only enabled when the integration uses a service principal or PAT — both of which run every query as the integration's identity, not the user's. If your dashboard relies on per-user RLS, the integration must be per-user OAuth. If you discover this drift, switch the auth method and clear the cache once.

Where this lives in the platform

For engineers and reviewers reading after the fact:

  • The cache lives in Postgres (table insights.insight_query_cache). It is not Redis — see ADR-013 for the decision.
  • The implementation mirrors the metric-cache pattern in MetricCacheService. New engineers learn one pattern, applied twice.
  • The dynamic TTL cascade is display_config.cache_ttl_minutes_overrideintegrations.config.cache_ttl_minutes → 1440 minutes hardcoded fallback.
  • The Refresh button sends bypass_cache: true on the existing POST /cards/{id}/execute endpoint — no separate refresh endpoint, just a flag.
  • Tenant scoping is via the dashboard_id → tenant_id bridge, not a denormalised tenant_id column on the cache table.