Datahub Documentation¶
Learn how Datahub helps you govern data, ship AI products, and run your business — module by module. Every guide is written for the people who use the platform, not the people who build it.
New to Datahub? Start with HERC — the home page and AI assistant — then follow the trail into whichever module your role centres on.
Where to start by role¶
| If you are a… | Start with |
|---|---|
| Business user asking the platform questions | HERC → Tasks → Insights |
| Data steward governing terms, metrics, contracts | Business glossary → Metrics → Data contracts |
| Metric author defining KPIs | Metrics → Defining metrics & joining tables → Insights |
| Dashboard builder | Insights → Backing a card with a metric → Dashboards |
| Data catalog steward | Data catalog → Metadata engine → Business glossary |
| Automation engineer | Flows → Logic engine → Processes |
| Platform admin | Administration → Workflows → AI platform |
| Knowledge / AI lead | Organisation DNA → HERC → AI platform |
Modules — by group¶
Top-level¶
- HERC — your AI assistant + the home page
- Tasks — central work inbox for approvals, hand-offs, acknowledgements
- Organisation DNA — the learned knowledge graph powering deep AI answers
Automation¶
- Processes (BPM) — visual, version-controlled documentation of organisational procedures
- Logic engine — alert rule builder over Databricks SQL
- Flows — n8n-style automation execution engine for actions, transforms and conditions
Insight¶
- Insights — live, governed dashboards on Databricks SQL warehouses
- Metrics — define organisational KPIs as governed, reusable objects
- Defining metrics & joining tables — the deep dive on the semantic layer
- Dashboards — registry of Databricks Lakeview dashboards (embedded with native rendering and Genie chat)
- Backing a card with a metric — live metric references on insight cards
- Matrix, heatmap & multi-metric cards — advanced card types
- Cached query results — how insights stay snappy without hammering the warehouse
Capture¶
- Data catalog — assets, lineage, documents, tagging
- Business glossary — terms, governance, lifecycle
- Metadata engine — connections, snapshots, freshness, scheduler
- Transcription — Whisper-backed speech to text
- Events — organisational event log to annotate trends and capture decisions
Ownership¶
- Data contracts — schemas, teams, linked products
- Data products — bundle metrics, terms, assets, processes, alert rules into a versioned product owned by a team
Cross-cutting¶
- Administration — modules, system tables, branding, defaults, users, roles, fiscal calendars
- Workflows — the approval engine behind every Submit for Review button
- AI platform — provider keys, model settings, guardrails, observability, experiments, Datahub Private AI
Integrations¶
- Databricks per-user OAuth — connect Databricks with per-user identity, Unity Catalog RLS and per-user audit
Conventions used in these docs¶
- When to choose this at the top of every module page — quick fit-check before you read on.
- What it looks like — surfaces, where they live in the UI, what's on each.
- Concepts — the language used in the rest of the page.
- Setup — what an admin needs to wire up once.
- Limitations + Audit & compliance + Troubleshooting — the long-tail you actually need under pressure.
These pages are written for business users, administrators, data stewards, and power users. They don't document UI layout shifts, refactors, or anything that's only meaningful to engineers building Datahub itself. Internal architectural decision records live in docs/adr/ of the source repository.