Skip to content

Transcription: turn meetings into searchable knowledge

The Transcription module records spoken audio and turns it into searchable text the rest of the platform can use. Capture a meeting in the browser, upload an audio file, tag the result with a domain, and the transcript becomes part of your organisation's institutional memory — searchable by HERC, queryable across the platform, and (over time) linkable to the decisions and events that came out of it.

It's the bridge between what was said and what was decided.

When to choose this

Reach for Transcription when you want to:

  • Capture a meeting in real time. Open the recording session, hit Record, the browser uploads audio chunks every ~10 seconds and the platform transcribes them as you go.
  • Process a pre-recorded file. Upload an mp3, mp4, m4a, wav, webm, or ogg (≤25 MB) — one Whisper call, one transcript.
  • Make ad-hoc spoken decisions discoverable. A planning call, a customer interview, a steering committee — once tagged with a domain, HERC can answer "what decisions were made in Finance last quarter?".
  • Convert a transcript into platform events. Tagged transcripts can seed Events module rows automatically — capturing the what / when / why of a decision in a structured way.

You do not need this module for:

  • Compliance recordings (use a regulated recording vendor).
  • Voice-to-action automation (this is capture, not command).
  • Audio outside the supported formats (convert first; keep below 25 MB per file).

What Transcription looks like

Surface Where What you see
Sessions list /ai/transcription Tabs: Active, Archived. Each row: name, created, duration, last word count, tags. Search box across name + transcript text.
Session detail — Recording /ai/transcription/{id} (during recording) Live transcript building, audio level indicator, Stop recording.
Session detail — Editor /ai/transcription/{id} (after recording) Full transcript with word-level confidence heatmap — hover any word to see Whisper's confidence; inline edit. Required: name + ≥1 domain tag before completing.
File upload view /ai/transcriptionUpload file Pick an audio file, optional language, drop into Whisper for one-shot transcription.

How it works

The platform supports two capture flows:

  1. Live recording (chunked). The browser records WebM audio in ~10 s chunks and POSTs each chunk to the API. Each chunk is transcribed by Whisper independently, with the previous chunk's tail passed as a context prompt so word boundaries stay consistent. The transcript appears in real time as chunks complete.
  2. File upload. A single audio file (≤25 MB) is sent to Whisper in one call. No chunking; one transcript returned at the end.

Both flows produce a session in recording state with the full transcript populated. You then:

  • Name the session (mandatory).
  • Add ≥1 domain tag from the catalog tags (mandatory — this is what makes the transcript discoverable across the org).
  • Optionally edit the transcript — fix proper nouns, jargon, names. The word-confidence heatmap highlights where Whisper was uncertain so editing is targeted.
  • Click Complete. The session moves out of recording and the transcript is searchable.

Setup — what an admin needs to do once

Prereq Where Why
AI provider with Whisper-compatible model /admin/integrations → AI Provider Keys OpenAI key required for Whisper. The same key powers HERC.
Domain tags seeded /admin/system-tables → Tags (or /glossary Domains) Sessions must be tagged with at least one domain. The platform seeds Finance, HR, Sales, Marketing, Legal, Engineering, Operations, Product.
Roles /rolegroups aimodule.transcription.run to create sessions and upload audio.
Browser permissions Per-user The user must grant microphone access to use live recording (file upload doesn't need it).

Tagging matters

The completion gate (must have name + ≥1 domain tag) is intentional. Untagged transcripts are noise — they exist but no one can find them. The domain tag puts each transcript into a discoverable bucket, so HERC can answer:

  • "What decisions were made in Finance last quarter?"
  • "Find recent transcripts mentioning churn."
  • "Show me the Strategy call from May."

Pick the domain (or domains) that match the call's subject. You can update tags later if you mis-tag.

Word-confidence heatmap

Whisper returns per-word confidence values (in verbose_json mode). The editor renders these as underline shading:

  • Solid underline → high confidence.
  • Dashed / faded underline → uncertain word; edit if wrong.
  • No underline → confidence wasn't returned (legacy chunk).

Once you edit a word, the underline clears for that word and the surrounding context is marked as user-corrected. Editing doesn't re-run Whisper; it stores your text alongside the original.

Archive and restore

Transcripts can be archived (soft-delete) — the row is hidden from default lists but the data stays. Reasons to archive:

  • The conversation was casual / off-topic.
  • The session was a duplicate.
  • You're cleaning up old sessions but want a 90-day undo window.

Archived sessions appear in the Archived tab with a Restore action. There is no permanent delete from the UI — sensitive transcripts should be deleted at the database level by an admin via the audit-supervised process.

How transcripts become events

This is the killer cross-module workflow. From a completed transcript:

  1. Open the session → Extract events.
  2. The platform's AI agent reads the transcript and proposes one or more Events (decisions, milestones, market observations) with title, type, impact level, and the relevant excerpt.
  3. Review each suggestion → accept, edit, or skip.
  4. Accepted suggestions become rows in the Events module — first-class, governed, searchable, taggable.

Result: spoken decisions become structured organisational memory automatically.

Limitations

Limit Why Workaround
Per-file upload cap is 25 MB. Whisper's API limit. Split long recordings, or use the live recording flow which chunks for you.
Only OpenAI Whisper is supported as the speech-to-text engine. Quality + cost. Other engines may be added; for now, OpenAI is required.
Live recording uses WebM and requires a modern browser. The chunked flow needs MediaRecorder API. Chrome / Edge / Firefox supported. Safari has partial support.
Word-confidence heatmap is for the original chunk. Edits override but the original confidence is what shows. Edits clear the highlight for those words.
No speaker diarisation. Whisper doesn't return speakers natively. Edit speaker labels manually if needed.
No real-time translation. The model transcribes in the spoken language. Pass the optional language field on file upload to force a specific transcription language; translate afterwards.
Transcripts are user-scoped — only the creator sees them in the list. Privacy default. Tag and complete a session, then promote to events for org-wide visibility.

Audit & compliance

Question a CISO might ask Where to look
"Where does the audio physically go?" The audio is sent to OpenAI's Whisper endpoint for transcription, then discarded. The transcript text + word confidences are stored in your tenant's PostgreSQL. Audio is not retained by Datahub.
"Are transcripts searchable across users?" Only by their creator in the UI, but tagged transcripts feed HERC and the DNA graph (which respect role gates). Promote to events for explicit cross-org visibility.
"Can the AI access transcripts of meetings I wasn't part of?" No — agent calls are scoped to the user's JWT; transcripts are user-scoped. The DNA graph indexes transcripts but answers are filtered by viewer role.
"How do we delete a transcript permanently?" Archive then admin-delete via the audit-supervised flow. Sensitive transcripts can also be deleted by an admin running a row-level deletion against aimodule.transcriptions.
"Did the AI process this audio anywhere we don't expect?" OpenAI Whisper, calling endpoint configured in /admin/integrations. No other vendor.

Troubleshooting

Symptom Likely cause Fix
Microphone permission denied Browser blocked the request. Grant in browser settings; reload.
Live transcript is delayed Slow upload bandwidth. Wait — chunks are async.
File upload says "File too large" >25 MB. Split or compress.
Transcript is in the wrong language Whisper auto-detect failed. Re-upload with the explicit language form field.
Complete button is greyed Missing name or no domain tag. Add a name; pick at least one tag.
Many words show as low confidence Background noise, accent, or domain-specific jargon. Edit; consider a quieter recording environment.
Domain tag picker is empty Tags system table not seeded. Admin → /admin/system-tables → Tags → confirm domain tags exist.

See also

  • Events — promote a transcript's decisions into governed events.
  • HERC — ask "find decisions about churn in Finance" to navigate transcripts conversationally.
  • Data Catalog — tags shared with the transcription tag picker.
  • AI platform — provider keys, model settings, observability for Whisper calls.