Usage Guide¶
Quick Start¶
- Launch Hermes — it appears as a winged helmet icon in your menu bar. There is no Dock icon.
- Click the menu bar icon to show the floating overlay panel.
- Select your microphone from the dropdown at the top of the overlay.
- Start your call in Zoom, Meet, Teams, FaceTime, or any app.
- Press the red record button (or Cmd+Shift+R) to start transcribing.
- Watch the live transcript appear with Me and Them speaker labels.
- Stop when you're done. The session is saved automatically.
The Overlay¶
The overlay is a floating panel that stays on top of your call window.
Expanded Mode¶
The full overlay (340 × 480) shows:
- Mic picker — dropdown to select your input device (disabled during recording)
- Recording controls — record, pause, resume, stop
- Live transcript — scrolling text with speaker labels and timestamps
- History button — clock icon to browse past sessions
Collapsed Mode¶
Click the chevron to collapse the overlay into a tiny pill (48 × 64) showing just the Hermes icon and an expand button. The pill stays in the top-right corner and is draggable.
Recording Controls¶
| State | Available Actions |
|---|---|
| Idle | Record — start a new session |
| Recording | Pause · Stop |
| Paused | Resume · Stop |
Global Hotkey¶
Press Cmd+Shift+R from any application to toggle recording on/off. No need to switch to the overlay.
Speaker Labels¶
Hermes captures two separate audio streams:
- Me — your microphone input (your voice)
- Them — system audio output (everyone else on the call)
This gives you automatic speaker diarization without any ML-based speaker identification. Consecutive segments from the same speaker are merged into a single growing line.
Session History¶
Click the clock icon in the overlay header to open the session history window.
From here you can:
- Browse all past sessions sorted by date
- View the full transcript of any session
- Export a session as Markdown
- Delete sessions you no longer need
Tips for Best Results¶
Use headphones
Without headphones, your speakers bleed into the microphone and the same audio appears in both the "Me" and "Them" channels. Headphones eliminate this entirely.
First recording is slower
The first recording after launch takes a few seconds to start while WhisperKit loads the transcription model into memory. Subsequent recordings in the same session start instantly.
Transcription window
Hermes buffers 10 seconds of audio before transcribing each chunk. This means there's a ~10 second delay between speech and transcript output. This window size produces significantly better accuracy than shorter intervals.
Silence handling
When no one is speaking on the system audio channel, Hermes automatically suppresses transcription to prevent hallucinated text (Whisper's tendency to output "Thank you" or similar phrases on silence).
Data Storage¶
All data is stored locally at:
This includes the SwiftData/SQLite database with your meeting sessions and transcript segments. No data is synced anywhere.