StationOne provides comprehensive voice functionality including dictation, transcription, and real-time voice conversations with AI models.
Audio Booth
A dedicated workspace for recording and transcribing audio.
How to Access
- Click Audio Booth in the sidebar.
- Or use the keyboard shortcut (configurable in Settings → Shortcuts).
Recording
- Click the Record button to start recording.
- Click Stop to end recording.
- Alternatively, upload an audio file (MP3, WAV) or drag-and-drop.
Output Options
After transcription:
- Summarize: AI-generated summary of the transcription.
- Translate: Translate to a specified language.
- Commands: Execute AI commands on the transcription.
- Copy (Cmd+C / Ctrl+C): Copy transcription to clipboard.
- Insert (Cmd+Enter / Ctrl+Enter): Paste into the active application.
- Clear (Cmd+X / Ctrl+X): Clear the transcription.
Settings (Sidebar)
- Engine: Local, API, or Custom STT provider.
- Model: Provider-specific model selection.
- Language: Auto-detect or specify spoken language.
- Auto-start: Begin recording when Audio Booth opens.
- Push-to-talk: Space bar controls recording.
Quick Dictation
A floating widget for rapid voice-to-text input.
How to Activate
Use the global keyboard shortcut (configurable in Settings → Shortcuts → Dictation).
Modes
- Tap: Single activation starts recording.
- Double-tap: Two quick presses activate.
- Hold: Press and hold for continuous recording.
Behavior
- A floating widget appears (bottom, top, or notch style).
- Speak your text.
- Recording auto-stops on silence detection.
- Transcription is automatically inserted into the focused application.
- Optional: copy to clipboard instead.
Configuration
Settings → Voice → Quick Dictation:
- Shortcut key (supports native shortcuts like Right Cmd, Right Alt, etc.).
- Activation mode (tap, double-tap, hold).
- Appearance (bottom, top, notch).
- Copy to clipboard toggle.
Voice Chat (Realtime)
Real-time voice conversations with AI models.
How to Access
- Click Voice Mode in the sidebar.
- Or use the keyboard shortcut.
Setup
- Select Provider: OpenAI or Google.
- Select Model (e.g., gpt-4o-realtime-preview, Gemini 2.0 Live).
- Select Voice from available options.
- Optionally select tools for the AI to use during conversation.
Using Voice Chat
- Click the animated blob to start the conversation.
- Speak naturally — the AI responds in real-time with voice.
- Transcript panel (right sidebar) shows the message history.
- Cost tracking displays input/output tokens and estimated cost.
- Click the blob again to end the conversation.
Google Gemini Live
Voice Chat supports Google Gemini Live for real-time voice interactions with Gemini models.
Speech-to-Text Configuration
Settings → Voice → STT Configuration:
Supported Engines
- Apple SpeechAnalyzer (macOS 26+ required): On-device, free.
- Parakeet WebGPU: Local model, runs in browser (free, no API key).
- Cloud providers: OpenAI Whisper, Google Speech, etc. (requires API key).
Settings
- Engine: Choose between local and cloud-based engines.
- Model: Select specific model for the chosen engine.
- Language: Auto-detect or specify spoken language.
Keyboard Shortcuts
Configurable in Settings → Shortcuts:
- Audio Booth shortcut.
- Quick Dictation shortcut.
- Voice Mode shortcut.
- Read Aloud shortcut (text-to-speech).