Home
Workspace
Voice & Audio

Voice & Audio

Contents

StationOne provides comprehensive voice functionality including dictation, transcription, and real-time voice conversations with AI models.

Audio Booth

A dedicated workspace for recording and transcribing audio.

How to Access

Click Audio Booth in the sidebar.
Or use the keyboard shortcut (configurable in Settings → Shortcuts).

Recording

Click the Record button to start recording.
Click Stop to end recording.
Alternatively, upload an audio file (MP3, WAV) or drag-and-drop.

Output Options

After transcription:

Summarize: AI-generated summary of the transcription.
Translate: Translate to a specified language.
Commands: Execute AI commands on the transcription.
Copy (Cmd+C / Ctrl+C): Copy transcription to clipboard.
Insert (Cmd+Enter / Ctrl+Enter): Paste into the active application.
Clear (Cmd+X / Ctrl+X): Clear the transcription.

Settings (Sidebar)

Engine: Local, API, or Custom STT provider.
Model: Provider-specific model selection.
Language: Auto-detect or specify spoken language.
Auto-start: Begin recording when Audio Booth opens.
Push-to-talk: Space bar controls recording.

Quick Dictation

A floating widget for rapid voice-to-text input.

How to Activate

Use the global keyboard shortcut (configurable in Settings → Shortcuts → Dictation).

Modes

Tap: Single activation starts recording.
Double-tap: Two quick presses activate.
Hold: Press and hold for continuous recording.

Behavior

A floating widget appears (bottom, top, or notch style).
Speak your text.
Recording auto-stops on silence detection.
Transcription is automatically inserted into the focused application.
Optional: copy to clipboard instead.

Configuration

Settings → Voice → Quick Dictation:

Shortcut key (supports native shortcuts like Right Cmd, Right Alt, etc.).
Activation mode (tap, double-tap, hold).
Appearance (bottom, top, notch).
Copy to clipboard toggle.

Voice Chat (Realtime)

Real-time voice conversations with AI models.

How to Access

Click Voice Mode in the sidebar.
Or use the keyboard shortcut.

Setup

Select Provider: OpenAI or Google.
Select Model (e.g., gpt-4o-realtime-preview, Gemini 2.0 Live).
Select Voice from available options.
Optionally select tools for the AI to use during conversation.

Using Voice Chat

Click the animated blob to start the conversation.
Speak naturally — the AI responds in real-time with voice.
Transcript panel (right sidebar) shows the message history.
Cost tracking displays input/output tokens and estimated cost.
Click the blob again to end the conversation.

Google Gemini Live

Voice Chat supports Google Gemini Live for real-time voice interactions with Gemini models.

Speech-to-Text Configuration

Settings → Voice → STT Configuration:

Supported Engines

Apple SpeechAnalyzer (macOS 26+ required): On-device, free.
Parakeet WebGPU: Local model, runs in browser (free, no API key).
Cloud providers: OpenAI Whisper, Google Speech, etc. (requires API key).

Settings

Engine: Choose between local and cloud-based engines.
Model: Select specific model for the chosen engine.
Language: Auto-detect or specify spoken language.

Keyboard Shortcuts

Configurable in Settings → Shortcuts:

Audio Booth shortcut.
Quick Dictation shortcut.
Voice Mode shortcut.
Read Aloud shortcut (text-to-speech).

Updated on March 17, 2026

Was this article helpful?