ShoutShout
Features

Transcription

How Shout converts speech to text — models, modes, and system audio capture.

How it works

Shout uses OpenAI's Whisper models running locally via WhisperKit, Apple's optimized implementation for Apple Silicon. Transcription happens entirely on your Mac — your audio never touches a server.

100% local processing. Your audio never leaves your device. No internet connection is required for transcription.

Whisper models

Choose a model in Settings to balance speed and accuracy for your hardware. English-only (.en) variants are faster for English content.

ModelSizeBest for
Tiny / Tiny.en~40 MBFastest, lowest memory
Base / Base.en~75 MBQuick tasks, light hardware
Small / Small.en~250 MBGood daily driver
Medium / Medium.en~750 MBHigh accuracy
Large v3~1.6 GBBest accuracy, multilingual

Transcription modes

Real-time Transcription

See your words appear as you speak. Shout streams partial results in real time and refines them when you stop recording.

Retroactive Transcription

Go back in time using the timeline editor and transcribe any segment from the always-on buffer — even audio you didn't explicitly record.

The timeline editor shows a visual waveform with voice activity detection highlights. Drag the selection handles to pick exactly the segment you want to transcribe.

System audio capture

Capture from other apps

Record audio from meetings, calls, podcasts, and videos alongside your microphone input. Enable system audio in Settings.

System audio capture requires the Screen & System Audio permission. Grant it in System Settings > Privacy & Security.