VoiceInput vs Superwhisper

Superwhisper is the polished Whisper desktop wrapper for English-first users. VoiceInput is built around Chinese typing speed and turns every dictation into a searchable, persona-reviewable memory — not just text dropped at the cursor.

One-line verdict

If you're an English-first user who wants the cleanest Whisper desktop wrapper, Superwhisper is excellent. If you type Chinese (or mixed Chinese-English) all day, want under-1.5-second latency, and want everything you said to be searchable next month — VoiceInput is built for that.

Side-by-side comparison

Dimension VoiceInput Superwhisper
End-to-end latency (Chinese) ~1.4s ~3-6s
End-to-end latency (English) ~1.5s ~1.5-3s
Mixed-language (CJK + English code/brand) 200+ hotwords + pinyin disambiguation Generic Whisper, frequent miscaps
Local / fully offline Yes — SenseVoice / Paraformer / Apple Yes — Whisper local models
Cloud option Volcengine streaming (audio direct, no relay) OpenAI Whisper / cloud Whisper variants
Memory layer (search past dictations) Built-in. Every line archives with app, time, tags. Full-text search, export. No. Text drops, then it's gone.
AI persona review 7 built-in personas (Boss, Coach, Therapist, Editor…) re-read your week. Weekly Big5 sketch. N/A
Local typography engine CJK-Latin spacing, brand casing, unit spacing — all handled in <5ms, zero LLM call N/A
Privacy boundary Audio + text local; only recording-length seconds reported (toggleable) Audio sent to cloud (Pro tier); local mode keeps everything offline
Free tier 100% Local + BYOK forever — no card 7-day trial, then paid
Paid tier Cloud: $5/mo · $79/yr · $49 lifetime $8.49/mo · $84/yr
Distribution Direct DMG, Sparkle auto-update Direct DMG, auto-update
Open API key (BYOK) DeepSeek / Kimi / OpenAI / any OpenAI-compatible endpoint Limited — Whisper-only routing

Where Superwhisper wins

Where VoiceInput wins

Speed: where the 1.4 seconds come from

VoiceInput's end-to-end pipeline is built for one number: time from button release to text landing at your cursor. The path:

  1. Press right Option, audio streams to ASR (Volcengine) over a persistent connection — no handshake on each utterance.
  2. ASR returns partial transcripts in real time. The local typography engine fixes spacing, casing, and unit formatting in <5ms (no LLM call).
  3. If AI tidy is enabled, the cleaned text is sent to the LLM with a constrained prompt (three jobs only: homophones, fillers, punctuation). Confidence below 0.5 keeps the original.
  4. Text injects at the cursor via Accessibility API. Clipboard fallback triggers if the target field rejects the inject.

Superwhisper's Whisper-based pipeline batches audio into chunks of 200ms-1s before processing — fundamentally a different architecture, optimized for English transcription accuracy over real-time latency.

Memory layer: the real product difference

Most dictation apps stop at "speech becomes text." VoiceInput treats each utterance as data worth keeping:

If you mostly use voice for quick text injection, you don't need the bottom two layers. If you talk through real decisions and want them retrievable later, no Whisper wrapper has built this.

Privacy

Both apps offer a local-only mode. The difference is what leaves your Mac when you opt into cloud:

Who should pick which

Pick VoiceInput if

You type Chinese or mixed Chinese-English daily, want sub-1.5-second latency, and want to be able to search what you said last month. Free forever covers most real usage.

Pick Superwhisper if

You only speak English, you want a polished Whisper wrapper, you don't need a memory layer, and $84/year fits your workflow.

FAQ

Is VoiceInput a Superwhisper alternative?

Yes. Both are macOS menu-bar dictation apps. VoiceInput differs in three ways: ~1.4s end-to-end on Chinese (vs 3-6s), every utterance auto-archives into a searchable local memory layer with AI personas, and the Local + BYOK tiers are free forever.

Which is more accurate for mixed Chinese-English?

VoiceInput. Volcengine ASR tuned for Mandarin, plus 200+ brand hotwords (Cursor, Kimi, GitHub) and pinyin disambiguation injected into LLM cleanup. Superwhisper's Whisper backbone underperforms on real-time code-switching CJK/Latin.

Can VoiceInput run fully offline?

Yes. Three local ASR engines: SenseVoice, Paraformer, Apple. Local typography engine handles formatting in <5ms. Toggle off cloud — nothing leaves your Mac.

How much does each cost?

Superwhisper: $84/yr or $8.49/mo. VoiceInput: 100% Local + BYOK free forever. Optional Cloud tier $5/mo, $79/yr, or $49 lifetime.

Can I migrate my Superwhisper history into VoiceInput?

Not yet. Superwhisper doesn't preserve a queryable history — there's nothing structural to migrate. From the moment you start using VoiceInput, every dictation is captured.

Try VoiceInput free

Free forever (100% local). No account, no API key, no setup. macOS 14+.

Download v0.47.0 · 21 MB