VoiceInput vs Wispr Flow
Wispr Flow is a polished English-first dictation tool with strong AI rewrite features. VoiceInput targets Chinese and mixed-language typing speed, ships sub-1.5-second latency, and turns every utterance into a searchable memory you can revisit a month later.
One-line verdict
If you write English emails all day and want AI to clean up your tone on the fly, Wispr Flow's rewrite features are excellent. If you type Chinese or mixed Chinese-English, want under-1.5-second latency, and want everything you said to be retrievable later — VoiceInput is built for that.
Side-by-side comparison
| Dimension | VoiceInput | Wispr Flow |
|---|---|---|
| End-to-end latency (Chinese) | ~1.4s | ~3-10s |
| End-to-end latency (English) | ~1.5s | ~1.5-3s |
| Mixed-language (CJK + English) | 200+ brand hotwords + pinyin disambiguation | English-first; CJK is best-effort |
| Local / fully offline | Yes — SenseVoice / Paraformer / Apple | Cloud only |
| AI tone rewrite (formal / casual / shorter) | Constrained: fix homophones, drop fillers, add punctuation. Won't rewrite tone unless asked. | Strong rewrite presets (email, Slack, tweet) |
| Memory layer (search past dictations) | Built-in. Local archive, full-text search, app + time + tags, export. | No queryable history |
| AI persona review | 7 built-in personas re-read your week. Weekly Big5 sketch + 3-5 quotes. | N/A |
| Local typography engine | CJK-Latin spacing, brand casing, units — all in <5ms, zero LLM call | N/A |
| Privacy | Local audio + text. Only recording-length seconds reported (toggleable). | Audio sent to cloud. Standard SaaS retention. |
| Free tier | 100% Local + BYOK forever — no card, no rate limit on local tier | Rate-limited free tier |
| Paid tier | Cloud: $5/mo · $79/yr · $49 lifetime | $12/mo |
| BYOK (your own LLM key) | DeepSeek / Kimi / OpenAI / any OpenAI-compatible endpoint | No |
| Distribution | Direct DMG, Sparkle auto-update | Direct DMG |
Where Wispr Flow wins
- English tone rewrite. If your job is writing English emails, Slack messages, and tweets, Wispr Flow's rewrite presets are mature and useful. VoiceInput's AI tidy intentionally won't rewrite your tone — only fix homophones, drop fillers, and add punctuation.
- Brand polish. Wispr Flow has invested heavily in onboarding and visual design. The product feels finished from the first second.
- Native English ASR. Their cloud model is tuned for English and performs well on accented speech.
Where VoiceInput wins
- Speed on Chinese. Volcengine streaming ASR + local typography engine = sub-1.5s on Mandarin and mixed CJK-English. Wispr Flow is 3-10s on the same input — that's the difference between dictation feeling natural and feeling like you're waiting on it.
- Memory layer. Every voice line auto-archives. Search "what did I say about onboarding last month?" — Wispr Flow drops the text and forgets.
- Persona reviews. Same line re-read by Boss / Coach / Therapist / Editor. Weekly Big5 sketch derived from how you actually talked. This is a different product category, not just dictation.
- Local-first option. 100% offline mode is real, not theoretical — three local ASR engines plus a local typography engine. Wispr Flow is cloud-only.
- Free forever (Local + BYOK). No card. No rate limit on the local tier. Wispr Flow's free tier caps usage and pushes Pro hard.
- BYOK. Plug in DeepSeek / Kimi / OpenAI / your own server. Pay model providers directly (~$0-2/month for typical use). Wispr Flow doesn't expose this.
The tone-rewrite question
Wispr Flow's rewrite features (make this email more formal, make this tweet shorter) are the product's signature. VoiceInput intentionally doesn't do this:
- The AI tidy prompt is constrained to three jobs: fix homophones, drop fillers, add punctuation. Confidence below 0.5 keeps the original.
- Double-tap right Option to bypass AI cleanup entirely — get the raw ASR output.
- Tone shaping happens in the memory layer instead — 7 personas re-read what you said and offer different angles. You read it later, not at the moment of dictation.
This is a deliberate split. If you want voice → polished output instantly, Wispr Flow wins. If you want voice → fast clean text + retrievable memory, VoiceInput wins.
Speed: why 1.4 seconds matters
Latency under 1.5 seconds is the threshold where dictation stops feeling like a tool and starts feeling like typing. Above 3 seconds, you wait — and the flow breaks.
VoiceInput's pipeline:
- Persistent ASR connection — no handshake per utterance.
- Streaming partial transcripts — text starts arriving before you stop talking.
- Local typography engine handles formatting in <5ms (no LLM call).
- Optional AI tidy fires only after pause detection — never blocks the first injection.
Wispr Flow's pipeline batches audio in chunks before sending — built for transcription accuracy on English, not real-time CJK throughput.
Memory layer: a different product category
Most dictation apps end at "speech became text." VoiceInput treats every utterance as data worth keeping:
- Tool (SPEAK). Hold to talk, text lands at the cursor. Same job Wispr Flow does.
- Data (RECALL). Every line archives locally with source app, time, tags. Full-text search across months. Export to Markdown / JSON / CSV.
- Memory (REFLECT). 7 personas re-read your week. AI picks 3-5 quotes worth echoing. Weekly Big5 snapshot derived from real talk — patterns you wouldn't see yourself.
Privacy
- VoiceInput. Audio + text + history all live on your Mac. Cloud ASR streams audio directly to Volcengine — no relay server. Only metric leaving the device is recording length in seconds, for the global pulse counter (toggle off in Settings). API keys live in macOS Keychain, never on our servers.
- Wispr Flow. Cloud-first SaaS. Standard retention and processing policies apply. No local-only mode at the time of writing.
Who should pick which
Pick VoiceInput if
You write Chinese (or mixed CJK-English) daily, want sub-1.5-second latency, value an offline option, and want every dictation to be searchable next month. Free forever covers most real usage.
Pick Wispr Flow if
You write English all day, want strong AI rewrite presets (formal email, casual Slack, tweet shorter), and don't mind cloud-only + $12/mo.
FAQ
Is VoiceInput a Wispr Flow alternative?
Yes. Both are menu-bar dictation apps. VoiceInput is faster on Chinese (~1.4s vs 3-10s), works fully offline, and every dictation archives into a searchable memory layer with AI persona reviews. Wispr Flow focuses on English tone rewrite.
Which handles mixed Chinese-English better?
VoiceInput. Volcengine ASR tuned for Mandarin, 200+ brand hotwords (Cursor, Kimi, GitHub), and pinyin disambiguation in the LLM cleanup prompt. Wispr Flow is English-first.
Does VoiceInput have an offline mode?
Yes. Three on-device ASR engines (SenseVoice, Paraformer, Apple) plus a local typography engine. Toggle off cloud — nothing leaves your Mac.
Can VoiceInput rewrite my emails the way Wispr Flow does?
Not by design. AI tidy is constrained to homophones, fillers, and punctuation. Tone shaping happens in the memory layer (7 personas re-read your week), not at the moment of dictation. If you want voice → polished output instantly, pick Wispr Flow.
What does each cost?
Wispr Flow Pro is $12/mo. VoiceInput Local + BYOK are free forever. Optional Cloud tier is $5/mo, $79/yr, or $49 lifetime.
Try VoiceInput free
Free forever (100% local). No account, no API key, no setup. macOS 14+.
Download v0.47.0 · 21 MB