Wispr Flow: free tier is rate-limited; Pro is $12/month. VoiceInput: 100% Local and Bring-Your-Own-Key tiers are free forever. Optional Cloud tier is $9/month, $79/year, or $49 lifetime.

VoiceInput vs Wispr Flow

Q: Is VoiceInput a Wispr Flow alternative?

Yes. Both are menu-bar dictation apps that drop transcribed text into any text field. VoiceInput is built for Chinese and mixed Chinese-English speed (~1.4s end-to-end vs 3-10s), and every dictation auto-archives into a searchable local memory layer with AI persona reviews. Wispr Flow focuses on English rewrite quality and tone shaping.

Q: Which handles mixed Chinese-English better?

VoiceInput. Volcengine ASR tuned for Mandarin, 200+ built-in brand hotwords (Cursor, Kimi, GitHub, Apple products), and pinyin disambiguation injected into LLM cleanup prompts. Wispr Flow performs well on English but underperforms on real-time CJK code-switching workflows.

Q: Does VoiceInput have an offline mode?

Yes. Three on-device ASR engines (SenseVoice, Paraformer, Apple) plus a local typography engine handling spacing, casing, and unit formatting in under 5ms. Wispr Flow is cloud-first.

Wispr Flow is a polished English-first dictation tool with strong AI rewrite features. VoiceInput targets Chinese and mixed-language typing speed, ships sub-1.5-second latency, and turns every utterance into a searchable memory you can revisit a month later.

One-line verdict

If you write English emails all day and want AI to clean up your tone on the fly, Wispr Flow's rewrite features are excellent. If you type Chinese or mixed Chinese-English, want under-1.5-second latency, and want everything you said to be retrievable later — VoiceInput is built for that.

Side-by-side comparison

Dimension	VoiceInput	Wispr Flow
End-to-end latency (Chinese)	~1.4s	~3-10s
End-to-end latency (English)	~1.5s	~1.5-3s
Mixed-language (CJK + English)	200+ brand hotwords + pinyin disambiguation	English-first; CJK is best-effort
Local / fully offline	Yes — SenseVoice / Paraformer / Apple	Cloud only
AI tone rewrite (formal / casual / shorter)	Constrained: fix homophones, drop fillers, add punctuation. Won't rewrite tone unless asked.	Strong rewrite presets (email, Slack, tweet)
Memory layer (search past dictations)	Built-in. Local archive, full-text search, app + time + tags, export.	No queryable history
AI persona review	7 built-in personas re-read your week. Weekly Big5 sketch + 3-5 quotes.	N/A
Local typography engine	CJK-Latin spacing, brand casing, units — all in <5ms, zero LLM call	N/A
Privacy	Local audio + text. Only recording-length seconds reported (toggleable).	Audio sent to cloud. Standard SaaS retention.
Free tier	100% Local + BYOK forever — no card, no rate limit on local tier	Rate-limited free tier
Paid tier	Cloud: $9/mo · $79/yr · $49 lifetime	$12/mo
BYOK (your own LLM key)	DeepSeek / Kimi / OpenAI / any OpenAI-compatible endpoint	No
Distribution	Direct DMG, Sparkle auto-update	Direct DMG

Where Wispr Flow wins

English tone rewrite. If your job is writing English emails, Slack messages, and tweets, Wispr Flow's rewrite presets are mature and useful. VoiceInput's AI tidy intentionally won't rewrite your tone — only fix homophones, drop fillers, and add punctuation.
Brand polish. Wispr Flow has invested heavily in onboarding and visual design. The product feels finished from the first second.
Native English ASR. Their cloud model is tuned for English and performs well on accented speech.

Where VoiceInput wins

Speed on Chinese. Volcengine streaming ASR + local typography engine = sub-1.5s on Mandarin and mixed CJK-English. Wispr Flow is 3-10s on the same input — that's the difference between dictation feeling natural and feeling like you're waiting on it.
Memory layer. Every voice line auto-archives. Search "what did I say about onboarding last month?" — Wispr Flow drops the text and forgets.
Persona reviews. Same line re-read by Boss / Coach / Therapist / Editor. Weekly Big5 sketch derived from how you actually talked. This is a different product category, not just dictation.
Local-first option. 100% offline mode is real, not theoretical — three local ASR engines plus a local typography engine. Wispr Flow is cloud-only.
Free forever (Local + BYOK). No card. No rate limit on the local tier. Wispr Flow's free tier caps usage and pushes Pro hard.
BYOK. Plug in DeepSeek / Kimi / OpenAI / your own server. Pay model providers directly (~$0-2/month for typical use). Wispr Flow doesn't expose this.

The tone-rewrite question

Wispr Flow's rewrite features (make this email more formal, make this tweet shorter) are the product's signature. VoiceInput intentionally doesn't do this:

The AI tidy prompt is constrained to three jobs: fix homophones, drop fillers, add punctuation. Confidence below 0.5 keeps the original.
Double-tap right Option to bypass AI cleanup entirely — get the raw ASR output.
Tone shaping happens in the memory layer instead — 7 personas re-read what you said and offer different angles. You read it later, not at the moment of dictation.

This is a deliberate split. If you want voice → polished output instantly, Wispr Flow wins. If you want voice → fast clean text + retrievable memory, VoiceInput wins.

Speed: why 1.4 seconds matters

Latency under 1.5 seconds is the threshold where dictation stops feeling like a tool and starts feeling like typing. Above 3 seconds, you wait — and the flow breaks.

VoiceInput's pipeline:

Persistent ASR connection — no handshake per utterance.
Streaming partial transcripts — text starts arriving before you stop talking.
Local typography engine handles formatting in <5ms (no LLM call).
Optional AI tidy fires only after pause detection — never blocks the first injection.

Wispr Flow's pipeline batches audio in chunks before sending — built for transcription accuracy on English, not real-time CJK throughput.

Memory layer: a different product category

Most dictation apps end at "speech became text." VoiceInput treats every utterance as data worth keeping:

Tool (SPEAK). Hold to talk, text lands at the cursor. Same job Wispr Flow does.
Data (RECALL). Every line archives locally with source app, time, tags. Full-text search across months. Export to Markdown / JSON / CSV.
Memory (REFLECT). 7 personas re-read your week. AI picks 3-5 quotes worth echoing. Weekly Big5 snapshot derived from real talk — patterns you wouldn't see yourself.

Privacy

VoiceInput. Audio + text + history all live on your Mac. Cloud ASR streams audio directly to Volcengine — no relay server. Only metric leaving the device is recording length in seconds, for the global pulse counter (toggle off in Settings). API keys live in macOS Keychain, never on our servers.
Wispr Flow. Cloud-first SaaS. Standard retention and processing policies apply. No local-only mode at the time of writing.

Who should pick which

Pick VoiceInput if

You write Chinese (or mixed CJK-English) daily, want sub-1.5-second latency, value an offline option, and want every dictation to be searchable next month. Free forever covers most real usage.

Pick Wispr Flow if

You write English all day, want strong AI rewrite presets (formal email, casual Slack, tweet shorter), and don't mind cloud-only + $12/mo.

FAQ

Is VoiceInput a Wispr Flow alternative?

Yes. Both are menu-bar dictation apps. VoiceInput is faster on Chinese (~1.4s vs 3-10s), works fully offline, and every dictation archives into a searchable memory layer with AI persona reviews. Wispr Flow focuses on English tone rewrite.

Which handles mixed Chinese-English better?

VoiceInput. Volcengine ASR tuned for Mandarin, 200+ brand hotwords (Cursor, Kimi, GitHub), and pinyin disambiguation in the LLM cleanup prompt. Wispr Flow is English-first.

Does VoiceInput have an offline mode?

Yes. Three on-device ASR engines (SenseVoice, Paraformer, Apple) plus a local typography engine. Toggle off cloud — nothing leaves your Mac.

Can VoiceInput rewrite my emails the way Wispr Flow does?

Not by design. AI tidy is constrained to homophones, fillers, and punctuation. Tone shaping happens in the memory layer (7 personas re-read your week), not at the moment of dictation. If you want voice → polished output instantly, pick Wispr Flow.

What does each cost?

Wispr Flow Pro is $12/mo. VoiceInput Local + BYOK are free forever. Optional Cloud tier is $9/mo, $79/yr, or $49 lifetime.

Try VoiceInput free

Free forever (100% local). No account, no API key, no setup. macOS 14+.

Download v0.79.0 · 23.1 MB