VoiceInput vs Apple Dictation
Apple Dictation is free and built in, but it caps at 60 seconds, can't handle mixed Chinese-English with technical terms, gives you no AI cleanup, and forgets every word the moment it lands. VoiceInput is the dictation tool macOS should have shipped — fast, mixed-language-aware, with everything you say retrievable later.
One-line verdict
Apple Dictation is fine for "send a text in iMessage." For real work — drafting code review comments, writing meeting notes in mixed Chinese-English, capturing thoughts that you'll want to revisit next month — it's not enough. VoiceInput removes the 60-second cap, the no-AI-cleanup limit, the no-memory limit, and the no-mixed-language-handling limit.
Side-by-side comparison
| Dimension | VoiceInput | Apple Dictation |
|---|---|---|
| Per-session length cap | No cap | 60 seconds |
| Trigger | Hold right Option (configurable) | Globe key (configurable, but no hold-to-talk) |
| Mixed Chinese-English with technical terms | 200+ hotwords (Cursor / Kimi / GitHub / Apple products) + pinyin disambiguation | Frequent mis-recognition, no hotword list |
| AI cleanup (homophones, fillers, punctuation) | Yes — DeepSeek / Kimi / OpenAI / your BYOK endpoint | No |
| Local typography engine | CJK-Latin spacing, brand casing, units — <5ms, zero LLM call | N/A |
| Memory of what you said | Every line archives locally with app, time, tags. Search and export. | Nothing. Text drops, then it's gone. |
| AI persona review | 7 personas re-read your week + Big5 sketch + 3-5 quotes | N/A |
| End-to-end latency (Chinese) | ~1.4s | 2-4s |
| Offline mode | Three on-device engines | On-device for short languages, cloud-required for others |
| Cost | Free (Local + BYOK forever) · Cloud $5/mo · $79/yr · $49 lifetime | Free (built into macOS) |
| Works in any text field | Yes (Accessibility + clipboard fallback) | Yes (system-level) |
| Customizable hotwords / vocabulary | 200+ defaults + your own list | No user dictionary for dictation |
Where Apple Dictation wins
- Free and built in. Zero install. Apple's privacy story is unmatched — on-device for supported languages.
- Works on iOS too. If you need the same dictation experience across Mac and iPhone, Apple Dictation is integrated.
- Good for short messages. If 90% of your dictation is "tell my wife I'll be home at 7," Apple Dictation is enough.
Where VoiceInput wins
- No 60-second cap. Talk for two minutes, five minutes, twenty minutes. The recording continues until you release the hotkey. Apple Dictation auto-stops at 60 seconds and cuts you off mid-thought.
- Mixed-language handling. "Push 这个 commit 到 staging 分支" — VoiceInput keeps "commit" and "staging" as English with correct casing. Apple Dictation either mis-recognizes or splits the language and breaks formatting.
- AI cleanup. Homophones, fillers ("uh," "um," "嗯"), missing punctuation — handled by an LLM with a constrained prompt. Apple Dictation outputs raw ASR with no post-processing.
- 200+ brand hotwords. Cursor stays "Cursor." GitHub stays "GitHub." Kimi stays "Kimi." Apple products stay correctly cased. Apple Dictation has no user vocabulary.
- Local typography engine. Half-width spacing between CJK and Latin. Spacing after English commas/periods. Unit spacing ("100 GB," not "100GB"). Every rule toggleable. Apple Dictation has none of this.
- Memory of what you said. Every dictation archives locally with the source app, timestamp, and auto-tags. Search "what did I say about onboarding?" three weeks later. Apple Dictation drops the text and forgets — there's no log.
- AI persona review. 7 built-in personas (Boss, Coach, Therapist, Editor, Musk, Jobs, Friend) re-read your week. Weekly Big5 sketch from real talk. 3-5 AI-picked quotes worth echoing. Different product category from "speech becomes text."
- Faster on Chinese. 1.4-second end-to-end vs Apple's 2-4 seconds. Above 1.5 seconds, dictation feels like waiting; below it, it feels like typing.
The 60-second cap is the dealbreaker
Most reviews of Apple Dictation gloss over this, but it's the limit you actually hit:
- Drafting a multi-paragraph code review — gets cut off.
- Talking through a product decision out loud — gets cut off.
- Capturing a meeting takeaway — gets cut off.
- Anything longer than "send a text" — gets cut off.
VoiceInput's hold-to-talk has no length limit. Speak as long as the cursor needs to land, release when you're done. The pipeline streams ASR in real time, so latency stays sub-1.5s regardless of total length.
Why mixed-language matters
If your work touches code, design, AI, finance, or any technical field, you say things like:
- "merge 一下 main 分支再开 PR" — VoiceInput: "merge 一下 main 分支再开 PR." Apple Dictation: typically breaks one of the words.
- "把 Cursor 的 model 切到 Claude Sonnet" — VoiceInput keeps "Cursor" and "Claude Sonnet" cased correctly. Apple Dictation often outputs "cursor" or "claude" lowercase.
- "今天 GitHub Actions CI 又挂了" — VoiceInput: brand-cased correctly. Apple Dictation: "github actions ci" or worse.
The difference compounds. Five mis-recognized brand names per page means five interruptions to fix the casing — you stop trusting dictation and go back to typing.
Privacy
Both options have a strong story:
- Apple Dictation. On-device for supported languages and short queries. Cloud for longer queries and unsupported languages. Apple's standard privacy practices.
- VoiceInput. Three local ASR engines + local typography engine = zero network when toggled offline. Cloud tier streams audio directly to Volcengine — no proxy server, no audio storage. Only metric leaving the device is recording length in seconds (toggleable). API keys in macOS Keychain.
Who should pick which
Pick VoiceInput if
You hit the 60-second cap regularly, you type Chinese or mixed Chinese-English, you want technical terms cased correctly, or you want every dictation searchable later. Free forever covers most real usage.
Stay with Apple Dictation if
Most of what you dictate is short, single-language messages, and you're already happy with the built-in flow. There's no point installing a third-party tool for that workflow.
FAQ
Why use VoiceInput when macOS has Dictation built in?
Apple Dictation has four limits: 60-second cap, no AI cleanup, no memory layer, no mixed-language hotword handling. VoiceInput removes all four and adds a searchable archive of everything you've ever dictated.
Is VoiceInput private like on-device Apple Dictation?
Yes. Three local ASR engines (SenseVoice, Paraformer, Apple) plus a local typography engine. Toggle off cloud — nothing leaves your Mac. Cloud tier streams audio directly to Volcengine without a relay; only recording-length seconds get reported (toggleable).
Does VoiceInput work with the Globe key?
No. VoiceInput uses hold-to-talk on right Option (configurable). Hold-to-talk is intentional — it scopes recording to exactly what you meant to say. No auto-stop on silence, no leftover transcription.
Can VoiceInput handle technical terms?
Yes. 200+ brand hotwords built in (Cursor, Kimi, GitHub, AWS, all major Apple products). Pinyin disambiguation for Chinese homophones. Apple Dictation has no equivalent.
What does VoiceInput cost?
100% Local and Bring-Your-Own-Key tiers are free forever. Optional Cloud tier is $5/mo, $79/yr, or $49 lifetime — only needed if you want zero configuration with cloud ASR + cloud AI tidy.
Try VoiceInput free
Free forever (100% local). No account, no API key, no setup. macOS 14+.
Download v0.47.0 · 21 MB