Does VoiceInput work with the Globe key like Apple Dictation?

No. VoiceInput uses hold-to-talk on the right Option key (configurable). The hold-to-talk pattern is intentional — it scopes recording to exactly what you wanted to say, no auto-stop on silence, no leftover transcription after you finish.

Can VoiceInput handle technical terms like Apple Dictation cannot?

Yes. 200+ brand hotwords are built in (Cursor, Kimi, GitHub, AWS, all major Apple products), so technical names stay correctly cased. AI tidy uses pinyin disambiguation for Chinese homophones. Apple Dictation has no equivalent.

VoiceInput vs Apple Dictation

Q: Why use VoiceInput when macOS has Dictation built in?

Apple Dictation has hard limits: 60-second cap per session, no AI cleanup of homophones or fillers, no memory of what you said, no support for mixed-language code switching with technical terms. VoiceInput removes all four limits and adds a searchable memory layer where every dictation auto-archives.

Q: Is VoiceInput private like on-device Apple Dictation?

Yes. VoiceInput ships three local ASR engines (SenseVoice, Paraformer, Apple) plus a local typography engine. Run fully offline — nothing leaves your Mac. The optional cloud tier streams audio directly to Volcengine without a relay server, and only recording-length seconds get reported (toggleable in Settings).

Apple Dictation is free and built in, but it caps at 60 seconds, can't handle mixed Chinese-English with technical terms, gives you no AI cleanup, and forgets every word the moment it lands. VoiceInput is the dictation tool macOS should have shipped — fast, mixed-language-aware, with everything you say retrievable later.

One-line verdict

Apple Dictation is fine for "send a text in iMessage." For real work — drafting code review comments, writing meeting notes in mixed Chinese-English, capturing thoughts that you'll want to revisit next month — it's not enough. VoiceInput removes the 60-second cap, the no-AI-cleanup limit, the no-memory limit, and the no-mixed-language-handling limit.

Side-by-side comparison

Dimension	VoiceInput	Apple Dictation
Per-session length cap	No cap	60 seconds
Trigger	Hold right Option (configurable)	Globe key (configurable, but no hold-to-talk)
Mixed Chinese-English with technical terms	200+ hotwords (Cursor / Kimi / GitHub / Apple products) + pinyin disambiguation	Frequent mis-recognition, no hotword list
AI cleanup (homophones, fillers, punctuation)	Yes — DeepSeek / Kimi / OpenAI / your BYOK endpoint	No
Local typography engine	CJK-Latin spacing, brand casing, units — <5ms, zero LLM call	N/A
Memory of what you said	Every line archives locally with app, time, tags. Search and export.	Nothing. Text drops, then it's gone.
AI persona review	7 personas re-read your week + Big5 sketch + 3-5 quotes	N/A
End-to-end latency (Chinese)	~1.4s	2-4s
Offline mode	Three on-device engines	On-device for short languages, cloud-required for others
Cost	Free (Local + BYOK forever) · Cloud $5/mo · $79/yr · $49 lifetime	Free (built into macOS)
Works in any text field	Yes (Accessibility + clipboard fallback)	Yes (system-level)
Customizable hotwords / vocabulary	200+ defaults + your own list	No user dictionary for dictation

Where Apple Dictation wins

Free and built in. Zero install. Apple's privacy story is unmatched — on-device for supported languages.
Works on iOS too. If you need the same dictation experience across Mac and iPhone, Apple Dictation is integrated.
Good for short messages. If 90% of your dictation is "tell my wife I'll be home at 7," Apple Dictation is enough.

Where VoiceInput wins

No 60-second cap. Talk for two minutes, five minutes, twenty minutes. The recording continues until you release the hotkey. Apple Dictation auto-stops at 60 seconds and cuts you off mid-thought.
Mixed-language handling. "Push 这个 commit 到 staging 分支" — VoiceInput keeps "commit" and "staging" as English with correct casing. Apple Dictation either mis-recognizes or splits the language and breaks formatting.
AI cleanup. Homophones, fillers ("uh," "um," "嗯"), missing punctuation — handled by an LLM with a constrained prompt. Apple Dictation outputs raw ASR with no post-processing.
200+ brand hotwords. Cursor stays "Cursor." GitHub stays "GitHub." Kimi stays "Kimi." Apple products stay correctly cased. Apple Dictation has no user vocabulary.
Local typography engine. Half-width spacing between CJK and Latin. Spacing after English commas/periods. Unit spacing ("100 GB," not "100GB"). Every rule toggleable. Apple Dictation has none of this.
Memory of what you said. Every dictation archives locally with the source app, timestamp, and auto-tags. Search "what did I say about onboarding?" three weeks later. Apple Dictation drops the text and forgets — there's no log.
AI persona review. 7 built-in personas (Boss, Coach, Therapist, Editor, Musk, Jobs, Friend) re-read your week. Weekly Big5 sketch from real talk. 3-5 AI-picked quotes worth echoing. Different product category from "speech becomes text."
Faster on Chinese. 1.4-second end-to-end vs Apple's 2-4 seconds. Above 1.5 seconds, dictation feels like waiting; below it, it feels like typing.

The 60-second cap is the dealbreaker

Most reviews of Apple Dictation gloss over this, but it's the limit you actually hit:

Drafting a multi-paragraph code review — gets cut off.
Talking through a product decision out loud — gets cut off.
Capturing a meeting takeaway — gets cut off.
Anything longer than "send a text" — gets cut off.

VoiceInput's hold-to-talk has no length limit. Speak as long as the cursor needs to land, release when you're done. The pipeline streams ASR in real time, so latency stays sub-1.5s regardless of total length.

Why mixed-language matters

If your work touches code, design, AI, finance, or any technical field, you say things like:

"merge 一下 main 分支再开 PR" — VoiceInput: "merge 一下 main 分支再开 PR." Apple Dictation: typically breaks one of the words.
"把 Cursor 的 model 切到 Claude Sonnet" — VoiceInput keeps "Cursor" and "Claude Sonnet" cased correctly. Apple Dictation often outputs "cursor" or "claude" lowercase.
"今天 GitHub Actions CI 又挂了" — VoiceInput: brand-cased correctly. Apple Dictation: "github actions ci" or worse.

The difference compounds. Five mis-recognized brand names per page means five interruptions to fix the casing — you stop trusting dictation and go back to typing.

Privacy

Both options have a strong story:

Apple Dictation. On-device for supported languages and short queries. Cloud for longer queries and unsupported languages. Apple's standard privacy practices.
VoiceInput. Three local ASR engines + local typography engine = zero network when toggled offline. Cloud tier streams audio directly to Volcengine — no proxy server, no audio storage. Only metric leaving the device is recording length in seconds (toggleable). API keys in macOS Keychain.

Who should pick which

Pick VoiceInput if

You hit the 60-second cap regularly, you type Chinese or mixed Chinese-English, you want technical terms cased correctly, or you want every dictation searchable later. Free forever covers most real usage.

Stay with Apple Dictation if

Most of what you dictate is short, single-language messages, and you're already happy with the built-in flow. There's no point installing a third-party tool for that workflow.

FAQ

Why use VoiceInput when macOS has Dictation built in?

Apple Dictation has four limits: 60-second cap, no AI cleanup, no memory layer, no mixed-language hotword handling. VoiceInput removes all four and adds a searchable archive of everything you've ever dictated.

Is VoiceInput private like on-device Apple Dictation?

Yes. Three local ASR engines (SenseVoice, Paraformer, Apple) plus a local typography engine. Toggle off cloud — nothing leaves your Mac. Cloud tier streams audio directly to Volcengine without a relay; only recording-length seconds get reported (toggleable).

Does VoiceInput work with the Globe key?

No. VoiceInput uses hold-to-talk on right Option (configurable). Hold-to-talk is intentional — it scopes recording to exactly what you meant to say. No auto-stop on silence, no leftover transcription.

Can VoiceInput handle technical terms?

Yes. 200+ brand hotwords built in (Cursor, Kimi, GitHub, AWS, all major Apple products). Pinyin disambiguation for Chinese homophones. Apple Dictation has no equivalent.

What does VoiceInput cost?

100% Local and Bring-Your-Own-Key tiers are free forever. Optional Cloud tier is $5/mo, $79/yr, or $49 lifetime — only needed if you want zero configuration with cloud ASR + cloud AI tidy.

Try VoiceInput free

Free forever (100% local). No account, no API key, no setup. macOS 14+.

Download v0.47.0 · 21 MB