How to make VoiceInput more accurate?
VoiceInput defaults to an offline local engine — free but with an accuracy ceiling. Switch to the cloud engine for better results, or add hotwords for stable recognition of jargon.
Local vs Cloud Engine
| Dimension | Local (Free) | Cloud (Pro) |
|---|---|---|
| Short sentences | Adequate | Noticeably better |
| Long sentences / jargon | OK | Larger advantage |
| Mixed CN/EN | Partial | More stable |
| Punctuation | None | Auto |
| Privacy | Fully offline | Audio sent to Volcano |
| Quota | Unlimited | Pro: unlimited / Free: 60 min/month |
How to switch
Settings → Engine → choose "Cloud" or "Local"
Switching takes effect immediately — your next hotkey press uses the new engine. The two are seamlessly interchangeable.
Hotword dictionary
Some words get recognized incorrectly (names, brands, jargon, abbreviations). Add them to your hotword dictionary so future recognition matches them first.
Settings → Hotwords → Add
One word per line. Start with the 5–10 you misrecognize most.
Good hotword candidates
- Proper nouns: company names, product names, people (e.g. "Lark", "Anthropic", "Zihan Zhang")
- Abbreviations / English jargon: often misrecognized as similar-sounding Chinese (e.g. "API" → "A 派", "token" → "头肯")
- Fixed phrases: specific combinations you say often at work
Punctuation
Behavior:
- Local engine: no auto punctuation. Output is continuous — you add punctuation manually or enable AI Tidy to fill it in.
- Cloud engine: auto-adds periods, commas, question marks, exclamation marks. Usually no manual fix needed.
Speaking with deliberate pauses also gets both engines to insert sentence breaks — useful for separating clauses in a long utterance.
Mixed CN/EN speech
VoiceInput supports mixed Chinese-English (e.g. "What's the API endpoint of this service"). Both engines handle it, but the cloud engine switches between languages more smoothly.
- Common English terms stay English (no forced homophone substitution to Chinese)
- Pure-English sentences also work (e.g. "I'll send you the doc")
- But Chinese is the primary language — pure-English long passages won't match a dedicated English ASR
Accents / Dialects
VoiceInput uses a Mandarin-Chinese recognition engine. With a strong regional accent:
- Cloud is more tolerant: broader training data covers accented Mandarin
- Slow down a notch: accuracy correlates inversely with speech rate; speaking slightly slower noticeably helps
- Add error-prone words to hotwords: if specific words always come out wrong, add them
Dialects (Cantonese / Hokkien / Shanghainese / etc.) aren't directly supported — wait for a future release, or enable AI Tidy to let it guess from context (saves some cases).
Long sentences
No length limit, but recommendations:
- Under 30 seconds: local / cloud both stable.
- 30–90 seconds: switch to cloud + enable AI Tidy — long sentences benefit more from punctuation and structure.
- Over 90 seconds: consider chunking. There's no hard limit, but very long recordings may time out at the AI Tidy stage (recognition itself is fine).
Pairing with AI Tidy
The recognition engine only handles "voice → text." Turning spoken style into written prose is AI Tidy's job. If your speaking style is conversational ("um... so... like, the thing is..."), enabling AI Tidy gives much cleaner output.
Detailed AI Tidy setup: AI Tidy docs.