RECOGNITION

How to make VoiceInput more accurate?

VoiceInput defaults to an offline local engine — free but with an accuracy ceiling. Switch to the cloud engine for better results, or add hotwords for stable recognition of jargon.

Local vs Cloud Engine

Dimension Local (Free) Cloud (Pro)
Short sentencesAdequateNoticeably better
Long sentences / jargonOKLarger advantage
Mixed CN/ENPartialMore stable
PunctuationNoneAuto
PrivacyFully offlineAudio sent to Volcano
QuotaUnlimitedPro: unlimited / Free: 60 min/month

How to switch

PATH

Settings → Engine → choose "Cloud" or "Local"

Switching takes effect immediately — your next hotkey press uses the new engine. The two are seamlessly interchangeable.

💡 Recommended: use Local for short everyday inputs (sufficient + private); use Cloud for professional emails or long docs (more accurate + auto punctuation). Cloud + AI Tidy together gives the cleanest output.

Hotword dictionary

Some words get recognized incorrectly (names, brands, jargon, abbreviations). Add them to your hotword dictionary so future recognition matches them first.

PATH

Settings → Hotwords → Add

One word per line. Start with the 5–10 you misrecognize most.

Good hotword candidates

  • Proper nouns: company names, product names, people (e.g. "Lark", "Anthropic", "Zihan Zhang")
  • Abbreviations / English jargon: often misrecognized as similar-sounding Chinese (e.g. "API" → "A 派", "token" → "头肯")
  • Fixed phrases: specific combinations you say often at work

Punctuation

Behavior:

  • Local engine: no auto punctuation. Output is continuous — you add punctuation manually or enable AI Tidy to fill it in.
  • Cloud engine: auto-adds periods, commas, question marks, exclamation marks. Usually no manual fix needed.

Speaking with deliberate pauses also gets both engines to insert sentence breaks — useful for separating clauses in a long utterance.

Mixed CN/EN speech

VoiceInput supports mixed Chinese-English (e.g. "What's the API endpoint of this service"). Both engines handle it, but the cloud engine switches between languages more smoothly.

  • Common English terms stay English (no forced homophone substitution to Chinese)
  • Pure-English sentences also work (e.g. "I'll send you the doc")
  • But Chinese is the primary language — pure-English long passages won't match a dedicated English ASR

Accents / Dialects

VoiceInput uses a Mandarin-Chinese recognition engine. With a strong regional accent:

  • Cloud is more tolerant: broader training data covers accented Mandarin
  • Slow down a notch: accuracy correlates inversely with speech rate; speaking slightly slower noticeably helps
  • Add error-prone words to hotwords: if specific words always come out wrong, add them

Dialects (Cantonese / Hokkien / Shanghainese / etc.) aren't directly supported — wait for a future release, or enable AI Tidy to let it guess from context (saves some cases).

Long sentences

No length limit, but recommendations:

  • Under 30 seconds: local / cloud both stable.
  • 30–90 seconds: switch to cloud + enable AI Tidy — long sentences benefit more from punctuation and structure.
  • Over 90 seconds: consider chunking. There's no hard limit, but very long recordings may time out at the AI Tidy stage (recognition itself is fine).

Pairing with AI Tidy

The recognition engine only handles "voice → text." Turning spoken style into written prose is AI Tidy's job. If your speaking style is conversational ("um... so... like, the thing is..."), enabling AI Tidy gives much cleaner output.

Detailed AI Tidy setup: AI Tidy docs.