Speech to Text — dictate any language in your browser
Press Start, speak, and watch your words appear live. The Web Speech API drives Chrome, Edge, and Safari's built-in recognition engine — no signup, no upload, English, Sinhala, Tamil, and 30+ more languages. Export the transcript as plain text, SubRip (.srt), or WebVTT subtitles.
How it works
The tool is a thin React layer over the SpeechRecognition interface from the WICG Web Speech API draft. There is no transcription model shipped with this page — when you press Start, a recognition object is created (via window.SpeechRecognition or window.webkitSpeechRecognition), configured with the language tag and continuous / interim flags, and the browser starts streaming microphone audio to whichever recognition backend it ships with. Chrome and Edge use cloud engines (Google and Microsoft respectively); Safari since iOS 14.5 and macOS Sonoma runs recognition on-device for installed language packs.
Four pieces sit between the microphone and the transcript area:
- Permission and start. Calling
recognition.start()triggers the browser's microphone permission prompt the first time you visit the page. The user gesture (the Start button click) is required by all engines — programmatic auto-start would be rejected with anot-allowederror. - Result handling. The engine fires
resultevents with aSpeechRecognitionResultList. Each entry hasisFinaltrue (committed) or false (interim). The tool walks results fromresultIndexforward — already-finalized entries don't need to be re-processed — and appends every final chunk to the segments array with a timestamp measured against the Start moment. - Stat counting. Word and sentence counts use Unicode property classes (
\p{L}\p{M}\p{N}), so Sinhala syllable clusters like ආයුබෝවන් and Tamil clusters like வணக்கம் count as one word each. The combining marks (\p{M}) keep the cluster together; without them, Sinhala vowel signs would split each base letter into its own word. WPM is straight division: words ÷ (durationMs ÷ 1000) × 60. - Subtitle export. SubRip (
HH:MM:SS,mmm) and WebVTT (HH:MM:SS.mmm) timestamps are built from the per-segment millisecond offsets. The two formats differ by exactly one character — the decimal separator — and the "Verified · methodology checked" badge confirms thatSRT(t).replace(",", ".") === VTT(t)for every sampled timestamp on every page load.
Privacy lives at the boundary between the browser and the recognition backend. The page never touches a server itself — every state transition (Start, Stop, Clear, Copy, Download) is local JavaScript. But the audio leaving the browser, when Chrome or Edge is the host, travels to Google or Microsoft for transcription. That trade-off is fundamental to the way Chrome implements the API, and there is no page-level switch to flip. Safari with an installed language pack is the only fully on-device path; the tool labels the engine in the startup notice so you can choose accordingly.
The transport row shows a pulsing dot while the engine is listening and a monotonic elapsed timer keyed off performance.now() (immune to system clock changes). Interim results render in italic muted grey; final segments switch to body weight. When you press Stop, the engine fires one last result batch with isFinal=true for whatever it had buffered, then end fires and the tool tears down the listeners.
Worked examples
Frequently asked questions
Sources & references
- WICG — Web Speech API (Recognition section)
- MDN Web Docs — SpeechRecognition interface
- MDN Web Docs — SpeechRecognitionEvent
- MDN Web Docs — SpeechRecognitionErrorEvent (error codes)
- Can I use — speech-recognition support matrix
- IETF — BCP 47: Tags for Identifying Languages
- W3C — WebVTT: The Web Video Text Tracks Format
- Wikipedia — SubRip (.srt) format reference
The language list, error mapping, and subtitle timestamp helpers on this page were last cross-checked against the upstream specs on 2026-05-11. The page is reviewed whenever a Chromium recognition regression lands or the WICG Speech API draft ships a new revision. If you spot engine behaviour that disagrees with the methodology above, email me below.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Hit an engine error, a missing language, or want a different export format?
Email me at [email protected] — most fixes ship within 24 hours.