Question 1

Which browsers support speech to text?

Accepted Answer

Chrome and Edge on desktop, Android, and ChromeOS expose the Web Speech recognition API out of the box. Safari supports it on iOS 14.5 or later and macOS Sonoma or later — turn on Dictation in System Settings first. Firefox does not implement the recognition half of the API, so the tool reports a fallback notice there. Headless browsers (CI runners, embedded WebViews) generally have it stubbed out and will not work.

Question 2

Is the audio sent to a server?

Accepted Answer

Chrome streams audio to Google's cloud speech-recognition service; Edge streams to Microsoft Cognitive Services. The browser handles that transport — the page never sees the audio. Safari runs recognition on-device starting iOS 14.5 and macOS Sonoma. If you need a strict on-device guarantee, use Safari, or open this page from a Mac and pick a language your OS has a local model for.

Question 3

How do I get Sinhala or Tamil to work?

Accepted Answer

Pick si-LK, ta-LK, or ta-IN from the language dropdown. Chrome's cloud engine has trained models for all three (accuracy is best for ta-IN and si-LK; ta-LK falls back to ta-IN on some Android builds). Safari needs an OS-level voice download under Settings → General → Language & Region → Add Language; once installed, recognition runs on-device.

Question 4

Why does the recognition stop after a few seconds?

Accepted Answer

By default the Web Speech API ends recognition after a short pause in speech. To keep listening, leave the Continuous toggle on — that sets recognition.continuous = true, so the engine keeps the stream open until you press Stop. Continuous mode can still time out at engine discretion (Chrome aborts after roughly a minute of silence); the tool resets cleanly when that happens.

Question 5

Can I edit the transcript while it's running?

Accepted Answer

The live transcript area is read-only by design — interim and final results stream in faster than a user can keep cursor position, and any edits would be overwritten on the next result event. Use the Copy button when finished and paste into a regular text editor for cleanup, or download to .txt for offline editing.

Question 6

Why does the engine say words I didn't say?

Accepted Answer

Web Speech recognition is statistical, not deterministic — background noise, accent, microphone quality, and language model coverage all affect accuracy. Three quick fixes: pick the language tag that matches your accent (en-IN for Indian English, en-GB for British, en-AU for Australian), get closer to the microphone, and reduce background noise. The Words-per-minute counter helps you spot speed-related errors — under 150 wpm is the engine's comfort zone.

Question 7

How do I download timestamped subtitles?

Accepted Answer

After Stop, use the SubRip (.srt) or WebVTT (.vtt) buttons in the Download row. Both formats embed the start/end times relative to when you pressed Start. Drop the .vtt file into a <track> element on an HTML5 video, or import the .srt into VLC, Premiere, Resolve, or YouTube Studio. Plain text (.txt) drops the timestamps for word-processor pasting.

Question 8

Will it work offline?

Accepted Answer

Only in Safari with an OS-installed language pack — recognition runs on-device. In Chrome or Edge the engine requires a network connection to reach Google or Microsoft. The page itself is statically served and the methodology, FAQ, and exports all work offline, but no audio will be transcribed without a working connection.

Question 9

Does the tool record my voice anywhere?

Accepted Answer

No. The microphone stream is handed straight to window.SpeechRecognition; the page does not call MediaRecorder, doesn't keep a MediaStream reference, and never writes audio to disk. The only network traffic the page itself makes is the static page assets. Audio routing to Google or Microsoft is the browser's own behaviour, controlled by the browser permissions you grant.

Question 10

What's the difference between interim and final results?

Accepted Answer

Interim results are the engine's best guess at what you said while you're still speaking — they update word by word and may flip as more context arrives. Final results are committed when the engine decides the utterance is complete (a pause or a sentence terminator). The tool shows interim text in italic muted grey and final text in normal weight, so you can see the engine's confidence at a glance.

Speech to Text — dictate any language in your browser

How it works

Worked examples

Frequently asked questions

Sources & references

Related tools

Image to Text (OCR)

Word Counter

Text to Speech

Comments & feedback