Question 1

Which browsers support text-to-speech?

Accepted Answer

Chrome, Edge, and Safari on macOS, Windows, Android, and iOS all expose the Web Speech API and ship a usable English voice out of the box. Firefox supports the API but only when an OS-level voice is installed — Windows 11 includes one, but Linux often needs an espeak-ng install. Headless browsers (CI runners, etc.) do not speak.

Question 2

Does the text get sent to a server?

Accepted Answer

No. The synthesis runs locally — the tool calls window.speechSynthesis.speak(...) and the operating system's voice engine renders the audio on your device. Open DevTools → Network and watch the panel stay silent while a voice speaks. Some platforms (Chrome on ChromeOS, Edge with Microsoft Online voices) optionally fetch a high-quality cloud voice, in which case the engine reports localService=false on that voice; pick a local-service voice if you need a guarantee of on-device synthesis.

Question 3

How do I get a Sinhala or Tamil voice?

Accepted Answer

Sinhala (si-LK) ships with Windows 11 — install it under Settings → Time & language → Speech → Add voices. Tamil (ta-IN / ta-LK) ships with Windows, recent Android versions, and macOS Sequoia. iOS exposes ta-IN under Settings → Accessibility → Spoken Content → Voices. Once the OS-level voice is installed, the voice picker on this page lists it on next page load.

Question 4

Why are the rate, pitch, and volume ranges different from other tools?

Accepted Answer

They follow the W3C Web Speech API spec exactly: rate is 0.1 to 10 (default 1), pitch is 0 to 2 (default 1), volume is 0 to 1 (default 1). Other websites that show 0–100% or −50 to +50 are remapping behind the scenes. The values shown on this page are what reaches the engine, so they are reproducible — if you tell a colleague to use rate 1.2 they will hear the same speed.

Question 5

Can I download the audio as MP3 or WAV?

Accepted Answer

Not directly — the Web Speech API does not expose a recording handle, which is by design (security and DRM concerns with voice models). The reliable way to capture audio is OS-level loopback: QuickTime Screen Recording on macOS, Stereo Mix or VB-Cable on Windows, or OBS Studio cross-platform. The page transport panel links to those tools.

Question 6

Why does the voice cut out after about 15 seconds in Chrome?

Accepted Answer

Chrome has a long-standing watchdog that halts a single utterance after roughly 15 seconds — see Chromium issue 679437. The tool here splits text into sub-utterances of up to 200 characters at sentence boundaries and queues them, which keeps each utterance comfortably under the watchdog and removes the cut-out.

Question 7

What is the maximum text length?

Accepted Answer

32,000 characters in the textarea — roughly 6,000 English words, or about 35 minutes at default rate. The tool chunks the text at sentence boundaries before sending it to the engine; the cap exists so the page stays responsive on a mid-range phone, not because the engine slows down.

Question 8

Why does the voice list start empty and then appear?

Accepted Answer

Chrome populates the voice list asynchronously on first load — getVoices() returns an empty array immediately, then fires a voiceschanged event when the list is ready. The tool listens for that event and refreshes the picker automatically. If the list still looks empty after a few seconds, you are likely on Linux without espeak-ng, or in an embedded browser (Instagram WebView, etc.) where the API is stubbed.

Question 9

Can I pause and resume mid-text?

Accepted Answer

Yes — Pause calls speechSynthesis.pause() and Resume calls speechSynthesis.resume(). They preserve the position within the current utterance. Some engines (older Android Chrome, Safari iOS) treat pause as a full stop; in that case use Stop instead and Speak from the start. The progress bar tracks per-chunk completion, so resuming long text will continue at the next chunk boundary if your engine cannot resume mid-utterance.

Question 10

Why does a particular voice sound robotic compared to YouTube or AI tools?

Accepted Answer

Browser-installed voices are concatenative or formant synthesizers — they are tiny (a few megabytes) and run instantly with no network. Modern neural voices like ElevenLabs, Google Cloud WaveNet, or OpenAI TTS are hundreds of megabytes and run on a server. The trade-off is privacy and speed for cadence: this tool is the right pick for accessibility, draft narration, or reading proofs aloud, but not for finished voiceover production.

Text to Speech — read any text aloud in your browser

How it works

Worked examples

Frequently asked questions

Sources & references

Related tools

Image to Text (OCR)

Word Counter

AI Voice Generator

Comments & feedback