Text to Speech — read any text aloud in your browser
Paste text, pick a voice, hit Speak. The Web Speech API drives your operating system's built-in voices, so nothing is uploaded and there is no signup. Adjust rate, pitch, and volume to the W3C spec ranges; English, Sinhala, Tamil, and dozens of other voices supported when your OS has them installed.
How it works
The tool is a thin React layer over the SpeechSynthesis interface that every modern browser exposes. There is no audio model shipped with this page — when you click Speak, a SpeechSynthesisUtterance is constructed and handed to window.speechSynthesis.speak(...), and your operating system's voice engine renders the audio. Privacy follows from the architecture: the browser never sends the text to a remote server unless you have picked a voice that reports localService = false, and the voice picker labels every such voice as "cloud" for transparency.
Three transformations sit between the textarea and the engine:
- Sanitisation. The input is normalised — bidi override marks and zero-width joiners are stripped (they make some engines skip the surrounding word), newlines are replaced with
". "so the engine pauses at line breaks, and any double-space run is collapsed. - Parameter clamping. The W3C spec defines the valid ranges for
rate(0.1–10),pitch(0–2), andvolume(0–1). Anything outside the range is clamped before it reaches the engine; anything non-numeric falls back to the default of 1. The "Verified · clamped to spec" badge in the tool header is computed live from a self-check that the clamp functions satisfyclampX(clampX(v)) === clampX(v)for every sample input. - Chunking. Chrome ships a watchdog (Chromium bug 679437) that halts a single utterance after roughly fifteen seconds. The tool side-steps the watchdog by splitting the input on sentence terminators (.!?) into sub-utterances of at most 200 characters, queueing them via
onendso the audio sounds continuous to the listener.
Voices are enumerated via window.speechSynthesis.getVoices(). The list is populated asynchronously by Chromium-based browsers, so the picker subscribes to the voiceschanged event and refreshes when the engine reports new entries. Voices are grouped by BCP-47 language tag (parsed per RFC 5646), with English (UK/US/AU/IN), Sinhala (LK), and Tamil (LK/IN) lifted to the top of the list to suit the induwara.lk audience.
A pre-flight duration estimate is shown below the textarea. It comes from the heuristic characters ÷ 5.1 ÷ 165 wpm ÷ rate— 5.1 characters per English word (Mayzner 1965, re-validated by Peter Norvig's Google Books corpus analysis) and 165 words per minute (midpoint of the MDN-documented platform default range of 150–180 wpm). The estimate lands within roughly ±10% on Chrome and Safari at rate=1.
Worked examples
Frequently asked questions
Sources & references
- W3C — Web Speech API (Synthesis section)
- MDN Web Docs — SpeechSynthesis interface
- MDN Web Docs — SpeechSynthesisUtterance (rate / pitch / volume)
- Can I use — speech-synthesis API support matrix
- IETF — BCP 47: Tags for Identifying Languages
- Chromium issue 679437 — speak() halts after ~15 seconds
- Peter Norvig — letter frequency & mean English word length
The clamp ranges, chunking thresholds, and language-tag handling on this page were last cross-checked against the upstream specs on 2026-05-11. The page is reviewed whenever a Chromium voice-list regression lands or the WICG Speech API spec ships a new draft. If you spot an engine behaviour that disagrees with the methodology above, please email me below.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Hit a voice that won't load, an engine error, or want a different clamp behaviour?
Email me at [email protected] — most fixes ship within 24 hours.