induwara.lk
induwara.lkText · Accessibility

Text to Speech — read any text aloud in your browser

Paste text, pick a voice, hit Speak. The Web Speech API drives your operating system's built-in voices, so nothing is uploaded and there is no signup. Adjust rate, pitch, and volume to the W3C spec ranges; English, Sinhala, Tamil, and dozens of other voices supported when your OS has them installed.

By Induwara AshinsanaUpdated May 11, 2026
Text to SpeechWeb Speech API · 100% in-browser
Runs entirely in your browser. Nothing is uploaded, logged, or stored.
Try a sample
0 voices detected

Voices come from your operating system — Sinhala and Tamil voices are bundled on Windows 11 and most Android phones. macOS & iOS ship a wide English set out of the box.

0.110
02
01
Ready

Microphone is never used. This tool plays synthesized audio out — it does not record. To save the audio, use your system's screen-record or audio-loopback (QuickTime · OBS · Audacity).

Sources: voice list comes from window.speechSynthesis.getVoices(); parameter ranges follow the W3C Web Speech API spec. Long text is split at sentence boundaries (max 200 chars per utterance) to dodge the Chrome 15-second watchdog. Full citations in the Sources section below.

How it works

The tool is a thin React layer over the SpeechSynthesis interface that every modern browser exposes. There is no audio model shipped with this page — when you click Speak, a SpeechSynthesisUtterance is constructed and handed to window.speechSynthesis.speak(...), and your operating system's voice engine renders the audio. Privacy follows from the architecture: the browser never sends the text to a remote server unless you have picked a voice that reports localService = false, and the voice picker labels every such voice as "cloud" for transparency.

Three transformations sit between the textarea and the engine:

  1. Sanitisation. The input is normalised — bidi override marks and zero-width joiners are stripped (they make some engines skip the surrounding word), newlines are replaced with". "so the engine pauses at line breaks, and any double-space run is collapsed.
  2. Parameter clamping. The W3C spec defines the valid ranges for rate (0.1–10), pitch (0–2), and volume (0–1). Anything outside the range is clamped before it reaches the engine; anything non-numeric falls back to the default of 1. The "Verified · clamped to spec" badge in the tool header is computed live from a self-check that the clamp functions satisfy clampX(clampX(v)) === clampX(v) for every sample input.
  3. Chunking. Chrome ships a watchdog (Chromium bug 679437) that halts a single utterance after roughly fifteen seconds. The tool side-steps the watchdog by splitting the input on sentence terminators (.!?) into sub-utterances of at most 200 characters, queueing them via onend so the audio sounds continuous to the listener.

Voices are enumerated via window.speechSynthesis.getVoices(). The list is populated asynchronously by Chromium-based browsers, so the picker subscribes to the voiceschanged event and refreshes when the engine reports new entries. Voices are grouped by BCP-47 language tag (parsed per RFC 5646), with English (UK/US/AU/IN), Sinhala (LK), and Tamil (LK/IN) lifted to the top of the list to suit the induwara.lk audience.

A pre-flight duration estimate is shown below the textarea. It comes from the heuristic characters ÷ 5.1 ÷ 165 wpm ÷ rate— 5.1 characters per English word (Mayzner 1965, re-validated by Peter Norvig's Google Books corpus analysis) and 165 words per minute (midpoint of the MDN-documented platform default range of 150–180 wpm). The estimate lands within roughly ±10% on Chrome and Safari at rate=1.

Worked examples

Short English greeting

"Hello world"

  1. Sanitise → 'Hello world' (11 chars, no newlines, no invisibles)
  2. Chunk → 1 utterance of 11 chars (well under the 200-char limit)
  3. Estimate → 11 / 5.1 / 165 wpm × 60 s + 0.08 lead-in ≈ 0.86 s at rate 1
  4. Clamp → rate=1 → 1, pitch=1 → 1, volume=1 → 1 (all within spec)
  5. Speak → 1 SpeechSynthesisUtterance handed to the engine

Three sentences with terminators

"Welcome to Sri Lanka. Today is hot. Stay hydrated."

  1. Sanitise → preserved exactly (length 48)
  2. Chunk → splits on .!? → ['Welcome to Sri Lanka.', 'Today is hot.', 'Stay hydrated.']
  3. Pack → pieces total 48 chars, well under 200 → 1 chunk
  4. Estimate → 48 / 5.1 / 165 wpm × 60 + 0.08 ≈ 3.5 s at rate 1
  5. Engine progress events fire after every utterance for the progress bar

Long input crossing the watchdog

"A 1,000-character paragraph with no terminators (worst case)"

  1. Sanitise → length 1,000, no sentence boundaries
  2. Chunk → falls back to hard split at 200 chars → 5 chunks of 200
  3. Speak → 5 utterances queued; each ends well under the 15 s watchdog
  4. Audio → seamless because each onend immediately speaks the next chunk
  5. Without chunking → Chrome would cut out after ~15 seconds mid-sentence

Extreme parameter values

rate=12, pitch=-1, volume=2 (all outside the spec)

  1. clampRate(12) → 10 (capped at spec maximum)
  2. clampPitch(-1) → 0 (floored at spec minimum)
  3. clampVolume(2) → 1 (capped at spec maximum)
  4. Result: utterance plays at the fastest legal rate, monotone, full volume
  5. Without clamping → some engines throw 'invalid-argument' and skip the audio

Frequently asked questions

Sources & references

The clamp ranges, chunking thresholds, and language-tag handling on this page were last cross-checked against the upstream specs on 2026-05-11. The page is reviewed whenever a Chromium voice-list regression lands or the WICG Speech API spec ships a new draft. If you spot an engine behaviour that disagrees with the methodology above, please email me below.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Hit a voice that won't load, an engine error, or want a different clamp behaviour?

Email me at [email protected] — most fixes ship within 24 hours.