Question 1

How much does a text-to-speech API cost per character?

Accepted Answer

Hosted TTS ranges from about $4 per 1,000,000 characters (Google Standard, Amazon Polly Standard) to roughly $90 per 1M for premium neural voices like ElevenLabs Multilingual v2. The common middle is $15–$30 per 1M — OpenAI tts-1 ($15), Google Neural2 and Amazon Polly Neural ($16), and HD/generative tiers around $30. Enter your monthly volume in the tool above to see the exact projected bill per provider.

Question 2

Which AI voice generator sounds the most natural?

Accepted Answer

On community naturalness leaderboards like TTS-Arena, ElevenLabs, Google's Chirp 3 HD, Cartesia Sonic and OpenAI's gpt-4o-mini-tts consistently rank near the top, with Amazon Polly Generative and Azure HD close behind. But naturalness is listener- and language-dependent — a model that shines in English may be weaker in your target language. Shortlist two or three here, then generate a sample of your own script before committing.

Question 3

Is ElevenLabs cheaper than OpenAI TTS?

Accepted Answer

No. OpenAI tts-1 is $15 per 1M characters; ElevenLabs Multilingual v2 works out to roughly $90 per 1M on a representative mid tier — about six times more. ElevenLabs Flash v2.5 is cheaper at ~$45 per 1M. People still pay the premium because ElevenLabs leads on naturalness and offers instant and professional voice cloning, which OpenAI's TTS API does not.

Question 4

Which text-to-speech APIs support voice cloning?

Accepted Answer

ElevenLabs (instant and professional), PlayHT and Cartesia (instant) offer voice cloning through their APIs, and Microsoft Azure offers professional Custom Neural Voice behind a gated-access application. OpenAI, Google Cloud, Amazon Polly, Murf's API and Deepgram Aura do not clone voices via their standard TTS endpoints. Toggle the “Voice cloning” filter in the tool above to grey out providers that lack it.

Question 5

Which TTS API has the lowest streaming latency?

Accepted Answer

Among the providers that publish a figure, ElevenLabs Flash v2.5 (~75 ms time-to-first-byte) and Cartesia Sonic (~90 ms) market the lowest streaming latency, followed by Deepgram Aura (~200 ms) and PlayHT (~300 ms). Low latency matters most for live voice agents and IVR, not for pre-rendered narration. The big-three clouds support streaming but don't prominently publish a TTFB number, so they show “—” in the latency column.

Question 6

Does this tool generate audio or read text aloud?

Accepted Answer

No. This is a pricing-and-feature comparison only — it sends nothing to any provider and generates no speech. To actually create audio in the browser, use the AI Voice Generator. To model one provider's cost in depth, use the AI Text-to-Speech Cost Calculator. This page is for choosing which provider to wire up before you write any code or pick a paid plan.

Question 7

How is the monthly cost calculated?

Accepted Answer

Your volume is normalised to characters (words × 6, or minutes × 900), the provider's documented standing monthly free characters are subtracted, and the remainder is multiplied by that provider's per-character rate. So cost = max(0, characters − free tier) ÷ 1,000,000 × rate-per-million. The data module also cross-checks every figure via the per-1,000-character rate, so the two routes must agree before the page builds.

Question 8

Why do credit-priced providers show an “effective” rate?

Accepted Answer

ElevenLabs, PlayHT, Murf and Cartesia sell credits rather than billing per character directly, and the cost per character depends on which plan you're on. The rate shown is an effective per-million-character figure for a representative tier, marked “eff”. Treat it as a planning estimate and confirm against your actual plan on the linked pricing page — high-volume plans usually lower the effective rate.

Question 9

Can I pay for these APIs from Sri Lanka?

Accepted Answer

Yes — all bill in USD and accept international cards. Budget the FX margin (typically 1.5–3% over the CBSL indicative rate) plus 2–3% card processing on most LKR cards. The optional LKR column uses one dated indicative rate of Rs 300 per USD for a rough sense of scale; for exact local cost use the Freelancer USD–LKR calculator with your bank's real rate.

Question 10

When was this pricing last verified?

Accepted Answer

Every rate and capability flag was last cross-checked against the providers' official pricing and documentation pages on 2026-06-21. TTS pricing, voices and models change often, so confirm the current figure on the linked pricing page before committing. Spotted a stale number? Email me and I'll fix it within 24 hours.

Provider	$/1M chars	Cloning	Languages	Latency	Elo
Neural2 Google	$16.00	No	50	—	1120
tts-1 OpenAI	$15.00	No	50	—	1150
Neural Amazon	$16.00	No	34	—	1100
Multilingual v2 ElevenLabs	$90.00eff	Professional	29	—	1290

AI Text-to-Speech (TTS) API Comparison

Providers to compare

Projected monthly cost (cheapest first)

Feature matrix

Per-provider notes

How it works

1. The cost formula

2. Per-character, credit and per-minute pricing

3. Free tiers

4. Quality (Elo) is benchmark-dependent

5. Best-for badges

Worked examples

Frequently asked questions

Sources & references

Related tools

AI TTS Cost Calculator

Speech-to-Text Compare

AI Image Generator Compare

Comments & feedback