Question 1

How much does a speech-to-text API cost per minute?

Accepted Answer

Hosted transcription ranges from about $0.0007/min (Groq Whisper Large v3 Turbo, ~$0.04/hr) to $0.024/min (Amazon Transcribe tier 1). The popular middle is $0.0043–$0.006/min — Deepgram Nova-3, AssemblyAI Universal, and OpenAI Whisper all sit there. Enter your monthly minutes in the tool above to see the exact projected bill per provider.

Question 2

Which speech-to-text API is the most accurate?

Accepted Answer

On published English benchmarks, ElevenLabs Scribe, AssemblyAI Universal, OpenAI gpt-4o-transcribe and Deepgram Nova-3 report the lowest word error rates (roughly 6.2%–6.9%). But WER is benchmark-dependent — accent, audio quality, domain jargon and background noise move it a lot. Test two or three on your own audio before deciding; a 1% benchmark gap rarely survives real recordings.

Question 3

Is OpenAI Whisper cheaper than Deepgram?

Accepted Answer

No. OpenAI's whisper-1 API is $0.006/min; Deepgram Nova-3 is $0.0043/min for pre-recorded audio — about 28% cheaper — and Nova-3 includes speaker diarization and word timestamps, which Whisper's hosted API does not. If you self-host the open Whisper weights your only cost is your own GPU, which can beat both above very high volume.

Question 4

Which speech-to-text APIs support real-time streaming?

Accepted Answer

Deepgram, AssemblyAI, Google Cloud Speech-to-Text, Azure AI Speech, Amazon Transcribe and OpenAI's gpt-4o transcription models all offer real-time streaming. Groq Whisper, ElevenLabs Scribe, Rev AI's machine endpoint and OpenAI's whisper-1 are batch-only. Switch the tool above to “Real-time” mode and batch-only providers are marked accordingly so they never win a cost ranking they can't serve.

Question 5

What is a good word error rate (WER) for transcription?

Accepted Answer

WER is the share of words wrong (substitutions, insertions, deletions) versus a human reference. On clean English audio, under 10% is usable, under 7% is strong, and the best models report ~6%. Noisy, accented, or jargon-heavy audio pushes every model higher. Treat the WER column here as an indicative, benchmark-published figure, not a promise for your recordings.

Question 6

Does this tool transcribe my audio?

Accepted Answer

No. This is a pricing-and-feature comparison only — it sends nothing to any provider and uploads no audio. To actually transcribe a file in the browser, use the AI Audio Transcriber. To model the cost of one provider in depth, use the AI Transcription Cost Calculator. This page is for choosing which provider to wire up before you write any code.

Question 7

How is the monthly cost calculated?

Accepted Answer

Your volume is normalised to minutes (hours × 60), the provider's documented monthly free minutes are subtracted, and the remainder is multiplied by that provider's per-minute rate for the selected mode. So cost = max(0, minutes − free tier) × rate. Hourly-priced vendors (AssemblyAI, Groq, Azure, ElevenLabs) are converted to a per-minute rate by dividing by 60.

Question 8

Can I pay for these APIs from Sri Lanka?

Accepted Answer

Yes — all bill in USD and accept international cards. Budget the FX margin (typically 1.5–3% over the CBSL indicative rate) plus 2–3% card-processing on most LKR cards. The optional LKR column here uses one dated indicative rate of Rs 300 per USD for a rough sense of scale; for exact local cost use the Freelancer USD–LKR calculator with your bank's real rate.

Question 9

Why does the same provider show two different prices?

Accepted Answer

Batch (pre-recorded) and real-time (streaming) are billed differently. Deepgram Nova-3 is $0.0043/min batch but $0.0077/min streaming; AssemblyAI is cheaper streaming ($0.15/hr) than batch ($0.27/hr). The mode toggle switches which rate the ranking uses, because picking the wrong one can flip which provider is cheapest for you.

Question 10

When was this pricing last verified?

Accepted Answer

Every rate and capability flag was last cross-checked against the providers' official pricing and documentation pages on 2026-06-20. STT pricing changes often, so confirm the current figure on the linked pricing page before committing. Spotted a stale number? Email me and I'll fix it within 24 hours.

Provider	Batch $/min	Real-time	Languages	Max input	WER
Whisper Large v3 Turbo Groq	$0.0007	—	99	100 MB / request	8.4%
Nova-3 Deepgram	$0.0043	$0.0077	36	2 GB / request	6.8%
Universal AssemblyAI	$0.0045	$0.0025	99	≤ 10 h / 5 GB	6.6%
Whisper OpenAI	$0.0060	—	99	25 MB / request	8.1%

AI Speech-to-Text API Comparison

Providers to compare

Projected monthly cost (batch, cheapest first)

Feature matrix

Per-provider notes

How it works

1. The cost formula

2. Batch versus real-time

3. Free tiers and tiered pricing

4. Accuracy (WER) is benchmark-dependent

5. Best-for badges

Worked examples

Frequently asked questions

Sources & references

Related tools

Transcription Cost Calc

Text-to-Speech Compare

AI Image Generator Compare

Comments & feedback