How much does an AI voice agent cost per minute?

For a typical cascaded stack — GPT-4o mini for the brain, Deepgram for transcription and ElevenLabs Flash for the voice — about $0.15 per minute of live conversation, before telephony and platform fees. Cheaper voices like Cartesia Sonic or OpenAI TTS drop that to roughly $0.04–$0.07. Add ~$0.014/min for Twilio and $0.05/min for a Vapi platform fee if you use them.

Why is text-to-speech the most expensive part of a voice agent?

Because the agent speaks far more characters than the LLM costs to generate them. One output token (≈4 characters) costs a fraction of a cent on a cheap model, but ElevenLabs charges around $0.15 per 1,000 characters to voice it. In the default stack TTS is roughly 97% of the per-minute cost — the LLM is almost a rounding error against it.

Is ElevenLabs or Cartesia cheaper for a voice agent?

Cartesia Sonic is far cheaper — about $0.035 per 1,000 characters against ElevenLabs Flash at roughly $0.15, so close to a quarter of the per-character cost. For the same workload that can cut a $2,600/month bill to under $700. ElevenLabs buys more natural, expressive voices; Cartesia and OpenAI TTS trade some fidelity for a much lower bill. Use the 'Swap the voice' strip to price your exact workload on each.

What is the cheapest LLM for a voice agent?

GPT-4o mini ($0.15 input / $0.60 output per 1M tokens) is the cheapest sensible brain here, with Gemini 2.5 Flash and Claude Haiku 4.5 close behind. Because voice agents send short turns, the LLM is rarely the dominant cost, so paying for a stronger model usually moves the monthly total only slightly. Spend your optimisation effort on the voice (TTS), not the brain.

How much does a Vapi or Retell voice agent cost per month?

Vapi and Retell charge a per-minute platform fee on top of the model costs — about $0.05/min for Vapi and $0.07/min for Retell. At 60,000 minutes a month that platform fee alone is $3,000–$4,200, often the second-largest line after the voice. Self-hosting the pipeline removes that fee entirely but you maintain the orchestration yourself.

What is the difference between a cascaded and a realtime voice agent?

A cascaded agent chains separate speech-to-text, LLM and text-to-speech services — this calculator prices that. A realtime agent uses a native speech-to-speech model (OpenAI Realtime, Gemini Live) billed on audio tokens. They cost differently, so for the second kind use our Realtime Voice API Cost Calculator instead, linked in the Related tools below.

Does this calculator make a live API call?

No. Every price is a pinned constant transcribed from the vendor's pricing page and dated with a last-verified value, and all maths runs in your browser. Nothing — no inputs, no keys — leaves the page. Because AI prices move, the rates are re-checked each quarter; the last pass was on 2026-06-09.

Why does the tool ask for an STT billed fraction?

A half-duplex agent only streams the caller's audio to the transcriber while they speak, not for the whole minute. The STT billed fraction (0–1) is the share of the minute the caller is actually talking — 0.5 is a fair default for a balanced back-and-forth. Set it to 1 if you transcribe the whole call, or lower for agent-heavy scripts.

Are the monthly totals an upper or lower bound?

Treat them as an upper bound. The tool uses standard pay-as-you-go list prices and ignores committed-use discounts, free tiers and volume pricing, which most providers offer above a few million characters or tokens. Real bills with negotiated rates usually come in lower. The figure is for budgeting and client quotes, not an invoice.

AI · Cost

AI Voice Agent Cost Calculator

Work out what a cascaded voice AI agent really costs per minute and per month. Chain a speech-to-text engine, an LLM, a text-to-speech voice, telephony and a platform fee, and see which layer dominates — in USD and rupees, no signup.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 9, 2026

Cascaded voice agent cost

Speech-to-text (STT)

Transcribes the caller

LLM brain

Input / output per 1M tokens

Text-to-speech (TTS)

The agent's voice, $/1k chars

Telephony

PSTN carriage, per minute

Orchestration platform

Vapi / Retell fee, per minute

Currency

Turns / minute

Agent replies per minute (1–20)

Input tokens / turn

System + history re-sent each turn

Output tokens / turn

What the agent says each turn

STT billed fraction

Share of the minute the caller speaks

Calls / day

Conversations handled per day

Avg call length (min)

Average minutes per conversation

Days / month

Operating days (1–31)

Try a stack

Cost per minute

$0.1486

4 turns/min

Cost per call

$0.4458

3 min average

Monthly minutes

18,000

200 calls/day × 30 days

Monthly cost

$2,675

Rs 802,408

TTS — ElevenLabs Flash is 96.91% of your per-minute cost — the single biggest layer to optimise.

Component breakdown

Component	Per minute	Share	Per month
STT — Deepgram Nova-3	$0.0039	2.59%	$69.30
LLM input — GPT-4o mini	$0.0006	0.4%	$10.80
LLM output — GPT-4o mini	$0.0001	0.1%	$2.59
TTS — ElevenLabs Flash	$0.1440	96.91%	$2,592
Telephony — None (web / SIP)	$0.0000	0%	$0.00
Platform — Self-hosted (none)	$0.0000	0%	$0.00
Blended total	$0.1486	100%	$2,675

TTS characters/minute = 960 (output tokens × 4). Monthly minutes = 18,000.

Swap the voice (TTS)

TTS is usually the cost driver. Same workload, every voice — cheapest first.

Option	Per minute	Monthly	vs cheapest
OpenAI TTS	$0.0190	$341.89	—
Cartesia Sonic	$0.0382	$687.49	+$345.60
ElevenLabs Flashselected	$0.1486	$2,675	+$2,333

Swap the brain (LLM)

Same workload re-priced across every LLM. The LLM is rarely the dominant cost.

Option	Per minute	Monthly	vs cheapest
GPT-4o miniselected	$0.1486	$2,675	—
Gemini 2.5 Flash	$0.1496	$2,694	+$19.01
Claude Haiku 4.5	$0.1531	$2,755	+$80.21
GPT-4o	$0.1602	$2,884	+$209.81
Claude Sonnet 4.5	$0.1634	$2,942	+$267.41

Sources cited: OpenAI, Anthropic, Google Gemini, Deepgram, ElevenLabs, Cartesia, Twilio, Vapi and Retell pricing pages (transcribed and dated 2026-06-09), the OpenAI Help Center token-to-character ratio, and the CBSL indicative USD→LKR rate. Full links are in the Sources & references section below. List prices only — committed-use and volume discounts are not modelled, so treat the totals as an upper bound.

How it works

A cascaded (pipeline) voice agent is built by chaining four services per turn: a speech-to-text model transcribes the caller, an LLM decides what to say, a text-to-speech voice speaks the reply, and optional telephony carries the call over the phone network. An orchestration platform such as Vapi or Retell can glue them together for a per-minute fee. This calculator prices each layer for one minute of live conversation, then scales to your monthly volume. All maths is deterministic and the prices are pinned, vendor-published constants — no live API calls.

The per-minute model uses your turns per minute (T), input tokens per turn (S) and output tokens per turn (O):

LLM: (T·S / 1e6)·P_in + (T·O / 1e6)·P_out, where P_in and P_out are the model's input and output prices per 1M tokens. Output is billed higher than input, which is why the two are shown separately.
TTS: the agent voices what the LLM generates, so characters per minute = T·O × 4 (1 token ≈ 4 English characters, per the OpenAI Help Center). Cost = (chars / 1000) × P_tts.
STT: stt_fraction × P_stt, where P_stt is the dollars-per-minute audio rate and stt_fraction is the share of the minute the caller is actually speaking. A half-duplex agent only streams user audio, so 0.5 is a sensible default.
Telephony and platform: flat per-minute fees (Twilio for carriage, Vapi/Retell for orchestration), each zero when not used.
Per-minute total is the sum of all five layers. Multiply by calls/day × avg minutes × days/month for the monthly cost, then by the USD→LKR rate for the rupee figure.

The result is cross-checked two ways: the monthly total computed as per-minute × monthly minutes must equal per-call × calls-per-month to the cent, and the two worked examples below are re-derived in code at build time and fail the build on any mismatch. For native speech-to-speech agents billed on audio tokens, use the Realtime Voice API Cost Calculator instead — it prices a fundamentally different billing model.

Worked examples

Web reservation bot

GPT-4o mini + Deepgram + ElevenLabs Flash · self-hosted

Inputs: T=4 turns/min, S=1,000, O=60, stt_fraction=0.5
LLM in: 4×1,000 = 4,000 tok → 4,000/1e6 × $0.15 = $0.000600
LLM out: 4×60 = 240 tok → 240/1e6 × $0.60 = $0.000144
TTS: 240×4 = 960 chars → 960/1,000 × $0.15 = $0.144000
STT: 0.5 × $0.0077 = $0.003850
Per minute = $0.148594 (TTS is 96.9% of it)
Monthly: 200 calls × 3 min × 30 days = 18,000 min
18,000 × $0.148594 = $2,674.69/mo ≈ Rs 802,408

Phone support agent

GPT-4o + Deepgram + Cartesia Sonic + Twilio + Vapi

Inputs: T=5, S=1,200, O=70, stt_fraction=0.5
LLM in: 5×1,200 = 6,000 → 6,000/1e6 × $2.50 = $0.015000
LLM out: 5×70 = 350 → 350/1e6 × $10 = $0.003500
TTS: 350×4 = 1,400 chars → 1.4 × $0.035 = $0.049000
STT: 0.5 × $0.0077 = $0.003850
Telephony $0.014000 + Platform $0.050000
Per minute = $0.135350 (telephony + platform = 47.3%)
Monthly: 500 × 4 × 30 = 60,000 min × $0.135350 = $8,121.00/mo

Edge case — swapping the voice

Same bot as Example A, but Cartesia Sonic instead of ElevenLabs

TTS rate falls from $0.15 to $0.035 per 1,000 chars
TTS: 960/1,000 × $0.035 = $0.033600 (was $0.144000)
Per minute = $0.000744 + $0.033600 + $0.003850 = $0.038194
Monthly: 18,000 min × $0.038194 = $687.49/mo
Swapping the voice alone cuts the bill by ~74%.

Frequently asked questions

Sources & references

Every per-unit price was last cross-checked against the vendor pages above on 2026-06-09. Pricing is re-verified quarterly. List prices only — committed-use and volume discounts are not modelled, so treat the totals as an upper bound for budgeting.

Related tools

LiveAI

AI TTS Cost Calculator

Compare AI text-to-speech costs in USD and LKR across OpenAI TTS, ElevenLabs, Google Cloud TTS, Azure Neural, Amazon Polly, Deepgram Aura, Cartesia Sonic, and PlayHT. Enter characters or audio minutes per month and a quality tier; the cheapest supported provider is highlighted, free-tier allowances applied, and ElevenLabs auto-upgrades to the cheapest plan that covers your volume.

Open tool

LiveAI

Realtime Voice API Cost

Estimate the per-session, monthly and annual cost of a speech-to-speech voice agent on the OpenAI Realtime API or Gemini Live API. Models all four billed token classes — audio in, cached audio, audio out and text — in USD and LKR. Runs in your browser, no signup.

Open tool

LiveAI

AI Video Cost Calculator

Compare the cost of generating AI video across Sora 2, Veo 3, Runway Gen-4, Kling, Luma, and Pika — per-second, per-clip, and monthly totals in USD and LKR, every price sourced.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a price that has moved, a bug, or an edge case?

Email me at [email protected] — most fixes ship within 24 hours.