induwara.lk
induwara.lkAI · Cost

AI Voice Agent Cost Calculator

Work out what a cascaded voice AI agent really costs per minute and per month. Chain a speech-to-text engine, an LLM, a text-to-speech voice, telephony and a platform fee, and see which layer dominates — in USD and rupees, no signup.

By Induwara AshinsanaUpdated Jun 9, 2026
Cascaded voice agent cost

Transcribes the caller

Input / output per 1M tokens

The agent's voice, $/1k chars

PSTN carriage, per minute

Vapi / Retell fee, per minute

Currency

Agent replies per minute (1–20)

System + history re-sent each turn

What the agent says each turn

Share of the minute the caller speaks

Conversations handled per day

Average minutes per conversation

Operating days (1–31)

Try a stack
Cost per minute
$0.1486
4 turns/min
Cost per call
$0.4458
3 min average
Monthly minutes
18,000
200 calls/day × 30 days
Monthly cost
$2,675
Rs 802,408

TTS — ElevenLabs Flash is 96.91% of your per-minute cost — the single biggest layer to optimise.

Component breakdown

ComponentPer minuteSharePer month
STT — Deepgram Nova-3$0.0039
2.59%
$69.30
LLM input — GPT-4o mini$0.0006
0.4%
$10.80
LLM output — GPT-4o mini$0.0001
0.1%
$2.59
TTS — ElevenLabs Flash$0.1440
96.91%
$2,592
Telephony — None (web / SIP)$0.0000
0%
$0.00
Platform — Self-hosted (none)$0.0000
0%
$0.00
Blended total$0.1486100%$2,675

TTS characters/minute = 960 (output tokens × 4). Monthly minutes = 18,000.

Swap the voice (TTS)

TTS is usually the cost driver. Same workload, every voice — cheapest first.

OptionPer minuteMonthlyvs cheapest
OpenAI TTS$0.0190$341.89
Cartesia Sonic$0.0382$687.49+$345.60
ElevenLabs Flashselected$0.1486$2,675+$2,333

Swap the brain (LLM)

Same workload re-priced across every LLM. The LLM is rarely the dominant cost.

OptionPer minuteMonthlyvs cheapest
GPT-4o miniselected$0.1486$2,675
Gemini 2.5 Flash$0.1496$2,694+$19.01
Claude Haiku 4.5$0.1531$2,755+$80.21
GPT-4o$0.1602$2,884+$209.81
Claude Sonnet 4.5$0.1634$2,942+$267.41

Sources cited: OpenAI, Anthropic, Google Gemini, Deepgram, ElevenLabs, Cartesia, Twilio, Vapi and Retell pricing pages (transcribed and dated 2026-06-09), the OpenAI Help Center token-to-character ratio, and the CBSL indicative USD→LKR rate. Full links are in the Sources & references section below. List prices only — committed-use and volume discounts are not modelled, so treat the totals as an upper bound.

How it works

A cascaded (pipeline) voice agent is built by chaining four services per turn: a speech-to-text model transcribes the caller, an LLM decides what to say, a text-to-speech voice speaks the reply, and optional telephony carries the call over the phone network. An orchestration platform such as Vapi or Retell can glue them together for a per-minute fee. This calculator prices each layer for one minute of live conversation, then scales to your monthly volume. All maths is deterministic and the prices are pinned, vendor-published constants — no live API calls.

The per-minute model uses your turns per minute (T), input tokens per turn (S) and output tokens per turn (O):

  1. LLM: (T·S / 1e6)·P_in + (T·O / 1e6)·P_out, where P_in and P_out are the model's input and output prices per 1M tokens. Output is billed higher than input, which is why the two are shown separately.
  2. TTS: the agent voices what the LLM generates, so characters per minute = T·O × 4 (1 token ≈ 4 English characters, per the OpenAI Help Center). Cost = (chars / 1000) × P_tts.
  3. STT: stt_fraction × P_stt, where P_stt is the dollars-per-minute audio rate and stt_fraction is the share of the minute the caller is actually speaking. A half-duplex agent only streams user audio, so 0.5 is a sensible default.
  4. Telephony and platform: flat per-minute fees (Twilio for carriage, Vapi/Retell for orchestration), each zero when not used.
  5. Per-minute total is the sum of all five layers. Multiply by calls/day × avg minutes × days/month for the monthly cost, then by the USD→LKR rate for the rupee figure.

The result is cross-checked two ways: the monthly total computed as per-minute × monthly minutes must equal per-call × calls-per-month to the cent, and the two worked examples below are re-derived in code at build time and fail the build on any mismatch. For native speech-to-speech agents billed on audio tokens, use the Realtime Voice API Cost Calculator instead — it prices a fundamentally different billing model.

Worked examples

Web reservation bot

GPT-4o mini + Deepgram + ElevenLabs Flash · self-hosted

  1. Inputs: T=4 turns/min, S=1,000, O=60, stt_fraction=0.5
  2. LLM in: 4×1,000 = 4,000 tok → 4,000/1e6 × $0.15 = $0.000600
  3. LLM out: 4×60 = 240 tok → 240/1e6 × $0.60 = $0.000144
  4. TTS: 240×4 = 960 chars → 960/1,000 × $0.15 = $0.144000
  5. STT: 0.5 × $0.0077 = $0.003850
  6. Per minute = $0.148594 (TTS is 96.9% of it)
  7. Monthly: 200 calls × 3 min × 30 days = 18,000 min
  8. 18,000 × $0.148594 = $2,674.69/mo ≈ Rs 802,408

Phone support agent

GPT-4o + Deepgram + Cartesia Sonic + Twilio + Vapi

  1. Inputs: T=5, S=1,200, O=70, stt_fraction=0.5
  2. LLM in: 5×1,200 = 6,000 → 6,000/1e6 × $2.50 = $0.015000
  3. LLM out: 5×70 = 350 → 350/1e6 × $10 = $0.003500
  4. TTS: 350×4 = 1,400 chars → 1.4 × $0.035 = $0.049000
  5. STT: 0.5 × $0.0077 = $0.003850
  6. Telephony $0.014000 + Platform $0.050000
  7. Per minute = $0.135350 (telephony + platform = 47.3%)
  8. Monthly: 500 × 4 × 30 = 60,000 min × $0.135350 = $8,121.00/mo

Edge case — swapping the voice

Same bot as Example A, but Cartesia Sonic instead of ElevenLabs

  1. TTS rate falls from $0.15 to $0.035 per 1,000 chars
  2. TTS: 960/1,000 × $0.035 = $0.033600 (was $0.144000)
  3. Per minute = $0.000744 + $0.033600 + $0.003850 = $0.038194
  4. Monthly: 18,000 min × $0.038194 = $687.49/mo
  5. Swapping the voice alone cuts the bill by ~74%.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a price that has moved, a bug, or an edge case?

Email me at [email protected] — most fixes ship within 24 hours.