AI Voice Agent Cost Calculator
Work out what a cascaded voice AI agent really costs per minute and per month. Chain a speech-to-text engine, an LLM, a text-to-speech voice, telephony and a platform fee, and see which layer dominates — in USD and rupees, no signup.
How it works
A cascaded (pipeline) voice agent is built by chaining four services per turn: a speech-to-text model transcribes the caller, an LLM decides what to say, a text-to-speech voice speaks the reply, and optional telephony carries the call over the phone network. An orchestration platform such as Vapi or Retell can glue them together for a per-minute fee. This calculator prices each layer for one minute of live conversation, then scales to your monthly volume. All maths is deterministic and the prices are pinned, vendor-published constants — no live API calls.
The per-minute model uses your turns per minute (T), input tokens per turn (S) and output tokens per turn (O):
- LLM:
(T·S / 1e6)·P_in + (T·O / 1e6)·P_out, where P_in and P_out are the model's input and output prices per 1M tokens. Output is billed higher than input, which is why the two are shown separately. - TTS: the agent voices what the LLM generates, so characters per minute =
T·O × 4(1 token ≈ 4 English characters, per the OpenAI Help Center). Cost =(chars / 1000) × P_tts. - STT:
stt_fraction × P_stt, where P_stt is the dollars-per-minute audio rate and stt_fraction is the share of the minute the caller is actually speaking. A half-duplex agent only streams user audio, so 0.5 is a sensible default. - Telephony and platform: flat per-minute fees (Twilio for carriage, Vapi/Retell for orchestration), each zero when not used.
- Per-minute total is the sum of all five layers. Multiply by
calls/day × avg minutes × days/monthfor the monthly cost, then by the USD→LKR rate for the rupee figure.
The result is cross-checked two ways: the monthly total computed as per-minute × monthly minutes must equal per-call × calls-per-month to the cent, and the two worked examples below are re-derived in code at build time and fail the build on any mismatch. For native speech-to-speech agents billed on audio tokens, use the Realtime Voice API Cost Calculator instead — it prices a fundamentally different billing model.
Worked examples
Frequently asked questions
Sources & references
- OpenAI — API pricing (LLM, Whisper STT, TTS)
- Anthropic — Claude API pricing
- Google — Gemini API pricing
- Deepgram — speech-to-text pricing
- ElevenLabs — text-to-speech pricing
- Cartesia — Sonic text-to-speech pricing
- Twilio — Programmable Voice pricing
- Vapi — orchestration platform pricing
- Retell AI — platform pricing
- OpenAI Help — tokens (1 token ≈ 4 characters)
- Central Bank of Sri Lanka — daily indicative USD→LKR rate
Every per-unit price was last cross-checked against the vendor pages above on 2026-06-09. Pricing is re-verified quarterly. List prices only — committed-use and volume discounts are not modelled, so treat the totals as an upper bound for budgeting.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a price that has moved, a bug, or an edge case?
Email me at [email protected] — most fixes ship within 24 hours.