induwara.lk
induwara.lkAI · Developer cost

Realtime Voice API Cost Calculator — OpenAI & Gemini Live

Price a speech-to-speech voice agent on the OpenAI Realtime API or Gemini Live API. Models the four billed token classes — audio in, cached audio, audio out, text — and projects per-session, monthly and annual cost in USD and LKR. No signup, rates cited.

By Induwara AshinsanaUpdated Jun 7, 2026
Realtime voice API cost

How long the user speaks per session.

How long the assistant speaks back per session.

0%

The fraction of audio input the provider serves from cache, billed at the much cheaper cached rate (OpenAI ≈ 90% off). Gemini Live publishes no cached-audio rate — that share is billed at the full audio-input rate, flagged in the breakdown.

System prompt & instruction tokens.

Calls or conversations per month.

Rs

CBSL indicative. Edit to match your bank.

Workload presets
Per session
$0.2152
Rs 65
Monthly (1,000 sessions)
$215.20
Rs 64,560
Annual projection
$2,582.40
Rs 774,720

Per-session cost breakdown

Token classTokens$/1MLine cost
Audio input
1,800$32.00$0.0576
Cached audio input
0$0.40$0.00
Audio output
2,400$64.00$0.1536
Text input
1,000$4.00$0.004
Per-session total$0.2152

Same workload, every model — monthly USD

Gemini 2.5 Flash (Live API, native audio) Cheapest
$50.00/mo · Rs 15,000
OpenAI gpt-4o-mini-realtime
$66.60/mo · Rs 19,980
OpenAI gpt-realtime
$215.20/mo · Rs 64,560
All math runs in your browser. No usage data or API key leaves the page.

Sources cited

Audio tokenisation: OpenAI bills 1 token per 100 ms of input audio and 1 token per 50 ms of output audio; Gemini bills 25 tokens per second. Minutes-mode figures use these documented constants and are estimates — switch to token mode for cent-exact costs from your usage dashboard.

How it works

A normal LLM API bill has two lines: input tokens and output tokens. A speech-to-speech agent on the Realtime API is different — the provider meters four token classes, each on its own price tier:

  • Audio input — the user's speech, tokenised.
  • Cached audio input — re-sent context served from cache at a steep discount.
  • Audio output — the model's spoken reply, the most expensive class.
  • Text input — system prompt and instruction tokens.

The total per session is the sum of each class priced per 1,000,000 tokens:

cost = audioIn/1e6·rateIn + cachedIn/1e6·rateCached + audioOut/1e6·rateOut + textIn/1e6·rateText

When you enter minutes, audio tokens are derived first using each provider's documented tokenisation rate. OpenAI bills 1 token per 100 ms of input audio (600 tokens/minute) and 1 token per 50 ms of output audio (1,200 tokens/minute); Gemini bills 25 tokens per second (1,500 tokens/minute). Cached audio is carved out of audio input by the cached-share slider: cachedIn = audioIn × share, and the remainder is billed at the full input rate.

Monthly and annual figures are linear: monthly = perSession × sessions and annual = monthly × 12. LKR amounts multiply the USD result by your editable USD→LKR rate. Every per-token rate is pinned from the official OpenAI and Google pricing pages and carries a last-verified date (2026-06-07); nothing is fetched at runtime. To check the math, the page derives audio cost a second way — straight from the per-minute unit rate — and confirms it matches the token pipeline to the cent.

Worked examples

gpt-realtime · one support call (token mode)

Audio in 50,000 · cached 10,000 · audio out 40,000 · text 2,000

  1. Audio in: 50,000 / 1e6 × $32.00 = $1.6000
  2. Cached in: 10,000 / 1e6 × $0.40 = $0.0040
  3. Audio out: 40,000 / 1e6 × $64.00 = $2.5600
  4. Text in: 2,000 / 1e6 × $4.00 = $0.0080
  5. Per session = $4.1720 → at Rs 300/USD = Rs 1,251.60

gpt-4o-mini-realtime · 1,000 short queries / month

Per query: audio in 1,500 · audio out 1,200 · text 500 · no cache

  1. Audio in: 1,500 / 1e6 × $10.00 = $0.01500
  2. Audio out: 1,200 / 1e6 × $20.00 = $0.02400
  3. Text in: 500 / 1e6 × $0.60 = $0.00030
  4. Per session = $0.03930 → × 1,000 = $39.30/month
  5. At Rs 300/USD: Rs 11,790/month · annual $471.60 / Rs 141,480

gpt-realtime · minutes mode (cross-check)

3 min audio in + 2 min audio out, no cache, no text

  1. Audio in: 3 × 600 = 1,800 tok → 1,800 / 1e6 × $32 = $0.0576
  2. Audio out: 2 × 1,200 = 2,400 tok → 2,400 / 1e6 × $64 = $0.1536
  3. Per session = $0.2112
  4. Unit-rate check: $0.0192/min × 3 + $0.0768/min × 2 = $0.2112 ✓

Frequently asked questions

Sources & references

Per-token rates were last cross-checked against the official OpenAI and Google pricing pages on 2026-06-07. AI prices change often; the rates are reviewed quarterly and after any provider pricing update.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.