induwara.lk
induwara.lkAI · Developer

AI Audio Token & Cost Calculator

Enter how long your audio is and see how many audio tokensGPT-4o-audio and Gemini bill for it, then the cost per request and per month in USD and LKR, side by side. Gemini's 32-tokens-per-second rule is taken straight from Google's docs. Everything runs in your browser.

By Induwara AshinsanaUpdated Jun 30, 2026
Audio token & cost

Length of the clip you'll send to the model (max 10 hours).

Unit
Quick presets

Your instruction text, e.g. “summarise this”.

Expected text the model writes back.

How many clips you'll send monthly.

Rs

CBSL indicative rate. Edit to match your bank.

Cheapest cost
$0.007025
Gemini 2.5 Flash · Rs 2
Audio tokens / request
5,760
on Gemini 2.5 Flash
Requests / month
1
single request
ModelAudio tokensPer requestMonthlyMonthly LKR
Gemini 2.5 Flash Cheapest
Google · $1.00/1M audio in
5,760$0.007025$0.007025Rs 2
Gemini 2.5 Pro
Google · $1.25/1M audio in
5,760$0.0123$0.0123Rs 4
GPT-4o mini audio
OpenAI · $10.00/1M audio in
4,500$0.0453$0.0453Rs 14
GPT-4o audio
OpenAI · $40.00/1M audio in
4,500$0.1851$0.1851Rs 56
Per request = audio input + text prompt + output, each priced at the model's own per-1M rate. Sorted cheapest first.

Rows marked ≈ use an estimated audio-tokens-per-second rate for OpenAI, which doesn't publish a fixed figure. Switch to By known token count and paste the exact audio_tokens from your usage response for precise OpenAI costs.

All math runs in your browser. Nothing is uploaded — duration and token counts are just numbers.

How it works

A multimodal model doesn't bill audio by the megabyte — it converts the sound into tokens, the same unit it charges for text, and prices them at its audio input rate. The token count depends only on the clip's duration, not its bitrate, sample rate, or file format, so a 3-minute voice note costs the same whether it's a 16 kHz WhatsApp recording or a studio WAV.

Gemini — the documented anchor.Google's token-counting docs state that audio is counted at a fixed 32 tokens per second. So audioTokens = durationSeconds × 32. One minute is 1,920 tokens; a 3-minute clip is 5,760. This rule is exact and does not change when pricing changes, which is why it's the backbone of this calculator.

OpenAI — an estimate, plus an exact path.OpenAI bills audio as distinct audio tokens but does not publish a fixed tokens-per-second figure. The tool uses an estimate of about 25 tokens per second for the GPT-4o audio models, derived from OpenAI's per-minute audio pricing, and marks those rows with a ≈. For an exact figure, OpenAI returns input_token_details.audio_tokens in the usage object of every response — paste that into the By known token count mode for precise costs.

Cost.Each request has up to three billed parts, each priced at the model's own per-1M rate:

  • audioInput = audioTokens / 1,000,000 × audioInputRate
  • textInput = textPromptTokens / 1,000,000 × textInputRate
  • output = outputTokens / 1,000,000 × outputRate

The per-request total is the sum of those three; the monthly figure multiplies it by your requests per month, and the LKR column applies your exchange rate. As a cross-check, the audio line can also be read per minute — 32 × 60 / 1,000,000 × audioInputRate — and the two derivations agree to the cent, which the build verifies on every deploy. Output here means generated text (a summary or answer); spoken audio replies use a separate rate and are out of scope.

Worked examples

3-minute WhatsApp voice note → Gemini 2.5 Flash summary

  1. Duration: 3 min = 180 s → audioTokens = 180 × 32 = 5,760
  2. Audio input: 5,760 / 1e6 × $1.00 = $0.005760
  3. Text prompt (50 tokens): 50 / 1e6 × $0.30 = $0.000015
  4. Output (500 tokens): 500 / 1e6 × $2.50 = $0.001250
  5. Per request = $0.007025 (≈ Rs 2.14 @305)
  6. At 500 notes/month: 0.007025 × 500 = $3.51 (≈ Rs 1,072)

10-minute meeting recording, one request — Gemini 2.5 Flash

  1. Duration: 10 min = 600 s → audioTokens = 600 × 32 = 19,200
  2. Audio input: 19,200 / 1e6 × $1.00 = $0.019200
  3. Output (800 tokens, no text prompt): 800 / 1e6 × $2.50 = $0.002000
  4. Per request = $0.021200 (≈ Rs 6.47 @305)
  5. Reconcile: 19,200 ÷ 32 = 600 s = 10 min ✓ — the audio line is linear in length

Edge case — a 12-hour archive clamps to the 10-hour cap

  1. Requested duration: 12 h = 43,200 s, above the 36,000 s (10 h) limit
  2. Clamped to 36,000 s → audioTokens = 36,000 × 32 = 1,152,000 (Gemini)
  3. Audio input on Gemini 2.5 Flash: 1,152,000 / 1e6 × $1.00 = $1.152
  4. Split anything longer into chunks so no single request exceeds the cap.

Frequently asked questions

Sources & references

The 32 tokens/second rule and the per-1M rates were last cross-checked against these sources on 2026-06-30. The count rule is stable; pricing is revised periodically, so confirm dollar figures against your latest invoice.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want another model added?

Email me at [email protected] — most fixes ship within 24 hours.