How many tokens is one minute of audio?

On Gemini, exactly 1,920 audio tokens — Google counts audio at a fixed 32 tokens per second, so one minute is 32 × 60 = 1,920 tokens regardless of bitrate or format. OpenAI does not publish a fixed rate; its audio models return the exact count in the API usage object, roughly 1,500 tokens per minute in practice.

How much does GPT-4o audio cost per minute?

GPT-4o-audio-preview bills audio input at about $40 per 1M tokens. At roughly 1,500 audio tokens per minute that is near $0.06 per minute for the audio alone, before the text prompt and the generated output. GPT-4o-mini-audio is about a quarter of that at $10 per 1M audio input tokens.

How does Gemini charge for audio input?

Gemini converts audio to tokens at a fixed 32 tokens per second, then bills those at the model's audio input rate — $1.00 per 1M on Gemini 2.5 Flash. So a 3-minute clip is 180 × 32 = 5,760 tokens, costing 5,760 ÷ 1,000,000 × $1.00 ≈ $0.0058 for the audio. The text prompt and output are billed separately.

Is sending audio to an LLM cheaper than transcribing first with Whisper?

It depends on length and how often you reuse the transcript. Direct-to-LLM audio avoids a separate transcription step but bills dense audio tokens every request. If you ask several questions about the same recording, transcribing once and sending cheap text tokens usually wins. Our transcription cost calculator covers the Whisper side of that comparison.

How many audio tokens does a 10-minute meeting recording use?

On Gemini, 600 seconds × 32 = 19,200 audio tokens. At Gemini 2.5 Flash's $1.00 per 1M audio input rate that is about $0.0192 for the audio, plus output tokens for the summary. Doubling the meeting length doubles the audio-token line, because the count is linear in duration.

Does this include audio output (the model speaking back)?

No. This tool prices audio as input plus a text output (a summary or answer). Spoken audio replies are billed at a separate, higher audio output rate and are covered by our text-to-speech cost calculator. Here the Output tokens field is treated as text output.

Which exchange rate should I use for the LKR figures?

The calculator defaults to 305 LKR per USD, near the Central Bank of Sri Lanka indicative band in mid-2026, and the field is editable. For real billing, use the rate your card issuer applies to foreign API charges, which usually carries a small margin over the indicative rate.

Are these the current provider prices?

The audio, text, and output rates were last cross-checked against the OpenAI and Google pricing pages on 2026-06-30. Providers revise rates periodically — treat the dollar figures as a close estimate and confirm against your latest invoice. The 32 tokens/second count rule is stable and independent of pricing.

AI · Developer

AI Audio Token & Cost Calculator

Enter how long your audio is and see how many audio tokensGPT-4o-audio and Gemini bill for it, then the cost per request and per month in USD and LKR, side by side. Gemini's 32-tokens-per-second rule is taken straight from Google's docs. Everything runs in your browser.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 30, 2026

Audio token & cost

Audio duration

Length of the clip you'll send to the model (max 10 hours).

Unit

Quick presets

Text prompt tokens

Your instruction text, e.g. “summarise this”.

Output tokens

Expected text the model writes back.

Requests / month

How many clips you'll send monthly.

USD → LKR rate

CBSL indicative rate. Edit to match your bank.

Models to compare

Cheapest cost

$0.007025

Gemini 2.5 Flash · Rs 2

Audio tokens / request

5,760

on Gemini 2.5 Flash

Requests / month

single request

Model	Audio tokens	Per request	Monthly	Monthly LKR
Gemini 2.5 Flash Cheapest Google · $1.00/1M audio in	5,760	$0.007025	$0.007025	Rs 2
Gemini 2.5 Pro Google · $1.25/1M audio in	5,760	$0.0123	$0.0123	Rs 4
GPT-4o mini audio≈ OpenAI · $10.00/1M audio in	4,500	$0.0453	$0.0453	Rs 14
GPT-4o audio≈ OpenAI · $40.00/1M audio in	4,500	$0.1851	$0.1851	Rs 56

Per request = audio input + text prompt + output, each priced at the model's own per-1M rate. Sorted cheapest first.

Rows marked ≈ use an estimated audio-tokens-per-second rate for OpenAI, which doesn't publish a fixed figure. Switch to By known token count and paste the exact audio_tokens from your usage response for precise OpenAI costs.

All math runs in your browser. Nothing is uploaded — duration and token counts are just numbers.

How it works

A multimodal model doesn't bill audio by the megabyte — it converts the sound into tokens, the same unit it charges for text, and prices them at its audio input rate. The token count depends only on the clip's duration, not its bitrate, sample rate, or file format, so a 3-minute voice note costs the same whether it's a 16 kHz WhatsApp recording or a studio WAV.

Gemini — the documented anchor.Google's token-counting docs state that audio is counted at a fixed 32 tokens per second. So audioTokens = durationSeconds × 32. One minute is 1,920 tokens; a 3-minute clip is 5,760. This rule is exact and does not change when pricing changes, which is why it's the backbone of this calculator.

OpenAI — an estimate, plus an exact path.OpenAI bills audio as distinct audio tokens but does not publish a fixed tokens-per-second figure. The tool uses an estimate of about 25 tokens per second for the GPT-4o audio models, derived from OpenAI's per-minute audio pricing, and marks those rows with a ≈. For an exact figure, OpenAI returns input_token_details.audio_tokens in the usage object of every response — paste that into the By known token count mode for precise costs.

Cost.Each request has up to three billed parts, each priced at the model's own per-1M rate:

audioInput = audioTokens / 1,000,000 × audioInputRate
textInput = textPromptTokens / 1,000,000 × textInputRate
output = outputTokens / 1,000,000 × outputRate

The per-request total is the sum of those three; the monthly figure multiplies it by your requests per month, and the LKR column applies your exchange rate. As a cross-check, the audio line can also be read per minute — 32 × 60 / 1,000,000 × audioInputRate — and the two derivations agree to the cent, which the build verifies on every deploy. Output here means generated text (a summary or answer); spoken audio replies use a separate rate and are out of scope.

Worked examples

3-minute WhatsApp voice note → Gemini 2.5 Flash summary

Duration: 3 min = 180 s → audioTokens = 180 × 32 = 5,760
Audio input: 5,760 / 1e6 × $1.00 = $0.005760
Text prompt (50 tokens): 50 / 1e6 × $0.30 = $0.000015
Output (500 tokens): 500 / 1e6 × $2.50 = $0.001250
Per request = $0.007025 (≈ Rs 2.14 @305)
At 500 notes/month: 0.007025 × 500 = $3.51 (≈ Rs 1,072)

10-minute meeting recording, one request — Gemini 2.5 Flash

Duration: 10 min = 600 s → audioTokens = 600 × 32 = 19,200
Audio input: 19,200 / 1e6 × $1.00 = $0.019200
Output (800 tokens, no text prompt): 800 / 1e6 × $2.50 = $0.002000
Per request = $0.021200 (≈ Rs 6.47 @305)
Reconcile: 19,200 ÷ 32 = 600 s = 10 min ✓ — the audio line is linear in length

Edge case — a 12-hour archive clamps to the 10-hour cap

Requested duration: 12 h = 43,200 s, above the 36,000 s (10 h) limit
Clamped to 36,000 s → audioTokens = 36,000 × 32 = 1,152,000 (Gemini)
Audio input on Gemini 2.5 Flash: 1,152,000 / 1e6 × $1.00 = $1.152
Split anything longer into chunks so no single request exceeds the cap.

Frequently asked questions

Sources & references

The 32 tokens/second rule and the per-1M rates were last cross-checked against these sources on 2026-06-30. The count rule is stable; pricing is revised periodically, so confirm dollar figures against your latest invoice.

Related tools

LiveAI

AI Max Output Tokens

Look up the maximum output (completion) tokens for every current LLM — Claude, GPT-4o, Gemini, Llama and more — and check whether your desired response fits in a single API call or needs chunking. Per-model caps cited from vendor docs, separate from the context window.

Open tool

LiveAI

AI Vision Token Calculator

Calculate how many tokens an image costs on GPT-4o, GPT-4o mini, Claude, and Gemini from its pixel dimensions — plus the per-image and total cost in USD and LKR, side by side. Runs entirely in your browser; the image is never uploaded.

Open tool

LiveAI

AI Rate Limit Calculator

Computes whether an LLM workload will hit OpenAI, Anthropic, or Gemini rate limits — effective max requests/min, which limit binds (RPM vs TPM/ITPM/OTPM/RPD), and batch wall-clock time.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want another model added?

Email me at [email protected] — most fixes ship within 24 hours.

How it works

Worked examples

Frequently asked questions

How many tokens is one minute of audio?

How much does GPT-4o audio cost per minute?

How does Gemini charge for audio input?

Is sending audio to an LLM cheaper than transcribing first with Whisper?

How many audio tokens does a 10-minute meeting recording use?

Why does the tool show a ≈ next to the OpenAI models?

Does this include audio output (the model speaking back)?

Which exchange rate should I use for the LKR figures?

Are these the current provider prices?

Sources & references

Related tools

AI Max Output Tokens

AI Vision Token Calculator

AI Rate Limit Calculator

Comments & feedback