induwara.lkinduwara.lk
induwara.lkAI · Cost calculator

RAG Cost Calculator

Price a complete Retrieval-Augmented Generation pipeline — document indexing, vector storage, per-query retrieval, and LLM answer generation — in USD and LKR. Plug in your knowledge-base size, query volume, and chosen models to see the one-time and monthly cost, and exactly which line dominates the bill.

By Induwara AshinsanaUpdated Jun 9, 2026
RAG pipeline cost
Knowledge base (one-time indexing)

Whole number, 1 or more.

≈ 750 words per 1,000 tokens. PDF page ≈ 500–600 tokens.

Tokens per stored chunk. 50–8,000.

Overlap between chunks. Must be below chunk size.

Per-million-token prices from the vendor pricing page.

0 = index once, never re-embed. 1 = monthly refresh.

Query workload (recurring)

Total questions answered per month.

Typical short user question: 20–80 tokens.

Chunks fed to the model per query. 1–50.

Instructions prepended to every generation.

Per-million-token prices from the vendor pricing page.

Length of each generated answer. 1–8,000.

$

Default is Pinecone serverless ($0.33/GB-month). Edit for Qdrant, Weaviate, or pgvector.

Rs

CBSL daily indicative rate. Edit to match your bank or Wise.

Workload presets
Monthly cost
$5.8882
Rs 1,766
First month (incl. indexing)
$5.9082
Rs 1,772
One-time indexing
$0.02
2,223 chunks
Cost per query
$0.0006
2,720 input tokens

What drives the monthly bill

LLM generation
$5.8899.86%
Vector storage
$0.00420.07%
Query embedding
$0.0040.07%
Re-indexing
$0.000%

Derived stats

Total tokens indexed
1,000,000
Chunks stored
2,223
Vector storage
0.0127 GB
Retrieved context / query
2,500 tok
Generation input / query
2,720 tok
Monthly storage cost
$0.0042
All math runs in your browser. No documents, queries, or API keys leave the page.

How it works

A RAG pipeline has four cost centres, and this calculator prices each one separately so you can see where your money actually goes. Most single-purpose calculators price only indexing or only storage; the recurring per-query generation cost — the line that dominates a real bill — gets left out. The math is intentionally plain, and every per-token and per-GB rate comes from the vendor pricing pages cited at the bottom of this page, hand-verified on 2026-06-09.

total_tokens     = documents × avg_tokens_per_doc
step             = max(1, chunk_size − overlap)
chunks           = ceil(total_tokens / step)

index_cost       = total_tokens / 1e6 × embed_price_per_M      (one-time)
storage_bytes    = chunks × dimensions × 4                     (float32)
storage_gb       = storage_bytes / 1024³
monthly_storage  = storage_gb × storage_price_per_GB_month

context_tokens   = top_k × chunk_size
gen_input        = system_prompt + query_tokens + context_tokens
gen_cost/query   = gen_input / 1e6 × gen_in_price
                 + output_tokens / 1e6 × gen_out_price
query_embed/query= query_tokens / 1e6 × embed_price_per_M
per_query        = query_embed/query + gen_cost/query

monthly_total    = queries/mo × per_query
                 + monthly_storage
                 + index_cost × reindex_per_month
first_month      = monthly_total + (reindex = 0 ? index_cost : 0)
lkr              = usd × usd_to_lkr_rate

The one-time indexing cost embeds your whole corpus once. Storageis computed from the real vector size — chunk count × the embedding model's dimensions × 4 bytes for float32 — so a 3,072-dimension model like text-embedding-3-large costs twice the storage of a 1,536-dimension model. Storage GB uses binary GiB (1024³ bytes), which slightly overestimates against the decimal GB some clouds bill, erring toward a safer number.

The recurring generation line is where RAG bills live. Every query sends the system prompt, the question, and all top_k retrieved chunks to the LLM as input tokens, then bills the answer as output tokens. Because chat-model rates run many times the embedding rate and apply on every single query, generation routinely exceeds 95% of the monthly total — which is why raising top-k or the chunk size is the fastest way to push the bill up. The calculator's breakdown bar shows the exact split.

Chunk counting uses ceil(total_tokens / step), which counts every sliding window across the corpus. A real text splitter such as LangChain produces an equal or slightly smaller count once overlap is large, so this figure is a conservative upper bound on stored chunks. The page cross-checks the two formulas and confirms they agree exactly at zero overlap: chunk-count cross-check passes.

Worked-example self-test (computed live on this page) — each line reconciles the formula above with the hand-derived numbers in the code header:

  • A · Small KB · 3-small · GPT-4o-mini · monthly total → expected $5.89, got $5.89
  • A · Small KB · first month incl. one-time index → expected $5.91, got $5.91
  • B · Larger KB · 3-large · Claude Haiku 4.5 · monthly total → expected $276.77, got $276.77
  • C · Zero docs, zero queries · first month → expected $0.00, got $0.00
  • E · 1e9 tokens · no queries · first month incl. $20 index → expected $21.89, got $21.89

Worked examples

Chat-with-PDFs bot · 1,000 docs × 1,000 tokens · 10,000 queries/mo

OpenAI text-embedding-3-small + GPT-4o-mini. Chunk 500, overlap 0, top-k 5. Rs 300/USD.

  1. Total tokens: 1,000 × 1,000 = 1,000,000
  2. Chunks: ceil(1,000,000 / 500) = 2,000
  3. Indexing: (1,000,000/1e6) × $0.02 = $0.02 one-time
  4. Storage: 2,000 × 1,536 × 4 = 12.29 MB → 0.01144 GB × $0.33 = $0.0038/mo
  5. Gen input/query: 200 + 20 + (5 × 500) = 2,720 tokens
  6. Gen cost/query: 2,720/1e6 × $0.15 + 300/1e6 × $0.60 = $0.000588
  7. Monthly queries: 10,000 × $0.000588 = $5.88
  8. Monthly total: $5.88 + $0.0038 ≈ $5.89 (≈ Rs 1,766) + $0.02 one-time

Support bot · 10,000 docs × 800 tokens · 50,000 queries/mo

text-embedding-3-large + Claude Haiku 4.5. Chunk 400, overlap 0, top-k 8. Rs 300/USD.

  1. Total tokens: 10,000 × 800 = 8,000,000
  2. Chunks: ceil(8,000,000 / 400) = 20,000
  3. Indexing: (8,000,000/1e6) × $0.13 = $1.04 one-time
  4. Storage: 20,000 × 3,072 × 4 = 245.76 MB → 0.2289 GB × $0.33 = $0.0755/mo
  5. Gen input/query: 300 + 30 + (8 × 400) = 3,530 tokens
  6. Gen cost/query: 3,530/1e6 × $1.00 + 400/1e6 × $5.00 = $0.00553
  7. Monthly queries: 50,000 × $0.005534 = $276.70
  8. Monthly total: $276.70 + $0.0755 ≈ $276.77 (≈ Rs 83,031) + $1.04 one-time

Edge case · top-k turned up from 5 to 10

Same chat-with-PDFs bot, but retrieving 10 chunks instead of 5. Shows why retrieval depth is the cost lever.

  1. Context tokens jump: 5 × 500 = 2,500 → 10 × 500 = 5,000
  2. Gen input/query: 200 + 20 + 5,000 = 5,220 tokens (was 2,720)
  3. Gen cost/query: 5,220/1e6 × $0.15 + 300/1e6 × $0.60 = $0.000963
  4. Monthly queries: 10,000 × $0.000963 = $9.63 (was $5.88)
  5. Lesson: doubling top-k raised the monthly bill ~64%, all from input tokens.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a price that has moved, or an edge case the calculator doesn't cover?

Email me at [email protected] — most fixes ship within 24 hours.