RAG Cost Calculator
Price a complete Retrieval-Augmented Generation pipeline — document indexing, vector storage, per-query retrieval, and LLM answer generation — in USD and LKR. Plug in your knowledge-base size, query volume, and chosen models to see the one-time and monthly cost, and exactly which line dominates the bill.
How it works
A RAG pipeline has four cost centres, and this calculator prices each one separately so you can see where your money actually goes. Most single-purpose calculators price only indexing or only storage; the recurring per-query generation cost — the line that dominates a real bill — gets left out. The math is intentionally plain, and every per-token and per-GB rate comes from the vendor pricing pages cited at the bottom of this page, hand-verified on 2026-06-09.
total_tokens = documents × avg_tokens_per_doc
step = max(1, chunk_size − overlap)
chunks = ceil(total_tokens / step)
index_cost = total_tokens / 1e6 × embed_price_per_M (one-time)
storage_bytes = chunks × dimensions × 4 (float32)
storage_gb = storage_bytes / 1024³
monthly_storage = storage_gb × storage_price_per_GB_month
context_tokens = top_k × chunk_size
gen_input = system_prompt + query_tokens + context_tokens
gen_cost/query = gen_input / 1e6 × gen_in_price
+ output_tokens / 1e6 × gen_out_price
query_embed/query= query_tokens / 1e6 × embed_price_per_M
per_query = query_embed/query + gen_cost/query
monthly_total = queries/mo × per_query
+ monthly_storage
+ index_cost × reindex_per_month
first_month = monthly_total + (reindex = 0 ? index_cost : 0)
lkr = usd × usd_to_lkr_rateThe one-time indexing cost embeds your whole corpus once. Storageis computed from the real vector size — chunk count × the embedding model's dimensions × 4 bytes for float32 — so a 3,072-dimension model like text-embedding-3-large costs twice the storage of a 1,536-dimension model. Storage GB uses binary GiB (1024³ bytes), which slightly overestimates against the decimal GB some clouds bill, erring toward a safer number.
The recurring generation line is where RAG bills live. Every query sends the system prompt, the question, and all top_k retrieved chunks to the LLM as input tokens, then bills the answer as output tokens. Because chat-model rates run many times the embedding rate and apply on every single query, generation routinely exceeds 95% of the monthly total — which is why raising top-k or the chunk size is the fastest way to push the bill up. The calculator's breakdown bar shows the exact split.
Chunk counting uses ceil(total_tokens / step), which counts every sliding window across the corpus. A real text splitter such as LangChain produces an equal or slightly smaller count once overlap is large, so this figure is a conservative upper bound on stored chunks. The page cross-checks the two formulas and confirms they agree exactly at zero overlap: chunk-count cross-check passes.
Worked-example self-test (computed live on this page) — each line reconciles the formula above with the hand-derived numbers in the code header:
- A · Small KB · 3-small · GPT-4o-mini · monthly total → expected $5.89, got $5.89
- A · Small KB · first month incl. one-time index → expected $5.91, got $5.91
- B · Larger KB · 3-large · Claude Haiku 4.5 · monthly total → expected $276.77, got $276.77
- C · Zero docs, zero queries · first month → expected $0.00, got $0.00
- E · 1e9 tokens · no queries · first month incl. $20 index → expected $21.89, got $21.89
Worked examples
Frequently asked questions
Sources & references
- OpenAI — API Pricing (text-embedding-3-small / 3-large, GPT-4o / 4o-mini)
- Anthropic — Claude API Pricing (Claude Haiku 4.5, Sonnet 4.5)
- Google — Gemini API Pricing (Gemini 2.5 / 2.0 Flash, gemini-embedding-001)
- Cohere — Pricing (embed-v3.0 English / multilingual)
- Pinecone — Pricing (serverless storage $/GB-month)
- Central Bank of Sri Lanka — Daily indicative exchange rates (USD→LKR)
Every price was last cross-checked against its vendor pricing page on 2026-06-09. The default storage rate is Pinecone serverless ($0.33/GB-month). Spotted a price that has moved? Email the address below — most fixes ship within 24 hours.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a price that has moved, or an edge case the calculator doesn't cover?
Email me at [email protected] — most fixes ship within 24 hours.