induwara.lk
induwara.lkAI · Reading comprehension

AI Question Answering — Free Extractive QA Over Any Passage

Paste a passage, ask a question, get the exact span that answers it — with a confidence score. A DistilBERT model fine-tuned on SQuAD runs server-side; a deterministic keyword baseline runs alongside as a transparent cross-check. No signup, no API key.

By Induwara AshinsanaUpdated May 12, 2026
Ask a questionDistilBERT · SQuAD · server-side
Sources cited
Inference runs server-side. Text is sent once for scoring and not stored.289 / 8,000
Focused questions of 5–25 words work best. Ending with a question mark is optional.46 / 200
Try a sample
Two answers in one click: a DistilBERT span extractor (server-side) and a deterministic keyword baseline as a second opinion.

What this does

Paste an English passage and ask a question. The tool returns the exact sentence (and a narrower span) that answers it, with a confidence score. A DistilBERT model fine-tuned on SQuAD runs server-side; a deterministic keyword baseline runs alongside as a transparent cross-check. Best on factual questions where the answer is stated outright in the passage.

Methodology: the DistilBERT-SQuAD pipeline returns a span and a softmax confidence in [0, 1]. The lexical baseline scores each context sentence by keyword overlap with the question. Both numbers are shown so you can verify the model independently. Model F1 on SQuAD-dev: 86.9.

How it works

The tool runs two analyses on every question and shows them side-by-side, so you can sanity-check the model against a transparent baseline. Inference happens on the server — no model weights are downloaded into your browser, and the page works on any device that can hit our API.

Neural answer. The passage and question are sent to the Hugging Face Inference API for distilbert/distilbert-base-cased-distilled-squad, a DistilBERT (cased) checkpoint fine-tuned on SQuAD 1.1. The model attends to the passage and the question together, then emits two probability vectors of length 512start logits and end logits. The library scores every valid (start, end) pair where start ≤ end and the span length is within the configured cap, then returns the highest-scoring span and its softmax-normalised probability as the "confidence".

Long passages. The DistilBERT context window is 512 sub-word tokens. When your passage exceeds that, the pipeline applies a sliding window of 384 context tokens with a stride of 128 tokens, runs inference on every chunk, then deduplicates overlapping spans and keeps the best one. The result panel shows the chunk count so you can see when this kicks in.

Lexical baseline.Independently of the model, a tiny keyword-overlap scorer runs over your passage. For each sentence it counts the fraction of content words (non stop-words) from your question it contains. A small bonus is added when the sentence matches the question's wh-pattern — numbers and durations for "how long / how many", calendar terms for "when", and capitalised noun phrases for "who / where". The top sentence is returned with a 0–1 score.

Cross-check. The two answer spans are compared with the official SQuAD word-overlap F1 metric (the same one the SQuAD leaderboard reports). High F1 means the two methods picked essentially the same words; low F1 means the model disagreed with the baseline and the answer is worth a second look. Both spans, both confidences, and the F1 are shown in the result tiles.

Confidence bands.The reported model confidence is calibrated to the model card's dev-set behaviour: ≥ 80% is high (usually correct on clean factoid questions), 5080% is medium (treat as a candidate), and below 50% is low (the passage probably does not state the answer outright). Both the F1 cross-check and a low-confidence warning render automatically when the score drops into the low band.

Privacy. Text is sent once for scoring and not stored. No tracking, no ads, no signup. If you need to keep the passage local, the same model can be run via transformers in Python on your own machine; see the sources below.

Worked examples

Factoid

How long is the President of Sri Lanka's term?

  1. Passage: 2-sentence excerpt from Article 30 of the Constitution.
  2. Question keywords (content words only): {long, president, sri, lanka, term}.
  3. Best sentence by keyword overlap: "The President holds office for a term of five years and is elected directly by the people in a national vote."
  4. Lexical span (wh-pattern "how long" → duration regex): five years
  5. Neural span (DistilBERT-SQuAD): five years · confidence ~0.92
  6. Cross-check F1 ≈ 1.00 → answers agree.

Legal clause

What is the notice period in clause 7?

  1. Passage: short lease clause titled "Notice and termination".
  2. Question keywords: {notice, period, clause, 7}.
  3. Best sentence: "Either party may terminate this lease by giving the other party not less than three months written notice."
  4. Lexical span (no clean wh-pattern for "what is the period") → returns the full sentence.
  5. Neural span: three months · confidence ~0.86
  6. Cross-check F1 < 0.5 because the lexical baseline keeps the whole sentence while the model returns the narrower phrase — both are correct, but the model wins on precision.

Out-of-context

What is the capital of Sri Lanka?

  1. Passage: a recipe for kiribath (milk rice). No mention of capital cities.
  2. Question keywords: {capital, sri, lanka}.
  3. Best sentence by overlap: the opening recipe line (only "Sri Lankan" matches).
  4. Lexical confidence = 1 / 3 ≈ 0.33 → LOW band.
  5. Neural confidence < 0.30 on this input → LOW band.
  6. Cross-check F1 ≈ 0 → big disagreement, both LOW.
  7. Page surfaces the "likely no answer in passage" warning rather than presenting either span as authoritative.

Frequently asked questions

Sources & references

Cross-checked against the official model card and SQuAD reference scorer on 2026-05-12. The page is reviewed when the upstream checkpoint changes or when a clearly better extractive QA backbone (XLM-RoBERTa-SQuAD2 for multilingual, DeBERTa-v3-SQuAD2 for English with no-answer support) becomes available on the free Inference API tier.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.