induwara.lk
induwara.lkAI · Text

AI Text Summarizer — Free, No Signup, Sources Cited

Paste an article, paper, or meeting notes and get a TL;DR paragraph plus 3–5 key-point bullets. A DistilBART model rewrites the TL;DR on the server while a deterministic TextRank pass picks the most information-dense sentences. No model download, no signup, no upload to a third-party UI.

By Induwara AshinsanaUpdated Jun 4, 2026
Summarize textDistilBART + TextRank
Sources cited
Inference runs server-side. Text is sent once for scoring and not stored.1,344 / 20,000
Length4 bullets · 2-sentence TL;DR
Try a sample
Two passes in one click: a DistilBART abstractive TL;DR (server-side) and a deterministic TextRank extractive bullet list.

What this does

Reads any English passage and returns a TL;DR paragraph plus 3-5 key-point bullets. DistilBART rewrites the text in its own words for the TL;DR; TextRank scores every sentence by shared content and surfaces the highest-information ones as the bullets — in original document order so the summary reads naturally.

Methodology: DistilBART (sshleifer/distilbart-cnn-12-6) for the abstractive TL;DR and TextRank (Mihalcea & Tarau, 2004) for the extractive bullets. The HF Inference API is stateless — your text is sent once for scoring and not stored. Sources linked under “Sources” below.

How it works

The page runs two complementary summarization methods on the same input and shows both. They are good at different things, and reading them side by side is the fastest way to judge whether the summary can be trusted on a particular passage.

1. DistilBART abstractive (server-side)

The TL;DR paragraph comes from sshleifer/distilbart-cnn-12-6, a 12-layer-encoder / 6-layer-decoder distillation of BART fine-tuned on the CNN/DailyMail summarization corpus (Shleifer & Rush, 2020). We call the Hugging Face Inference API (api-inference.huggingface.co) from a server-only Next.js route handler with do_sample=false for deterministic output. The browser never downloads model weights — only your text is sent, and only a small JSON response comes back. Generation length is bound by the length preset you pick:

  • Short: min 40 tokens, max 80 tokens, 3 extractive bullets.
  • Medium: min 70 tokens, max 140 tokens, 4 extractive bullets.
  • Long: min 110 tokens, max 220 tokens, 5 extractive bullets.

2. TextRank extractive (always on)

The bullets come from TextRank (Mihalcea & Tarau, EMNLP 2004) — a PageRank-style algorithm run over a sentence-similarity graph. Text is split into sentences on terminal punctuation. Each sentence becomes a node; the weight of the edge between two sentences is the Mihalcea & Tarau similarity:

               |S_i ∩ S_j|
sim(S_i, S_j) = ─────────────────────────
                 log|S_i| + log|S_j|

where Si is the set of content tokens in sentence i (after Unicode normalisation, lowercasing, and removing the 126-word Snowball English stop-word list). PageRank then iterates with damping factor 0.85 until the scores stop changing (tolerance 1e-4 or 30 iterations, whichever comes first). The top-scoring sentences are returned in original document order — that keeps the bullets reading naturally instead of being shuffled by importance.

TextRank is pure JavaScript with no API call, so it always runs even when neural inference is not configured on this build. It is also fully deterministic — the same text always produces the same bullets — which is useful when you need a summary you can reproduce offline or when comparing two versions of a document.

3. Non-English handling

DistilBART-CNN was trained on English news and the stop-word list is English-only, so the page is honest about its limits. Sentences whose letters are more than 70% non-Latin script are flagged in the breakdown and excluded from the bullet ranking. For Sinhala or Tamil text we recommend translating to English first with our AI Translator and then summarizing. The TL;DR still runs through the model on mixed content — output quality just drops.

4. Compression and reading-time stats

Alongside the summary the page reports four numbers so you can judge the result at a glance. Key points is the bullet count over the total sentence count. Compression is 1 − (summary words ÷ original words) — a 1,000-word article cut to a 120-word summary reads as 88% compression. Read time saved converts both word counts to minutes at 220 words per minute, the same reading speed our reading-time estimator uses, then shows the difference. Word counts come from the same Unicode-aware token regex that powers the word counter, so the figures line up if you check them independently.

The summarizer is one half of a workflow. To pull the standalone terms a passage keeps returning to, run the bullets through the AI keyword extractor; to check that a draft summary fits inside a model’s context window before you paste it into a chatbot, the AI token counter gives the exact token count. DistilBART itself reads at most 1,024 tokens, which is why very long inputs are truncated rather than rejected — see the worked example below.

Both passes are deterministic given the same input and length preset. The HF Inference API is stateless: every call is independent and no identifiers are sent. If the API is rate-limited or unreachable, the extractive bullets and the top-ranked sentence as a TL;DR fallback are still returned so the page never shows a dead state.

Worked examples

Long news article

The Central Bank of Sri Lanka cut its standing deposit and lending facility rates by 50 basis points each on Wednesday, marking the seventh consecutive easing decision … (about 200 words).

  1. Splits into 8 sentences after period-boundary detection.
  2. Stop-word filter leaves a content-token graph of 8 nodes, average 12 tokens per node.
  3. TextRank converges in ~12 iterations. Top 4 by score: rate cut, inflation context, market reaction, forward guidance.
  4. Bullets returned in original document order so the timeline reads naturally.
  5. DistilBART rewrites a 2-sentence TL;DR that names the rates, the direction, and the headline reason.

Short paragraph (edge case)

The meeting is at 3pm in the boardroom. Bring the Q3 deck.

  1. Only 2 sentences — below MIN_SENTENCES_FOR_RANKING.
  2. TextRank degenerates to ordering by length; both sentences are returned as bullets.
  3. Short-input warning is shown; the user can still summarize.
  4. Model TL;DR collapses to a single sentence ("3pm boardroom meeting; bring Q3 deck") or returns near-verbatim.

Very long input (token-window edge case)

A 6,000-word feature article pasted in full (well under the 20,000-character limit but far past DistilBART's 1,024-token encoder window).

  1. Input passes validation — it is under the 20,000-character cap.
  2. TextRank runs over the whole article (up to the 200-sentence scoring cap) so the extractive bullets still cover the full piece.
  3. DistilBART only sees the first ~1,024 tokens; the server truncates before the inference call, so the abstractive TL;DR reflects the opening, not the tail.
  4. Takeaway: trust the bullets for coverage of a long piece, and for a faithful TL;DR of the whole thing, summarize section by section then summarize those summaries.

Mixed-language passage

"The Vesak full-moon poya is observed across the country. ආගමික වැඩසටහන් කොළඹ දී පැවැත්වේ. Public holidays follow."

  1. splitSentences returns 3 sentences.
  2. isMostlyNonLatin flags the middle sentence as non-Latin (>70% Sinhala script).
  3. That sentence is excluded from the PageRank graph but still listed in the breakdown.
  4. Bullets are picked from the two English sentences only.
  5. Stats tile reports '1 non-Latin sentence skipped'.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a hallucination, edge case, or have a model suggestion?

Email me at [email protected] — most fixes ship within 24 hours.