AI Keyword Extractor — Free, In-Browser, No Upload
Paste an article, abstract, or blog draft and get its most important keywords and key phrases, ranked. Runs two deterministic extractors — YAKE and RAKE — side by side, entirely in your browser. No signup, no model download, sources cited.
How it works
The page runs two independent extractors over the same input and shows both, side by side. They use completely different methods, and watching them agree (or disagree) is the fastest way to judge how trustworthy a keyword set is for your text.
1. Shared pre-processing
The input is NFKC-normalized, split into sentences on terminal punctuation, then tokenized on Unicode letter boundaries. Tokens shorter than two characters and pure-numeric runs are dropped. From the surviving tokens the page builds candidate n-grams up to the user-selected length, admitting an n-gram only if neither its first nor its last token is a Snowball English stopword (179 terms).
2. YAKE (statistical extractor)
YAKE (Campos et al. 2020, Information Sciences 509) scores every single token using five local features:
TCase = max(TF_UPPER, TF_ACRONYM) / log2(1 + TF) TPos = log(log(3 + medianSentenceIndex)) TFNorm = TF / (meanTF + stdTF) TRel = 1 + (DL + DR) * TF / maxTF TSentence = sentenceFreq / totalSentences S(token) = (TPos * TRel) / (TCase + (TFNorm + TSentence) / TRel)
Lower S(token) means more keyword-like. For an n-gram of tokens (t₁ … tₙ) with raw frequency KF, the candidate score is S(kw) = mean(S(tᵢ)) / (1 + KF). The extractor is fully deterministic — same input always yields the same ranking.
3. RAKE (co-occurrence extractor)
RAKE (Rose et al. 2010) takes a different angle on the same text. It splits the document into candidate phrases at every stopword and every punctuation boundary, giving a list of maximal runs of content words. For each token w it then counts:
freq(w) = total occurrences of w across all phrases deg(w) = sum of phrase lengths over phrases containing w wordScore = deg(w) / freq(w) S(phrase) = sum of wordScore over the phrase's tokens
Higher S(phrase) means more keyword-like. RAKE is particularly effective on technical text where the same content words co-occur in long compound phrases (“reduced graphene oxide”, “impedance spectroscopy plot”) — the degree count rewards exactly those compounds.
4. Method agreement
The page reports the percentage overlap between YAKE's top-K and RAKE's top-K. High agreement (≥ 60%) means both extractors converge on the same vocabulary, which is a strong credibility signal — when a deterministic statistical extractor and a deterministic co-occurrence extractor independently pick the same phrases, those phrases are unambiguously the topic of the document. Low agreement is informative too: it usually flags input where word frequency alone is misleading (lists, code, or text with repeated function-word neighbours).
5. Why not KeyBERT / a neural extractor?
KeyBERT (Grootendorst 2020) and similar embedding-based extractors require shipping a 20–90 MB sentence-transformer model to the browser, or routing your text through a paid inference API. Neither fits the privacy and zero-friction goals of this site. Campos et al. 2020 benchmark YAKE within a few percentage points of supervised neural extractors on standard datasets (SemEval-2010, Inspec, DUC-2001), with no training and no inference cost — so the trade-off is worthwhile.
6. Validation and limits
Inputs shorter than 50 characters are rejected — a single sentence does not contain enough context for either method to rank candidates meaningfully. Inputs longer than 20,000 characters are rejected to keep extraction well under 100 ms even on older hardware. For book-length material, run the extractor on each section separately. The number of returned keywords is bounded between 1 and 30; the n-gram length is bounded between 1 and 3 tokens.
7. Using it in a content workflow
Keyword extraction is most useful as one step in a larger editing pass. A common sequence: draft the piece, run it through the AI Text Summarizer to confirm the lede actually states the main point, then run it through this extractor to check that your target phrase sits in the top five. If the summary and the keyword list both circle the same idea, the draft is on-topic; if they diverge, the body is burying the lead. For reviews, testimonials, or support replies, pair the keyword list with the AI Sentiment Analyzer so you can see both what a passage is about and the tone it carries. And because the stopword list here is English-only, run mixed-language drafts through the AI Language Detector first — if a passage is mostly Sinhala or Tamil, expect some function words to leak into the ranking until a validated stopword set for those languages ships.
Everything runs in your browser. Neither extractor ever touches the network — the page is fully usable offline once loaded.
Worked examples
Frequently asked questions
Sources & references
- Campos et al., 2020 — YAKE! Keyword extraction from single documents (Information Sciences)
- LIAAD / YAKE — reference Python implementation
- Rose et al., 2010 — Automatic Keyword Extraction from Individual Documents (RAKE)
- Grootendorst, 2020 — KeyBERT (cited as related literature; not used here)
- Snowball stemmer — Porter English stopword list
Reference papers, formulas, and the stopword list were last cross-checked on 2026-05-12. The page is reviewed whenever a worked example regresses or one of the reference papers publishes a revised formula.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a keyword that misfired, an edge case, or want a different stopword set?
Email me at [email protected] — most fixes ship within 24 hours.