Question 1

How do you extract keywords from a text automatically?

Accepted Answer

Two algorithms run in parallel on the same input. YAKE (Campos et al. 2020) scores each candidate phrase using five local features — case, position, frequency, neighbour diversity, and sentence spread. RAKE (Rose et al. 2010) splits the document at stopword and punctuation boundaries, then ranks each phrase by the sum of its tokens' degree-over-frequency scores. A check mark appears next to phrases that both methods agree on — those are the highest-confidence terms.

Question 2

What is the best free keyword extractor?

Accepted Answer

For SEO drafting, a deterministic statistical extractor beats a black-box embedding model because the score is auditable: you can trace exactly why a phrase ranked where it did. This page runs YAKE and RAKE — the two most-cited deterministic algorithms in the literature — side by side, marks the phrases they agree on, and lets you copy the result as CSV, JSON, line-separated, or #hashtag form. No paid API, no signup, no model download.

Question 3

What algorithm is used to extract keywords from text?

Accepted Answer

YAKE assigns each candidate n-gram a score equal to the mean of its tokens' single-term scores divided by (1 + KF), where KF is the n-gram's raw frequency. Single-term scores combine TCase (proper-noun signal), TPos (position bias), TFNorm (frequency relative to document), TRel (neighbour diversity), and TSentence (sentence spread). RAKE computes a word score = degree / frequency over the document, then sums word scores into a phrase score. Both formulas are documented under "How it works" on this page.

Question 4

Can AI find SEO keywords in an article?

Accepted Answer

Yes — for on-page SEO drafting. Paste your article and the extractor returns the noun phrases your text actually emphasises, ranked by importance. Treat the list as a starting point for headings, image alt text, and meta-description tokens. It does not give you search volume or competition (that requires a paid index like Ahrefs or Semrush) but it does tell you whether your draft is about what you think it is about.

Question 5

What is the difference between YAKE and RAKE?

Accepted Answer

YAKE scores individual tokens on five local signals — case, position, frequency, neighbour diversity, and sentence spread — then aggregates them per phrase, so it favours topic-bearing terms and named entities that appear early. RAKE ignores position entirely: it scores each word by degree-over-frequency and sums those across a phrase, which rewards recurring multi-word technical compounds. Running both and trusting the phrases they overlap on is more reliable than either method alone.

Question 6

How many keywords should I extract from one article?

Accepted Answer

For a single blog post or page, eight to twelve phrases usually cover the main topic and two or three sub-topics without diluting focus. Use the slider to set the count. Extracting thirty from a short article tends to drag in incidental phrases; extracting three from a long one misses sub-topics. Start at ten and adjust based on how tightly the list clusters around one subject.

Question 7

How do you pick keywords for a blog post in Sri Lanka?

Accepted Answer

Write the post first, then run it through this extractor. Look at the top 10 phrases and check that your target search term is among them — if your post about Trincomalee beaches surfaces "Uppuveli sand" and "Pigeon Island" but not "Trincomalee beach", strengthen those mentions. For specifically Sri Lankan terms (place names, brand names), YAKE's proper-noun signal (TCase) usually surfaces them automatically. Cross-check with Google Search Console once the page is live to confirm the queries you actually rank for match the keywords the extractor flagged.

Question 8

Can I extract keywords from a PDF, a URL, or a YouTube video?

Accepted Answer

Not directly — the tool reads text you paste into the box, not files or links. Copy the relevant text out of your PDF, web page, or video transcript and paste it in. That keeps everything on your device with no fetching, no upload, and no third-party request. For very long documents, paste one section at a time; both algorithms are single-document by design and sharper on focused passages.

Question 9

Is anything uploaded? Where does my text go?

Accepted Answer

Nothing leaves your device. Both extractors are pure JavaScript. There is no API call, no model download, no logging, no telemetry. The page works offline once loaded.

Question 10

Why don't you use a BERT or embedding-based extractor?

Accepted Answer

KeyBERT and similar embedding-based extractors need a sentence-transformer model (typically 20–90 MB) downloaded into the browser. That is a large cost for a tool that should answer in milliseconds, and Sri Lankan mobile connections are not always generous. YAKE and RAKE are deterministic, instant, and produce competitive rankings for the single-document case — Campos et al. 2020 benchmark YAKE within 2–5% of supervised neural extractors on SemEval and Inspec, with no training and no inference cost.

Question 11

Does it work with Sinhala or Tamil text?

Accepted Answer

Partially. Both YAKE and RAKE are language-agnostic — they rely on positional, frequency, and co-occurrence features rather than language-specific vocabulary, so they will return reasonable results on Sinhala or Tamil prose. The only language-specific dependency is the stopword list (179 English terms from Snowball/Porter). Until a validated Sinhala stopword set is added, common Sinhala function words may appear in the top results. A multilingual stopword option is on the roadmap.

Question 12

Why do YAKE and RAKE sometimes disagree?

Accepted Answer

YAKE rewards phrases whose component tokens appear early, in varied syntactic contexts, and in capitalized form — strong signals for topic-bearing terms like named entities. RAKE rewards phrases whose tokens have a high co-occurrence degree across the document — better for long technical compounds ("reduced graphene oxide electrodes"). When they disagree, look at the disagreement itself: a phrase only YAKE surfaces is usually a proper noun; a phrase only RAKE surfaces is usually a recurring technical compound. Both are legitimate keywords; the agreement column flags the safest bets.

Question 13

What is the input character limit?

Accepted Answer

20,000 characters per call, with a minimum of 50. That covers an article, paper abstract, or product description. For a chapter or longer document, run the extractor on each section separately — both algorithms are single-document by design, and short topical passages produce sharper keyword sets than long dilute ones.

Question 14

When were the algorithms and sources last verified?

Accepted Answer

Reference papers, formulas, and stopword list were last cross-checked on 2026-05-12. The implementation is reviewed whenever a worked example regresses or a YAKE/RAKE reference release ships a relevant fix.

AI Keyword Extractor — Free, In-Browser, No Upload

How it works

1. Shared pre-processing

2. YAKE (statistical extractor)

3. RAKE (co-occurrence extractor)

4. Method agreement

5. Why not KeyBERT / a neural extractor?

6. Validation and limits

7. Using it in a content workflow

Worked examples

Frequently asked questions

Sources & references

Related tools

AI Entity Recognizer

Text Summarizer

AI Question Answering

Comments & feedback