induwara.lk
induwara.lkAI · Text

Named Entity Recognition Online — Free, Server-Side, No Signup

Find every person, organisation, location, and proper noun in any English passage. A BERT NER classifier runs server-side through the Hugging Face Inference API — no model download to your browser, no third-party UI to sign in to. Export the entity table as CSV or JSON when you are done.

By Induwara AshinsanaUpdated May 12, 2026
Extract named entitiesBERT NER · server-side
Sources cited
Inference runs server-side. Text is sent once for scoring and not stored.233 / 10,000
Try a sample
75%

Spans below this score are dropped. Slide up for precision, down for recall.

Adjacent tokens that share a base label are merged. Greedy — long ORG spans can swallow embedded LOCs.

Show entity types
Aggregation strategy, threshold, and entity-type toggles update instantly without a re-run.

What this does

Reads any English passage and highlights every person, organisation, location, and other proper noun it finds. The classifier runs server-side — no model download to your browser, no signup, no upload to a third-party UI. Export the entity list as CSV or JSON when you are done.

Methodology: BERT-base fine-tuned on CoNLL-2003 (Tjong Kim Sang & De Meulder, 2003) with the BIO tagging scheme. Sub-word tokens merged client-side — strategy and threshold controls above let you trade recall for precision. Sources linked under “Sources” below.

How it works

The recognizer is a three-stage pipeline. The model itself is a single network — BERT-base fine-tuned on CoNLL-2003 — but most of what makes the output useful happens on either side of that inference call.

1. Tokenisation and inference

Your text is sent once to api-inference.huggingface.co and passed to dslim/bert-base-NER — the BERT-base checkpoint fine-tuned on the CoNLL-2003 English NER dataset (Tjong Kim Sang & De Meulder, 2003). The Inference API tokenises the input with BERT's WordPiece tokeniser, runs the model, and returns one prediction row per sub-word token. We ask for aggregation_strategy="none", so each row carries a BIO label (B-PER, I-LOC, O, etc.) plus a softmax confidence and the source-text offsets. No model weights leave the server — only the small JSON payload.

2. Span aggregation

The browser merges the per-token predictions into entity spans. Two strategies are exposed: simple, which extends a span as long as adjacent tokens share the same base type, and first, which starts a fresh span on every B-tag. Sub-word continuations marked with ## always inherit the parent span's label so a word like “Wickremesinghe” survives intact. The per-token confidences are averaged across the merged span. You can change the strategy without re-running the model — only the merge step replays.

3. Threshold and dedupe

Each merged span's averaged confidence is compared against the slider value (default 75%). Spans below the threshold are dropped; the count is reported in the “dropped” tally. What remains is grouped by the tuple (entity type, lowercased trimmed surface form), so the same name mentioned three times collapses into one row with count: 3 and the average confidence across all three. The entity table is sorted by count descending, then by first occurrence — most-discussed entities surface first.

The pipeline is deterministic given the same text. The HF Inference API is stateless — every call is independent and we send no identifiers along with your input. The four-class taxonomy (Person, Organisation, Location, Misc.) comes straight from the CoNLL-2003 shared task and is what every off-the-shelf English NER tutorial in the ecosystem uses today.

Worked examples

Sri Lankan election news

Anura Kumara Dissanayake of the Janatha Vimukthi Peramuna won the September 2024 presidential election, defeating Sajith Premadasa and Ranil Wickremesinghe. The result was confirmed by the Election Commission of Sri Lanka in Colombo.

  1. Token output: 100+ sub-word rows; PER and ORG B-/I- tags clustered around the names and party
  2. Aggregation (simple): merges sub-words → 6 entity spans
  3. Result: 3 PER (the three politicians, all ≈ 0.99), 2 ORG (the party and the Election Commission), 1 LOC (Colombo)
  4. Note: in `simple` mode, `Sri Lanka` is absorbed inside the ORG span — switch to `first` to surface it separately
  5. Verdict: complete coverage; date `September 2024` is correctly NOT flagged — DATE is outside CoNLL's four classes

Tech history one-liner

Steve Jobs co-founded Apple Inc. in Cupertino in 1976 with Steve Wozniak and Ronald Wayne.

  1. Aggregation (simple): 5 entity spans
  2. Result: 3 PER (Steve Jobs, Steve Wozniak, Ronald Wayne, all ≈ 0.99), 1 ORG (Apple Inc., ≈ 0.98), 1 LOC (Cupertino, ≈ 0.99)
  3. `1976` is correctly NOT flagged — DATE is outside the CoNLL taxonomy
  4. Title-case cross-check: 5 capitalised proper-noun runs ≡ model's 5 spans → independent confirmation
  5. Verdict: textbook NER output; an SEO content writer auditing entity coverage gets exactly what they need

No proper nouns (edge case)

the cat sat on the mat while the kettle boiled and the rain kept falling on the roof.

  1. Tokens classified: ~20, all labelled `O`
  2. Aggregation: empty array — no B- tags survive
  3. Threshold filter: nothing to drop
  4. Verdict: zero entities, empty state shown. Demonstrates that the recogniser refuses to hallucinate proper nouns from common-noun text — the right answer is sometimes 'none'

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a misclassification, edge case, or have a model suggestion?

Email me at [email protected] — most fixes ship within 24 hours.