induwara.lk
induwara.lkUtility · Moderation

AI Content Moderation Checker — Free Toxicity Checker

Paste any comment, review, or message and check it for toxicity, profanity, threats, insults, and hate speech across the six Jigsaw categories. Offending words are highlighted, sources are cited, and nothing is stored. No signup.

By Induwara AshinsanaUpdated Jun 22, 2026
Check content for moderation6 categories · English
Sources cited
The profanity scan runs in your browser; the model runs server-side. Nothing is stored.87 / 1,000
Try a sample
One click runs the profanity scan instantly and the toxic-bert classifier server-side as a second opinion.

What this does

Reads any English text and returns a single moderation verdict — Clean, Flagged, or Strongly flagged — backed by two checks: a transparent profanity-word scan that highlights every offending word, and the six Jigsaw toxicity scores (toxic, severe_toxic, obscene, threat, insult, identity_hate) from a server-side model. Pick a threshold and press Check.

Methodology: deterministic LDNOOBW profanity scan + BERT-base · 110M parameters · multi-label sigmoid head. Six independent sigmoid scores; a label flags at the chosen threshold (inclusive ≥). Sources linked under “Sources” below.

How it works

The checker runs two independent layers and combines them into one verdict — the same pattern as a profanity filter sitting next to a machine-learning classifier. Each layer is transparent, and either can flag a message on its own.

Layer 1 — deterministic profanity scan. The text is lowercased, split into word tokens, and each token is checked for an exact match against a curated 72-term subset of the LDNOOBW list (“List of Dirty, Naughty, Obscene and Otherwise Bad Words”), the open profanity list used by Shutterstock. This layer always runs in your browser, needs no network, and highlights every matched word. It also computes a transparent density figure:

severity = min(1, (matches ÷ words) × 5)

The 5× multiplier means a profanity density of 20% or more saturates to 100%, so a single bad word in a short message still registers while one in a long, otherwise clean paragraph scores low. This is a stated heuristic, not a vendor figure.

Layer 2 — toxicity model. When configured, the text is sent once to the unitary/toxic-bert classifier through the Hugging Face Inference API on the server — no model weights are ever downloaded to your browser. It returns an independent sigmoid probability between 0 and 1 for each of the six categories. Because the head is multi-label rather than softmax, the six scores do not sum to 1; a message can be high on several categories at once. A category counts as flagged when its score is at or above your chosen threshold (Strict 0.3, Balanced 0.5, Lenient 0.7).

Combined verdict. The text is flagged when any model category crosses the threshold or any profanity word is matched. It escalates to strongly flaggedwhen the model's top score reaches 0.85, when a high-harm category (severe_toxic, threat, or identity_hate) is flagged, or when the profanity density reaches 60%. The verdict maps to a plain action: Clean → “Likely safe to publish”, Flagged → “Review before publishing”, Strongly flagged → “Recommend removing”. No score is invented — model probabilities are shown verbatim, and the only computed numbers are the profanity ratio and the threshold comparisons.

The six categories

Toxic

Rude, disrespectful, or unreasonable language likely to make someone leave a discussion.

Severe toxic

Very hateful, aggressive, or disrespectful content — toxicity at its most extreme.

Obscene

Vulgar, sexually explicit, or profane language.

Threat

A statement of intent to inflict physical or other harm on a person or group.

Insult

An inflammatory or negative comment directed at a person (a personal attack).

Identity hate

Hateful content targeting a person's race, religion, gender, sexual orientation, disability, or other identity.

Worked examples

The profanity layer is fully hand-checkable. These three reconcile exactly with the formula above and with the tool's built-in verifyWorkedExamples() check. (The neural scores are not hand-computable, so only the deterministic numbers are shown.)

Happy customer — Clean

Thank you so much for the fast delivery, the product is great

  1. Tokens (words): 12
  2. Profanity matches: 0
  3. severity = min(1, 0 ÷ 12 × 5) = 0%
  4. No word flagged → Verdict: Clean → Likely safe to publish

Short angry message — Strongly flagged

this is crap

  1. Tokens (words): 3
  2. Profanity matches: 1 ("crap")
  3. severity = min(1, 1 ÷ 3 × 5) = min(1, 1.667) = 100%
  4. severity ≥ 60% → Verdict: Strongly flagged → Recommend removing

Same word, longer text — Flagged

the food was good but the service was crap honestly

  1. Tokens (words): 10
  2. Profanity matches: 1 ("crap")
  3. severity = min(1, 1 ÷ 10 × 5) = 50%
  4. Word flagged but density below 60% → Verdict: Flagged → Review before publishing

Frequently asked questions

Sources & references

The taxonomy, model, and profanity list were last cross-checked on 2026-06-22. This v1 is English-only; image, audio, and Sinhala/Tamil moderation are out of scope. A high score is a prompt for human review, not a final ruling.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.