induwara.lk
induwara.lkAI · Accessibility

AI Alt-Text Generator — WCAG-friendly captions for any image

Drop an image and get a concise, WCAG 2.1-friendly alt-text string plus two longer caption variants. BLIP captioning runs server-side through the Hugging Face Inference API; a built-in WebAIM/W3C linter catches common alt-text mistakes. No signup, no ads.

By Induwara AshinsanaUpdated May 12, 2026
Generate alt textBLIP · WCAG-friendly · server-side
Sources cited
3

Higher values explore more wording but take longer.

BLIP base produces higher-quality captions; ViT-GPT2 is lighter.

0/200

A few words the model can't see in the image — proper nouns, brand names, location.

If your image already has a caption underneath it, the linter will warn against duplicating it in alt text.

Max 8.0 MB · JPG, PNG, WebP, GIF, AVIF
Informative. Per the W3C decision tree, the alt text should convey the information the image carries. The model generates a short visual description.

Captions are generated by the BLIP model on the Hugging Face Inference API. Image bytes are sent once for inference and not stored. AI output is a draft — review it before publishing per W3C guidance.

How it works

The page wraps three independent layers: the W3C decision tree (drives the role switch), the BLIP vision-language model (writes the caption), and the WebAIM/W3C linter (scores the caption against accessibility heuristics). Each layer is documented below so you can see why a given result was chosen — and override it when the model gets it wrong.

1. W3C decision tree — what role is this image?

Before the model runs, the page asks the same question the W3C WAI alt-text decision tree asks: is this image informative (adds information beyond the surrounding text), decorative (a divider, ornament, or visual filler with no meaning), functional (used inside a link or a button), or complex (a chart, diagram, or infographic that needs a long description). Each branch has a different rule. Decorative images take an empty alt="" — the page skips the model entirely. Functional images get described by their purpose (Search, Open menu, Buy now), not their picture. Informative and complex images get a caption from the model.

2. BLIP captioning — what does the image show?

The primary captioner is Salesforce/blip-image-captioning-base — a ViT-B/16 vision encoder paired with a BERT-base text decoder, trained on COCO + Conceptual Captions 3M + 12M + SBU + filtered LAION 115M (Li, Li, Xiong & Hoi, ICML 2022). The model card reports CIDEr 136.7 / SPICE 26.0 on the COCO Karpathy test split. We call it via the Hugging Face Inference API from a server-only Next.js route handler — your image bytes travel from your browser to this server to Hugging Face and back. No third-party SaaS, no model download to your device.

Beam search is enabled by default with width 15 (slider above). Wider beams explore more wording at the cost of one or two extra seconds of latency. The token budget is set from your length preset — 20 tokens for Brief, 30 for Standard, 60 for Detailed — which keeps the model close to the WebAIM-recommended ~125-character cap without an over-aggressive trim.

The optional context hintfield is wired to BLIP's conditional-captioning entry point (paper §3.3). Anything you type is prepended to the decoder input as a photo of <hint> so the model continues from there. Useful for proper nouns the model cannot guess — place names, brand names, named dishes.

3. Post-processing — clip, lint, score, rank

Each candidate is run through the same pipeline:

  1. Length cap. Walk back to the previous word boundary inside the [60 %, 100 %] window of the cap and append an ellipsis (U+2026) so truncation is visible. If a candidate already fits, it is returned verbatim.
  2. WebAIM/W3C linter.Five heuristics run against the clipped string: drop "image of / picture of / photo of" lead phrases; warn on filename suffixes (.jpg, .png, …); flag over-limit lengths; warn on "click here" in functional roles; warn when alt duplicates the visible page caption.
  3. Quality score.Each candidate scores 100 minus 20 per warning, minus 5 per info, plus a +5 bonus when length sits in WebAIM's 25–125-character sweet spot. The cross-check function alternateScoreCaption() reproduces the same number from a separate formulation, so you can verify the ranking is deterministic and not a black-box re-rank.

The highest-scoring candidate becomes the primary alt text shown first. The next two are surfaced as alternates so you can compare phrasings. The model's own beam-search order is preserved in the API response — the score-based re-rank only swaps a candidate forward when the linter says the underlying beam-top has a clearly fixable problem.

Hard limits

Images larger than 8.0 MBor 4096×4096 px are rejected with a specific error rather than silently re-scaled on the server. The captioner only ever sees a 384×384 view of your image after Hugging Face's preprocessing, so larger files just waste bandwidth.

Worked examples

Informative photo, Standard length

A stock photo of a barista pouring espresso into a small white ceramic cup at a wooden café counter.

Role
informative
Length preset
standard
Raw model output
a barista pouring espresso into a small white cup at a café counter
Final alt
a barista pouring espresso into a small white cup at a café counter
Quality score
100 / 100

67 characters, fits the 125-char Standard cap. Linter clears all rules: no lead phrase, no filename suffix, no over-limit, role is Informative so the click-here rule does not apply. Score: 100 + 5 (in 25..125 sweet spot) = 105 → clamped to 100.

Decorative divider, Brief length

A thin horizontal divider line PNG used between two blog sections.

Role
decorative
Length preset
brief
Raw model output
not run
Final alt
""
Quality score
100 / 100

The API is not called. Per the W3C decision tree, decorative images take alt='' so screen readers skip them entirely.

Linter catches the 'image of' lead phrase

A user-pasted caption that starts with a redundant 'image of' prefix.

Role
informative
Length preset
brief
Raw model output
image of a sunset over the beach
Final alt
image of a sunset over the beach
Quality score
85 / 100
Linter rewrite
a sunset over the beach

Per WebAIM, screen readers already announce that an image is present. The linter flags LEAD_PHRASE and offers the rewrite shown. Score = 100 − 20 + 5 (32 chars in band) = 85, below the 100 a clean candidate scores.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spot a caption the linter should be catching, or an image the model keeps mis-reading?

Email me at [email protected] — most fixes ship within 24 hours.