Question 1

How do you write alt text for an image automatically?

Accepted Answer

Drop the image into the tool above and pick a role (Informative / Decorative / Functional / Complex). The page sends the image to a BLIP vision-language model on the Hugging Face Inference API, which returns up to three caption candidates. A built-in WebAIM/W3C linter scores each one, caps it to your chosen length (80 / 125 / 250 chars), and shows the cleanest result with one-click copy in plain text, HTML alt attribute, and Markdown.

Question 2

What is the best free alt-text generator?

Accepted Answer

Most free generators send your image to a private API and store it. This page runs the same BLIP model used by paid services, but the only network call is from this site's server to Hugging Face Inference — no third-party SaaS, no signup, no ads. The linter on top is the same one that power-users wire up manually using WebAIM and W3C guidance; here it runs by default on every result.

Question 3

How long should alt text be for accessibility (WCAG)?

Accepted Answer

WCAG 2.1 SC 1.1.1 does not give a length, but WebAIM and most screen-reader vendors recommend keeping it under ~125 characters. Very long alt text is tiring to listen to. This page offers three presets: Brief (<= 80), Standard (<= 125), and Detailed (<= 250) — the linter flags any output that exceeds your chosen cap and walks back to the previous word boundary if the model overshoots.

Question 4

Can AI describe an image without uploading it to a server?

Accepted Answer

Browser-only captioning is technically possible but requires the user to download a ~150–250 MB model on first run. On a typical Sri Lankan home connection that's a 30-second wait the user did not ask for, and on mobile it eats a couple of hundred megabytes of data. This page sends the image bytes once to a Next.js route on this server, which forwards them to the Hugging Face Inference API and discards them. Nothing is stored or logged.

Question 5

What model generates alt text for a blog image?

Accepted Answer

The default backbone is Salesforce/blip-image-captioning-base (Li et al., ICML 2022) — a ViT-B/16 vision encoder paired with a BERT-base text decoder, trained on COCO + Conceptual Captions + SBU + filtered LAION 115M. It scores 136.7 CIDEr on the COCO Karpathy test split. The lighter alternative, nlpconnect/vit-gpt2-image-captioning, scores 110.1 CIDEr and is offered for when the free Inference tier is rate-limited.

Question 6

What is the W3C alt-text decision tree?

Accepted Answer

The W3C WAI Tutorial on Images publishes a decision tree that asks four questions about every image: is it functional (a link or button), is it informative, is it decorative, or is it complex (chart, diagram). Each branch has a different alt-text rule — for example, functional images get described by their purpose, not their picture, and decorative images take an empty alt attribute so screen readers skip them. This page bakes the decision tree into the role switch above the input.

Question 7

Should I rely on AI alt text for accessibility audits?

Accepted Answer

AI captions are a starting draft, not a substitute for human review. The model can mis-identify objects, miss culturally specific context (e.g. a kottu roti described as 'food in a pan'), and fail on diagrams and charts. The page surfaces the model's beam-search candidates and an editable text field so a sighted author can sharpen the alt before publishing. WCAG SC 1.1.1 is a Level A requirement — judge each image against the W3C decision tree, not the model's confidence.

Question 8

How does the WebAIM linter work?

Accepted Answer

The linter runs five checks: (1) does the caption start with 'image of / picture of / photo of' — WebAIM recommends dropping it because screen readers already announce that an image is present; (2) does it end with a filename or file extension; (3) is it over the chosen length cap; (4) for functional images, does it contain 'click here'; and (5) does it duplicate the visible page caption (per W3C decision tree). The linter offers a rewrite where it can.

Question 9

Are my images stored anywhere?

Accepted Answer

No. The /api/tools/caption-image route forwards bytes to Hugging Face Inference and returns the caption — there is no database write, no logging of image contents, no analytics on the file. Hugging Face's privacy policy applies to their endpoint; we cannot speak for them, but their commercial terms state that inference inputs are not used for training. For sensitive imagery (medical, identification documents), keep the image off the page and write alt text manually.

Question 10

When were the sources last verified?

Accepted Answer

Model cards, W3C decision tree, WCAG 2.1 SC 1.1.1, and WebAIM guidance were last cross-checked on 2026-05-12. The Hugging Face Inference API endpoint and the BLIP model upload are independently versioned — when an upstream patch lands, the server response picks it up on the next call.

AI Alt-Text Generator — WCAG-friendly captions for any image

How it works

1. W3C decision tree — what role is this image?

2. BLIP captioning — what does the image show?

3. Post-processing — clip, lint, score, rank

Hard limits

Worked examples

Frequently asked questions

Sources & references

Related tools

Image to Text (OCR)

Text Summarizer

Comments & feedback