Image to Text (OCR) — Free, In-Browser, No Signup
Drop a photo, screenshot, or scan and get plain text out. The Tesseract LSTM model runs entirely in your browser — your image never touches a server. Supports English, Sinhala, and Tamil. First run downloads ~5 MB of language data; later runs start in a second.
How it works
The tool runs the Tesseract LSTM optical character recogniser — the neural rewrite of the engine Google open-sourced in 2005 — through tesseract.js, a WebAssembly wrapper that loads the engine and trained data into a Web Worker inside your browser. The worker fetches the language .traineddata files once from the jsDelivr tessdata CDN, caches them in your browser, and every subsequent image is recognised without any network call on the photo bytes themselves.
A run on one image goes through five deterministic steps:
- Validate. The file must be JPG, PNG, WebP, BMP, or TIFF, under 25.0 MB, and at most 4,096 × 4,096 px. Rejected files leave a specific reason on screen rather than failing silently.
- Decode. Your browser's native image decoder reads the bytes into an ImageBitmap. No third-party decoder runs on your image.
- Pre-process. Tesseract.js converts the bitmap to greyscale, normalises the DPI to 70 (its default), and hands the buffer to the WebAssembly engine running in a Web Worker.
- Recognise. Tesseract's LSTM recogniser walks each layout block, line, and word, choosing the most likely characters given the picked page-segmentation mode and the loaded language model. It returns text plus a per-character, per-line, and per-image confidence score.
- Post-process. When "Tidy paragraphs" is on, the raw output is de-hyphenated across line breaks (“informa-\ninformation” → “information”) and hard-wrapped lines are collapsed so each paragraph is a single line. The original raw output is available for download too — useful when the source layout matters more than readability.
Processing time per image scales linearly with megapixels. On a 2022 MacBook Air running the WASM engine, a 1 MP English screenshot finishes in about 6 seconds; a 3 MP page takes about 14 seconds. Sinhala and Tamil are roughly 50% slower because the orthography has more glyph classes. The page exposes two estimators so you can sanity-check the wait before clicking. The closed-form throughput estimator computes seconds = megapixels × secondsPerMP + 2 s where secondsPerMP is 4 for English and 6 for Sinhala or Tamil. The lookup estimator interpolates a piecewise table calibrated against MacBook Air (M2) and Pixel 7 runs. For a 3 MP English job the closed form predicts 14.0 s and the lookup predicts 14.0 s — the two agree to within ~10%.
Page-segmentation mode controls how Tesseract carves the image before recognition. PSM 3 (Auto) is the default and the right call for most screenshots and scans. PSM 6 (Single block) treats the whole image as one paragraph — best for a tightly cropped quote. PSM 7 (Single line) is for a single line of text such as a name tag. PSM 11 (Sparse text) finds isolated words in no particular order, which is what charts, menus, and signage need.
Worked examples
Languages & page-segmentation modes
Three language packs are loaded on this page. Multi-language jobs are supported — Tesseract joins the codes with "+" and recognises in the order you picked.
| Language | Code | First-load size | Notes |
|---|---|---|---|
| English · English | eng | 4 MB | Most accurate on screenshots, printed documents, and signage. |
| Sinhala · සිංහල | sin | 10 MB | Trained on Sinhala print. Cursive and handwriting accuracy is lower. |
| Tamil · தமிழ் | tam | 9 MB | Trained on Tamil print. Cursive and handwriting accuracy is lower. |
| PSM | Mode | Best for |
|---|---|---|
| 3 | Auto (default) | Fully automatic page segmentation. Best for screenshots, scans, and most photos. |
| 6 | Single block | Assume a uniform block of text. Best for a tightly cropped paragraph. |
| 7 | Single line | Treat the image as one line. Best for a name tag, slogan, or single sentence. |
| 11 | Sparse text | Find as much text as possible in no particular order. Best for menus, charts, signs. |
Confidence grading
Tesseract reports a single confidence number between 0 and 100, averaged across every recognised character. The tool grades it using these thresholds:
- 95 · High — Likely usable as-is. Skim for any unusual characters.
- 78 · Medium — Mostly correct. Expect a handful of edits per page.
- 60 · Low — Substantial editing needed. Try a sharper image or a different PSM.
- 35 · Very low — Re-shoot the image with better lighting, focus, and contrast.
Confidence is a heuristic. A High-confidence result can still mis-read look-alike characters (0/O, 1/l, rn/m). Always proofread.
Frequently asked questions
Sources & references
- tesseract.js — WebAssembly wrapper around Tesseract OCR (Apache 2.0)
- tesseract.js — official documentation and live demo
- Tesseract OCR — upstream documentation (tessdoc)
- tessdata_fast — the small, fast .traineddata files used here
- Smith, An Overview of the Tesseract OCR Engine (Google, 2007) — original architecture paper
The tesseract.js version, trained-data sources, and PSM table were last cross-checked on 2026-05-11. The page is reviewed whenever tesseract.js ships a major release or the upstream tessdata files change.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.