induwara.lk
induwara.lkAI · Computer Vision

AI Object Detector — labelled bounding boxes for any image

Drop an image and get a labelled bounding box around every object the COCO-trained DETR or YOLOS transformer can find — class name, confidence, pixel coordinates, and exports in PNG, COCO JSON, and YOLO TXT. Server-side inference via the Hugging Face Inference API; no signup.

By Induwara AshinsanaUpdated May 12, 2026
Detect objectsDETR / YOLOS · COCO 80 · server-side
Sources cited

Quality, balanced, or fast — higher AP costs more inference latency.

50%

Detections below this score are hidden. Lower it for tough images, raise it for sure shots.

30

Cap on the number of boxes drawn. Top-confidence first.

Optional. Tick the COCO labels you want kept; leave empty for all 80 classes.

Per-class hues are most readable on neutral backgrounds; pick high-contrast for screenshots.

Max 8.0 MB · JPG, PNG, WebP, GIF, AVIF

What this does

Drop an image — a Daraz product photo, a street scene, a screenshot — and the tool returns labelled bounding boxes for every COCO-80 object the model can recognise. Tune the confidence threshold and class filter on the right; the annotated canvas and export files update without another network round-trip.

Detections come from DETR or YOLOS served via the Hugging Face Inference API. Image bytes are sent once for inference and not stored. The 80-class COCO label space does not cover tuk-tuks, sarongs, or local dishes — review the result before publishing it as feature data.

How it works

Object detection answers a different question than image classification or captioning. Where a classifier outputs one label for the whole picture, and a captioner writes one sentence, a detector returns a list of (class, confidence, bounding box) tuples — one per recognised object. This page wraps three layers: file validation + preprocessing in the browser, transformer inference on a Hugging Face endpoint, and a deterministic post-processing pipeline that re-applies threshold, class filter, and (for YOLOS) non-maximum suppression on the client without a second network call.

1. Validate and preprocess the image

The browser rejects anything outside JPG, PNG, WebP, GIF, or AVIF, or larger than 8.0 MB on disk, or wider than 4096px on either side. Files that pass the gate are decoded once via createImageBitmap (which applies EXIF orientation in modern browsers) and posted as multipart form data to /api/tools/detect-objects. The route forwards the raw bytes to the Hugging Face Inference endpoint for the chosen backbone — DETR or YOLOS — and returns the parsed detections plus an inference-time milliseconds figure.

2. Transformer detection (DETR or YOLOS)

The Quality backbone is facebook/detr-resnet-50. The ResNet-50 backbone produces a 2048-channel feature map at stride 32, a 1×1 conv squeezes it to 256 channels, a 6-layer transformer encoder attends across spatial positions, and a 6-layer decoder cross-attends against 100 learned object queries. Each query output goes through two heads: an 81-way classifier (80 COCO classes + a "no object" class) and a 4-way bbox regressor predicting centre/size form (c_x, c_y, w, h) in normalised image coordinates. To draw the box in pixels we convert withx_min = (c_x − w/2) × W, y_min = (c_y − h/2) × H. DETR's bipartite-matching training loss makes the 100 queries non-overlapping, so no NMS pass is needed (paper §3).

The Fast and Balanced backbones are hustvl/yolos-tiny (6.5 M params, COCO AP 28.7) and hustvl/yolos-small (30.0 M params, COCO AP 36.1). YOLOS treats the image as a sequence of 16×16 patches plus 100 learned [DET]tokens; each token output is classified and regressed exactly like DETR. Because YOLOS does not have DETR's set-prediction loss, the page applies a per-class greedy non-maximum suppression at IoU 0.50 on the client.

The standard IoU formula inter / (|A| + |B| − inter) is implemented twice in lib/data/ai-object-detector.ts iou() uses Math.max and Math.min, and iouCrossCheck() walks corners explicitly. Both produce identical values to within floating-point ε, so a divergence between them would be a clear bug signal — the cross-check is the safety net behind every detection you see on this page.

3. Threshold, filter, and draw

The slider above defaults to 50% — anything below it is hidden from the canvas and table, but the raw detection list is kept in memory so dragging the slider re-renders instantly. The class filter accepts any subset of the 80 COCO categories and is applied beforethe max-detections cap, so a request for "only person and dog" still surfaces up to N person/dog boxes rather than wasting the cap on a low-scoring background detection. The per-class palette spaces hues by hueForClass(idx) = (idx × 137) mod 360 — golden-ratio-friendly rotation that maps adjacent COCO IDs to widely separated hues. Label text colour is picked by a YIQ luminance test so it stays legible at WCAG AA on any background hue.

Hard limits and privacy

The free Hugging Face Inference tier rejects payloads above ~10 MB, so the client caps file uploads at 8.0 MB. The 4096×4096 px dimension cap protects low-RAM devices during canvas rendering and keeps the export PNG under a sensible disk size. Image bytes leave the device exactly once — the single POST to this server's route. The route does not write to disk, log the file, or persist the response; nothing is kept after the response is sent back to your browser. The COCO 80-class space is fixed; tuk-tuks, sarongs, kottu, king coconut, and other locally relevant items fall outside it. The FAQ explains this honestly rather than mis-labelling them.

Worked examples

Example 1 — Galle Road junction, DETR-ResNet-50

A 1920 × 1080 photo of a Colombo Galle Road junction with three pedestrians, a tuk-tuk, two cars, a bus, and a stop sign. The Quality backbone (DETR-ResNet-50) at the default 0.50 threshold returns the top-scoring boxes below.

Backbone
DETR-ResNet-50 (Quality)
Image size
1920 × 1080 px
Threshold
50%
Max detections
30
Class filter
all 80 classes
Detections kept
8
ClassConfidencebbox (x, y, w, h)
person97.0%(412, 510, 88, 220)
person95.0%(520, 506, 92, 224)
car93.0%(820, 540, 240, 180)
person91.0%(628, 514, 86, 218)
car88.0%(1080, 558, 220, 170)
bus86.0%(1280, 460, 380, 280)
stop sign83.0%(1710, 280, 70, 90)
motorcycle71.0%(160, 580, 120, 140)

Tuk-tuks are not in the COCO 80-class set, so the model returns 'motorcycle' for the closest visual match. The tool surfaces this in the FAQ rather than mis-labelling silently.

Example 2 — Daraz product listing, YOLOS-tiny, class filter

A 1080 × 1080 product photo: a laptop on a white background with a coffee cup beside it. YOLOS-tiny is selected for speed; the class filter restricts output to laptop, cup, keyboard, mouse.

Backbone
YOLOS-tiny (Fast)
Image size
1080 × 1080 px
Threshold
30%
Max detections
5
Class filter
laptop, cup, keyboard, mouse
Detections kept
3
ClassConfidencebbox (x, y, w, h)
laptop92.0%(180, 240, 720, 540)
cup84.0%(880, 600, 160, 220)
keyboard41.0%(230, 600, 540, 80)

The keyboard box clears the lowered 0.30 threshold and is labelled 'low confidence' in the table because it falls below the default 0.50. The class filter removes any non-listed detection before the max-detections cap.

Example 3 — Cat under sofa, DETR, threshold 0.80 (empty state)

A low-light photo of a partially-occluded cat under a sofa. The DETR backbone runs successfully but no detection clears the 0.80 confidence threshold.

Backbone
DETR-ResNet-50 (Quality)
Image size
1600 × 1200 px
Threshold
80%
Max detections
30
Class filter
all 80 classes
Detections kept
0

The page degrades helpfully: empty annotated canvas, a one-line explanation suggesting a lower threshold or the alt-text generator, and the stats footer still shows the inference timing.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spot an image the detector keeps mis-reading, or a Sri Lanka-specific class you wish the COCO label space covered?

Email me at [email protected] — most fixes ship within 24 hours.