Question 1

How can I detect objects in an image online for free?

Accepted Answer

Drop your image into the tool above, leave the defaults or pick a model, and click Detect. The page sends the image once to a Next.js route on this server, which forwards the bytes to the Hugging Face Inference API for DETR or YOLOS. You get back labelled boxes, per-class summary chips, a sortable table, and downloads in PNG, COCO JSON, and YOLO TXT. No signup, no watermark, no model download to your device.

Question 2

What is the best free object detection model that runs without a key?

Accepted Answer

For accuracy, facebook/detr-resnet-50 is hard to beat on the COCO 80-class set — 42 box AP and no NMS pass needed thanks to its bipartite-matching loss. For speed, hustvl/yolos-tiny returns in roughly a second on the free inference tier with 28.7 box AP. Both are hosted on Hugging Face's public inference endpoints, so this tool can call them without an API key from the user.

Question 3

How accurate is YOLOS or DETR compared to YOLOv8?

Accepted Answer

On COCO val 2017, hustvl/yolos-tiny reports 28.7 box AP, hustvl/yolos-small reports 36.1, and facebook/detr-resnet-50 reports 42. YOLOv8-n (3.2 M params) reports ~37.3 box AP on COCO val and YOLOv8-x reports 53.9. YOLOv8 is closed-licence for many commercial uses; DETR and YOLOS ship under permissive Apache 2.0 / MIT terms and are good enough for product audits, dataset prep, and accessibility work.

Question 4

Can I run object detection without uploading my image anywhere?

Accepted Answer

Browser-only object detection is technically possible but requires downloading 25–165 MB of ONNX weights to every visitor's device on first run. On a typical Sri Lankan home connection that is a 30-second wait the user did not ask for, and on mobile it eats a couple of hundred megabytes of data. This tool POSTs the image once to a server route on induwara.lk, which proxies the bytes to Hugging Face Inference and discards them after the response. Nothing is stored or logged.

Question 5

How do I get bounding box coordinates from a photo?

Accepted Answer

After clicking Detect, scroll down to the table — each row gives the class, confidence, and the bounding box as (x, y, width, height) in pixels of the original image. The Download buttons export the same data as COCO JSON ({ image_id, annotations: [{ category_id, category_name, bbox, score }, …] }) or YOLO TXT (one line per box: class_id cx cy w h, normalised to [0, 1]). Both formats drop into LabelStudio, Roboflow, Ultralytics, or any standard CV training pipeline.

Question 6

Why is my tuk-tuk labelled motorcycle?

Accepted Answer

COCO 2017 has 80 fixed classes and tuk-tuk is not one of them. The model returns the closest visual neighbour, which is usually 'motorcycle' for a three-wheeler. Same applies to sarongs (labelled 'tie' or 'person'), king coconut (labelled 'apple' or 'orange'), and kottu (labelled 'pizza' or 'sandwich'). The 80-class list is fixed by the model card; we surface this honestly rather than silently mis-labelling. See the COCO class list link in the Sources section.

Question 7

How do I lower the confidence threshold for tougher images?

Accepted Answer

The Threshold slider above the canvas re-filters the already-returned detections on the client, so you can drag it lower (down to 5%) without paying for another round-trip to the server. Hard-to-see subjects in low light or partial occlusion typically score 0.30–0.45; raise the bar to 0.70 for clean product shots, drop it to 0.20 when looking for everything the model can find.

Question 8

Does this tool do face recognition?

Accepted Answer

No. DETR and YOLOS return the COCO class 'person' with a bounding box around the whole body — there is no face detection, no age estimation, no identity inference, no biometric matching. We deliberately keep face-level inference off this tool: induwara.lk does not run any model that could fingerprint or identify a specific person from a photo.

Question 9

What is the difference between DETR and YOLOS?

Accepted Answer

DETR (Carion et al., 2020) is a ResNet-50 backbone followed by a transformer encoder-decoder that emits 100 object queries; the bipartite-matching training loss makes the queries non-overlapping, so no NMS post-processing is needed. YOLOS (Fang et al., 2021) is a pure vision transformer (DeiT backbone) that re-uses 100 detection tokens inside the encoder itself. YOLOS-tiny is roughly six times smaller than DETR but drops about 13 box-AP points. Both share the 80-class COCO label space.

Question 10

Are my images stored anywhere?

Accepted Answer

No. The /api/tools/detect-objects route forwards bytes to Hugging Face Inference and returns the detections — there is no database write, no logging of image contents, no analytics on the file. Hugging Face's privacy policy applies to their endpoint; we cannot speak for them, but their commercial terms state that inference inputs are not used for training. For sensitive imagery (medical, identification documents), prefer the worked examples and avoid uploading the original. Last source verification: 2026-05-12.

Class	Confidence	bbox (x, y, w, h)
person	97.0%	(412, 510, 88, 220)
person	95.0%	(520, 506, 92, 224)
car	93.0%	(820, 540, 240, 180)
person	91.0%	(628, 514, 86, 218)
car	88.0%	(1080, 558, 220, 170)
bus	86.0%	(1280, 460, 380, 280)
stop sign	83.0%	(1710, 280, 70, 90)
motorcycle	71.0%	(160, 580, 120, 140)

Class	Confidence	bbox (x, y, w, h)
laptop	92.0%	(180, 240, 720, 540)
cup	84.0%	(880, 600, 160, 220)
keyboard	41.0%	(230, 600, 540, 80)

AI Object Detector — labelled bounding boxes for any image

How it works

1. Validate and preprocess the image

2. Transformer detection (DETR or YOLOS)

3. Threshold, filter, and draw

Hard limits and privacy

Worked examples

Frequently asked questions

Sources & references

Related tools

AI Alt-Text Generator

Image to Text (OCR)

Comments & feedback