induwara.lk
induwara.lkDeveloper · Encoding

URL Encoder & Decoder

Percent-encode text for safe inclusion in URLs, or decode an encoded URL back to plain text. Supports component, whole-URI, and form-urlencoded profiles. UTF-8 round-trip verified, runs entirely in your browser.

By Induwara AshinsanaUpdated May 11, 2026
URL encoder & decoderRFC 3986
Encoding profile
Anything you paste is converted to UTF-8 bytes first, so Sinhala (ශ්‍රී ලංකා), Tamil, and emoji round-trip without corruption.
Encoded output
Type or paste text above to see its percent-encoded form here.

Sources: percent-encoding follows RFC 3986 §2 (unreserved, reserved, percent-encoding). UTF-8 byte conversion uses the WHATWG Encoding TextEncoder. URL parsing follows the WHATWG URL standard. Everything runs in your browser — no uploads.

How it works

A URL has a fixed structure (scheme, host, path, query, fragment) and a fixed list of characters with structural meaning. Any character outside that list — including spaces, most punctuation, and every non-ASCII byte — must be escaped before it can ride inside a URL. The escape format is the same one defined by RFC 3986 §2.1: a literal % followed by two hex digits naming the byte's value. The three profiles on this page differ only in which characters they escape:

  1. UTF-8 bytes from text. Any non-ASCII input is first converted to bytes by the standard TextEncoder. That is why the Sinhala letter (U+0DC1) becomes three bytes E0 B7 81 in the encoded output rather than one — UTF-8 expresses code points above U+007F using two, three, or four bytes, and each byte is escaped independently.
  2. The unreserved set. RFC 3986 §2.3 names 66 characters that always survive untouched: A-Z, a-z, 0-9, and the four marks - . _ ~. Every encoder on this page leaves them alone.
  3. The reserved set (gen-delims and sub-delims). : / ? # [ ] @ split a URI into its parts. ! $ & ' ( ) * + , ; = separate fields inside one part. The Component profile escapes every reserved character because the value is going inside one segment. The Whole URI profile preserves them because the input is already structured. The Form profile is the Component profile with one extra step: a literal space is emitted as + instead of %20, matching the WHATWG application/x-www-form-urlencoded serializer used by HTML forms and URLSearchParams.
  4. Decoder = the inverse, with sanity checks. The decoder scans for malformed escapes first — every % must be followed by two hex digits, or a clear error fires with the offending position. If the syntax is sound, the bytes are gathered and run through a fatal-mode UTF-8 decoder. Form mode additionally replaces every literal + with a space before the percent-decoding step.

The implementation is built on the platform: encoding uses encodeURIComponentand encodeURI as defined in ECMA-262; URL parsing uses the WHATWG URL constructor; UTF-8 conversion uses the WHATWG Encoding TextEncoder. The page wraps these with deterministic error messages and a round-trip verifier that runs once per load over a probe containing ASCII reserved characters, Sinhala, and emoji. The green “Verified · round-trip” badge in the tool header confirms all three profiles passed.

Worked examples

Reserved characters (Component vs Whole URI)

"a&b=c?d#e"

  1. Component profile escapes every reserved character:
  2. & (0x26) → %26 = (0x3D) → %3D
  3. ? (0x3F) → %3F # (0x23) → %23
  4. Encoded (Component) → 'a%26b%3Dc%3Fd%23e'
  5. Whole-URI profile preserves them all → 'a&b=c?d#e'

A space, three ways

"hello world"

  1. Bytes: 68 65 6C 6C 6F 20 77 6F 72 6C 64
  2. 0x20 is the space — outside the unreserved set.
  3. Component → 'hello%20world'
  4. Whole URI → 'hello%20world' (same — space is illegal everywhere)
  5. Form → 'hello+world' (RFC 3986 + WHATWG form serializer)

UTF-8 multi-byte (Latin Extended)

"é" → utf-8 bytes [0xC3, 0xA9]

  1. Code point: U+00E9 — outside the ASCII range.
  2. TextEncoder emits two UTF-8 bytes: 0xC3 0xA9.
  3. Each byte is escaped independently:
  4. 0xC3 → %C3 0xA9 → %A9
  5. Encoded → '%C3%A9'
  6. Decoded → bytes [0xC3, 0xA9] → UTF-8 'é' (round-trip)

Sinhala 'ශ්‍රී ලංකා' — multi-codepoint, with ZWJ

"ශ්‍රී ලංකා" (15 + 1 + 12 = 28 utf-8 bytes)

  1. ශ → E0 B7 81 ් → E0 B7 8A ZWJ → E2 80 8D
  2. ර → E0 B6 BB ී → E0 B7 93 space → 20
  3. ල → E0 B6 BD ං → E0 B6 82 ක → E0 B6 9A ා → E0 B7 8F
  4. Each non-unreserved byte → %HH (the space goes to %20):
  5. '%E0%B7%81%E0%B7%8A%E2%80%8D%E0%B6%BB%E0%B7%93%20…'
  6. Decoding reverses both legs — bytes back to UTF-8, no NFC munging.

Already-encoded input (double-encoding pitfall)

"Hello%20World"

  1. The encoder is not idempotent — '%' itself must be escaped.
  2. Component → 'Hello%2520World' (each '%' becomes '%25')
  3. Decoding 'Hello%2520World' once → 'Hello%20World'
  4. Decoding again → 'Hello World'
  5. The page surfaces a yellow hint when an encode input matches %HH.

Frequently asked questions

Sources & references

The encoder, decoder, and parser on this page were last cross-checked against ECMA-262 reference behaviour, the WHATWG URL parser, and Python's urllib.parse on 2026-05-11. Any algorithmic change bumps that date. If you spot a mismatch with another implementation, email me below.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spotted a decoding edge case, a profile mismatch, or want to suggest a new feature?

Email me at [email protected] — most fixes ship within 24 hours.