induwara.lk
induwara.lkDeveloper · Encoding

Base64 Encoder & Decoder

Encode text or any file to Base64, or decode Base64 back to text or a downloadable file. Standard and URL-safe alphabets, optional ‘=’ padding, UTF-8 round-trip verified. Runs entirely in your browser.

By Induwara AshinsanaUpdated May 11, 2026
Base64 encoder & decoderRFC 4648
Input
UTF-8 is used for the byte conversion. Any character — including emoji and Sinhala / Tamil — round-trips cleanly.
Alphabet
Padding
Base64 output
Type or paste text above to see its Base64 encoding here.

Sources: encoding follows RFC 4648 §4 (standard) and §5 (URL-safe). UTF-8 byte conversion uses WHATWG Encoding TextEncoder / TextDecoder. Everything runs in your browser — no uploads, no logs.

How it works

Base64 is the standard way to carry arbitrary bytes through a text-only channel. The encoder takes three input bytes (24 bits) and slices them into four 6-bit groups; each group indexes a 64-character alphabet. When the input is not a multiple of 3 bytes, the last group is partial and one or two padding = characters fill it out — that's why standard Base64 strings always have a length that is a multiple of 4. The encoder on this page implements both alphabets defined by RFC 4648: §4 standard (ending in +/) and §5 URL-safe (ending in -_), with optional padding for the URL-safe form as §3.2 allows.

  1. Bytes from text. When you paste text, it is first converted to UTF-8 bytes by the standard TextEncoder. Sinhala, Tamil, emoji and any other Unicode survive the round-trip — the alternative (UTF-16 directly to Base64) would corrupt anything outside the BMP.
  2. 3-byte → 4-char chunking. For each group of 3 input bytes, the encoder packs them into a 24-bit value v = (a << 16) | (b << 8) | c and emits four characters by shifting and masking 6 bits at a time: alpha[(v >> 18) & 63] then alpha[(v >> 12) & 63] and so on. The same code path handles both alphabets — only the 64-character lookup table changes.
  3. Tail and padding. If the last group has 1 or 2 bytes, the encoder shifts the missing positions to zero, emits 2 or 3 data characters, then appends two or one = characters when padding is on. URL-safe outputs default to no padding (the JWT / OAuth PKCE convention).
  4. Decoder = the inverse. Whitespace and trailing = are stripped, then a single 256-entry lookup translates each character to its 6-bit value (the table accepts both alphabets simultaneously, so you don't have to pick a variant). Every 4 input characters produce 3 output bytes, with the tail logic mirrored. A final TextDecoder("utf-8", { fatal: true }) tries to read the bytes as text — if that fails, the page hands them to a download button instead.

Two independent invariants are checked on every page load to keep the implementation honest. First, the encoded length is computed by a separate formula 4 × ceil(bytes / 3) (with padding) and asserted against the actual encoder output. Second, a probe string containing ASCII punctuation, Sinhala (ශ්‍රී ලංකා), and an emoji is round-tripped encode → decode for both alphabets, and the result must match the original. Both checks are surfaced as the green “Verified · round-trip” badge in the calculator header. If you ever see the badge flip to red on a real input, please email me — that is a real regression.

Worked examples

Three-byte ASCII (no padding)

"Abc" → bytes [0x41, 0x62, 0x63]

  1. Bits: 0100_0001 0110_0010 0110_0011
  2. 6-bit: 010000 010110 001001 100011
  3. Decimal: 16 22 9 35
  4. Lookup: Q W J j
  5. Encoded: 'QWJj' — exactly 4 chars, no padding

Two-byte input (one '=' padding)

"Ab" → bytes [0x41, 0x62]

  1. Bits: 0100_0001 0110_0010 (last 8 bits zero-shifted)
  2. 6-bit: 010000 010110 001000 (pad)
  3. Decimal: 16 22 8
  4. Lookup: Q W I
  5. Encoded standard: 'QWI='
  6. Encoded URL-safe: 'QWI' (padding optional per §3.2)

Bytes that hit the alphabet boundary (+/ vs -_)

bytes [0xFB, 0xFF]

  1. Bits: 1111_1011 1111_1111
  2. 6-bit: 111110 111111 111100 (pad)
  3. Decimal: 62 63 60
  4. Standard alphabet [62, 63] = '+', '/' → '+/8='
  5. URL-safe alphabet [62, 63] = '-', '_' → '-_8'
  6. Same bytes, two safe representations — pick by destination

UTF-8 multi-byte (Unicode round-trip)

"š" → utf-8 bytes [0xC5, 0xA1]

  1. Bits: 1100_0101 1010_0001
  2. 6-bit: 110001 011010 000100 (pad)
  3. Decimal: 49 26 4
  4. Lookup: x a E
  5. Encoded: 'xaE='
  6. Decode → bytes [0xC5, 0xA1] → UTF-8 'š' (round-trip)

Frequently asked questions

Sources & references

The encoder, decoder, and length predictor on this page were last cross-checked against RFC 4648 reference vectors and the browser's built-in btoa / atob on 2026-05-11. Any algorithmic change bumps that date. If you spot a mismatch with another implementation, email me below.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spotted a decoding edge case, an alphabet mismatch, or want to suggest a new feature?

Email me at [email protected] — most fixes ship within 24 hours.