induwara.lk
induwara.lkDeveloper · AI

AI Chat Template Generator — ChatML, Llama 3, Mistral & more

Wrap a system prompt and a conversation in the exact special tokens your open model expects, then copy the prompt for raw completion endpoints like Ollama, llama.cpp and vLLM. Ten formats, fully in-browser, no signup.

By Induwara AshinsanaUpdated Jun 30, 2026
Build a chat-template promptChatML (OpenAI-style / generic)
Examples

Used by Qwen, Yi, Nous-Hermes, and many fine-tunes. The de-facto generic format.

This family has a dedicated system block.

Conversation

Append the empty assistant header so the model knows to reply.

This family uses no BOS token, so this has no effect.

Characters
120
Special tokens
5
Lines
6
System role
Native system role
The family has a dedicated system block with its own special tokens.
Formatted prompt — copy & paste
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is 2+2?<|im_end|>
<|im_start|>assistant

Special-token legend

TokenMeaningCount
<|im_start|>Start of a message; followed by the role name and a newline.3
<|im_end|>End of a message.2
Where to paste this: Ollama raw /api/generate, llama.cpp --prompt, vLLM /v1/completions, text-generation-webui. Reminder: a /chat/completions endpoint applies the template itself — use this string only with raw /completion endpoints.

Each family's tokens are taken verbatim from its official source (listed below the tool). Use this for raw /completion endpoints; /chat/completions endpoints apply the template for you.

How it works

When you call a chat endpoint (OpenAI's /chat/completions, or Ollama's /api/chat), the server reads your messages as structured JSON and applies the model's template for you. But when you hit a raw completion endpoint — Ollama's /api/generate, llama.cpp's --prompt, vLLM's /v1/completions, or text-generation-webui — you must hand-wrap the conversation in the model's special tokens yourself. One wrong token and the model rambles, repeats, or ignores the system prompt.

This tool is pure, deterministic string assembly — no model, no network. For the chosen family it concatenates, in order:

  1. BOS token, if the family uses one and the toggle is on: <|begin_of_text|> (Llama 3), <s> (Llama 2 / Mistral), <bos> (Gemma). ChatML, Phi-3, Alpaca and Vicuna emit none.
  2. System segment, per family rule. Families with a native system role (Llama 3, ChatML, Phi-3) give it a dedicated block. Families that fold it (Mistral, Gemma, Llama 2) prepend it inside the first user turn. Families that use a prefix (Vicuna, Alpaca, DeepSeek) write it as a plain line. An empty system prompt is omitted entirely.
  3. Each conversation turn, wrapped in that family's user/assistant markers — for example ChatML's <|im_start|>{role}\n{content}<|im_end|>, or Gemma's <start_of_turn> blocks where the assistant role is written as model.
  4. Generation prompt, if the toggle is on: the empty assistant header that tells the model to start replying, such as <|start_header_id|>assistant<|end_header_id|> for Llama 3 or an open [/INST] for Mistral and Llama 2.

The special-token set and ordering for each of the 10families is taken verbatim from the official source cited in the references below — Meta's Llama prompt formats, Mistral's tokenization guide, Google's Gemma prompt structure, Microsoft's Phi-3 card, Qwen's docs, DeepSeek's template, and HuggingFace's chat-templating reference. The character count is exact; the “special tokens” count is a literal string match of each family's markers, not a BPE token count (use the AI Token Counter for that).

Worked examples

ChatML — system + one user turn

Generic ChatML (used by Qwen and many fine-tunes). System gets a native block; the generation prompt opens an empty assistant message. No BOS token.

  • Family: ChatML
  • System: You are a helpful assistant.
  • User: What is 2+2?
  • Generation prompt: on · BOS: off
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is 2+2?<|im_end|>
<|im_start|>assistant

5 special tokens — <|im_start|>×3, <|im_end|>×2. System handling: native.

Llama 3.x — system + one user turn + BOS

Meta Llama 3 format. Every message is wrapped in header-id tokens and closed with <|eot_id|>; the final assistant header is left open for the model to continue. Matches Meta's published format exactly.

  • Family: Llama 3.x
  • System: You are a pirate.
  • User: Hello
  • Generation prompt: on · BOS: on
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a pirate.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>

9 special tokens — <|begin_of_text|>×1, <|start_header_id|>×3, <|end_header_id|>×3, <|eot_id|>×2.

Mistral — multi-turn, system folded into first instruction

Mistral Instruct has no system role, so the system text is prepended inside the first [INST] block. The BOS token appears once at the very start; the completed round ends with </s> and the final [/INST] is left open to generate.

  • Family: Mistral / Mixtral
  • System: Answer in one sentence.
  • User: Capital of Sri Lanka?
  • Assistant: Sri Jayawardenepura Kotte.
  • User: And the largest city?
  • Generation prompt: on · BOS: on
<s>[INST] Answer in one sentence.

Capital of Sri Lanka? [/INST]Sri Jayawardenepura Kotte.</s>[INST] And the largest city? [/INST]

6 special tokens. One <s> at the start, one </s> after the completed round, [INST]/[/INST] around each user turn.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a token mismatch, a missing family, or want a feature added?

Email me at [email protected] — most fixes ship within 24 hours.