AI Chat Template Generator — ChatML, Llama 3, Mistral & more
Wrap a system prompt and a conversation in the exact special tokens your open model expects, then copy the prompt for raw completion endpoints like Ollama, llama.cpp and vLLM. Ten formats, fully in-browser, no signup.
How it works
When you call a chat endpoint (OpenAI's /chat/completions, or Ollama's /api/chat), the server reads your messages as structured JSON and applies the model's template for you. But when you hit a raw completion endpoint — Ollama's /api/generate, llama.cpp's --prompt, vLLM's /v1/completions, or text-generation-webui — you must hand-wrap the conversation in the model's special tokens yourself. One wrong token and the model rambles, repeats, or ignores the system prompt.
This tool is pure, deterministic string assembly — no model, no network. For the chosen family it concatenates, in order:
- BOS token, if the family uses one and the toggle is on:
<|begin_of_text|>(Llama 3),<s>(Llama 2 / Mistral),<bos>(Gemma). ChatML, Phi-3, Alpaca and Vicuna emit none. - System segment, per family rule. Families with a native system role (Llama 3, ChatML, Phi-3) give it a dedicated block. Families that fold it (Mistral, Gemma, Llama 2) prepend it inside the first user turn. Families that use a prefix (Vicuna, Alpaca, DeepSeek) write it as a plain line. An empty system prompt is omitted entirely.
- Each conversation turn, wrapped in that family's user/assistant markers — for example ChatML's
<|im_start|>{role}\n{content}<|im_end|>, or Gemma's<start_of_turn>blocks where the assistant role is written asmodel. - Generation prompt, if the toggle is on: the empty assistant header that tells the model to start replying, such as
<|start_header_id|>assistant<|end_header_id|>for Llama 3 or an open[/INST]for Mistral and Llama 2.
The special-token set and ordering for each of the 10families is taken verbatim from the official source cited in the references below — Meta's Llama prompt formats, Mistral's tokenization guide, Google's Gemma prompt structure, Microsoft's Phi-3 card, Qwen's docs, DeepSeek's template, and HuggingFace's chat-templating reference. The character count is exact; the “special tokens” count is a literal string match of each family's markers, not a BPE token count (use the AI Token Counter for that).
Worked examples
Frequently asked questions
Sources & references
- ChatML (OpenAI-style / generic) — HuggingFace — Chat Templating
- Qwen / Qwen2.5 (ChatML) — Qwen — Concepts (ChatML)
- Llama 3.x (3 / 3.1 / 3.2 / 3.3) — Meta — Llama 3 prompt format
- Llama 2 / CodeLlama — Meta — Llama 2 prompting guide
- Mistral / Mixtral (Instruct) — Mistral AI — Tokenization
- Gemma / Gemma 2 / Gemma 3 — Google — Gemma prompt structure
- Phi-3 / Phi-3.5 — Microsoft — Phi-3 chat format
- DeepSeek V2 / V3 — DeepSeek-V3 — chat template
- Alpaca (legacy instruction) — Stanford Alpaca
- Vicuna v1.1 (legacy) — LMSYS FastChat (Vicuna)
Token sets were last cross-checked against these official sources on 2026-06-30. The tool ships five worked examples that reconcile byte-for-byte against the published formats; if a format changes, the “verified” badge count drops and the page is updated.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a token mismatch, a missing family, or want a feature added?
Email me at [email protected] — most fixes ship within 24 hours.