What is the Llama 3 prompt format?

Llama 3, 3.1, 3.2 and 3.3 wrap every message in header tokens: {role} followed by two newlines, the content, then . The whole prompt starts with the BOS token, and you append an empty assistant header to make the model reply.

What is ChatML format?

ChatML wraps each message as {role}\n{content} . Roles are system, user and assistant. It uses no BOS token. Qwen, Yi and many fine-tunes use ChatML, which is why it is the most common generic format for local models.

How do I format a system prompt for Mistral?

Mistral Instruct has no system role. The system text is prepended inside the first [INST] block, before the first user message, separated by two newlines. The whole prompt opens with a single BOS token, and each completed round ends with .

What special tokens does Gemma use?

Gemma uses , then {role}\n for each turn and to close it. Gemma has no system role, so any system prompt is folded into the first user turn, and the assistant role is written as model rather than assistant.

How do I build a raw prompt for llama.cpp or Ollama?

Pick your model family, paste your system prompt and conversation, then copy the output. Send it to a raw completion endpoint — Ollama's /api/generate with "raw": true, llama.cpp's --prompt, or vLLM's /v1/completions. The /chat endpoints apply the template for you, so only use this string with /completion-style endpoints.

What does the "add generation prompt" toggle do?

It appends the empty assistant header (for example assistant for ChatML or assistant for Llama 3) so the model continues as the assistant. Turn it off when you are building a fine-tuning string that already contains the assistant reply.

Should I include the BOS token?

For most raw completion calls, yes — the BOS token (such as , or ) marks the start of the sequence. Turn it off if your runtime adds the BOS automatically (many do), otherwise you get a duplicate BOS, which can subtly degrade output. Families like ChatML and Phi-3 use no BOS at all.

Why does my model ramble or ignore the system prompt?

Usually a token mismatch: a missing or , the wrong role name, a duplicated BOS, or a system prompt placed where that family has no system role. This tool emits the exact tokens each family expects, in order, so you can paste a known-good prompt and rule that out.

Does this send my prompt anywhere?

No. The whole tool is plain string assembly that runs in your browser. Nothing is uploaded, no model is called, and no network request fires when you generate. You can disconnect from the internet and it still works.

When were these templates last verified?

The token sets were last cross-checked against the official Meta, Mistral, Google, Microsoft, Qwen and DeepSeek references on 2026-06-30. Five worked examples reconcile byte-for-byte inside the tool, shown as the "verified" badge.

Developer · AI

AI Chat Template Generator — ChatML, Llama 3, Mistral & more

Wrap a system prompt and a conversation in the exact special tokens your open model expects, then copy the prompt for raw completion endpoints like Ollama, llama.cpp and vLLM. Ten formats, fully in-browser, no signup.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 30, 2026

Build a chat-template promptChatML (OpenAI-style / generic)

Examples

Model family

Used by Qwen, Yi, Nous-Hermes, and many fine-tunes. The de-facto generic format.

System prompt (optional)

This family has a dedicated system block.

Conversation

UserTurn 1

Add generation prompt

Append the empty assistant header so the model knows to reply.

Include BOS token

This family uses no BOS token, so this has no effect.

Characters

120

Special tokens

Lines

System role

Native system role

The family has a dedicated system block with its own special tokens.

Formatted prompt — copy & paste

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is 2+2?<|im_end|>
<|im_start|>assistant

Special-token legend

Token	Meaning	Count
<\|im_start\|>	Start of a message; followed by the role name and a newline.	3
<\|im_end\|>	End of a message.	2

Where to paste this: Ollama raw /api/generate, llama.cpp --prompt, vLLM /v1/completions, text-generation-webui. Reminder: a /chat/completions endpoint applies the template itself — use this string only with raw /completion endpoints.

Each family's tokens are taken verbatim from its official source (listed below the tool). Use this for raw /completion endpoints; /chat/completions endpoints apply the template for you.

How it works

When you call a chat endpoint (OpenAI's /chat/completions, or Ollama's /api/chat), the server reads your messages as structured JSON and applies the model's template for you. But when you hit a raw completion endpoint — Ollama's /api/generate, llama.cpp's --prompt, vLLM's /v1/completions, or text-generation-webui — you must hand-wrap the conversation in the model's special tokens yourself. One wrong token and the model rambles, repeats, or ignores the system prompt.

This tool is pure, deterministic string assembly — no model, no network. For the chosen family it concatenates, in order:

BOS token, if the family uses one and the toggle is on: <|begin_of_text|> (Llama 3), <s> (Llama 2 / Mistral), <bos> (Gemma). ChatML, Phi-3, Alpaca and Vicuna emit none.
System segment, per family rule. Families with a native system role (Llama 3, ChatML, Phi-3) give it a dedicated block. Families that fold it (Mistral, Gemma, Llama 2) prepend it inside the first user turn. Families that use a prefix (Vicuna, Alpaca, DeepSeek) write it as a plain line. An empty system prompt is omitted entirely.
Each conversation turn, wrapped in that family's user/assistant markers — for example ChatML's <|im_start|>{role}\n{content}<|im_end|>, or Gemma's <start_of_turn> blocks where the assistant role is written as model.
Generation prompt, if the toggle is on: the empty assistant header that tells the model to start replying, such as <|start_header_id|>assistant<|end_header_id|> for Llama 3 or an open [/INST] for Mistral and Llama 2.

The special-token set and ordering for each of the 10families is taken verbatim from the official source cited in the references below — Meta's Llama prompt formats, Mistral's tokenization guide, Google's Gemma prompt structure, Microsoft's Phi-3 card, Qwen's docs, DeepSeek's template, and HuggingFace's chat-templating reference. The character count is exact; the “special tokens” count is a literal string match of each family's markers, not a BPE token count (use the AI Token Counter for that).

Worked examples

ChatML — system + one user turn

Generic ChatML (used by Qwen and many fine-tunes). System gets a native block; the generation prompt opens an empty assistant message. No BOS token.

Family: ChatML
System: You are a helpful assistant.
User: What is 2+2?
Generation prompt: on · BOS: off

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is 2+2?<|im_end|>
<|im_start|>assistant

5 special tokens — <|im_start|>×3, <|im_end|>×2. System handling: native.

Llama 3.x — system + one user turn + BOS

Meta Llama 3 format. Every message is wrapped in header-id tokens and closed with <|eot_id|>; the final assistant header is left open for the model to continue. Matches Meta's published format exactly.

Family: Llama 3.x
System: You are a pirate.
User: Hello
Generation prompt: on · BOS: on

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a pirate.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>

9 special tokens — <|begin_of_text|>×1, <|start_header_id|>×3, <|end_header_id|>×3, <|eot_id|>×2.

Mistral — multi-turn, system folded into first instruction

Mistral Instruct has no system role, so the system text is prepended inside the first [INST] block. The BOS token appears once at the very start; the completed round ends with </s> and the final [/INST] is left open to generate.

Family: Mistral / Mixtral
System: Answer in one sentence.
User: Capital of Sri Lanka?
Assistant: Sri Jayawardenepura Kotte.
User: And the largest city?
Generation prompt: on · BOS: on

<s>[INST] Answer in one sentence.

Capital of Sri Lanka? [/INST]Sri Jayawardenepura Kotte.</s>[INST] And the largest city? [/INST]

6 special tokens. One <s> at the start, one </s> after the completed round, [INST]/[/INST] around each user turn.

Frequently asked questions

Sources & references

Token sets were last cross-checked against these official sources on 2026-06-30. The tool ships five worked examples that reconcile byte-for-byte against the published formats; if a format changes, the “verified” badge count drops and the page is updated.

Related tools

LiveAI

System Prompt Generator

Turn a short form into a structured system prompt — role, task, tone, output format, guardrails and few-shot examples — for ChatGPT Custom Instructions, a Custom GPT, or the Claude, OpenAI or Gemini system field. Generic Markdown or Claude XML style, with a live preview and copy/download. Runs entirely in your browser; nothing is uploaded, no AI rewriting.

Open tool

LiveAI

Prompt Formatter

Paste rough notes and get a well-structured AI prompt with role, task, constraints, and output format. Works with GPT, Claude, Gemini. Deterministic templating — runs entirely in your browser.

Open tool

LiveAI

AI LLM License Checker

Check in plain English whether Llama, Mistral, Gemma, Qwen, DeepSeek, Command R, GPT, Claude or Gemini is free for commercial use — with MAU caps, attribution rules, and a link to each official license. Free, no signup, sources cited.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a token mismatch, a missing family, or want a feature added?

Email me at [email protected] — most fixes ship within 24 hours.