AI Data Privacy Comparison: Does ChatGPT, Claude or Gemini Train on Your Data?
Before you paste a client's code, a contract or a CV into an AI, check whether it becomes training data. Compare every major provider across the consumer app, the API, and enterprise tiers — training, retention, human review, opt-out, and Zero-Data-Retention — each cell cited to the official policy. No signup, sources below.
How it works
This is a curated reference, not a calculator. Each row is one provider on one access tier, and every classification is copied from that provider's own published privacy policy or terms of service — never our opinion. The single question it answers is the one most people actually Google: if I type this in, does it train the model?
The key insight the table makes obvious is that the tier matters more than the brand. The same company can train on your data in its free consumer app yet contractually promise never to train on it through its paid API. So each service is classified on three things:
- Trains on your data? —
no(safe with zero configuration),opt-out(on by default, but a documented toggle stops it), oryes(trains with no general opt-out). - Retention, human review, opt-out and Zero-Data-Retention (ZDR) — the supporting facts that decide whether an opt-out is enough for your risk level. ZDR means inputs are never stored, the strongest guarantee.
- Certifications — SOC 2, ISO 27001, GDPR and HIPAA-eligible flags, where the provider publishes them for that tier.
Filtering and the verdict are pure, deterministic derivations over that data — no network call, no scoring weights. “Most private only” keeps a row when trainsOnData = no OR (a documented opt-out AND a ZDR path). The verdict card simply partitions the visible rows into no training, opt-out, and trains by default, and names the safe-by-default services.
Every classification is double-checked before the page renders: an internal integrity test confirms that no service is marked both “trains” and “zero-data-retention,” that each “opt-out” row really has an opt-out path, and that two independent derivations of “safe by default” agree. Because the data is a dated snapshot (2026-06-22), each row links its live policy so you can confirm the current terms — policies change without notice.
Worked examples
Frequently asked questions
Sources & references
Every row links its own policy in the tool above. The primary sources, by provider:
- OpenAI — How your data is used to improve model performance
- OpenAI — Enterprise privacy & API data usage
- Anthropic — Privacy Center
- Anthropic — Commercial Terms of Service
- Google — Gemini Apps & your data
- Google — Gemini API additional terms
- Microsoft — Microsoft 365 Copilot privacy
- Meta — Generative AI & your information
- xAI — Privacy policy
- DeepSeek — Privacy policy
- Mistral — Privacy policy
All 19 classifications were last cross-checked against these official policies on 2026-06-22. The data is a dated snapshot, refreshed as policies change. This page is informational and is not legal advice or a substitute for a Data Processing Agreement; for self-hosted open-weight models (Ollama, LM Studio) nothing leaves your machine, a different trust model not covered here. Provider and product names are trademarks of their respective owners.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Spotted a policy that changed, or a service that should be added?
Email me at [email protected] — most fixes ship within 24 hours.