What is Have I Been Trained?

Have I Been Trained is a free tool created by Spawning AI that lets you search the LAION-5B dataset, which was used to train image generation models like Stable Diffusion. You can upload an image or search by text to see if matching images appear in the training data.

Can Have I Been Trained check if my text content was used to train ChatGPT?

No. Have I Been Trained only searches the LAION-5B image dataset. It does not cover text-based training data used by models like GPT-4, Claude, or Gemini. There is currently no public tool that lets you search the full text training data of major language models.

How do I opt out of AI training data?

Spawning AI provides several opt-out mechanisms: the ai.txt protocol (similar to robots.txt but for AI crawlers), the Do Not Train registry, and direct integrations with platforms like DeviantArt. For text content, you can use robots.txt to block known AI crawlers like GPTBot and Google-Extended.

What is the difference between AI training data and AI visibility?

AI training data is the content used to build the model's knowledge. AI visibility is whether the model actively recommends or mentions your brand when users ask questions. A brand can appear in training data but never get recommended, or get recommended based on signals the model picks up from web sources during inference.

Do I need to track both training data and AI visibility?

For most brands, AI visibility tracking matters more than training data audits. Whether ChatGPT recommends you to potential customers has direct business impact. Training data audits matter more for creators and rights holders concerned about intellectual property.

Have I Been Trained? What It Does, What It Doesn't, and What You Actually Need

You publish an article. It gets indexed. It ranks on Google. Then one day you wonder: did an AI model just... learn from it? Is your content sitting inside ChatGPT's training data, helping it answer questions, without you knowing?

That question led Spawning AI to build Have I Been Trained, a tool that lets you check if your work ended up in AI training datasets. It's a useful tool. But it solves a very specific problem, and it's not the problem most brands think they have.

What Have I Been Trained actually does

Have I Been Trained searches the LAION-5B dataset, the massive open-source image collection used to train models like Stable Diffusion. You upload an image (or search by text), and the tool shows you matching or similar images from the dataset.

If your photos, illustrations, or artwork appear in the results, it means your visual content was included in the training data that powered a generation of image AI models.

The tool was built by Spawning AI, a company focused on creator rights in the age of generative AI. Their thesis is simple: creators should know when their work is used, and they should be able to say no.

Spawning's opt-out ecosystem

Beyond the search tool, Spawning has built a broader set of mechanisms for controlling how AI uses your content:

ai.txt works like robots.txt, but specifically for AI crawlers. You place it on your website to signal which content AI companies can and cannot use for training. Several AI companies have agreed to respect it.

The Do Not Train registry lets creators register their work and declare it off-limits for AI training. Think of it as a global opt-out list.

Platform integrations bring these controls directly into creative platforms. DeviantArt, for example, integrated Spawning's technology so artists can flag their work as not available for AI training.

These tools address a real concern. If you're a photographer, illustrator, or content creator, knowing whether your work was scraped into training data, and having mechanisms to prevent it, matters.

The gap most brands don't see

Here's where it gets interesting. Most businesses searching for "have I been trained" or "is my content in AI" aren't actually worried about training data rights. They're worried about something else entirely: whether AI platforms recommend them.

These are fundamentally different questions.

Training data presence means your content was ingested when the model was built. It's historical. It happened (or didn't) months or years ago. For text-based models like GPT-4 or Claude, there's no public tool equivalent to Have I Been Trained. You can't search their training corpora.

AI visibility means whether ChatGPT, Perplexity, Gemini, Copilot, or Google AI Mode actively mentions your brand when someone asks a relevant question right now. This is dynamic. It changes as models update, as your online presence evolves, and as competitors shift.

A brand can exist in training data but never get recommended. And a brand can get recommended based entirely on signals the model picks up from the live web during inference (which is how Perplexity and Google AI Mode work by design).

Why visibility tracking matters more for most brands

If you run a SaaS, a consultancy, an agency, or any business where customers discover you through search, the question that actually affects your revenue is not "was my website in the training data?" It's "when someone asks ChatGPT for a tool like mine, do they hear about me?"

That's the question AI visibility tracking answers. Tools like Mentionable monitor what AI platforms say about your brand across real prompts that your potential customers use. Not once, but daily, across five major AI platforms. You see trends, spot when competitors gain mentions, and catch it when you disappear from a conversation you used to own.

This is closer to how you already think about SEO rankings. You don't just care whether Google crawled your site. You care whether you rank for the queries that matter. Same logic applies to AI.

Both layers matter, for different reasons

For creators and rights holders, Have I Been Trained and Spawning's opt-out tools address legitimate intellectual property concerns. If your images were used to train Stable Diffusion without consent, you deserve to know that and have recourse.

For brands focused on growth, AI visibility tracking addresses the business impact question. When 30% of product research starts with an AI chatbot (and that number keeps climbing), knowing whether you show up in those conversations is no longer optional.

The two layers aren't in conflict. They're complementary. One governs what goes into the models. The other monitors what comes out.

What to do next

If you're a creator concerned about training data: start with Have I Been Trained for images, and review Spawning's opt-out mechanisms. For text content, configure your robots.txt to block AI crawlers you want to exclude (GPTBot, Google-Extended, CCBot, anthropic-ai).

If you're a brand concerned about AI recommendations: start by checking what AI actually says about you. Run a free visibility check on your domain, or manually test 10-15 prompts your customers might ask across ChatGPT, Perplexity, and Gemini.

Either way, the worst position is not knowing. The AI landscape moves fast, and the brands that track their presence, whether in training data or in live recommendations, are the ones that can actually do something about it.

Have I Been Trained? What It Does, What It Doesn't, and What You Actually Need

Points clés

What Have I Been Trained actually does

Spawning's opt-out ecosystem

The gap most brands don't see

Why visibility tracking matters more for most brands

Both layers matter, for different reasons

What to do next

Questions fréquentes

Prêt à vérifier votre visibilité IA ?

Continuer la lecture

Guides

Apprendre

Alternatives