You Google your brand one day and everything looks fine. Then you ask ChatGPT about it and get a response that's partly wrong, partly outdated, and partly based on content you never consented to share. Now you want it gone. Or at least corrected.
Welcome to one of the messiest intersections in tech right now: your data, AI models, and the laws that are supposed to protect you.
What GDPR actually says about AI training data
GDPR's Article 17, the "right to erasure" or "right to be forgotten," gives EU residents the right to request deletion of their personal data. Article 16 covers the right to rectification of inaccurate data. Both apply to AI companies processing personal data of EU residents.
The catch is practical, not legal. When your data is used to train a large language model, it doesn't sit in a database row you can delete. It's embedded in billions of model parameters. "Deleting" it from a trained model is technically different from deleting a record from a traditional database, and the industry is still working out what compliance actually looks like.
That said, the legal obligation exists. AI companies must respond to GDPR requests, and regulators across Europe have made it clear they take this seriously. Italy temporarily banned ChatGPT in 2023 over GDPR concerns. France's CNIL has issued guidance specifically on AI and personal data.
How to request removal, company by company
Each major AI company has its own process. Here's the current state.
OpenAI (ChatGPT, DALL-E, GPT API)
Submit requests through privacy.openai.com. You can request access to your data, deletion of your account data, and correction of inaccurate outputs. OpenAI also lets users opt out of having their conversations used for training via account settings. For website owners, blocking GPTBot in robots.txt prevents future crawling.
Google (Gemini, Bard training data)
Google's privacy tools at myaccount.google.com handle deletion requests for Gemini conversation data. For preventing your website from being used in AI training, block the Google-Extended user agent in robots.txt. Note that this is separate from Googlebot, which handles regular search indexing.
Meta (LLaMA models)
Meta accepts data removal requests through its standard privacy portal. For content on Facebook and Instagram, you can object to your data being used for AI training through the platform settings. Meta released its GDPR compliance documentation for LLaMA training in 2024 after pressure from European regulators.
Stability AI (Stable Diffusion)
For image content, use haveibeentrained.com to check if your images appear in the LAION-5B dataset used for Stable Diffusion training. Spawning AI's Do Not Train registry lets you formally opt out. Stability AI has committed to respecting these opt-outs in future model versions.
Proactive opt-out mechanisms
Requesting removal after the fact is one approach. Preventing inclusion in the first place is another.
robots.txt remains the most universal mechanism. Add specific directives for AI crawlers:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
This blocks future crawling but doesn't affect data already collected.
ai.txt is a protocol by Spawning AI that provides more granular control than robots.txt. It lets you specify different permissions for different types of AI use (training, inference, indexing).
Platform-level controls on sites like DeviantArt, ArtStation, and others let creators flag individual works as not available for AI training. These vary by platform.
The TDM (Text and Data Mining) reservation under EU copyright law lets rights holders explicitly reserve their rights against text and data mining, including AI training. This is legally binding in the EU since the DSM Directive.
The EU AI Act adds a new layer
The EU AI Act, with provisions rolling out through 2025-2026, adds obligations for providers of general-purpose AI models:
- Publish sufficiently detailed summaries of training data
- Comply with EU copyright law, including TDM opt-outs
- Implement technical measures to respect robots.txt and similar signals
This doesn't give you a magic "delete my data" button, but it increases transparency and strengthens the legal basis for opt-out mechanisms.
Data removal vs. visibility monitoring: two different problems
Here's where most brands get confused. They conflate two separate concerns.
Data removal is about intellectual property and privacy. Did an AI company use your content to train their model without permission? This matters for creators, publishers, and anyone who produces original content.
AI visibility is about business outcomes. When a potential customer asks ChatGPT "what's the best tool for [your category]," does your brand appear in the answer? This matters for any business that acquires customers through search.
These two concerns can even pull in opposite directions. Blocking all AI crawlers protects your content from being scraped, but it might also reduce the signals that help AI platforms recommend you. A brand that aggressively opts out of everything might protect its IP while becoming invisible in AI-driven discovery.
The pragmatic approach for most businesses: use robots.txt selectively (block training crawlers, not inference crawlers where possible), monitor your AI visibility actively, and file data correction requests when AI platforms present inaccurate information about you.
What actually moves the needle
If your goal is accurate AI representation and strong AI visibility, data removal requests are a backstop, not a strategy. The strategy is:
- Make accurate information easy to find. Clear About pages, structured data, up-to-date product information.
- Build third-party signals. Reviews on G2 and Trustpilot, mentions on Reddit, coverage in industry publications.
- Monitor continuously. Track what AI platforms say about you daily, not once a quarter. Tools like Mentionable automate this across ChatGPT, Perplexity, Gemini, Copilot, and Google AI Mode.
- Correct when needed. File correction requests for factual errors. Optimize your content to provide better signals.
GDPR gives you rights. Use them when you need to. But don't mistake defensive data removal for a growth strategy. The brands winning in AI visibility are the ones actively shaping how AI represents them, not just trying to pull their data out of the machine.
