Can I request my data be removed from ChatGPT's training data under GDPR?

Yes. OpenAI accepts data subject access requests and deletion requests through its privacy portal at privacy.openai.com. Under GDPR Article 17, EU residents can request erasure of personal data. However, removing data from a trained model is technically different from deleting a database record, and OpenAI may not be able to fully remove specific data from model weights.

How do I block AI crawlers from scraping my website?

Add directives to your robots.txt file to block known AI crawlers. Key user agents to block include GPTBot (OpenAI), Google-Extended (Google AI training), CCBot (Common Crawl, used by many AI companies), anthropic-ai (Anthropic), and Bytespider (ByteDance). You can also implement the ai.txt protocol from Spawning AI for more granular control.

Does the EU AI Act change anything about data removal from AI models?

The EU AI Act (effective 2025-2026) requires providers of general-purpose AI models to publish summaries of training data and comply with copyright law. It strengthens the obligation to respect opt-out mechanisms and increases transparency requirements, but the practical enforcement of data removal from trained models remains an evolving area.

What is the difference between removing data from AI training and AI visibility?

Data removal addresses whether your content was used to build the model, an intellectual property concern. AI visibility addresses whether the model recommends your brand when users ask relevant questions, a business growth concern. You can have your data removed from training sets and still appear in AI recommendations (via web browsing), or be in the training data and never get recommended.

Can I use GDPR to remove false information about my brand from ChatGPT?

GDPR's right to rectification (Article 16) gives you the right to correct inaccurate personal data. If ChatGPT generates false statements about you or your business, you can submit a correction request to OpenAI. However, the more effective approach for brands is to improve your web presence so AI models have accurate information to reference.

AI Data Removal and GDPR: How to Request Deletion from OpenAI, Google, and Others

You Google your brand one day and everything looks fine. Then you ask ChatGPT about it and get a response that's partly wrong, partly outdated, and partly based on content you never consented to share. Now you want it gone. Or at least corrected.

Welcome to one of the messiest intersections in tech right now: your data, AI models, and the laws that are supposed to protect you.

What GDPR actually says about AI training data

GDPR's Article 17, the "right to erasure" or "right to be forgotten," gives EU residents the right to request deletion of their personal data. Article 16 covers the right to rectification of inaccurate data. Both apply to AI companies processing personal data of EU residents.

The catch is practical, not legal. When your data is used to train a large language model, it doesn't sit in a database row you can delete. It's embedded in billions of model parameters. "Deleting" it from a trained model is technically different from deleting a record from a traditional database, and the industry is still working out what compliance actually looks like.

That said, the legal obligation exists. AI companies must respond to GDPR requests, and regulators across Europe have made it clear they take this seriously. Italy temporarily banned ChatGPT in 2023 over GDPR concerns. France's CNIL has issued guidance specifically on AI and personal data.

How to request removal, company by company

Each major AI company has its own process. Here's the current state.

OpenAI (ChatGPT, DALL-E, GPT API)

Submit requests through privacy.openai.com. You can request access to your data, deletion of your account data, and correction of inaccurate outputs. OpenAI also lets users opt out of having their conversations used for training via account settings. For website owners, blocking GPTBot in robots.txt prevents future crawling.

Google (Gemini, Bard training data)

Google's privacy tools at myaccount.google.com handle deletion requests for Gemini conversation data. For preventing your website from being used in AI training, block the Google-Extended user agent in robots.txt. Note that this is separate from Googlebot, which handles regular search indexing.

Meta (LLaMA models)

Meta accepts data removal requests through its standard privacy portal. For content on Facebook and Instagram, you can object to your data being used for AI training through the platform settings. Meta released its GDPR compliance documentation for LLaMA training in 2024 after pressure from European regulators.

Stability AI (Stable Diffusion)

For image content, use haveibeentrained.com to check if your images appear in the LAION-5B dataset used for Stable Diffusion training. Spawning AI's Do Not Train registry lets you formally opt out. Stability AI has committed to respecting these opt-outs in future model versions.

Proactive opt-out mechanisms

Requesting removal after the fact is one approach. Preventing inclusion in the first place is another.

robots.txt remains the most universal mechanism. Add specific directives for AI crawlers:

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

This blocks future crawling but doesn't affect data already collected.

ai.txt is a protocol by Spawning AI that provides more granular control than robots.txt. It lets you specify different permissions for different types of AI use (training, inference, indexing).

Platform-level controls on sites like DeviantArt, ArtStation, and others let creators flag individual works as not available for AI training. These vary by platform.

The TDM (Text and Data Mining) reservation under EU copyright law lets rights holders explicitly reserve their rights against text and data mining, including AI training. This is legally binding in the EU since the DSM Directive.

The EU AI Act adds a new layer

The EU AI Act, with provisions rolling out through 2025-2026, adds obligations for providers of general-purpose AI models:

Publish sufficiently detailed summaries of training data
Comply with EU copyright law, including TDM opt-outs
Implement technical measures to respect robots.txt and similar signals

This doesn't give you a magic "delete my data" button, but it increases transparency and strengthens the legal basis for opt-out mechanisms.

Data removal vs. visibility monitoring: two different problems

Here's where most brands get confused. They conflate two separate concerns.

Data removal is about intellectual property and privacy. Did an AI company use your content to train their model without permission? This matters for creators, publishers, and anyone who produces original content.

AI visibility is about business outcomes. When a potential customer asks ChatGPT "what's the best tool for [your category]," does your brand appear in the answer? This matters for any business that acquires customers through search.

These two concerns can even pull in opposite directions. Blocking all AI crawlers protects your content from being scraped, but it might also reduce the signals that help AI platforms recommend you. A brand that aggressively opts out of everything might protect its IP while becoming invisible in AI-driven discovery.

The pragmatic approach for most businesses: use robots.txt selectively (block training crawlers, not inference crawlers where possible), monitor your AI visibility actively, and file data correction requests when AI platforms present inaccurate information about you.

What actually moves the needle

If your goal is accurate AI representation and strong AI visibility, data removal requests are a backstop, not a strategy. The strategy is:

Make accurate information easy to find. Clear About pages, structured data, up-to-date product information.
Build third-party signals. Reviews on G2 and Trustpilot, mentions on Reddit, coverage in industry publications.
Monitor continuously. Track what AI platforms say about you daily, not once a quarter. Tools like Mentionable automate this across ChatGPT, Perplexity, Gemini, Copilot, and Google AI Mode.
Correct when needed. File correction requests for factual errors. Optimize your content to provide better signals.

GDPR gives you rights. Use them when you need to. But don't mistake defensive data removal for a growth strategy. The brands winning in AI visibility are the ones actively shaping how AI represents them, not just trying to pull their data out of the machine.

AI Data Removal and GDPR: How to Request Deletion from OpenAI, Google, and Others

Points clés

What GDPR actually says about AI training data

How to request removal, company by company

Proactive opt-out mechanisms

The EU AI Act adds a new layer

Data removal vs. visibility monitoring: two different problems

What actually moves the needle

Questions fréquentes

Prêt à vérifier votre visibilité IA ?

Continuer la lecture

Guides

Apprendre

Alternatives