GDPR

GDPR-Compliant ChatGPT API: The 2026 Setup Guide for Engineers

Michael K. Onyekwere·29 April 2026·13 min read

The short answer to "can I use the ChatGPT API and stay GDPR compliant?" is yes, and I covered it in a prior post. What that piece doesn't cover, and what I get asked every week, is the actual setup. The clicks. The code. The DPIA written out, not bullet-pointed.

This is the walkthrough I do with clients on day one of an OpenAI integration. Real click paths, actual sanitisation patterns, the EU AI Act overlay that lands on August 2, 2026. Practitioner-grade rather than overview-grade.

Get articles like this every Tuesday. Compliance Engineering, practical AI compliance for engineers and founders. Free, weekly, written by a CIPP/E certified practitioner who actually builds these systems.

Need the same setup for your stack? A £500 scoping review covers your provider configuration, DPA status, data flows, DPIA, and AI Act classification. Written report in one week.

What you'll actually do, in order

There are six things that need to be true before your ChatGPT API call hits production. I'll walk through each one.

The OpenAI DPA is executed and on file
Zero retention is enabled and verified
Your application has a sanitisation layer before the API call
The DPIA is written and signed off
Your privacy notice tells users about the AI processing
Your EU AI Act classification is documented

The order matters. Skip the DPA and everything downstream is built on sand. Skip the sanitisation layer and you are sending more personal data to a US processor than your DPIA covers, which is a different breach again.

Step 1: Sign the OpenAI DPA (the actual click path)

You need a business OpenAI account. Personal accounts cannot execute the Data Processing Addendum. If you are still using your personal email at platform.openai.com for company work, fix that first.

The path:

Sign in at platform.openai.com
Click Settings in the left sidebar
Click Organization
Look for Compliance APIs or Data Processing Addendum depending on the account type
Open openai.com/policies/data-processing-addendum
Click Execute Data Processing Agreement
Fill in: legal entity name, signatory name and title, signatory email, billing entity (if different)
Submit

You receive the countersigned PDF by email within minutes. Save it somewhere your DPO and legal can find it. I've seen breach notifications stall by hours because nobody could locate the executed DPA.

What the DPA covers:

The processing OpenAI performs as your processor under Article 28 GDPR
Sub-processors (which is the list of services OpenAI uses to deliver yours)
Data location, retention, and deletion obligations
Notification obligations on a security incident (72 hours)
Audit rights
Standard Contractual Clauses for international transfers (the SCCs are part of the same instrument)

If the DPA is not executed, you are processing personal data through OpenAI without the contract Article 28 requires. Every API call is a separate breach. The fix is signing. It takes ten minutes. There is no excuse for not having it.

Step 2: Enable zero retention and verify it

By default, OpenAI retains API inputs and outputs for up to 30 days for abuse monitoring. They do not use API data for model training. The 30-day retention is the residual risk.

For most GDPR contexts, especially anything involving customer-facing chatbots, support workflows, or document processing, you want zero retention. The path:

From platform.openai.com, open Settings then Organization then Compliance APIs
Submit the zero-retention request, or email api-compliance@openai.com directly
Subject: "Zero Data Retention Request"
Body: state the legal basis for the request (GDPR Article 5(1)(c) data minimisation), name the API endpoints affected (e.g. /v1/chat/completions, /v1/embeddings), confirm your DPA is executed, give your organisation ID

OpenAI replies within 2-5 business days. Once approved, the flag is set at the organisation level. Verify in your dashboard that "Data Retention: 0 days" is shown on the affected endpoints. If it isn't shown, the configuration is not live, regardless of what the email said.

I've had two clients believe zero retention was enabled and discover, on a routine audit, that the request had stalled in OpenAI's queue. Verify before you brief your DPO that the configuration is in place.

Step 3: Build the sanitisation layer before the API call

This is the code most teams skip and the reason most LLM compliance work later finds problems. Before any prompt is constructed, run the input through a sanitisation pipeline.

What that pipeline does, at minimum:

import re

DIRECT_IDENTIFIER_PATTERNS = {
    "EMAIL": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
    "UK_PHONE": r"\b(?:\+44|0)\s?\d{2,4}\s?\d{3,4}\s?\d{3,4}\b",
    "UK_NI": r"\b[A-CEGHJ-PR-TW-Z]{2}\d{6}[A-D]\b",
    "ACCOUNT_REF": r"\b[A-Z]{2,4}-?\d{6,10}\b",
}

def sanitise(text: str) -> tuple[str, dict[str, str]]:
    redactions = {}
    for label, pattern in DIRECT_IDENTIFIER_PATTERNS.items():
        for i, match in enumerate(re.finditer(pattern, text)):
            token = f"<{label}_{i}>"
            redactions[token] = match.group()
            text = text.replace(match.group(), token, 1)
    return text, redactions

Then, when constructing the prompt, you pass the sanitised text. The redactions dictionary stays on your server. The prompt OpenAI sees is the placeholder version. When the response comes back, you re-inflate the placeholders before showing the user.

For more demanding contexts (health data, financial information, special category data), layer in named-entity recognition (spaCy, Presidio, or a vertical-specific tool) on top of the regex pass. For high-risk integrations, run the sanitiser output through a second-pass review by your DPO before the integration goes live.

The compliance benefit is massive. Your DPIA can credibly document that personal data is minimised before it leaves your server, that the LLM provider does not see direct identifiers, and that re-identification on the response side is performed inside your trust boundary. Without this layer, the DPIA cannot make those claims, and your data minimisation argument under Article 5(1)(c) is weaker.

Step 4: Write the DPIA properly

Most DPIAs I see for chatbot integrations are five-page documents that say very little. The structure regulators actually expect is in the toolkit's chatbot DPIA template, CC BY licensed. Use it.

The sections that matter most for an OpenAI integration:

System summary. Name the chatbot, name the deployment surface, name the LLM provider and the specific model (e.g. GPT-4o, GPT-4 Turbo). Name your data sources.

Roles. You are the controller. OpenAI is the processor. List the sub-processors (in OpenAI's case, hosting infrastructure, content moderation services, billing). The sub-processor list is in the DPA you executed in step 1.

Lawful basis. For most customer support chatbots, legitimate interest under Article 6(1)(f) is the typical basis. Document the legitimate interest balancing test. For transactional support that is part of a contract you have with the customer, contractual necessity under Article 6(1)(b) is cleaner. Pick one and write the analysis.

Personal data inventory. What goes into the prompt after sanitisation. What comes back. What you log. What you retain. What goes to OpenAI. What does not. This is where the sanitisation layer pays off; you can say with specificity what categories of data leave your server and what does not.

Risk assessment. The risks specific to LLM integrations are: model output exposing third-party data (extremely unlikely with API zero-retention), prompt injection causing unintended disclosures, conversation logs becoming a high-value attack surface, and data subject rights handling (right to erasure must reach the LLM provider's logs if any exist). The chatbot DPIA template walks through each.

Controls and mitigations. Sign the DPA, enable zero retention, run the sanitisation layer, log conversations on your side with encryption and access controls, define retention, build a deletion process that reaches both your logs and OpenAI's residual logs (under zero retention there shouldn't be any, but the deletion process is a control regardless).

Residual risk and sign-off. Low risk after controls is the typical landing for a properly configured OpenAI chatbot. DPO sign-off, dated. Senior management approval, dated. Annual review.

The DPIA is not a form. It is the document that proves you thought about the risks before you went live. The first thing a regulator asks for in any AI investigation is the DPIA. Make sure yours is one a regulator would accept.

Step 5: Update the privacy notice

Your users need to know you are using an AI processor. The standard pattern, in plain language:

"We use AI to help respond to support enquiries faster. The AI is provided by OpenAI under a Data Processing Agreement that meets our Article 28 GDPR obligations. We do not retain your conversation data with OpenAI beyond the length of the request, and OpenAI does not use your data to train its models. You can opt out of AI-assisted support by replying STOP or contacting our DPO at [email]."

Three things that paragraph does:

Tells the user what AI does and why
Names the processor and the contractual basis
Provides an opt-out and a contact

Add it to your privacy notice and, if the AI is in a chatbot, surface a shorter version at the start of every conversation. The EU AI Act Article 50 transparency obligation requires this disclosure separately, even for limited-risk systems. Doing it under GDPR sets you up for AI Act compliance with no extra work.

Step 6: Document the EU AI Act classification

From August 2, 2026, the EU AI Act high-risk obligations are live. Most chatbots are limited-risk; the main obligation is the transparency disclosure you already added in step 5. But if your ChatGPT integration falls into Annex III, you have substantially more work.

Annex III high-risk categories include AI used for:

Creditworthiness assessment or credit scoring
Employment-related decisions (recruitment, termination, performance evaluation)
Insurance pricing for life or health insurance
Eligibility for essential public services or benefits
Law enforcement, migration, or border control

If your chatbot helps a loan officer make credit decisions, screens job candidates, or determines benefit eligibility, it is high-risk under Annex III. The compliance package that triggers includes:

Conformity assessment before deployment
Technical documentation under Annex IV
Risk management system under Article 9
Data governance under Article 10
Human oversight under Article 14
Accuracy, robustness, and cybersecurity under Article 15
Registration in the EU AI database
Post-market monitoring

If you are limited-risk, the transparency disclosure plus the GDPR setup above is enough. Document the classification, the reasoning, and the lawful basis in a short addendum to your DPIA. If you are high-risk, the work is real, and it is what we do. Start with a scoping review.

Common failure modes I see in real audits

Five patterns. Avoid them.

The unsigned DPA. Team built the integration first, never executed the DPA. Every call is an Article 28 violation. The fix is ten minutes; the lookback exposure is months.

The missing sanitisation layer. Whole customer records pasted into the prompt context. The DPIA says "data is minimised" but the architecture says otherwise. The DPIA is wrong. Fix the architecture, not the document.

The zero-retention assumption. Team believed zero retention was enabled because someone submitted the request. Never verified. Production traffic ran for weeks under default 30-day retention with the DPIA claiming zero. Verify, screenshot, and re-verify on each contract renewal.

The "we don't process personal data" claim. Customer free-text inputs almost always contain personal data even if you are not asking for it. Names, email addresses, phone numbers, location signals. Treat the input field as personal data by default. Sanitise.

The forgotten erasure path. When a data subject exercises their right to erasure, the deletion has to reach all systems where their data lives. Conversation logs on your side, plus a deletion request to the LLM provider for any residual logs. Build the path before you need it.

Quick verification checklist

Before you push the integration to production:

OpenAI DPA executed under the correct legal entity, PDF stored where DPO can find it
Zero retention requested, approved, and verified in the dashboard
Sanitisation layer in place; tested against known direct identifiers
DPIA written with the chatbot template structure, DPO sign-off recorded
Privacy notice updated with the AI processing disclosure and the AI Act transparency line
EU AI Act classification documented (limited-risk or high-risk, with reasoning)
Conversation logs on your side encrypted, access-controlled, with retention defined
Erasure process tested end to end
Breach response plan includes the AI processor scenario

What this looks like for other providers

The pattern transfers. The click paths differ.

Anthropic (Claude API). The DPA is incorporated automatically into Anthropic's Commercial Terms of Service when you accept them for the API or Claude for Work. No separate signing step. Confirm you are on commercial terms, not consumer. Anthropic does not use API data for training by default, but the standard commercial terms include retention for trust and safety monitoring (typically 30 days), similar to OpenAI's default. Zero-retention configurations are available for enterprise customers; confirm the retention period that applies to your contract before treating it as zero. The sanitisation layer, DPIA, privacy notice, and AI Act classification work the same.

Google (Gemini API). Covered by your Google Cloud or Workspace DPA, which flows from the Cloud project terms. Configure data residency in Cloud Console. Document the EU multi-region setup if relevant.

Self-hosted (Llama, Mistral, Mixtral). No third-party DPA needed. You are the controller and the processor for that processing activity. The DPIA still applies; the privacy notice still applies; the AI Act classification still applies. The data minimisation argument is structurally easier (data never leaves your infrastructure), but document the rationale.

Closing

The fine for getting this wrong runs to millions. The setup, done correctly, takes a working day. There is no compelling reason to skip it.

If the team you work with already operates this way, this article is the checklist that confirms what you've built. If you are looking at a working integration that hasn't been documented properly, we do this end to end. The scoping review is the right first step. Written assessment in a week, the £500 deducts from any full engagement that follows.

For the high-level summary of why each step exists, the original article is shorter and Q&A-shaped. For DPIA structure, the chatbot DPIA template is open source. For breach response, the Nigeria-specific template is in the same toolkit, with UK and EU-specific versions landing in the next quarter.

See you next Tuesday.

Michael

Michael K. Onyekwere is a CIPP/E certified data protection professional and the founder of Janus Compliance. Writes Compliance Engineering, the weekly newsletter on practical AI compliance for engineers and founders. If you need a real answer on your provider setup, transfers, DPIA, retention, and AI Act classification, start with a £500 scoping review. If you also need the build done, see the AI Chatbot + Compliance Package.

Frequently Asked Questions

Where exactly do I sign the OpenAI DPA?

Sign in to platform.openai.com from a business account (not a personal one), open Settings, then Organization, then click the Compliance APIs section. The Data Processing Addendum is at openai.com/policies/data-processing-addendum. Click 'Execute Data Processing Agreement.' You enter the entity name, signatory details, and a contact email. Once executed, OpenAI emails the countersigned PDF. If you only see consumer settings, you're signed in to the wrong account. There is no consumer DPA.

How do I enable zero retention on the OpenAI API?

Zero retention is not a self-serve toggle for most accounts. You request it via the Compliance APIs page in platform.openai.com or by emailing api-compliance@openai.com. State that you require zero data retention for GDPR compliance, name the relevant API endpoints, and confirm your DPA is executed. Approval typically takes 2-5 business days. Once approved, OpenAI sets the flag at the organisation level, and your prompts and completions are not retained beyond the request lifecycle. Verify in your account dashboard that 'Data Retention: 0 days' is shown on the affected endpoints before treating the configuration as live.

What does data minimisation actually look like in code?

Build a sanitisation layer between your application and the API call. At minimum, strip direct identifiers (names, email addresses, phone numbers, account numbers, government IDs) before constructing the prompt. Replace them with placeholder tokens if context is needed (e.g. CUSTOMER_NAME, ORDER_REF), and re-inflate the response on your server before showing the user. For higher-risk categories (health, financial, special category data), add a named-entity recognition pass and a regex allowlist of fields permitted in the prompt context. Document the filter in your DPIA. The code is small. The compliance benefit is large.

Does the EU AI Act change my ChatGPT API setup?

Yes. From August 2, 2026, AI systems used in the EU are classified into prohibited, high-risk, limited-risk, and minimal-risk categories. Most customer-facing chatbots are limited-risk and need transparency disclosures (telling users they are interacting with AI). If your ChatGPT integration makes decisions that affect rights, finances, or access to services (credit scoring, recruitment, insurance pricing, eligibility), it is high-risk and needs conformity assessment, technical documentation, and human oversight mechanisms. Audit your use case against Annex III. The GDPR setup is necessary but not sufficient if you fall in the high-risk class.

What if I'm using Claude or Gemini instead of OpenAI?

The pattern is the same, the click paths differ. Anthropic incorporates the DPA automatically when you accept Commercial Terms for the API or Claude for Work; no separate signing step. Confirm you are on commercial, not consumer, terms. Google Gemini API is covered under your Google Cloud or Workspace DPA, which flows from your Cloud project terms. Self-hosted models (Llama, Mistral, Mixtral) do not need a third-party DPA because there is no external processor; you become the controller and the processor for that data. Pick the provider on technical fit, then configure for compliance.

Start with a £500 scoping review

If you need GDPR documentation, AI Act work, or a compliant AI build, the first step is a written scoping review. You get a real report, not a generic discovery call.

Book the £500 scoping review View AI compliance consulting See sample deliverables