Case Study: Building a Deepfake Detection System Under GDPR

The Brief

Build a system that can analyse uploaded images, videos, and audio to detect whether media has been generated or manipulated by AI. The use case: helping individuals and organisations verify whether content is authentic — particularly in contexts where deepfakes are used for fraud, impersonation, or harassment.

The system would need to process facial data, voice patterns, and video frames. Under GDPR, this is biometric data — Article 9 special category data, the most heavily regulated type of personal data there is. Building the system was the easy part. Building it compliantly was the real challenge.

The Compliance Challenge

Processing biometric data triggers some of the strictest requirements in GDPR:

Article 9 applies. Biometric data processed for identification purposes is special category data. You need an Article 9 exemption to process it at all — in this case, explicit consent from the user uploading the media.
A DPIA is mandatory. Any new technology processing special category data requires a Data Protection Impact Assessment before you go live. Not optional. Not "nice to have." The ICO fined MediaLab.AI £247,590 in February 2026 specifically for processing children's data without conducting a DPIA.
Data minimisation is non-negotiable. You can't store biometric data indefinitely "just in case." Every piece of data needs a retention period justified by the processing purpose.
Third-party processors need governance. The system uses Claude API for visual analysis. That means personal data leaves our infrastructure and enters Anthropic's systems. We need a Data Processing Agreement, international transfer safeguards, and documentation of every data flow.

What We Built

The detection system uses a three-stage pipeline. This isn't a black-box AI model — it's an architecture designed to be auditable and explainable, which matters for both accuracy and compliance.

Stage 1: Signal-Level Forensics

Before any AI model sees the media, we run mathematical analysis on the raw pixels and audio data. This includes:

Frequency domain analysis to detect GAN artifacts invisible to the human eye
Noise pattern analysis — AI-generated images have unnaturally uniform noise compared to real camera sensor noise
Temporal consistency checks for video — frame-to-frame variations in texture and noise that AI generators struggle to maintain
Content credential verification (C2PA) — checking for cryptographic signatures from AI generation tools

This stage processes data locally. No personal data leaves the system. The output is a set of numerical scores — not biometric data, just statistical features of the media file.

Stage 2: Contextual AI Analysis

The forensic scores, along with the media file, are sent to a vision-capable AI model for contextual analysis. The model evaluates visual and audio characteristics that require human-level understanding — facial asymmetry, lighting consistency, lip-sync accuracy, text rendering quality.

This is where the biometric processing happens, and where the compliance architecture matters most. The AI processor (Anthropic) operates under a signed DPA with zero-retention configuration — media data passes through for analysis but is not stored or used for model training.

Stage 3: Forensic Arbitration

When the mathematical signals and the AI analysis disagree, an arbitration layer resolves the conflict using validated thresholds. Strong forensic signals (validated at 71-83% accuracy across 35 test cases) can override the AI model's assessment.

This three-stage design means no single component makes the final decision. It's auditable — you can trace exactly why the system reached its conclusion. That traceability matters when a regulator asks "how does your AI system make decisions?"

The Compliance Framework

The working detection system was one deliverable. The compliance documentation was the other. Both were built in parallel — not bolted on afterwards. Here's what was delivered:

Documentation Package

01
Data Protection Impact Assessment — Full DPIA covering biometric data processing, risk assessment, and mitigation measures. Addresses Article 9 lawful basis (explicit consent), data flows through third-party processors, and automated decision-making considerations under Article 22.
02
Data Minimisation Architecture — 30-day automatic deletion of all uploaded media and scan results. No permanent storage of biometric data. Forensic scores (non-personal statistical data) retained separately from source media.
03
Privacy Notice — Clear, plain-language notice explaining what data is processed, why, who processes it, how long it's retained, and users' rights. Presented before any media upload.
04
Consent Flow — Explicit consent mechanism for Article 9 biometric data processing. Granular — users consent to the specific processing, not a blanket agreement. Withdrawal mechanism built into the user interface.
05
Data Processing Agreement — DPA with AI model provider (Anthropic) covering Article 28 requirements, sub-processor transparency, international transfer safeguards (Standard Contractual Clauses), and zero-retention configuration.
06
Breach Response Plan — Incident response procedure covering the detection system and all third-party processors. 72-hour ICO notification workflow. Includes scenarios specific to biometric data breaches.
07
Records of Processing Activities — Article 30 ROPA entry documenting all processing activities, lawful bases, data categories, recipients, retention periods, and international transfers.

Design Decisions That Made Compliance Possible

Compliance wasn't a layer added on top. It shaped the architecture from the start. Three decisions made the difference:

1. Local-first forensic processing

Stage 1 (signal-level forensics) runs entirely within our infrastructure. Pixel analysis, frequency domain transforms, noise pattern checks — none of this requires sending biometric data to a third party. By the time data reaches the external AI processor, the forensic analysis is already complete. If the forensic signals are conclusive enough, the system can classify media without external processing at all.

2. Separation of biometric data from analysis results

The uploaded media (biometric data) and the analysis results (classification, confidence score, forensic metrics) are stored separately with different retention periods. Results persist for the user to access. Source media is deleted after 30 days. The forensic scores — statistical features of the file, not biometric data — can be retained for system accuracy improvement without retaining any personal data.

3. Explainable output

The system doesn't just say "AI-generated" or "genuine." It provides specific evidence: which forensic signals triggered, what the AI model observed, why the arbitration layer reached its conclusion. This isn't just better UX — it's an Article 22 requirement. When automated processing produces decisions that affect individuals, they have the right to meaningful information about the logic involved.

The Outcome

One project. Two deliverables:

A working deepfake detection system with validated accuracy across image, video, and audio analysis
A complete compliance documentation package that would satisfy an ICO audit

Neither was an afterthought. The system was designed for compliance. The documentation was written alongside the code, not after it. When the architecture changed, the DPIA was updated. When new data flows were added, the ROPA was updated. They moved together.

This is how we work. You don't get a working system and then scramble to document it. You get both, built in parallel, from day one.

This Is How We Work

Every system we build — chatbots, workflow automation, document processing — comes with the compliance documentation as standard. DPIA, privacy notices, DPAs, breach response plans. You don't get one without the other.

If you're building an AI system and need it done right from the start, talk to us.

Book the scoping review View services →

Michael K. Onyekwere is a CIPP/E certified data protection professional and the founder of Janus Compliance. We build AI systems that are compliant from day one — chatbots, workflow automation, document processing — delivered with DPIA, privacy notices, and full documentation. About the author