CIPHER: Counterfeit Image Pattern High-level Examination via Representation

View on arXiv ← Back to list

Authors: Kyeonghun Kim, Youngung Han, Seoyoung Ju, Yeonju Jean, YooHyun Kim, Minseo Choi, SuYeon Lim, Kyungtae Park, Seungwoo Baek, Sieun Hyeon, Nam-Joon Kim, Hyuk-Jae Lee

Published: 2026-03-31 07:27:58+00:00

Comment: 6 pages, 2 figures. Accepted at IEEE-Asia 2025

AI Summary

The CIPHER framework addresses the challenge of generalizable deepfake detection by systematically reusing and fine-tuning discriminators from generative models. It integrates scale-adaptive features from ProGAN discriminators and temporal-consistency features from diffusion models to capture generation-agnostic artifacts. This approach achieves superior cross-model detection performance across various state-of-the-art generative models, making it robust against evolving generative technologies.

Abstract

The rapid progress of generative adversarial networks (GANs) and diffusion models has enabled the creation of synthetic faces that are increasingly difficult to distinguish from real images. This progress, however, has also amplified the risks of misinformation, fraud, and identity abuse, underscoring the urgent need for detectors that remain robust across diverse generative models. In this work, we introduce Counterfeit Image Pattern High-level Examination via Representation(CIPHER), a deepfake detection framework that systematically reuses and fine-tunes discriminators originally trained for image generation. By extracting scale-adaptive features from ProGAN discriminators and temporal-consistency features from diffusion models, CIPHER captures generation-agnostic artifacts that conventional detectors often overlook. Through extensive experiments across nine state-of-the-art generative models, CIPHER demonstrates superior cross-model detection performance, achieving up to 74.33% F1-score and outperforming existing ViT-based detectors by over 30% in F1-score on average. Notably, our approach maintains robust performance on challenging datasets where baseline methods fail, with up to 88% F1-score on CIFAKE compared to near-zero performance from conventional detectors. These results validate the effectiveness of discriminator reuse and cross-model fine-tuning, establishing CIPHER as a promising approach toward building more generalizable and robust deepfake detection systems in an era of rapidly evolving generative technologies.

Key findings

CIPHER demonstrates superior cross-model detection performance, achieving up to 74.33% F1-score and outperforming existing ViT-based detectors by over 30% on average. The approach maintains robust performance on challenging datasets like CIFAKE, reaching up to 88% F1-score where conventional detectors showed near-zero performance. These results validate the effectiveness of discriminator reuse and cross-model fine-tuning for building generalizable deepfake detection systems.

Approach

CIPHER extracts scale-adaptive features from pre-trained ProGAN discriminators and temporal-consistency features from diffusion models (DDPM/DDIM U-Net). These features are combined and fine-tuned using a cross-model strategy to identify generation-agnostic artifacts. This enables robust deepfake detection across diverse generative models, rather than overfitting to specific generator characteristics.

Datasets

CelebA-HQ, FFHQ (Flickr-Faces-HQ), UADFV, StarGAN, StarGANv2, StyleCLIP, OpenForensics, Inpainting, Insight, CIFAKE, DALL-E3, IMDB-WIKI, Real Person

Model(s)

ProGAN Discriminator, DDPM/DDIM U-Net

Author countries

South Korea