Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach

Authors: Yizhi Liu, Balaji Padmanabhan, Siva Viswanathan

Published: 2026-03-02 20:00:38+00:00

AI Summary

This paper introduces DICE-DML (Deepfake-Informed Control Encoder for Double Machine Learning), a framework addressing the challenge of causally estimating visual attribute effects from observational data. It overcomes visual treatment leakage, where standard vision encoders entangle treatment information with confounders, leading to biased estimates in Double Machine Learning (DML). DICE-DML leverages generative AI to disentangle treatment from confounding variables, enabling robust causal inference with image data.

Abstract

Digital advertising increasingly relies on visual content, yet marketers lack rigorous methods for understanding how specific visual attributes causally affect consumer engagement. This paper addresses a fundamental methodological challenge: estimating causal effects when the treatment, such as a model's skin tone, is an attribute embedded within the image itself. Standard approaches like Double Machine Learning (DML) fail in this setting because vision encoders entangle treatment information with confounding variables, producing severely biased estimates. We develop DICE-DML (Deepfake-Informed Control Encoder for Double Machine Learning), a framework that leverages generative AI to disentangle treatment from confounders. The approach combines three mechanisms: (1) deepfake-generated image pairs that isolate treatment variation; (2) DICE-Diff adversarial learning on paired difference vectors, where background signals cancel to reveal pure treatment fingerprints; and (3) orthogonal projection that geometrically removes treatment-axis components. In simulations with known ground truth, DICE-DML reduces root mean squared error by 73-97% compared to standard DML, with the strongest improvement (97.5%) at the null effect point, demonstrating robust Type I error control. Applying DICE-DML to 232,089 Instagram influencer posts, we estimate the causal effect of skin tone on engagement. Standard DML produces diagnostically invalid results (negative outcome R^2), while DICE-DML achieves valid confounding control (R^2 = 0.63) and estimates a marginally significant negative effect of darker skin tone (-522 likes; p = 0.062), substantially smaller than the biased standard estimate. Our framework provides a principled approach for causal inference with visual data when treatments and confounders coexist within images.


Key findings
In simulations with known ground truth, DICE-DML reduced root mean squared error by 73-97% compared to standard DML, demonstrating robust Type I error control. For Instagram influencer posts, DICE-DML achieved valid confounding control (outcome R^2 = 0.63) where standard DML failed (negative R^2). It estimated a marginally significant negative effect of darker skin tone (-522 likes; p = 0.062), a substantially smaller estimate than biased standard DML results.
Approach
DICE-DML utilizes deepfake-generated image pairs to isolate treatment variation. It then employs DICE-Diff adversarial learning on paired difference vectors to ensure the encoder learns treatment-invariant representations by canceling background signals. Finally, orthogonal projection geometrically removes any remaining treatment-axis components from the representations before applying standard Double Machine Learning for causal effect estimation.
Datasets
Instagram influencer posts (232,089 images), ImageNet (for ResNet pretraining).
Model(s)
ResNet-50 (pretrained on ImageNet), Multilayer Perceptron (MLP) as the DICE encoder, MLPs and linear layers as discriminators, Random Forest (for DML nuisance models, also tested with Gradient Boosting Machine and Ridge regression), RetinaFace (for face detection).
Author countries
USA