An adversarial attack approach for eXplainable AI evaluation on deepfake detection models

Authors: Balachandar Gowrisankar, Vrizlynn L. L. Thing

Published: 2023-12-08 15:19:08+00:00

AI Summary

This paper proposes an adversarial attack approach to evaluate the faithfulness of eXplainable AI (XAI) tools on deepfake detection models. The authors demonstrate that generic XAI evaluation methods, such as pixel removal/insertion, are unsuitable for deepfake detection tasks. They introduce a novel evaluation method that perturbs salient visual concepts in fake images, identified from corresponding real images by an XAI tool, to generate adversarial samples.

Abstract

With the rising concern on model interpretability, the application of eXplainable AI (XAI) tools on deepfake detection models has been a topic of interest recently. In image classification tasks, XAI tools highlight pixels influencing the decision given by a model. This helps in troubleshooting the model and determining areas that may require further tuning of parameters. With a wide range of tools available in the market, choosing the right tool for a model becomes necessary as each one may highlight different sets of pixels for a given image. There is a need to evaluate different tools and decide the best performing ones among them. Generic XAI evaluation methods like insertion or removal of salient pixels/segments are applicable for general image classification tasks but may produce less meaningful results when applied on deepfake detection models due to their functionality. In this paper, we perform experiments to show that generic removal/insertion XAI evaluation methods are not suitable for deepfake detection models. We also propose and implement an XAI evaluation approach specifically suited for deepfake detection models.


Key findings
The study found that generic removal/insertion XAI evaluation methods are ineffective for deepfake detection models, often yielding unexpected or misleading results. Using their proposed adversarial attack evaluation, GradCAM was consistently identified as the most faithful XAI tool for XceptionNet across different deepfake manipulation methods, while the most faithful tool for MesoNet varied by manipulation type (XRAI for DF, LIME for F2F, RISE for FS).
Approach
The proposed approach evaluates XAI tools by identifying salient visual concepts in a real image, then perturbing the same visual concept in its corresponding fake image to generate an adversarial fake image. XAI tools are ranked based on their ability to create adversarial images that significantly reduce the deepfake detection model's accuracy, leveraging a black-box adversarial attack strategy (NES) and image segmentation.
Datasets
FaceForensics++, Celeb-DF
Model(s)
MesoNet, XceptionNet
Author countries
Singapore