Examples

The following tables present a few examples of both synthetic audio (table 1) and authentic data (table 2). For each detection model (columns), the model score for the YouTube video (row) is shown via a color bar. Low scores (close to 0, green) indicate that the model considers the audio file authentic, while high scores (close to 1, yellow and red) indicate that the model considers the audio file a fake.

A perfect audio spoof detection model has high scores for the videos in the first table, visually represented by red bars. However, its scores would be low in the second table, i.e. predominantly green bars.

Table 1: Synthetic Audio

The following are videos where the audio track is created by an AI. These videos can be considered 'audio deepfakes'. Ideally, these scores are high (i.e. red color bar).

Youtube Video (click for details) SSL-W2V2

Table 2: Authentic Audio

The following are authentic videos, i.e. neither audio nor video deepfakes. Ideally, the correpsonding scores are low (i.e. green color bar).

Youtube Video (click for details) SSL-W2V2

Table 3: Video Deepfake with Voice Actor

The following videos are deepfake, but use a voice actor. Thus the audio is not a deepfake, since it comes from a real human. An audio-only deepfake detection system is expected to yield low scores here.

Youtube Video (click for details) SSL-W2V2