On the 'LCNN' model
The LCNN model, presented in https://arxiv.org/pdf/2103.11326.pdf, is a light convolutional neural network introduced in 2020. It takes as input CQT features and produces an output score between 0 and 1.This model is trained only shortly (sees 1.8 million samples once)
Interpreting the results
The LCNN model predicted a deepfake score of
.
This means that the soundtrack in this video is a deepfake with a
probability of
percent (out of 100%).
Caution: These results are of limited value at the moment. This is
among
other things due to the fact that deepfake detection models do not (yet)
generalise well, new fakes appear daily,
and our model only evaluates the audio track, but not the video
track. If
the voice in the audio comes from a voice actor,
i.e. an actor who imitates the speech, then this is not considered a synthetically generated voice and is therefore not recognised as such.