ShortCheck: Checkworthiness Detection of Multilingual Short-Form Videos

Authors: Henrik Vatndal, Vinay Setty

Published: 2025-09-24 18:37:45+00:00

AI Summary

ShortCheck is a modular, inference-only pipeline designed to detect checkworthy content in multilingual short-form videos like TikTok, thereby assisting human fact-checkers. The system integrates various multimodal components including speech transcription, OCR, visual deepfake detection, video-to-text summarization, and claim verification. Evaluated on two manually annotated datasets of TikTok videos, ShortCheck achieves promising results with F1-weighted scores exceeding 70%.

Abstract

Short-form video platforms like TikTok present unique challenges for misinformation detection due to their multimodal, dynamic, and noisy content. We present ShortCheck, a modular, inference-only pipeline with a user-friendly interface that automatically identifies checkworthy short-form videos to help human fact-checkers. The system integrates speech transcription, OCR, object and deepfake detection, video-to-text summarization, and claim verification. ShortCheck is validated by evaluating it on two manually annotated datasets with TikTok videos in a multilingual setting. The pipeline achieves promising results with F1-weighted score over 70\\%.


Key findings
The ShortCheck pipeline achieved strong performance with F1-weighted scores of 0.72 for Norwegian and 0.74 for English TikTok videos. Ablation studies showed that textual modules, specifically speech transcription and ideological buzzword detection, were the most influential contributors to checkworthiness detection. Visual deepfake detection, while integrated, offered limited standalone utility, with EfficientNet showing the best performance (0.612 accuracy, 0.992 precision, 0.573 recall) among the tested models for visual deepfakes.
Approach
ShortCheck utilizes a modular, inference-only pipeline to process short-form videos, extracting features from speech (transcription), on-screen text (OCR), and visuals (deepfake detection, video summarization). These multimodal features, along with ideological language detection and text-based claim verification, are aggregated by a rule-based engine. The system then classifies videos as 'Checkworthy' or 'Not Checkworthy' and provides interpretable intermediate outputs.
Datasets
Norwegian influencer data, TikTok Videos from Fact-Checking Websites
Model(s)
UNKNOWN
Author countries
Norway