gaetanbrison/deepfake-detector-resnet50-t2v-sora-veo2

Frame-level deepfake detector finetuned on the FakeParts benchmark (sora, veo2, ytb). Designed to score real vs. AI-generated video clips from text-to-video generators (Sora, Veo2).

  • Backbone: resnet50
  • Train sources: sora, veo2, ytb
  • Best val AP: 0.9913

Held-out test metrics

metric value
AP 0.993
accuracy 0.973
AUROC 0.988
sora acc (n=50) 0.980
veo2 acc (n=50) 1.000
ytb acc (n=50) 0.940

Quickstart

Single video (mp4)

pip install git+https://github.com/gaetanbrison/deepfake-detector
fpd-predict --ckpt ckpt_best.pt --video path/to/clip.mp4 --num-frames 8

Streamlit demo

streamlit run app.py -- --ckpt ckpt_best.pt

Programmatic (single frame)

import torch
from deepfake_detector.models import build_model

state = torch.load("ckpt_best.pt", map_location="cpu", weights_only=False)
cfg = state["config"]
model = build_model(cfg["model"]["name"], **cfg["model"].get("kwargs", {}))
model.load_state_dict(state["model"])
model.eval()

# x: (B, 3, 224, 224) ImageNet-normalised RGB
with torch.no_grad():
    prob_fake = torch.softmax(model(x), dim=-1)[:, 1]

For a video, sample 4-8 uniformly-spaced frames and average the per-frame fake-probabilities.

Limits

  • Closed-source generators only. This checkpoint was trained on Sora and Veo2 fakes. It hasn't seen open-source T2V models (Pika, Latte, OpenSora, …); cross-generator transfer is poor across the field.
  • Image-level. Per-frame scores are averaged for a video — the model doesn't reason about temporal artifacts.
  • Real-class is YouTube only. False-positives on heavily filtered real video are likely.
  • Not a forensic-grade tool. Research / defensive use only.
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train gaetanbrison/deepfake-detector-resnet50-t2v-sora-veo2

Space using gaetanbrison/deepfake-detector-resnet50-t2v-sora-veo2 1