gaetanbrison/deepfake-detector-resnet50-t2v-sora-veo2

Frame-level deepfake detector finetuned on the FakeParts benchmark (sora, veo2, ytb). Designed to score real vs. AI-generated video clips from text-to-video generators (Sora, Veo2).

Backbone: resnet50
Train sources: sora, veo2, ytb
Best val AP: 0.9913

Held-out test metrics

metric	value
AP	0.993
accuracy	0.973
AUROC	0.988
sora acc (n=50)	0.980
veo2 acc (n=50)	1.000
ytb acc (n=50)	0.940

Quickstart

Single video (mp4)

pip install git+https://github.com/gaetanbrison/deepfake-detector
fpd-predict --ckpt ckpt_best.pt --video path/to/clip.mp4 --num-frames 8

Streamlit demo

streamlit run app.py -- --ckpt ckpt_best.pt

Programmatic (single frame)

import torch
from deepfake_detector.models import build_model

state = torch.load("ckpt_best.pt", map_location="cpu", weights_only=False)
cfg = state["config"]
model = build_model(cfg["model"]["name"], **cfg["model"].get("kwargs", {}))
model.load_state_dict(state["model"])
model.eval()

# x: (B, 3, 224, 224) ImageNet-normalised RGB
with torch.no_grad():
    prob_fake = torch.softmax(model(x), dim=-1)[:, 1]

For a video, sample 4-8 uniformly-spaced frames and average the per-frame fake-probabilities.

Limits

Closed-source generators only. This checkpoint was trained on Sora and Veo2 fakes. It hasn't seen open-source T2V models (Pika, Latte, OpenSora, …); cross-generator transfer is poor across the field.
Image-level. Per-frame scores are averaged for a video — the model doesn't reason about temporal artifacts.
Real-class is YouTube only. False-positives on heavily filtered real video are likely.
Not a forensic-grade tool. Research / defensive use only.

Downloads last month: 15

gaetanbrison
/

deepfake-detector-resnet50-t2v-sora-veo2