Saracasm commited on
Commit
9599d31
·
1 Parent(s): 3957e44

Phase 5c: WaveFake eval — 26.33% EER, reveals ASVspoof-specific overfitting

Browse files
results/metrics/stage2_eval_wavefake_results.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "phase": "Phase 5c \u2014 Supplementary Evaluation on WaveFake",
3
+ "completed_at": "2026-05-03T02:30:38.193603",
4
+ "model_checkpoint": "/content/drive/MyDrive/deepfake_audio/checkpoints/stage2_best.pt",
5
+ "model_dev_eer": 0.0069,
6
+ "evaluation_dataset": {
7
+ "name": "WaveFake (Frank et al., 2021) \u2014 sampled subset",
8
+ "kaggle_source_spoof": "walimuhammadahmad/fakeaudio",
9
+ "kaggle_source_bonafide": "mathurinache/the-lj-speech-dataset",
10
+ "sampling_strategy": "Random sample of 1,500 LJSpeech bonafide + 1,000 spoof per vocoder \u00d7 9 vocoders",
11
+ "utterances_total": 10500,
12
+ "windows": 27483,
13
+ "bonafide_count": 1500,
14
+ "spoof_count": 9000,
15
+ "vocoders": [
16
+ "ljspeech_melgan",
17
+ "ljspeech_melgan_large",
18
+ "ljspeech_multi_band_melgan",
19
+ "ljspeech_full_band_melgan",
20
+ "ljspeech_parallel_wavegan",
21
+ "ljspeech_waveglow",
22
+ "ljspeech_hifiGAN",
23
+ "jsut_multi_band_melgan",
24
+ "jsut_parallel_wavegan"
25
+ ]
26
+ },
27
+ "inference": {
28
+ "batch_size": 16,
29
+ "mixed_precision": true,
30
+ "wall_clock_minutes": 7.9,
31
+ "windows_per_second": 58,
32
+ "note": "Slower windows/sec than ASVspoof because of resampling 22050/24000 \u2192 16000"
33
+ },
34
+ "overall_results": {
35
+ "eer": 0.2633,
36
+ "auc": 0.825,
37
+ "accuracy": 0.7368,
38
+ "threshold": 0.0
39
+ },
40
+ "cross_dataset_comparison": {
41
+ "stage2_dev_2019_seen_attacks": 0.0069,
42
+ "stage2_eval_2019_unseen_attacks": 0.0555,
43
+ "stage2_eval_2021_unseen_attacks_plus_codecs": 0.0909,
44
+ "stage2_eval_wavefake_novel_vocoders": 0.2633,
45
+ "interpretation": "Largest cross-dataset gap. Model trained on ASVspoof attacks generalizes only weakly to standalone neural vocoder pipelines."
46
+ },
47
+ "per_vocoder_eer": {
48
+ "ljspeech_melgan": 0.3112,
49
+ "ljspeech_melgan_large": 0.3385,
50
+ "ljspeech_multi_band_melgan": 0.2192,
51
+ "ljspeech_full_band_melgan": 0.306,
52
+ "ljspeech_parallel_wavegan": 0.2612,
53
+ "ljspeech_waveglow": 0.296,
54
+ "ljspeech_hifiGAN": 0.3323,
55
+ "jsut_multi_band_melgan": 0.0113,
56
+ "jsut_parallel_wavegan": 0.0083
57
+ },
58
+ "methodological_caveats": [
59
+ "JSUT vocoder EERs (~1%) are likely inflated by domain shortcuts: bonafide is English LJSpeech, JSUT spoofs are Japanese audio at different sample rate (24 kHz vs 22 kHz). Model may be classifying language/speaker rather than detecting spoofing.",
60
+ "The LJSpeech-based vocoder EERs (22-34%) are the methodologically meaningful results: same speaker, same content, same recording quality as bonafide; only the synthesis differs.",
61
+ "High EERs on LJSpeech vocoders (mean 29.4%) reveal that ASVspoof-trained models generalize poorly to clean neural vocoder pipelines. This matches the original WaveFake paper's observations.",
62
+ "Model has not been adapted to WaveFake \u2014 pure cross-dataset evaluation."
63
+ ],
64
+ "key_findings": [
65
+ "Cross-dataset robustness varies substantially by distribution shift type:",
66
+ " - Unseen attack types in same dataset: +4.86 pp (0.69% \u2192 5.55%)",
67
+ " - Real-world codec degradation: +3.54 pp (5.55% \u2192 9.09%)",
68
+ " - Novel vocoder pipelines on different domain: +17.24 pp (9.09% \u2192 26.33%)",
69
+ "Model has learned to detect ASVspoof-specific synthesis artifacts but not pure vocoder artifacts.",
70
+ "Future work direction: include vocoder-only spoofing data during training to improve cross-dataset generalization."
71
+ ],
72
+ "raw_scores_path": "/content/deepfake-audio-detection/results/scores/stage2_eval_wavefake.npz"
73
+ }
results/scores/stage2_eval_wavefake.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98c7e31ef0e91863012230a3828181ca5962079158e92a569f78f69af12ed497
3
+ size 2983028