Saracasm commited on
Commit
86bce6d
·
1 Parent(s): 739f31c

Phase 6: UI polish - Stages 1-3 (foundation, hero, detector)

Browse files
Files changed (1) hide show
  1. app/app.py +886 -89
app/app.py CHANGED
@@ -203,36 +203,93 @@ def make_wavefake_plot():
203
 
204
  def predict_audio(audio_path):
205
  if audio_path is None:
206
- return ("Please upload an audio file or select an example.", None, None, None)
 
 
 
 
 
 
207
 
208
  start = time.time()
209
  try:
210
  result = detector.predict(audio_path, return_per_window=True)
211
  except Exception as e:
212
- return (f"Error: {type(e).__name__}: {e}", None, None, None)
 
 
 
 
 
 
213
  elapsed_ms = (time.time() - start) * 1000
214
 
215
  pred = result["prediction"]
216
  confidence = result["confidence"] * 100
 
 
217
 
218
  if pred == "spoof":
219
- badge = (f"<div style='padding:1rem;border-radius:0.5rem;"
220
- f"background:#fee2e2;border-left:4px solid {COLOR_SPOOF};'>"
221
- f"<h3 style='margin:0;color:{COLOR_SPOOF};'>SPOOF detected</h3>"
222
- f"<p style='margin:0.5rem 0 0 0;font-size:1.1rem;'><b>Confidence: {confidence:.1f}%</b></p>"
223
- f"</div>")
224
  else:
225
- badge = (f"<div style='padding:1rem;border-radius:0.5rem;"
226
- f"background:#dcfce7;border-left:4px solid {COLOR_BONAFIDE};'>"
227
- f"<h3 style='margin:0;color:{COLOR_BONAFIDE};'>BONAFIDE (likely real)</h3>"
228
- f"<p style='margin:0.5rem 0 0 0;font-size:1.1rem;'><b>Confidence: {confidence:.1f}%</b></p>"
229
- f"</div>")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
 
231
  details = (f"**Spoof probability:** {result['spoof_probability']:.4f}\n\n"
232
  f"**Bonafide probability:** {result['bonafide_probability']:.4f}\n\n"
233
  f"**Audio duration:** {result['utterance_duration_sec']:.2f} seconds\n\n"
234
  f"**Windows analyzed:** {result['n_windows']}\n\n"
235
- f"**Inference time:** {elapsed_ms:.0f} ms (CPU)")
 
236
 
237
  fig = make_per_window_plot(result["window_scores"], threshold=result["threshold_used"])
238
 
@@ -255,56 +312,741 @@ def predict_audio(audio_path):
255
  # ============================================================
256
 
257
  CUSTOM_CSS = """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
258
  .gradio-container {
259
- font-family: ui-sans-serif, system-ui, -apple-system, sans-serif;
260
- max-width: 1200px !important;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
261
  }
262
  .tab-nav button {
263
- font-size: 1rem !important;
264
  font-weight: 600 !important;
 
 
 
 
 
 
 
 
 
 
265
  }
 
 
266
  .metric-card {
267
- background: linear-gradient(135deg, #f3f4f6 0%, #e5e7eb 100%);
268
- padding: 1.5rem;
269
- border-radius: 0.75rem;
 
270
  text-align: center;
271
- border: 1px solid #d1d5db;
 
 
 
 
 
272
  }
273
  .metric-value {
274
- font-size: 2.5rem;
275
- font-weight: 700;
276
- color: #111827;
277
- line-height: 1.2;
 
 
 
 
278
  }
279
  .metric-label {
280
- font-size: 0.875rem;
281
- color: #6b7280;
282
  margin-top: 0.5rem;
 
 
 
 
283
  }
 
 
284
  .context-card {
285
- background: white;
286
- padding: 1.25rem;
287
- border-radius: 0.5rem;
288
- border: 1px solid #e5e7eb;
 
289
  margin-bottom: 1rem;
 
 
 
 
 
290
  }
291
  .context-card h4 {
292
- color: #7c3aed;
293
  margin: 0 0 0.5rem 0;
294
  font-size: 1.05rem;
295
  }
296
  .context-card p {
297
  margin: 0;
298
- color: #4b5563;
299
- line-height: 1.6;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
301
  .cta-section {
302
  text-align: center;
303
- padding: 2rem 1rem;
304
- background: linear-gradient(135deg, #ede9fe 0%, #ddd6fe 100%);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
305
  border-radius: 1rem;
306
- margin: 2rem 0;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
307
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
308
  """
309
 
310
 
@@ -318,6 +1060,20 @@ with gr.Blocks(
318
  css=CUSTOM_CSS,
319
  ) as demo:
320
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
321
  gr.Markdown("""
322
  # Deepfake Audio Detection
323
  *Wav2Vec 2.0 fine-tuned on ASVspoof 2019 LA • Cross-dataset evaluated on ASVspoof 2021 LA & WaveFake*
@@ -329,55 +1085,77 @@ with gr.Blocks(
329
  # TAB 1: WELCOME
330
  # ============================================================
331
  with gr.Tab("Welcome", id=0):
332
- gr.Markdown("""
333
- ## Is this voice real?
334
- ### Modern AI can clone any voice from just a few seconds of audio.
335
-
336
- Voice deepfakes have become a serious concern. AI systems can now generate speech that sounds almost
337
- indistinguishable from a real person — and they can do it from very short samples. This creates real
338
- problems for security, journalism, and trust in digital media. Detecting AI-generated speech
339
- reliably is an active research area, and this demo shows one approach.
 
 
 
340
  """)
341
 
342
- gr.Markdown("### Why this matters")
 
 
 
 
 
 
343
 
344
  with gr.Row():
345
  with gr.Column():
346
  gr.HTML("""
347
- <div class='context-card'>
348
- <h4>Phone scams</h4>
349
- <p>Voice clones are increasingly used to impersonate family members in
350
- "emergency call" scams, asking for money or sensitive information. Reported cases
351
- have surged since 2022.</p>
 
352
  </div>
353
  """)
354
  with gr.Column():
355
  gr.HTML("""
356
- <div class='context-card'>
357
- <h4>Misinformation</h4>
358
- <p>Fabricated political speeches, fake celebrity endorsements, and false
359
- statements attributed to public figures have circulated widely on social media.</p>
 
 
360
  </div>
361
  """)
362
  with gr.Column():
363
  gr.HTML("""
364
- <div class='context-card'>
365
- <h4>Trust in evidence</h4>
366
- <p>Courts now have to grapple with whether audio recordings are authentic.
367
- The same is true for journalism and historical archives.</p>
 
 
368
  </div>
369
  """)
370
 
371
- gr.Markdown("## Try the detector")
372
- gr.Markdown("Upload your own audio, record from your microphone, or click an example.")
373
- cta_btn = gr.Button("Open the detector", variant="primary", size="lg")
374
-
375
- gr.Markdown("""
376
- ---
377
- **Built by:** Sara Iqbal & Areeba Arif • FAST-NUCES Spring 2026 Deep Learning Project
 
 
 
378
 
379
- **Source code:** [github.com/Saracasm/deepfake-audio-detection](https://github.com/Saracasm/deepfake-audio-detection)
380
- **Model weights:** [Sara1708/deepfake-audio-wav2vec2](https://huggingface.co/Sara1708/deepfake-audio-wav2vec2)
 
 
 
 
 
381
  """)
382
 
383
 
@@ -385,32 +1163,51 @@ with gr.Blocks(
385
  # TAB 2: DETECTOR
386
  # ============================================================
387
  with gr.Tab("Detector", id=1):
388
- gr.Markdown("""
389
- ### Audio analysis
390
- Upload audio, record yourself, or click an example below. The detector returns a prediction with confidence,
391
- plus per-window analysis showing how the model integrates evidence over time.
 
 
 
 
 
392
  """)
393
 
394
- with gr.Row():
395
  with gr.Column(scale=1):
 
396
  audio_input = gr.Audio(
397
  sources=["upload", "microphone"],
398
  type="filepath",
399
- label="Audio input",
 
400
  )
401
- analyze_btn = gr.Button("Analyze", variant="primary", size="lg")
402
 
 
 
 
 
403
  gr.Examples(
404
  examples=EXAMPLE_FILES,
405
  inputs=audio_input,
406
- label="Example clips (click to load)",
407
  )
408
 
409
  with gr.Column(scale=1):
410
- badge_output = gr.HTML(label=None)
411
- details_output = gr.Markdown(label="Details")
 
 
 
 
 
 
 
 
412
 
413
- plot_output = gr.Plot(label="Per-window analysis")
 
414
 
415
  with gr.Accordion("Raw output (JSON)", open=False):
416
  raw_output = gr.JSON(label=None)
@@ -519,21 +1316,21 @@ with gr.Blocks(
519
 
520
  with gr.Row():
521
  gr.HTML("""
522
- <div class='context-card'>
523
- <h4>Stage 1: frozen backbone, head only</h4>
524
  <p>Train only the linear classification head, keeping all 95M Wav2Vec parameters frozen.
525
  This proves that pretrained Wav2Vec representations already carry strong anti-spoofing signal.</p>
526
- <p style='margin-top:1rem;'><b>Result:</b> <span style='color:#7c3aed;font-size:1.2rem;font-weight:700;'>10.09% dev EER</span><br>
527
  with just <b>1,538</b> trainable parameters.</p>
528
  </div>
529
  """)
530
  gr.HTML("""
531
- <div class='context-card'>
532
- <h4>Stage 2: top 2 layers unfrozen</h4>
533
  <p>Unfreeze top 2 transformer layers + final LayerNorm. Lower LR from 1e-3 to 1e-5
534
  with 10% warmup + linear decay. Enable mixed precision (fp16) for speed.</p>
535
- <p style='margin-top:1rem;'><b>Result:</b> <span style='color:#16a34a;font-size:1.2rem;font-weight:700;'>0.69% dev EER</span><br>
536
- a <b>93% relative error reduction</b> with 14.18M trainable params (15% of model).</p>
537
  </div>
538
  """)
539
 
@@ -549,20 +1346,20 @@ with gr.Blocks(
549
  gr.Markdown("## Limitations (honest disclosure)")
550
 
551
  gr.HTML("""
552
- <div style='background:#fef3c7;border-left:4px solid #f59e0b;padding:1rem 1.5rem;border-radius:0.5rem;margin:1rem 0;'>
553
  <p><b>WaveFake out-of-domain generalization is poor</b> (~29% EER on LJSpeech vocoders).
554
  The model learned ASVspoof-specific synthesis artifacts, not universal vocoder detection.
555
  Future work: train on a mixed corpus including pure vocoder samples.</p>
556
  </div>
557
- <div style='background:#fef3c7;border-left:4px solid #f59e0b;padding:1rem 1.5rem;border-radius:0.5rem;margin:1rem 0;'>
558
  <p><b>Codec sensitivity:</b> GSM and PSTN telephone codecs degrade EER by ~6 percentage points.
559
  Codec augmentation during training would likely close this gap.</p>
560
  </div>
561
- <div style='background:#fef3c7;border-left:4px solid #f59e0b;padding:1rem 1.5rem;border-radius:0.5rem;margin:1rem 0;'>
562
  <p><b>A10 attack family is consistently challenging</b> (15.54% EER on this attack alone).
563
  This is a stable model weakness across both 2019 and 2021 evaluations.</p>
564
  </div>
565
- <div style='background:#fee2e2;border-left:4px solid #dc2626;padding:1rem 1.5rem;border-radius:0.5rem;margin:1rem 0;'>
566
  <p><b>Not a production deepfake detector.</b> Real-world deepfakes use synthesis methods this
567
  model has never seen. Use this as a research demonstration, not for security-critical decisions.</p>
568
  </div>
 
203
 
204
  def predict_audio(audio_path):
205
  if audio_path is None:
206
+ empty_badge = """
207
+ <div class='result-placeholder'>
208
+ <div class='result-placeholder-icon'>⚠️</div>
209
+ <div class='result-placeholder-text'>Please upload an audio file or select an example first.</div>
210
+ </div>
211
+ """
212
+ return (empty_badge, None, None, None)
213
 
214
  start = time.time()
215
  try:
216
  result = detector.predict(audio_path, return_per_window=True)
217
  except Exception as e:
218
+ error_badge = f"""
219
+ <div class='result-error'>
220
+ <div class='result-placeholder-icon'>❌</div>
221
+ <div class='result-placeholder-text'><b>Error:</b> {type(e).__name__}: {e}</div>
222
+ </div>
223
+ """
224
+ return (error_badge, None, None, None)
225
  elapsed_ms = (time.time() - start) * 1000
226
 
227
  pred = result["prediction"]
228
  confidence = result["confidence"] * 100
229
+ spoof_pct = result["spoof_probability"] * 100
230
+ bona_pct = result["bonafide_probability"] * 100
231
 
232
  if pred == "spoof":
233
+ badge_class = "result-card-spoof"
234
+ icon = "⚠"
235
+ verdict = "Likely synthetic"
236
+ verdict_sub = "This audio shows characteristics of AI-generated speech."
 
237
  else:
238
+ badge_class = "result-card-bonafide"
239
+ icon = "✓"
240
+ verdict = "Likely authentic"
241
+ verdict_sub = "This audio shows characteristics of natural human speech."
242
+
243
+ badge = f"""
244
+ <div class='{badge_class}'>
245
+ <div class='result-card-header'>
246
+ <div class='result-card-icon'>{icon}</div>
247
+ <div class='result-card-text'>
248
+ <div class='result-card-verdict'>{verdict}</div>
249
+ <div class='result-card-verdict-sub'>{verdict_sub}</div>
250
+ </div>
251
+ </div>
252
+ <div class='result-card-confidence'>
253
+ <div class='confidence-label'>
254
+ <span>Confidence</span>
255
+ <span class='confidence-value'>{confidence:.1f}%</span>
256
+ </div>
257
+ <div class='confidence-bar-track'>
258
+ <div class='confidence-bar-fill' style='width: {confidence:.1f}%;'></div>
259
+ </div>
260
+ </div>
261
+ <div class='result-card-probs'>
262
+ <div class='prob-row'>
263
+ <span class='prob-label'>Synthetic</span>
264
+ <div class='prob-bar-track'>
265
+ <div class='prob-bar-fill prob-bar-spoof' style='width: {spoof_pct:.1f}%;'></div>
266
+ </div>
267
+ <span class='prob-pct'>{spoof_pct:.1f}%</span>
268
+ </div>
269
+ <div class='prob-row'>
270
+ <span class='prob-label'>Authentic</span>
271
+ <div class='prob-bar-track'>
272
+ <div class='prob-bar-fill prob-bar-bonafide' style='width: {bona_pct:.1f}%;'></div>
273
+ </div>
274
+ <span class='prob-pct'>{bona_pct:.1f}%</span>
275
+ </div>
276
+ </div>
277
+ <div class='result-card-meta'>
278
+ <span>{result['utterance_duration_sec']:.2f}s audio</span>
279
+ <span class='meta-dot'>·</span>
280
+ <span>{result['n_windows']} windows</span>
281
+ <span class='meta-dot'>·</span>
282
+ <span>{elapsed_ms:.0f}ms on CPU</span>
283
+ </div>
284
+ </div>
285
+ """
286
 
287
  details = (f"**Spoof probability:** {result['spoof_probability']:.4f}\n\n"
288
  f"**Bonafide probability:** {result['bonafide_probability']:.4f}\n\n"
289
  f"**Audio duration:** {result['utterance_duration_sec']:.2f} seconds\n\n"
290
  f"**Windows analyzed:** {result['n_windows']}\n\n"
291
+ f"**Inference time:** {elapsed_ms:.0f} ms (CPU)\n\n"
292
+ f"**Threshold used:** {result['threshold_used']:.4f}")
293
 
294
  fig = make_per_window_plot(result["window_scores"], threshold=result["threshold_used"])
295
 
 
312
  # ============================================================
313
 
314
  CUSTOM_CSS = """
315
+ /* ============================================================
316
+ STAGE 1: FOUNDATION — Modern AI aesthetic
317
+ Color system, typography, spacing, transitions
318
+ ============================================================ */
319
+
320
+ /* Import Inter for clean modern look */
321
+ @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800&family=JetBrains+Mono:wght@400;500&display=swap');
322
+
323
+ /* ---------- Color tokens ---------- */
324
+ :root {
325
+ --brand-purple-50: #f5f3ff;
326
+ --brand-purple-100: #ede9fe;
327
+ --brand-purple-300: #c4b5fd;
328
+ --brand-purple-400: #a78bfa;
329
+ --brand-purple-500: #8b5cf6;
330
+ --brand-purple-600: #7c3aed;
331
+ --brand-purple-700: #6d28d9;
332
+ --brand-pink-400: #f472b6;
333
+ --brand-pink-500: #ec4899;
334
+ --accent-green: #10b981;
335
+ --accent-amber: #f59e0b;
336
+ --accent-red: #ef4444;
337
+ --gradient-brand: linear-gradient(135deg, #7c3aed 0%, #ec4899 100%);
338
+ --gradient-soft: linear-gradient(135deg, rgba(124, 58, 237, 0.08) 0%, rgba(236, 72, 153, 0.08) 100%);
339
+ --gradient-hero: radial-gradient(ellipse at top, rgba(124, 58, 237, 0.15) 0%, transparent 50%),
340
+ radial-gradient(ellipse at bottom, rgba(236, 72, 153, 0.10) 0%, transparent 50%);
341
+ }
342
+
343
+ /* ---------- Container & typography ---------- */
344
  .gradio-container {
345
+ font-family: 'Inter', ui-sans-serif, system-ui, -apple-system, sans-serif !important;
346
+ max-width: 1100px !important;
347
+ margin: 0 auto !important;
348
+ font-feature-settings: 'cv11', 'ss01';
349
+ }
350
+
351
+ /* Tighter headings */
352
+ .gradio-container h1 {
353
+ font-weight: 800 !important;
354
+ letter-spacing: -0.03em !important;
355
+ line-height: 1.1 !important;
356
+ }
357
+ .gradio-container h2 {
358
+ font-weight: 700 !important;
359
+ letter-spacing: -0.02em !important;
360
+ line-height: 1.2 !important;
361
+ }
362
+ .gradio-container h3 {
363
+ font-weight: 600 !important;
364
+ letter-spacing: -0.015em !important;
365
+ }
366
+ .gradio-container h4 {
367
+ font-weight: 600 !important;
368
+ letter-spacing: -0.01em !important;
369
+ }
370
+
371
+ /* Body text breathing room */
372
+ .gradio-container p {
373
+ line-height: 1.65 !important;
374
+ }
375
+
376
+ /* Monospace for code/pipeline blocks */
377
+ .gradio-container code, .gradio-container pre {
378
+ font-family: 'JetBrains Mono', ui-monospace, monospace !important;
379
+ }
380
+
381
+ /* ---------- Tab navigation polish ---------- */
382
+ .tab-nav {
383
+ border-bottom: 1px solid var(--border-color-primary, rgba(255,255,255,0.1)) !important;
384
+ margin-bottom: 1.5rem !important;
385
  }
386
  .tab-nav button {
387
+ font-size: 0.95rem !important;
388
  font-weight: 600 !important;
389
+ letter-spacing: -0.01em !important;
390
+ transition: all 0.2s ease !important;
391
+ border-radius: 0.5rem 0.5rem 0 0 !important;
392
+ }
393
+ .tab-nav button:hover {
394
+ background: var(--gradient-soft) !important;
395
+ }
396
+ .tab-nav button.selected {
397
+ border-bottom: 2px solid var(--brand-purple-500) !important;
398
+ color: var(--brand-purple-400) !important;
399
  }
400
+
401
+ /* ---------- Metric cards (Performance tab) ---------- */
402
  .metric-card {
403
+ background: var(--background-fill-secondary, rgba(124, 58, 237, 0.04));
404
+ color: var(--body-text-color, #111827);
405
+ padding: 1.75rem 1.5rem;
406
+ border-radius: 0.875rem;
407
  text-align: center;
408
+ border: 1px solid var(--border-color-primary, rgba(124, 58, 237, 0.15));
409
+ transition: transform 0.2s ease, box-shadow 0.2s ease;
410
+ }
411
+ .metric-card:hover {
412
+ transform: translateY(-2px);
413
+ box-shadow: 0 12px 24px -8px rgba(124, 58, 237, 0.25);
414
  }
415
  .metric-value {
416
+ font-size: 2.75rem;
417
+ font-weight: 800;
418
+ background: var(--gradient-brand);
419
+ -webkit-background-clip: text;
420
+ -webkit-text-fill-color: transparent;
421
+ background-clip: text;
422
+ line-height: 1.1;
423
+ letter-spacing: -0.02em;
424
  }
425
  .metric-label {
426
+ font-size: 0.8125rem;
427
+ color: var(--body-text-color-subdued, #6b7280);
428
  margin-top: 0.5rem;
429
+ opacity: 0.8;
430
+ text-transform: uppercase;
431
+ letter-spacing: 0.05em;
432
+ font-weight: 500;
433
  }
434
+
435
+ /* ---------- Context cards (Welcome tab) ---------- */
436
  .context-card {
437
+ background: var(--background-fill-secondary, #ffffff);
438
+ color: var(--body-text-color, #111827);
439
+ padding: 1.5rem;
440
+ border-radius: 0.875rem;
441
+ border: 1px solid var(--border-color-primary, rgba(124, 58, 237, 0.15));
442
  margin-bottom: 1rem;
443
+ transition: transform 0.2s ease, border-color 0.2s ease;
444
+ }
445
+ .context-card:hover {
446
+ transform: translateY(-2px);
447
+ border-color: var(--brand-purple-400) !important;
448
  }
449
  .context-card h4 {
450
+ color: var(--brand-purple-400);
451
  margin: 0 0 0.5rem 0;
452
  font-size: 1.05rem;
453
  }
454
  .context-card p {
455
  margin: 0;
456
+ color: var(--body-text-color, #4b5563);
457
+ line-height: 1.65;
458
+ opacity: 0.9;
459
+ }
460
+
461
+ /* ---------- Stage cards (Under the hood tab) ---------- */
462
+ .stage-card {
463
+ background: var(--background-fill-secondary, #f9fafb);
464
+ color: var(--body-text-color, #111827);
465
+ border: 1px solid var(--border-color-primary, rgba(124, 58, 237, 0.15));
466
+ padding: 1.5rem;
467
+ border-radius: 0.875rem;
468
+ margin: 0.5rem;
469
+ transition: transform 0.2s ease, box-shadow 0.2s ease;
470
+ }
471
+ .stage-card:hover {
472
+ transform: translateY(-2px);
473
+ box-shadow: 0 12px 24px -8px rgba(124, 58, 237, 0.2);
474
+ }
475
+ .stage-card p, .stage-card b {
476
+ color: var(--body-text-color, #111827);
477
+ }
478
+
479
+ /* ---------- Limitation alerts ---------- */
480
+ .limitation-warn {
481
+ background: rgba(251, 191, 36, 0.08);
482
+ border-left: 3px solid var(--accent-amber);
483
+ padding: 1rem 1.25rem;
484
+ border-radius: 0.5rem;
485
+ margin: 0.75rem 0;
486
+ color: var(--body-text-color, #111827);
487
+ }
488
+ .limitation-warn p, .limitation-warn b {
489
+ color: var(--body-text-color, #111827);
490
+ margin: 0;
491
  }
492
+ .limitation-danger {
493
+ background: rgba(239, 68, 68, 0.08);
494
+ border-left: 3px solid var(--accent-red);
495
+ padding: 1rem 1.25rem;
496
+ border-radius: 0.5rem;
497
+ margin: 0.75rem 0;
498
+ color: var(--body-text-color, #111827);
499
+ }
500
+ .limitation-danger p, .limitation-danger b {
501
+ color: var(--body-text-color, #111827);
502
+ margin: 0;
503
+ }
504
+
505
+ /* ---------- CTA section ---------- */
506
  .cta-section {
507
  text-align: center;
508
+ padding: 2.5rem 1.5rem;
509
+ background: var(--gradient-soft);
510
+ border-radius: 1.25rem;
511
+ margin: 2.5rem 0;
512
+ color: var(--body-text-color, #111827);
513
+ border: 1px solid var(--border-color-primary, rgba(124, 58, 237, 0.15));
514
+ }
515
+
516
+ /* ---------- Buttons polish ---------- */
517
+ .gradio-container button.lg {
518
+ font-weight: 600 !important;
519
+ letter-spacing: -0.01em !important;
520
+ transition: transform 0.15s ease, box-shadow 0.15s ease !important;
521
+ }
522
+ .gradio-container button.lg:hover {
523
+ transform: translateY(-1px) !important;
524
+ box-shadow: 0 8px 16px -4px rgba(124, 58, 237, 0.3) !important;
525
+ }
526
+ .gradio-container button.primary {
527
+ background: var(--gradient-brand) !important;
528
+ border: none !important;
529
+ }
530
+
531
+ /* ---------- Theme toggle ---------- */
532
+ #theme-toggle-row {
533
+ justify-content: flex-end;
534
+ margin-bottom: 0.5rem;
535
+ }
536
+ #theme-toggle-btn {
537
+ max-width: 140px !important;
538
+ min-width: 140px !important;
539
+ font-size: 0.85rem !important;
540
+ }
541
+
542
+ /* ---------- Subtle animated gradient background (very low opacity) ---------- */
543
+ html, body {
544
+ overflow-x: hidden !important;
545
+ max-width: 100vw;
546
+ }
547
+ body::before {
548
+ content: '';
549
+ position: fixed;
550
+ top: 0; left: 0;
551
+ width: 100vw; height: 100vh;
552
+ background: var(--gradient-hero);
553
+ pointer-events: none;
554
+ z-index: -1;
555
+ opacity: 0.6;
556
+ animation: gradientShift 20s ease-in-out infinite;
557
+ }
558
+ @keyframes gradientShift {
559
+ 0%, 100% { opacity: 0.5; }
560
+ 50% { opacity: 0.8; }
561
+ }
562
+
563
+ /* ---------- Reduce motion for accessibility ---------- */
564
+ @media (prefers-reduced-motion: reduce) {
565
+ *, *::before, *::after {
566
+ animation-duration: 0.01ms !important;
567
+ transition-duration: 0.01ms !important;
568
+ }
569
+ }
570
+
571
+ /* ============================================================
572
+ STAGE 2: WELCOME HERO
573
+ ============================================================ */
574
+
575
+ /* Hero container with animated glow */
576
+ .hero-section {
577
+ position: relative;
578
+ text-align: center;
579
+ padding: 3rem 1.5rem 1.5rem 1.5rem;
580
+ margin: 0 0 0.5rem 0;
581
+ overflow: hidden;
582
+ border-radius: 1.5rem;
583
+ }
584
+ .hero-section::before {
585
+ content: '';
586
+ position: absolute;
587
+ top: 50%;
588
+ left: 50%;
589
+ width: 600px;
590
+ height: 600px;
591
+ transform: translate(-50%, -50%);
592
+ background: radial-gradient(circle,
593
+ rgba(124, 58, 237, 0.25) 0%,
594
+ rgba(236, 72, 153, 0.15) 40%,
595
+ transparent 70%);
596
+ z-index: -1;
597
+ animation: heroGlow 8s ease-in-out infinite;
598
+ filter: blur(40px);
599
+ }
600
+ @keyframes heroGlow {
601
+ 0%, 100% { transform: translate(-50%, -50%) scale(1); opacity: 0.7; }
602
+ 50% { transform: translate(-50%, -50%) scale(1.15); opacity: 1; }
603
+ }
604
+
605
+ /* Hero eyebrow tag */
606
+ .hero-eyebrow {
607
+ display: inline-block;
608
+ padding: 0.4rem 1rem;
609
+ background: rgba(124, 58, 237, 0.12);
610
+ border: 1px solid rgba(124, 58, 237, 0.25);
611
+ border-radius: 999px;
612
+ font-size: 0.8rem;
613
+ font-weight: 600;
614
+ color: var(--brand-purple-400);
615
+ letter-spacing: 0.05em;
616
+ text-transform: uppercase;
617
+ margin-bottom: 1.5rem;
618
+ }
619
+
620
+ /* Massive gradient hero headline */
621
+ .hero-title {
622
+ font-size: clamp(2.5rem, 6vw, 4.5rem) !important;
623
+ font-weight: 800 !important;
624
+ line-height: 1.05 !important;
625
+ letter-spacing: -0.04em !important;
626
+ margin: 0 0 1rem 0 !important;
627
+ background: linear-gradient(90deg, #7c3aed 0%, #a78bfa 30%, #ec4899 70%, #fb7185 100%);
628
+ -webkit-background-clip: text;
629
+ -webkit-text-fill-color: transparent;
630
+ background-clip: text;
631
+ }
632
+
633
+ /* Hero subtitle */
634
+ .hero-subtitle {
635
+ font-size: clamp(1.1rem, 2.2vw, 1.4rem) !important;
636
+ font-weight: 500 !important;
637
+ color: var(--body-text-color, #4b5563);
638
+ opacity: 0.85;
639
+ max-width: 720px;
640
+ margin: 0 auto 0 auto !important;
641
+ line-height: 1.5 !important;
642
+ letter-spacing: -0.01em !important;
643
+ }
644
+
645
+ /* Section eyebrow + heading combo */
646
+ .section-header {
647
+ text-align: center;
648
+ margin: 1.5rem 0 1.5rem 0;
649
+ }
650
+ .section-eyebrow {
651
+ display: block;
652
+ font-size: 0.8rem;
653
+ font-weight: 600;
654
+ color: var(--brand-purple-400);
655
+ text-transform: uppercase;
656
+ letter-spacing: 0.1em;
657
+ margin-bottom: 0.5rem;
658
+ }
659
+ .section-title {
660
+ font-size: 2rem !important;
661
+ font-weight: 700 !important;
662
+ letter-spacing: -0.02em !important;
663
+ margin: 0 !important;
664
+ }
665
+
666
+ /* Redesigned context cards with icon, bigger, animated */
667
+ .context-card-v2 {
668
+ background: var(--background-fill-secondary, #ffffff);
669
+ border: 1px solid var(--border-color-primary, rgba(124, 58, 237, 0.15));
670
+ padding: 2rem 1.75rem;
671
+ border-radius: 1rem;
672
+ height: 100%;
673
+ transition: transform 0.25s ease, border-color 0.25s ease, box-shadow 0.25s ease;
674
+ position: relative;
675
+ overflow: hidden;
676
+ }
677
+ .context-card-v2::before {
678
+ content: '';
679
+ position: absolute;
680
+ top: 0; left: 0; right: 0;
681
+ height: 3px;
682
+ background: linear-gradient(90deg, transparent, var(--brand-purple-500), transparent);
683
+ opacity: 0;
684
+ transition: opacity 0.25s ease;
685
+ }
686
+ .context-card-v2:hover {
687
+ transform: translateY(-4px);
688
+ border-color: rgba(124, 58, 237, 0.4) !important;
689
+ box-shadow: 0 20px 40px -12px rgba(124, 58, 237, 0.2);
690
+ }
691
+ .context-card-v2:hover::before {
692
+ opacity: 1;
693
+ }
694
+ .context-card-icon {
695
+ width: 56px;
696
+ height: 56px;
697
+ border-radius: 14px;
698
+ background: linear-gradient(135deg, rgba(124, 58, 237, 0.15) 0%, rgba(236, 72, 153, 0.15) 100%);
699
+ display: flex !important;
700
+ align-items: center;
701
+ justify-content: center;
702
+ font-size: 1.75rem !important;
703
+ line-height: 1 !important;
704
+ margin-bottom: 1.25rem;
705
+ border: 1px solid rgba(124, 58, 237, 0.25);
706
+ }
707
+ .context-card-icon span {
708
+ font-size: 1.75rem !important;
709
+ line-height: 1 !important;
710
+ display: inline-block;
711
+ }
712
+ .context-card-v2 h4 {
713
+ color: var(--body-text-color, #111827) !important;
714
+ margin: 0 0 0.6rem 0 !important;
715
+ font-size: 1.15rem !important;
716
+ font-weight: 700 !important;
717
+ letter-spacing: -0.01em !important;
718
+ }
719
+ .context-card-v2 p {
720
+ margin: 0;
721
+ color: var(--body-text-color, #4b5563) !important;
722
+ line-height: 1.6 !important;
723
+ opacity: 0.85;
724
+ font-size: 0.95rem;
725
+ }
726
+
727
+ /* CTA section v2 — gradient bg with stronger presence */
728
+ .cta-section-v2 {
729
+ text-align: center;
730
+ padding: 2.5rem 2rem;
731
+ background: linear-gradient(135deg,
732
+ rgba(124, 58, 237, 0.12) 0%,
733
+ rgba(236, 72, 153, 0.12) 100%);
734
+ border-radius: 1.5rem;
735
+ margin: 2rem 0 1.5rem 0;
736
+ border: 1px solid rgba(124, 58, 237, 0.2);
737
+ position: relative;
738
+ overflow: hidden;
739
+ }
740
+ .cta-section-v2::before {
741
+ content: '';
742
+ position: absolute;
743
+ top: -50%; left: -50%;
744
+ width: 200%; height: 200%;
745
+ background: radial-gradient(circle, rgba(167, 139, 250, 0.1) 0%, transparent 50%);
746
+ animation: ctaGlow 12s ease-in-out infinite;
747
+ pointer-events: none;
748
+ }
749
+ @keyframes ctaGlow {
750
+ 0%, 100% { transform: translate(0, 0); }
751
+ 50% { transform: translate(20px, -20px); }
752
+ }
753
+ .cta-title {
754
+ font-size: 2rem !important;
755
+ font-weight: 800 !important;
756
+ letter-spacing: -0.02em !important;
757
+ margin: 0 0 0.75rem 0 !important;
758
+ color: #a78bfa;
759
+ background: linear-gradient(135deg, #a78bfa 0%, #ec4899 100%);
760
+ -webkit-background-clip: text;
761
+ -webkit-text-fill-color: transparent;
762
+ background-clip: text;
763
+ display: block;
764
+ }
765
+ .cta-subtitle {
766
+ font-size: 1.05rem;
767
+ color: var(--body-text-color, #4b5563);
768
+ opacity: 0.85;
769
+ margin: 0 0 1.75rem 0;
770
+ }
771
+
772
+ /* Footer credits */
773
+ .welcome-footer {
774
+ margin-top: 4rem;
775
+ padding-top: 2rem;
776
+ border-top: 1px solid var(--border-color-primary, rgba(124, 58, 237, 0.15));
777
+ text-align: center;
778
+ font-size: 0.9rem;
779
+ color: var(--body-text-color, #6b7280);
780
+ opacity: 0.75;
781
+ line-height: 1.8;
782
+ }
783
+ .welcome-footer a {
784
+ color: var(--brand-purple-400) !important;
785
+ text-decoration: none;
786
+ font-weight: 500;
787
+ }
788
+ .welcome-footer a:hover {
789
+ text-decoration: underline;
790
+ }
791
+
792
+
793
+ /* New card title (replaces h4 which gets stripped by Gradio sanitizer) */
794
+ .card-title {
795
+ color: var(--body-text-color, #111827) !important;
796
+ margin: 0 0 0.6rem 0 !important;
797
+ font-size: 1.15rem !important;
798
+ font-weight: 700 !important;
799
+ letter-spacing: -0.01em !important;
800
+ line-height: 1.3 !important;
801
+ }
802
+
803
+
804
+ /* ============================================================
805
+ STAGE 3: DETECTOR POLISH
806
+ ============================================================ */
807
+
808
+ /* Detector intro paragraph */
809
+ .detector-intro {
810
+ max-width: 720px;
811
+ margin: 0.75rem auto 0 auto !important;
812
+ font-size: 1.02rem !important;
813
+ color: var(--body-text-color, #4b5563);
814
+ opacity: 0.85;
815
+ line-height: 1.6 !important;
816
+ }
817
+
818
+ /* Step labels (numbered guidance) */
819
+ .step-label {
820
+ display: flex;
821
+ align-items: center;
822
+ gap: 0.6rem;
823
+ font-size: 0.85rem;
824
+ font-weight: 600;
825
+ color: var(--body-text-color-subdued, #6b7280);
826
+ text-transform: uppercase;
827
+ letter-spacing: 0.05em;
828
+ margin: 0.75rem 0 0.6rem 0;
829
+ opacity: 0.85;
830
+ }
831
+ .step-number {
832
+ display: inline-flex;
833
+ align-items: center;
834
+ justify-content: center;
835
+ width: 22px;
836
+ height: 22px;
837
+ border-radius: 50%;
838
+ background: var(--gradient-brand);
839
+ color: white;
840
+ font-size: 0.75rem;
841
+ font-weight: 700;
842
+ text-transform: none;
843
+ letter-spacing: 0;
844
+ }
845
+
846
+ /* Result placeholder (shown before any analysis) */
847
+ .result-placeholder {
848
+ background: var(--background-fill-secondary, rgba(124, 58, 237, 0.04));
849
+ border: 2px dashed var(--border-color-primary, rgba(124, 58, 237, 0.2));
850
+ border-radius: 1rem;
851
+ padding: 3rem 2rem;
852
+ text-align: center;
853
+ color: var(--body-text-color-subdued, #6b7280);
854
+ min-height: 200px;
855
+ display: flex;
856
+ flex-direction: column;
857
+ align-items: center;
858
+ justify-content: center;
859
+ gap: 0.75rem;
860
+ }
861
+ .result-placeholder-icon {
862
+ font-size: 2.5rem;
863
+ opacity: 0.6;
864
+ }
865
+ .result-placeholder-text {
866
+ font-size: 0.95rem;
867
+ opacity: 0.85;
868
+ line-height: 1.5;
869
+ }
870
+ .result-error {
871
+ background: rgba(239, 68, 68, 0.08);
872
+ border: 1px solid rgba(239, 68, 68, 0.3);
873
+ border-radius: 1rem;
874
+ padding: 1.5rem;
875
+ text-align: center;
876
+ display: flex;
877
+ flex-direction: column;
878
+ align-items: center;
879
+ gap: 0.5rem;
880
+ }
881
+
882
+ /* Result cards — bonafide (green) and spoof (red) variants */
883
+ .result-card-bonafide, .result-card-spoof {
884
  border-radius: 1rem;
885
+ padding: 1.75rem 1.5rem;
886
+ border: 1px solid;
887
+ position: relative;
888
+ overflow: hidden;
889
+ }
890
+ .result-card-bonafide {
891
+ background: linear-gradient(135deg, rgba(16, 185, 129, 0.08) 0%, rgba(16, 185, 129, 0.03) 100%);
892
+ border-color: rgba(16, 185, 129, 0.3);
893
+ }
894
+ .result-card-spoof {
895
+ background: linear-gradient(135deg, rgba(239, 68, 68, 0.08) 0%, rgba(239, 68, 68, 0.03) 100%);
896
+ border-color: rgba(239, 68, 68, 0.3);
897
+ }
898
+ .result-card-bonafide::before, .result-card-spoof::before {
899
+ content: '';
900
+ position: absolute;
901
+ top: 0; left: 0; right: 0;
902
+ height: 3px;
903
+ }
904
+ .result-card-bonafide::before { background: linear-gradient(90deg, transparent, #10b981, transparent); }
905
+ .result-card-spoof::before { background: linear-gradient(90deg, transparent, #ef4444, transparent); }
906
+
907
+ .result-card-header {
908
+ display: flex;
909
+ align-items: flex-start;
910
+ gap: 1rem;
911
+ margin-bottom: 1.25rem;
912
+ }
913
+ .result-card-icon {
914
+ width: 48px;
915
+ height: 48px;
916
+ border-radius: 12px;
917
+ display: flex;
918
+ align-items: center;
919
+ justify-content: center;
920
+ font-size: 1.5rem;
921
+ font-weight: 700;
922
+ flex-shrink: 0;
923
+ }
924
+ .result-card-bonafide .result-card-icon {
925
+ background: rgba(16, 185, 129, 0.15);
926
+ color: #10b981;
927
+ border: 1px solid rgba(16, 185, 129, 0.3);
928
+ }
929
+ .result-card-spoof .result-card-icon {
930
+ background: rgba(239, 68, 68, 0.15);
931
+ color: #ef4444;
932
+ border: 1px solid rgba(239, 68, 68, 0.3);
933
  }
934
+ .result-card-text { flex: 1; }
935
+ .result-card-verdict {
936
+ font-size: 1.4rem;
937
+ font-weight: 700;
938
+ color: var(--body-text-color, #111827);
939
+ letter-spacing: -0.01em;
940
+ line-height: 1.2;
941
+ margin-bottom: 0.25rem;
942
+ }
943
+ .result-card-verdict-sub {
944
+ font-size: 0.9rem;
945
+ color: var(--body-text-color, #4b5563);
946
+ opacity: 0.8;
947
+ line-height: 1.5;
948
+ }
949
+
950
+ /* Confidence section */
951
+ .result-card-confidence {
952
+ margin: 1rem 0;
953
+ }
954
+ .confidence-label {
955
+ display: flex;
956
+ justify-content: space-between;
957
+ font-size: 0.85rem;
958
+ font-weight: 600;
959
+ color: var(--body-text-color-subdued, #6b7280);
960
+ text-transform: uppercase;
961
+ letter-spacing: 0.05em;
962
+ margin-bottom: 0.5rem;
963
+ }
964
+ .confidence-value {
965
+ color: var(--body-text-color, #111827);
966
+ font-size: 1rem;
967
+ text-transform: none;
968
+ letter-spacing: 0;
969
+ }
970
+ .confidence-bar-track {
971
+ width: 100%;
972
+ height: 8px;
973
+ background: var(--border-color-primary, rgba(0,0,0,0.1));
974
+ border-radius: 999px;
975
+ overflow: hidden;
976
+ }
977
+ .confidence-bar-fill {
978
+ height: 100%;
979
+ background: var(--gradient-brand);
980
+ border-radius: 999px;
981
+ transition: width 0.5s ease-out;
982
+ }
983
+
984
+ /* Probability rows (synthetic vs authentic) */
985
+ .result-card-probs {
986
+ margin: 1.25rem 0;
987
+ padding: 1rem;
988
+ background: var(--background-fill-secondary, rgba(0,0,0,0.02));
989
+ border-radius: 0.75rem;
990
+ }
991
+ .prob-row {
992
+ display: flex;
993
+ align-items: center;
994
+ gap: 0.75rem;
995
+ margin: 0.5rem 0;
996
+ }
997
+ .prob-label {
998
+ font-size: 0.85rem;
999
+ font-weight: 500;
1000
+ color: var(--body-text-color-subdued, #6b7280);
1001
+ width: 80px;
1002
+ flex-shrink: 0;
1003
+ }
1004
+ .prob-bar-track {
1005
+ flex: 1;
1006
+ height: 6px;
1007
+ background: var(--border-color-primary, rgba(0,0,0,0.08));
1008
+ border-radius: 999px;
1009
+ overflow: hidden;
1010
+ }
1011
+ .prob-bar-fill {
1012
+ height: 100%;
1013
+ border-radius: 999px;
1014
+ transition: width 0.5s ease-out;
1015
+ }
1016
+ .prob-bar-spoof { background: #ef4444; }
1017
+ .prob-bar-bonafide { background: #10b981; }
1018
+ .prob-pct {
1019
+ font-size: 0.85rem;
1020
+ font-weight: 600;
1021
+ color: var(--body-text-color, #111827);
1022
+ width: 50px;
1023
+ text-align: right;
1024
+ font-variant-numeric: tabular-nums;
1025
+ }
1026
+
1027
+ /* Result card meta footer */
1028
+ .result-card-meta {
1029
+ display: flex;
1030
+ align-items: center;
1031
+ justify-content: center;
1032
+ gap: 0.5rem;
1033
+ flex-wrap: wrap;
1034
+ font-size: 0.8rem;
1035
+ color: var(--body-text-color-subdued, #9ca3af);
1036
+ margin-top: 1rem;
1037
+ padding-top: 1rem;
1038
+ border-top: 1px solid var(--border-color-primary, rgba(0,0,0,0.06));
1039
+ }
1040
+ .meta-dot {
1041
+ opacity: 0.5;
1042
+ }
1043
+
1044
+ /* Analyze button override */
1045
+ .analyze-button {
1046
+ width: 100% !important;
1047
+ margin-top: 0.25rem !important;
1048
+ }
1049
+
1050
  """
1051
 
1052
 
 
1060
  css=CUSTOM_CSS,
1061
  ) as demo:
1062
 
1063
+ # Theme toggle button at top-right
1064
+ with gr.Row(elem_id="theme-toggle-row"):
1065
+ theme_btn = gr.Button("☾ Dark mode", elem_id="theme-toggle-btn", size="sm")
1066
+ theme_btn.click(
1067
+ fn=None,
1068
+ inputs=None,
1069
+ outputs=theme_btn,
1070
+ js="""() => {
1071
+ document.body.classList.toggle('dark');
1072
+ const isDark = document.body.classList.contains('dark');
1073
+ return isDark ? '☀️ Light mode' : '☾ Dark mode';
1074
+ }"""
1075
+ )
1076
+
1077
  gr.Markdown("""
1078
  # Deepfake Audio Detection
1079
  *Wav2Vec 2.0 fine-tuned on ASVspoof 2019 LA • Cross-dataset evaluated on ASVspoof 2021 LA & WaveFake*
 
1085
  # TAB 1: WELCOME
1086
  # ============================================================
1087
  with gr.Tab("Welcome", id=0):
1088
+ # Hero section
1089
+ gr.HTML("""
1090
+ <div class='hero-section'>
1091
+ <div class='hero-eyebrow'>Deep Learning Audio Forensics</div>
1092
+ <h1 class='hero-title'>Is this voice real?</h1>
1093
+ <p class='hero-subtitle'>
1094
+ Modern AI can clone any voice from just a few seconds of audio.
1095
+ This detector uses Wav2Vec 2.0 to tell synthetic speech apart from authentic recordings —
1096
+ with 0.69% Equal Error Rate on the ASVspoof 2019 LA benchmark.
1097
+ </p>
1098
+ </div>
1099
  """)
1100
 
1101
+ # Why this matters section
1102
+ gr.HTML("""
1103
+ <div class='section-header'>
1104
+ <div class='section-eyebrow'>Why this matters</div>
1105
+ <div class='section-title'>Voice deepfakes are already in the wild</div>
1106
+ </div>
1107
+ """)
1108
 
1109
  with gr.Row():
1110
  with gr.Column():
1111
  gr.HTML("""
1112
+ <div class='context-card-v2'>
1113
+ <div class='context-card-icon'><span style='font-size:1.6rem;line-height:1;'>📞</span></div>
1114
+ <div class='card-title'>Phone scams</div>
1115
+ <p>Voice clones are increasingly used to impersonate family members in
1116
+ "emergency call" scams. Reported cases have surged since 2022, with losses
1117
+ running into millions of dollars annually.</p>
1118
  </div>
1119
  """)
1120
  with gr.Column():
1121
  gr.HTML("""
1122
+ <div class='context-card-v2'>
1123
+ <div class='context-card-icon'><span style='font-size:1.6rem;line-height:1;'>📰</span></div>
1124
+ <div class='card-title'>Misinformation</div>
1125
+ <p>Fabricated political speeches, fake celebrity endorsements, and false
1126
+ statements attributed to public figures have circulated widely on social
1127
+ media platforms in recent election cycles.</p>
1128
  </div>
1129
  """)
1130
  with gr.Column():
1131
  gr.HTML("""
1132
+ <div class='context-card-v2'>
1133
+ <div class='context-card-icon'><span style='font-size:1.6rem;line-height:1;'>⚖️</span></div>
1134
+ <div class='card-title'>Trust in evidence</div>
1135
+ <p>Courts now have to grapple with whether audio recordings are authentic.
1136
+ The same challenge applies to investigative journalism and historical
1137
+ archive verification.</p>
1138
  </div>
1139
  """)
1140
 
1141
+ # CTA section
1142
+ gr.HTML("""
1143
+ <div class='cta-section-v2'>
1144
+ <div class='cta-title'>Try the detector</div>
1145
+ <div class='cta-subtitle'>
1146
+ Upload your own audio, record from your microphone, or pick an example to start.
1147
+ </div>
1148
+ </div>
1149
+ """)
1150
+ cta_btn = gr.Button("Open the detector →", variant="primary", size="lg")
1151
 
1152
+ gr.HTML("""
1153
+ <div class='welcome-footer'>
1154
+ <strong>Built by</strong> Sara Iqbal & Areeba Arif &nbsp;·&nbsp; FAST-NUCES Spring 2026 Deep Learning Project<br>
1155
+ <a href='https://github.com/Saracasm/deepfake-audio-detection' target='_blank'>Source code on GitHub</a>
1156
+ &nbsp;·&nbsp;
1157
+ <a href='https://huggingface.co/Sara1708/deepfake-audio-wav2vec2' target='_blank'>Model weights on Hugging Face</a>
1158
+ </div>
1159
  """)
1160
 
1161
 
 
1163
  # TAB 2: DETECTOR
1164
  # ============================================================
1165
  with gr.Tab("Detector", id=1):
1166
+ gr.HTML("""
1167
+ <div class='section-header' style='margin-top: 1rem;'>
1168
+ <div class='section-eyebrow'>The detector</div>
1169
+ <div class='section-title'>Test the model on any audio</div>
1170
+ <p class='detector-intro'>
1171
+ Upload audio, record yourself, or pick an example. The detector returns a calibrated
1172
+ prediction with confidence, plus per-window analysis showing how evidence accumulates over time.
1173
+ </p>
1174
+ </div>
1175
  """)
1176
 
1177
+ with gr.Row(equal_height=False):
1178
  with gr.Column(scale=1):
1179
+ gr.HTML("<div class='step-label'><span class='step-number'>1</span> Provide audio</div>")
1180
  audio_input = gr.Audio(
1181
  sources=["upload", "microphone"],
1182
  type="filepath",
1183
+ label="",
1184
+ elem_classes=["audio-input-styled"],
1185
  )
 
1186
 
1187
+ gr.HTML("<div class='step-label' style='margin-top: 1.25rem;'><span class='step-number'>2</span> Run the detector</div>")
1188
+ analyze_btn = gr.Button("Analyze audio →", variant="primary", size="lg", elem_classes=["analyze-button"])
1189
+
1190
+ gr.HTML("<div class='step-label' style='margin-top: 1.5rem;'>Or try an example</div>")
1191
  gr.Examples(
1192
  examples=EXAMPLE_FILES,
1193
  inputs=audio_input,
1194
+ label="",
1195
  )
1196
 
1197
  with gr.Column(scale=1):
1198
+ gr.HTML("<div class='step-label'><span class='step-number'>3</span> Result</div>")
1199
+ badge_output = gr.HTML(value="""
1200
+ <div class='result-placeholder'>
1201
+ <div class='result-placeholder-icon'>🎤</div>
1202
+ <div class='result-placeholder-text'>Run the detector to see prediction</div>
1203
+ </div>
1204
+ """, label=None)
1205
+
1206
+ with gr.Accordion("Detailed scores", open=False, elem_classes=["details-accordion"]):
1207
+ details_output = gr.Markdown(label="")
1208
 
1209
+ gr.HTML("<div class='step-label' style='margin-top: 2rem;'>Per-window analysis</div>")
1210
+ plot_output = gr.Plot(label="")
1211
 
1212
  with gr.Accordion("Raw output (JSON)", open=False):
1213
  raw_output = gr.JSON(label=None)
 
1316
 
1317
  with gr.Row():
1318
  gr.HTML("""
1319
+ <div class='stage-card'>
1320
+ <h4 style='color:#7c3aed;margin-top:0;'>Stage 1: frozen backbone, head only</h4>
1321
  <p>Train only the linear classification head, keeping all 95M Wav2Vec parameters frozen.
1322
  This proves that pretrained Wav2Vec representations already carry strong anti-spoofing signal.</p>
1323
+ <p style='margin-top:1rem;'><b>Result:</b> <span style='color:#a78bfa;font-size:1.2rem;font-weight:700;'>10.09% dev EER</span><br>
1324
  with just <b>1,538</b> trainable parameters.</p>
1325
  </div>
1326
  """)
1327
  gr.HTML("""
1328
+ <div class='stage-card'>
1329
+ <h4 style='color:#7c3aed;margin-top:0;'>Stage 2: top 2 layers unfrozen</h4>
1330
  <p>Unfreeze top 2 transformer layers + final LayerNorm. Lower LR from 1e-3 to 1e-5
1331
  with 10% warmup + linear decay. Enable mixed precision (fp16) for speed.</p>
1332
+ <p style='margin-top:1rem;'><b>Result:</b> <span style='color:#34d399;font-size:1.2rem;font-weight:700;'>0.69% dev EER</span><br>
1333
+ a <b style='color:#34d399;'>93% relative error reduction</b> with 14.18M trainable params (15% of model).</p>
1334
  </div>
1335
  """)
1336
 
 
1346
  gr.Markdown("## Limitations (honest disclosure)")
1347
 
1348
  gr.HTML("""
1349
+ <div class='limitation-warn'>
1350
  <p><b>WaveFake out-of-domain generalization is poor</b> (~29% EER on LJSpeech vocoders).
1351
  The model learned ASVspoof-specific synthesis artifacts, not universal vocoder detection.
1352
  Future work: train on a mixed corpus including pure vocoder samples.</p>
1353
  </div>
1354
+ <div class='limitation-warn'>
1355
  <p><b>Codec sensitivity:</b> GSM and PSTN telephone codecs degrade EER by ~6 percentage points.
1356
  Codec augmentation during training would likely close this gap.</p>
1357
  </div>
1358
+ <div class='limitation-warn'>
1359
  <p><b>A10 attack family is consistently challenging</b> (15.54% EER on this attack alone).
1360
  This is a stable model weakness across both 2019 and 2021 evaluations.</p>
1361
  </div>
1362
+ <div class='limitation-danger'>
1363
  <p><b>Not a production deepfake detector.</b> Real-world deepfakes use synthesis methods this
1364
  model has never seen. Use this as a research demonstration, not for security-critical decisions.</p>
1365
  </div>