Spaces:
Running
Sprint 5 du plan rapport — modélisation coût + vue Pareto qualité/coût
Browse filesSprint 5 et fin de la phase 0 du plan rapport. Vue compromis honnête
entre qualité (CER), coût €, vitesse et empreinte carbone optionnelle.
Modélisation coût (`picarones/core/pricing.py` + `data/pricing.yaml`) :
- Table indicative `pricing.yaml` avec OCR locaux (tesseract, pero_ocr,
kraken, calamari), APIs OCR cloud (mistral_ocr, google_vision,
azure_doc_intel), et LLMs courants (gpt-4o, gpt-4o-mini, claude
sonnet/haiku, mistral-large, ministral-3b, pixtral). Chaque entrée
porte type, prix par 1000 pages, source URL, date, kWh/1000 pages.
Avertissement explicite sur le vieillissement des prix.
- `EngineCost` dataclass : champs calculés (cost_per_1k_pages_eur,
co2_per_1k_pages_g) + assumptions textuelles affichées sous le graphique.
- `estimate_cost(...)` :
- cloud → prix de la table.
- local → temps mesuré (priorité) ou indicatif × taux horaire (CPU
par défaut 0,08 €/h, GPU 1,20 €/h, paramétrables).
- pipeline OCR+LLM → lookup prioritaire sur le LLM (qui domine le coût).
- inconnu → type "unknown" avec hypothèse "aucune entrée".
- `build_costs_for_benchmark(engines, durations)` annote toute la liste.
Algorithme Pareto (`statistics.py`) :
- `compute_pareto_front(points, objectives, name_key, minimize)` — N
dimensions, mix min/max, points avec valeurs manquantes ignorés.
- Utilisé sur 3 paires : (CER, coût €), (CER, vitesse), (CER, CO₂).
Vue HTML :
- Nouvelle carte `pareto-card` dans `view_analyses.html` (bord vert).
- Toolbar 3 boutons : "Coût € / 1000 pages" (défaut), "Vitesse (s/page)",
"Carbone (g CO₂)" — ce dernier étiqueté ⚗ "expérimental".
- Canvas Chart.js scatter avec axe X log pour le coût, linéaire sinon.
Front Pareto en vert (rayon 8 px), dominés en gris (rayon 6 px).
Tooltip : nom du moteur + CER + valeur de l'axe X choisi.
- Note méthodologique sous le graphique + bloc `<details>` "Hypothèses
détaillées par moteur" listant prix, type, lien vers la source de prix
daté, et chaque assumption récoltée par `estimate_cost`.
- CSS dédié (`.pareto-card`, `.pareto-toolbar`, `.pareto-toggle.active`,
`.pareto-experimental` avec ⚗).
Détecteurs narratifs activés :
- `detect_pareto_alternative` : émet un Fact HIGH si le front Pareto
contient un moteur autre que le leader CER ET strictement moins cher,
avec ratio d'économie. Templates FR/EN ajoutés.
- `detect_cost_outlier` : flag les moteurs au coût ≥ 5× la médiane qui ne
sont PAS sur le front (donc dominés). Importance MEDIUM. Templates ajoutés.
Tous les 12 types de Fact sont désormais opérationnels.
i18n : 9 nouvelles clés FR/EN (`h_pareto`, `pareto_axis_*`, `pareto_note`,
`pareto_assumptions_summary`, `pareto_front_label`, `pareto_dominated_label`,
`pareto_empty`).
Packaging : `data/*.yaml` ajouté à `package-data` et `MANIFEST.in`.
Tests (`test_sprint20_pareto_pricing.py`) — 28 :
- Pricing : chargement table, missing file, cloud vs local, override
taux horaire, mesure dépasse indicatif, pipeline → LLM, unknown,
carbone calculé.
- Pareto : front trivial, vide, point unique, valeurs manquantes
ignorées, 3 dimensions, min/max mixés, longueur de minimize validée.
- Détecteurs : pareto_alternative émis si alt cheaper, vide si front =
{leader} seul, cost_outlier flag les chers dominés mais pas ceux du
front.
- Intégration : section Pareto présente, JSON contient `pareto.cost/speed/co2`,
synthèse cite l'alternative, YAML présent dans le package, locale EN.
- Anti-hallucination : chaque nombre rendu dans la phrase pareto_alternative
vient bien du payload du Fact.
Suite complète : 1202 passed, 2 skipped (vs 1174 avant). Zéro régression.
Exemple sur la démo (8 docs, 3 moteurs + pipelines) :
• Sur ce corpus de 8 documents, pero_ocr obtient le CER moyen le plus
bas (0.13 %).
• Les moteurs pero_ocr, tesseract → gpt-4o, gpt-4o-vision (zero-shot),
tesseract ne sont pas statistiquement distinguables (Friedman-Nemenyi,
α = 0.05, n = 8 documents, CD = 2.157).
• À coût sensiblement inférieur, tesseract offre un compromis intéressant
(1.42 % de CER pour 0.06 €/1000 pages, contre 0.13 % / 0.57 € pour
pero_ocr, soit ×9.6 moins cher).
Phase 0 du plan rapport : terminée à ce sprint. Prochaines étapes
possibles : Sprint 6 (glossaire contextuel + panneau personnalisation),
Sprint 7 (études de cas + validation externe), puis Phase 1 (GT-free).
https://claude.ai/code/session_0162FdNNJyNvBuYzkgtsr9VB
- CLAUDE.md +4 -3
- MANIFEST.in +1 -0
- picarones/core/narrative/detectors.py +95 -4
- picarones/core/narrative/templates/en.yaml +9 -0
- picarones/core/narrative/templates/fr.yaml +9 -0
- picarones/core/pricing.py +309 -0
- picarones/core/statistics.py +80 -0
- picarones/data/pricing.yaml +136 -0
- picarones/report/generator.py +86 -0
- picarones/report/i18n/en.json +9 -0
- picarones/report/i18n/fr.json +9 -0
- picarones/report/templates/_app.js +118 -0
- picarones/report/templates/_styles.css +53 -0
- picarones/report/templates/view_analyses.html +25 -0
- pyproject.toml +1 -0
- tests/test_sprint20_pareto_pricing.py +371 -0
|
@@ -194,6 +194,7 @@ AZURE_DOC_INTEL_KEY=...
|
|
| 194 |
| 17 | **Sprint 2 du plan rapport** : refactor de `generator.py` (3690 → 617 lignes) via Jinja2. Le monolithe `_HTML_TEMPLATE` est découpé en 10 fichiers externes dans `picarones/report/templates/` (base + 5 vues + header/footer + CSS + JS). L'i18n `i18n.py` (dict Python 101 clés) migré vers `picarones/report/i18n/{fr,en}.json` chargés à l'import. Ajout de 16 tests de non-régression (structure, déterminisme, i18n, garde-fous contre balises dupliquées). |
|
| 195 |
| 18 | **Sprint 3 du plan rapport** : test de Friedman multi-moteurs + post-hoc Nemenyi + Critical Difference Diagram (Demšar 2006). Nouveau module `core/statistics.py` : `friedman_test`, `nemenyi_posthoc`, `build_critical_difference_svg` avec table Nemenyi (k=2 à 50, α=0,05 et 0,01), fallback pur Python (Wilson-Hilferty pour chi²), support scipy optionnel (extra `stats`). Partial `_critical_difference.html` inséré en tête du rapport, SVG rendu server-side (pas de JS), i18n FR/EN pour les aides. Détecteur narratif `detect_statistical_tie` activé (lit `nemenyi.tied_groups`). 41 tests ajoutés (cas canoniques, dégénérés, SVG, intégration rapport). |
|
| 196 |
| 19 | **Sprint 4 du plan rapport** : moteur narratif complet + synthèse factuelle en tête. 9 détecteurs implémentés (global_leader_cer, significant_gap, stratum_winner/collapse, error_profile_outlier, llm_hallucination_flag, robustness_fragile, speed_winner, confidence_warning). Arbitre (`arbiter.py`) avec tri par importance, non-redondance, suppression des contradictions Wilcoxon/Nemenyi. Renderer (`renderer.py`) lit templates YAML `core/narrative/templates/{fr,en}.yaml` (10 templates par langue) et rend par `str.format_map` déterministe. Nouveau partial `_narrative_summary.html` placé en tête du rapport (entre header et CDD). Garde-fou anti-hallucination testé : chaque nombre rendu est traçable au payload du Fact associé. 32 tests (détecteurs unitaires, arbitre, renderer, E2E, traçabilité, intégration HTML). `pareto_alternative` et `cost_outlier` restent stubs pour Sprint 5. |
|
|
|
|
| 197 |
|
| 198 |
---
|
| 199 |
|
|
@@ -219,12 +220,12 @@ parse la synthèse rendue et vérifie que chaque nombre est traçable au payload
|
|
| 219 |
(via `_numbers_in_payload`) augmenté d'une liste blanche limitative de constantes
|
| 220 |
de template (`95`, `100`).
|
| 221 |
|
| 222 |
-
**Détecteurs activés dans le registre par défaut (Sprint
|
| 223 |
- Sprint 3 : `statistical_tie`
|
| 224 |
- Sprint 4 : `global_leader_cer`, `significant_gap`, `stratum_winner`, `stratum_collapse`,
|
| 225 |
`error_profile_outlier`, `llm_hallucination_flag`, `robustness_fragile`,
|
| 226 |
`speed_winner`, `confidence_warning`
|
| 227 |
-
- Sprint 5 : `pareto_alternative`, `cost_outlier`
|
| 228 |
|
| 229 |
**Règle anti-contradiction** (arbitre) : si `SIGNIFICANT_GAP` (Wilcoxon non corrigé)
|
| 230 |
et `STATISTICAL_TIE` (Nemenyi corrigé) concernent les mêmes moteurs, Nemenyi
|
|
@@ -240,7 +241,7 @@ au template `_narrative_summary.html` (placé entre `_header.html` et `_critical
|
|
| 240 |
## Contexte développement
|
| 241 |
|
| 242 |
- **Environnement** : GitHub Codespaces (`/workspaces/Picarones`), Python 3.12
|
| 243 |
-
- **Tests** :
|
| 244 |
- **Branche active** : `claude/review-picarones-benchmarks-E3J42`
|
| 245 |
- **Transcript de la conversation de développement** :
|
| 246 |
`/mnt/transcripts/2026-03-11-14-01-41-picarones-ocr-bench-project.txt`
|
|
|
|
| 194 |
| 17 | **Sprint 2 du plan rapport** : refactor de `generator.py` (3690 → 617 lignes) via Jinja2. Le monolithe `_HTML_TEMPLATE` est découpé en 10 fichiers externes dans `picarones/report/templates/` (base + 5 vues + header/footer + CSS + JS). L'i18n `i18n.py` (dict Python 101 clés) migré vers `picarones/report/i18n/{fr,en}.json` chargés à l'import. Ajout de 16 tests de non-régression (structure, déterminisme, i18n, garde-fous contre balises dupliquées). |
|
| 195 |
| 18 | **Sprint 3 du plan rapport** : test de Friedman multi-moteurs + post-hoc Nemenyi + Critical Difference Diagram (Demšar 2006). Nouveau module `core/statistics.py` : `friedman_test`, `nemenyi_posthoc`, `build_critical_difference_svg` avec table Nemenyi (k=2 à 50, α=0,05 et 0,01), fallback pur Python (Wilson-Hilferty pour chi²), support scipy optionnel (extra `stats`). Partial `_critical_difference.html` inséré en tête du rapport, SVG rendu server-side (pas de JS), i18n FR/EN pour les aides. Détecteur narratif `detect_statistical_tie` activé (lit `nemenyi.tied_groups`). 41 tests ajoutés (cas canoniques, dégénérés, SVG, intégration rapport). |
|
| 196 |
| 19 | **Sprint 4 du plan rapport** : moteur narratif complet + synthèse factuelle en tête. 9 détecteurs implémentés (global_leader_cer, significant_gap, stratum_winner/collapse, error_profile_outlier, llm_hallucination_flag, robustness_fragile, speed_winner, confidence_warning). Arbitre (`arbiter.py`) avec tri par importance, non-redondance, suppression des contradictions Wilcoxon/Nemenyi. Renderer (`renderer.py`) lit templates YAML `core/narrative/templates/{fr,en}.yaml` (10 templates par langue) et rend par `str.format_map` déterministe. Nouveau partial `_narrative_summary.html` placé en tête du rapport (entre header et CDD). Garde-fou anti-hallucination testé : chaque nombre rendu est traçable au payload du Fact associé. 32 tests (détecteurs unitaires, arbitre, renderer, E2E, traçabilité, intégration HTML). `pareto_alternative` et `cost_outlier` restent stubs pour Sprint 5. |
|
| 197 |
+
| 20 | **Sprint 5 du plan rapport** : modélisation coût + vue Pareto. Nouveau module `core/pricing.py` (`EngineCost`, `estimate_cost`, `build_costs_for_benchmark`) lit la table indicative `picarones/data/pricing.yaml` (OCR locaux + APIs cloud + LLM). Nouvel algo `compute_pareto_front` dans `statistics.py`, multi-objectifs (min/max), N dimensions. Vue Chart.js dans `view_analyses.html` avec front Pareto en surbrillance et 3 toggles d'axe : coût € / vitesse / carbone (dernier étiqueté ⚗ expérimental). Détecteurs `pareto_alternative` et `cost_outlier` activés. Templates FR/EN ajoutés. Bloc "hypothèses détaillées" replié sous le graphique avec liens vers les sources de prix. 28 tests (pricing local vs cloud, override taux horaire, pareto canonique/dégénéré/3D, détecteurs, intégration HTML). |
|
| 198 |
|
| 199 |
---
|
| 200 |
|
|
|
|
| 220 |
(via `_numbers_in_payload`) augmenté d'une liste blanche limitative de constantes
|
| 221 |
de template (`95`, `100`).
|
| 222 |
|
| 223 |
+
**Détecteurs activés dans le registre par défaut (Sprint 20)** — les 12 sont opérationnels :
|
| 224 |
- Sprint 3 : `statistical_tie`
|
| 225 |
- Sprint 4 : `global_leader_cer`, `significant_gap`, `stratum_winner`, `stratum_collapse`,
|
| 226 |
`error_profile_outlier`, `llm_hallucination_flag`, `robustness_fragile`,
|
| 227 |
`speed_winner`, `confidence_warning`
|
| 228 |
+
- Sprint 5 : `pareto_alternative`, `cost_outlier`
|
| 229 |
|
| 230 |
**Règle anti-contradiction** (arbitre) : si `SIGNIFICANT_GAP` (Wilcoxon non corrigé)
|
| 231 |
et `STATISTICAL_TIE` (Nemenyi corrigé) concernent les mêmes moteurs, Nemenyi
|
|
|
|
| 241 |
## Contexte développement
|
| 242 |
|
| 243 |
- **Environnement** : GitHub Codespaces (`/workspaces/Picarones`), Python 3.12
|
| 244 |
+
- **Tests** : 1202 passed, 2 skipped (Sprint 20)
|
| 245 |
- **Branche active** : `claude/review-picarones-benchmarks-E3J42`
|
| 246 |
- **Transcript de la conversation de développement** :
|
| 247 |
`/mnt/transcripts/2026-03-11-14-01-41-picarones-ocr-bench-project.txt`
|
|
@@ -8,3 +8,4 @@ recursive-include picarones *.json *.yaml *.yml
|
|
| 8 |
recursive-include picarones/report/templates *.j2 *.html *.css *.js
|
| 9 |
recursive-include picarones/report/i18n *.json
|
| 10 |
recursive-include picarones/core/narrative/templates *.yaml
|
|
|
|
|
|
| 8 |
recursive-include picarones/report/templates *.j2 *.html *.css *.js
|
| 9 |
recursive-include picarones/report/i18n *.json
|
| 10 |
recursive-include picarones/core/narrative/templates *.yaml
|
| 11 |
+
recursive-include picarones/data *.yaml
|
|
@@ -162,8 +162,63 @@ def detect_significant_gap(benchmark_data: dict) -> list[Fact]:
|
|
| 162 |
|
| 163 |
|
| 164 |
def detect_pareto_alternative(benchmark_data: dict) -> list[Fact]:
|
| 165 |
-
"""Moteur Pareto-dominant différent du leader CER.
|
| 166 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
|
| 169 |
def _stratum_cer_by_engine(benchmark_data: dict) -> dict[str, dict[str, list[float]]]:
|
|
@@ -430,8 +485,44 @@ def detect_robustness_fragile(benchmark_data: dict) -> list[Fact]:
|
|
| 430 |
|
| 431 |
|
| 432 |
def detect_cost_outlier(benchmark_data: dict) -> list[Fact]:
|
| 433 |
-
"""Moteur
|
| 434 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 435 |
|
| 436 |
|
| 437 |
def _mean_duration_per_engine(benchmark_data: dict) -> dict[str, float]:
|
|
|
|
| 162 |
|
| 163 |
|
| 164 |
def detect_pareto_alternative(benchmark_data: dict) -> list[Fact]:
|
| 165 |
+
"""Moteur Pareto-dominant différent du leader CER.
|
| 166 |
+
|
| 167 |
+
Lit ``benchmark_data["pareto"]["cost"]`` (Sprint 19) et émet un Fact si
|
| 168 |
+
la frontière contient un moteur autre que le leader CER, pour souligner
|
| 169 |
+
l'existence d'un compromis coût/qualité intéressant.
|
| 170 |
+
"""
|
| 171 |
+
pareto = (benchmark_data.get("pareto") or {}).get("cost") or {}
|
| 172 |
+
front = pareto.get("front") or []
|
| 173 |
+
points = pareto.get("points") or []
|
| 174 |
+
if len(front) < 2:
|
| 175 |
+
return []
|
| 176 |
+
|
| 177 |
+
ranking = benchmark_data.get("ranking") or []
|
| 178 |
+
if not ranking:
|
| 179 |
+
return []
|
| 180 |
+
leader = ranking[0].get("engine")
|
| 181 |
+
|
| 182 |
+
# Le moteur le moins cher sur le front (hors leader)
|
| 183 |
+
alt: Optional[dict] = None
|
| 184 |
+
for p in points:
|
| 185 |
+
if p.get("engine") == leader:
|
| 186 |
+
continue
|
| 187 |
+
if p.get("engine") not in front:
|
| 188 |
+
continue
|
| 189 |
+
if alt is None or float(p.get("cost") or 0.0) < float(alt.get("cost") or 0.0):
|
| 190 |
+
alt = p
|
| 191 |
+
if alt is None:
|
| 192 |
+
return []
|
| 193 |
+
|
| 194 |
+
leader_point = next((p for p in points if p.get("engine") == leader), None)
|
| 195 |
+
if leader_point is None:
|
| 196 |
+
return []
|
| 197 |
+
|
| 198 |
+
alt_cer = float(alt.get("cer") or 0.0)
|
| 199 |
+
alt_cost = float(alt.get("cost") or 0.0)
|
| 200 |
+
leader_cer = float(leader_point.get("cer") or 0.0)
|
| 201 |
+
leader_cost = float(leader_point.get("cost") or 0.0)
|
| 202 |
+
if alt_cost >= leader_cost or alt_cost <= 0:
|
| 203 |
+
return [] # pas réellement moins cher — pas intéressant à remonter
|
| 204 |
+
|
| 205 |
+
return [Fact(
|
| 206 |
+
type=FactType.PARETO_ALTERNATIVE,
|
| 207 |
+
importance=FactImportance.HIGH,
|
| 208 |
+
payload={
|
| 209 |
+
"engine": alt["engine"],
|
| 210 |
+
"leader": leader,
|
| 211 |
+
"cer": round(alt_cer, 4),
|
| 212 |
+
"cer_pct": round(alt_cer * 100, 2),
|
| 213 |
+
"cost": round(alt_cost, 2),
|
| 214 |
+
"leader_cer": round(leader_cer, 4),
|
| 215 |
+
"leader_cer_pct": round(leader_cer * 100, 2),
|
| 216 |
+
"leader_cost": round(leader_cost, 2),
|
| 217 |
+
"cost_saving_ratio": round(leader_cost / alt_cost, 1) if alt_cost > 0 else None,
|
| 218 |
+
"delta_cer_pct": round((alt_cer - leader_cer) * 100, 2),
|
| 219 |
+
},
|
| 220 |
+
engines_involved=(alt["engine"],),
|
| 221 |
+
)]
|
| 222 |
|
| 223 |
|
| 224 |
def _stratum_cer_by_engine(benchmark_data: dict) -> dict[str, dict[str, list[float]]]:
|
|
|
|
| 485 |
|
| 486 |
|
| 487 |
def detect_cost_outlier(benchmark_data: dict) -> list[Fact]:
|
| 488 |
+
"""Moteur dont le coût est très disproportionné par rapport à son apport.
|
| 489 |
+
|
| 490 |
+
Flag un moteur dont le coût ≥ 5× la médiane ET qui n'est pas sur le
|
| 491 |
+
front Pareto (donc dominé par moins cher OU meilleur CER).
|
| 492 |
+
"""
|
| 493 |
+
pareto = (benchmark_data.get("pareto") or {}).get("cost") or {}
|
| 494 |
+
points = pareto.get("points") or []
|
| 495 |
+
front = set(pareto.get("front") or [])
|
| 496 |
+
if len(points) < 3:
|
| 497 |
+
return []
|
| 498 |
+
|
| 499 |
+
costs = [float(p["cost"]) for p in points if p.get("cost") is not None]
|
| 500 |
+
if not costs:
|
| 501 |
+
return []
|
| 502 |
+
median_cost = _stats.median(costs)
|
| 503 |
+
if median_cost <= 0:
|
| 504 |
+
return []
|
| 505 |
+
|
| 506 |
+
facts: list[Fact] = []
|
| 507 |
+
for p in points:
|
| 508 |
+
c = float(p.get("cost") or 0.0)
|
| 509 |
+
if c < 5.0 * median_cost:
|
| 510 |
+
continue
|
| 511 |
+
if p["engine"] in front:
|
| 512 |
+
continue # sur le front → coût justifié par une qualité unique
|
| 513 |
+
facts.append(Fact(
|
| 514 |
+
type=FactType.COST_OUTLIER,
|
| 515 |
+
importance=FactImportance.MEDIUM,
|
| 516 |
+
payload={
|
| 517 |
+
"engine": p["engine"],
|
| 518 |
+
"cost": round(c, 2),
|
| 519 |
+
"median_cost": round(median_cost, 2),
|
| 520 |
+
"ratio_to_median": round(c / median_cost, 1),
|
| 521 |
+
"cer_pct": round(float(p.get("cer") or 0.0) * 100, 2),
|
| 522 |
+
},
|
| 523 |
+
engines_involved=(p["engine"],),
|
| 524 |
+
))
|
| 525 |
+
return facts
|
| 526 |
|
| 527 |
|
| 528 |
def _mean_duration_per_engine(benchmark_data: dict) -> dict[str, float]:
|
|
@@ -44,3 +44,12 @@ speed_winner: >-
|
|
| 44 |
confidence_warning: >-
|
| 45 |
Ranking is fragile: the 95 % confidence interval of {engine} spans
|
| 46 |
{ci_width_pct} CER points, compared with a gap of {gap_to_runner_up_pct} points to the runner-up.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
confidence_warning: >-
|
| 45 |
Ranking is fragile: the 95 % confidence interval of {engine} spans
|
| 46 |
{ci_width_pct} CER points, compared with a gap of {gap_to_runner_up_pct} points to the runner-up.
|
| 47 |
+
|
| 48 |
+
pareto_alternative: >-
|
| 49 |
+
At much lower cost, {engine} offers an interesting trade-off ({cer_pct} %
|
| 50 |
+
CER for {cost} €/1000 pages, vs {leader_cer_pct} % / {leader_cost} € for
|
| 51 |
+
{leader}, i.e. ×{cost_saving_ratio} cheaper).
|
| 52 |
+
|
| 53 |
+
cost_outlier: >-
|
| 54 |
+
Disproportionate cost for {engine} ({cost} €/1000 pages, ×{ratio_to_median}
|
| 55 |
+
the median) without a compensating quality advantage (CER {cer_pct} %).
|
|
@@ -48,3 +48,12 @@ speed_winner: >-
|
|
| 48 |
confidence_warning: >-
|
| 49 |
Classement fragile : l'intervalle de confiance à 95 % de {engine} s'étend
|
| 50 |
sur {ci_width_pct} points de CER, à comparer à l'écart de {gap_to_runner_up_pct} points avec le second.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
confidence_warning: >-
|
| 49 |
Classement fragile : l'intervalle de confiance à 95 % de {engine} s'étend
|
| 50 |
sur {ci_width_pct} points de CER, à comparer à l'écart de {gap_to_runner_up_pct} points avec le second.
|
| 51 |
+
|
| 52 |
+
pareto_alternative: >-
|
| 53 |
+
À coût sensiblement inférieur, {engine} offre un compromis intéressant
|
| 54 |
+
({cer_pct} % de CER pour {cost} €/1000 pages, contre {leader_cer_pct} % /
|
| 55 |
+
{leader_cost} € pour {leader}, soit ×{cost_saving_ratio} moins cher).
|
| 56 |
+
|
| 57 |
+
cost_outlier: >-
|
| 58 |
+
Coût disproportionné pour {engine} ({cost} €/1000 pages, ×{ratio_to_median}
|
| 59 |
+
la médiane) sans avantage de qualité compensatoire (CER {cer_pct} %).
|
|
@@ -0,0 +1,309 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Modélisation des coûts — APIs cloud et temps d'inférence local.
|
| 2 |
+
|
| 3 |
+
Sert uniquement à la vue Pareto coût/qualité du rapport (Sprint 5).
|
| 4 |
+
Les prix sont indicatifs et vieillissent vite : voir ``picarones/data/pricing.yaml``
|
| 5 |
+
pour les hypothèses, dates et URLs de référence.
|
| 6 |
+
|
| 7 |
+
Conventions
|
| 8 |
+
-----------
|
| 9 |
+
- Unité monétaire : EUR (conversion indicative depuis USD quand applicable).
|
| 10 |
+
- Coût exprimé par **1 000 pages** traitées.
|
| 11 |
+
- Coût local = temps moyen d'inférence × taux horaire (paramétrable).
|
| 12 |
+
- Empreinte carbone optionnelle : kWh × intensité g CO₂/kWh du réseau
|
| 13 |
+
d'exécution (mix France bas carbone par défaut pour le local,
|
| 14 |
+
moyenne cloud hyperscaler pour les APIs).
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
from __future__ import annotations
|
| 18 |
+
|
| 19 |
+
import logging
|
| 20 |
+
from dataclasses import dataclass, field
|
| 21 |
+
from pathlib import Path
|
| 22 |
+
from typing import Optional
|
| 23 |
+
|
| 24 |
+
import yaml
|
| 25 |
+
|
| 26 |
+
logger = logging.getLogger(__name__)
|
| 27 |
+
|
| 28 |
+
_DEFAULT_PRICING_PATH = Path(__file__).parent.parent / "data" / "pricing.yaml"
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
@dataclass(frozen=True)
|
| 32 |
+
class PricingDefaults:
|
| 33 |
+
"""Valeurs par défaut du fichier de prix (section ``meta``)."""
|
| 34 |
+
|
| 35 |
+
last_updated: Optional[str] = None
|
| 36 |
+
currency: str = "EUR"
|
| 37 |
+
hourly_rate_local_cpu_eur: float = 0.08
|
| 38 |
+
hourly_rate_local_gpu_eur: float = 1.20
|
| 39 |
+
grid_intensity_local: float = 58.0
|
| 40 |
+
grid_intensity_cloud: float = 380.0
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
@dataclass
|
| 44 |
+
class EngineCost:
|
| 45 |
+
"""Coût estimé d'un moteur sur 1 000 pages, avec traçabilité des hypothèses.
|
| 46 |
+
|
| 47 |
+
La représentation est immuable après construction : une fois que l'utilisateur
|
| 48 |
+
a choisi un taux horaire local, toutes les instances partagent cette
|
| 49 |
+
hypothèse par injection explicite dans ``build_costs_for_benchmark``.
|
| 50 |
+
"""
|
| 51 |
+
|
| 52 |
+
engine_key: str
|
| 53 |
+
"""Nom ou modèle servant de clé dans la table (ex. ``"gpt-4o"``, ``"tesseract"``)."""
|
| 54 |
+
|
| 55 |
+
type: str # "local" | "cloud_api" | "unknown"
|
| 56 |
+
|
| 57 |
+
cost_per_1k_pages_eur: Optional[float] = None
|
| 58 |
+
"""Coût par 1 000 pages en euros. ``None`` si les données sont insuffisantes."""
|
| 59 |
+
|
| 60 |
+
currency: str = "EUR"
|
| 61 |
+
|
| 62 |
+
# Source / date
|
| 63 |
+
pricing_source_url: Optional[str] = None
|
| 64 |
+
pricing_date: Optional[str] = None
|
| 65 |
+
|
| 66 |
+
# Pour les APIs cloud : prix brut
|
| 67 |
+
api_price_per_1k_pages: Optional[float] = None
|
| 68 |
+
|
| 69 |
+
# Pour le local : temps d'inférence et taux horaire utilisés
|
| 70 |
+
local_mean_seconds_per_page: Optional[float] = None
|
| 71 |
+
hourly_rate_eur: Optional[float] = None
|
| 72 |
+
|
| 73 |
+
# Empreinte carbone (estimation — étiquetée "expérimentale" dans le rapport)
|
| 74 |
+
kwh_per_1k_pages: Optional[float] = None
|
| 75 |
+
grid_intensity_g_co2_per_kwh: Optional[float] = None
|
| 76 |
+
co2_per_1k_pages_g: Optional[float] = None
|
| 77 |
+
|
| 78 |
+
notes: Optional[str] = None
|
| 79 |
+
|
| 80 |
+
assumptions: list[str] = field(default_factory=list)
|
| 81 |
+
"""Liste d'hypothèses textuelles à afficher sous le graphique."""
|
| 82 |
+
|
| 83 |
+
def as_dict(self) -> dict:
|
| 84 |
+
return {
|
| 85 |
+
"engine_key": self.engine_key,
|
| 86 |
+
"type": self.type,
|
| 87 |
+
"cost_per_1k_pages_eur": self.cost_per_1k_pages_eur,
|
| 88 |
+
"currency": self.currency,
|
| 89 |
+
"pricing_source_url": self.pricing_source_url,
|
| 90 |
+
"pricing_date": self.pricing_date,
|
| 91 |
+
"api_price_per_1k_pages": self.api_price_per_1k_pages,
|
| 92 |
+
"local_mean_seconds_per_page": self.local_mean_seconds_per_page,
|
| 93 |
+
"hourly_rate_eur": self.hourly_rate_eur,
|
| 94 |
+
"kwh_per_1k_pages": self.kwh_per_1k_pages,
|
| 95 |
+
"grid_intensity_g_co2_per_kwh": self.grid_intensity_g_co2_per_kwh,
|
| 96 |
+
"co2_per_1k_pages_g": self.co2_per_1k_pages_g,
|
| 97 |
+
"notes": self.notes,
|
| 98 |
+
"assumptions": list(self.assumptions),
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
def load_pricing_database(path: Optional[Path] = None) -> tuple[PricingDefaults, dict]:
|
| 103 |
+
"""Charge la table de prix YAML.
|
| 104 |
+
|
| 105 |
+
Retourne ``(defaults, engines_table)`` où ``engines_table`` est un dict
|
| 106 |
+
``{engine_key: raw_entry}``.
|
| 107 |
+
"""
|
| 108 |
+
path = Path(path) if path else _DEFAULT_PRICING_PATH
|
| 109 |
+
if not path.exists():
|
| 110 |
+
logger.warning("[pricing] fichier %s introuvable", path)
|
| 111 |
+
return PricingDefaults(), {}
|
| 112 |
+
try:
|
| 113 |
+
with path.open(encoding="utf-8") as fh:
|
| 114 |
+
data = yaml.safe_load(fh) or {}
|
| 115 |
+
except yaml.YAMLError as e:
|
| 116 |
+
logger.warning("[pricing] échec parsing %s : %s", path, e)
|
| 117 |
+
return PricingDefaults(), {}
|
| 118 |
+
|
| 119 |
+
meta = data.get("meta", {}) or {}
|
| 120 |
+
defaults = PricingDefaults(
|
| 121 |
+
last_updated=meta.get("last_updated"),
|
| 122 |
+
currency=meta.get("currency", "EUR"),
|
| 123 |
+
hourly_rate_local_cpu_eur=float(meta.get("default_hourly_rate_local_cpu_eur", 0.08)),
|
| 124 |
+
hourly_rate_local_gpu_eur=float(meta.get("default_hourly_rate_local_gpu_eur", 1.20)),
|
| 125 |
+
grid_intensity_local=float(meta.get("default_grid_intensity_g_co2_per_kwh", 58.0)),
|
| 126 |
+
grid_intensity_cloud=float(meta.get("cloud_grid_intensity_g_co2_per_kwh", 380.0)),
|
| 127 |
+
)
|
| 128 |
+
engines_table = data.get("engines", {}) or {}
|
| 129 |
+
return defaults, engines_table
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
def _match_key(engine_name: str, llm_model: Optional[str], table: dict) -> Optional[str]:
|
| 133 |
+
"""Cherche la meilleure clé pour ce moteur dans la table.
|
| 134 |
+
|
| 135 |
+
Stratégie : d'abord le nom du modèle LLM (pour les pipelines), puis le
|
| 136 |
+
nom OCR, puis un match partiel (substring) comme filet de sécurité.
|
| 137 |
+
"""
|
| 138 |
+
candidates = [llm_model, engine_name]
|
| 139 |
+
for c in candidates:
|
| 140 |
+
if c and c in table:
|
| 141 |
+
return c
|
| 142 |
+
# Matching partiel — utile pour "tesseract → gpt-4o" ou "gpt-4o-vision"
|
| 143 |
+
for c in candidates:
|
| 144 |
+
if not c:
|
| 145 |
+
continue
|
| 146 |
+
for key in table:
|
| 147 |
+
if key in c:
|
| 148 |
+
return key
|
| 149 |
+
return None
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
def estimate_cost(
|
| 153 |
+
engine_name: str,
|
| 154 |
+
*,
|
| 155 |
+
llm_model: Optional[str] = None,
|
| 156 |
+
is_pipeline: bool = False,
|
| 157 |
+
measured_seconds_per_page: Optional[float] = None,
|
| 158 |
+
table: Optional[dict] = None,
|
| 159 |
+
defaults: Optional[PricingDefaults] = None,
|
| 160 |
+
hourly_rate_override_eur: Optional[float] = None,
|
| 161 |
+
) -> EngineCost:
|
| 162 |
+
"""Calcule le ``EngineCost`` pour un moteur donné.
|
| 163 |
+
|
| 164 |
+
Parameters
|
| 165 |
+
----------
|
| 166 |
+
engine_name:
|
| 167 |
+
Nom public du moteur (ex. ``"tesseract"``, ``"tesseract → gpt-4o"``).
|
| 168 |
+
llm_model:
|
| 169 |
+
Si pipeline OCR+LLM, le modèle LLM utilisé — prioritaire pour la
|
| 170 |
+
lookup car c'est lui qui domine le coût.
|
| 171 |
+
is_pipeline:
|
| 172 |
+
Indique un pipeline OCR+LLM (change la sémantique de lookup).
|
| 173 |
+
measured_seconds_per_page:
|
| 174 |
+
Temps moyen observé sur le benchmark courant. Remplace la valeur
|
| 175 |
+
indicative de la table si fournie (plus fiable).
|
| 176 |
+
table, defaults:
|
| 177 |
+
Overrides pour tests ou usage institutionnel.
|
| 178 |
+
hourly_rate_override_eur:
|
| 179 |
+
Taux horaire à utiliser pour le calcul local (sinon valeur table
|
| 180 |
+
ou défaut).
|
| 181 |
+
"""
|
| 182 |
+
if table is None or defaults is None:
|
| 183 |
+
_defaults, _table = load_pricing_database()
|
| 184 |
+
defaults = defaults or _defaults
|
| 185 |
+
table = table or _table
|
| 186 |
+
|
| 187 |
+
key = _match_key(engine_name, llm_model if is_pipeline else None, table)
|
| 188 |
+
if key is None:
|
| 189 |
+
return EngineCost(
|
| 190 |
+
engine_key=engine_name,
|
| 191 |
+
type="unknown",
|
| 192 |
+
assumptions=["Aucune entrée dans la table de prix pour ce moteur."],
|
| 193 |
+
)
|
| 194 |
+
|
| 195 |
+
entry = table[key]
|
| 196 |
+
etype = str(entry.get("type", "unknown"))
|
| 197 |
+
notes = entry.get("notes")
|
| 198 |
+
assumptions: list[str] = []
|
| 199 |
+
currency = defaults.currency
|
| 200 |
+
|
| 201 |
+
cost_eur: Optional[float] = None
|
| 202 |
+
api_price: Optional[float] = None
|
| 203 |
+
local_seconds = measured_seconds_per_page
|
| 204 |
+
hourly_rate = None
|
| 205 |
+
|
| 206 |
+
if etype == "cloud_api":
|
| 207 |
+
api_price = entry.get("api_price_per_1k_pages")
|
| 208 |
+
if api_price is not None:
|
| 209 |
+
cost_eur = float(api_price)
|
| 210 |
+
assumptions.append(
|
| 211 |
+
f"Prix API indicatif : {cost_eur:.2f} €/1000 pages "
|
| 212 |
+
f"(source : {entry.get('pricing_source_url', '—')}, {entry.get('pricing_date', 'date inconnue')})."
|
| 213 |
+
)
|
| 214 |
+
elif etype == "local":
|
| 215 |
+
indicative_seconds = entry.get("local_mean_seconds_per_page")
|
| 216 |
+
if local_seconds is None and indicative_seconds is not None:
|
| 217 |
+
local_seconds = float(indicative_seconds)
|
| 218 |
+
assumptions.append(
|
| 219 |
+
f"Temps d'inférence indicatif : {local_seconds:.1f} s/page (non mesuré sur ce benchmark)."
|
| 220 |
+
)
|
| 221 |
+
elif local_seconds is not None:
|
| 222 |
+
assumptions.append(
|
| 223 |
+
f"Temps d'inférence mesuré : {local_seconds:.1f} s/page (moyenne sur le corpus)."
|
| 224 |
+
)
|
| 225 |
+
|
| 226 |
+
hourly_rate = (
|
| 227 |
+
hourly_rate_override_eur
|
| 228 |
+
if hourly_rate_override_eur is not None
|
| 229 |
+
else entry.get("hourly_rate_override_eur")
|
| 230 |
+
)
|
| 231 |
+
if hourly_rate is None:
|
| 232 |
+
# Heuristique : si l'entrée précise un override GPU, sinon CPU
|
| 233 |
+
hourly_rate = (
|
| 234 |
+
defaults.hourly_rate_local_gpu_eur
|
| 235 |
+
if "gpu" in str(notes or "").lower()
|
| 236 |
+
else defaults.hourly_rate_local_cpu_eur
|
| 237 |
+
)
|
| 238 |
+
hourly_rate = float(hourly_rate)
|
| 239 |
+
|
| 240 |
+
if local_seconds is not None and hourly_rate is not None:
|
| 241 |
+
cost_eur = (local_seconds / 3600.0) * hourly_rate * 1000.0
|
| 242 |
+
assumptions.append(
|
| 243 |
+
f"Taux horaire appliqué : {hourly_rate:.2f} €/h "
|
| 244 |
+
f"(défaut {'GPU' if hourly_rate >= 0.5 else 'CPU'})."
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
# Empreinte carbone optionnelle
|
| 248 |
+
kwh_1k = entry.get("kwh_per_1k_pages")
|
| 249 |
+
grid = (
|
| 250 |
+
entry.get("grid_intensity_g_co2_per_kwh")
|
| 251 |
+
or (defaults.grid_intensity_cloud if etype == "cloud_api" else defaults.grid_intensity_local)
|
| 252 |
+
)
|
| 253 |
+
co2_g = None
|
| 254 |
+
if kwh_1k is not None and grid is not None:
|
| 255 |
+
co2_g = float(kwh_1k) * float(grid)
|
| 256 |
+
|
| 257 |
+
return EngineCost(
|
| 258 |
+
engine_key=key,
|
| 259 |
+
type=etype,
|
| 260 |
+
cost_per_1k_pages_eur=cost_eur,
|
| 261 |
+
currency=currency,
|
| 262 |
+
pricing_source_url=entry.get("pricing_source_url"),
|
| 263 |
+
pricing_date=entry.get("pricing_date"),
|
| 264 |
+
api_price_per_1k_pages=api_price,
|
| 265 |
+
local_mean_seconds_per_page=local_seconds,
|
| 266 |
+
hourly_rate_eur=hourly_rate,
|
| 267 |
+
kwh_per_1k_pages=float(kwh_1k) if kwh_1k is not None else None,
|
| 268 |
+
grid_intensity_g_co2_per_kwh=float(grid) if grid is not None else None,
|
| 269 |
+
co2_per_1k_pages_g=co2_g,
|
| 270 |
+
notes=notes,
|
| 271 |
+
assumptions=assumptions,
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
def build_costs_for_benchmark(
|
| 276 |
+
engines_summary: list[dict],
|
| 277 |
+
durations_by_engine: dict[str, float],
|
| 278 |
+
*,
|
| 279 |
+
hourly_rate_local_eur: Optional[float] = None,
|
| 280 |
+
pricing_path: Optional[Path] = None,
|
| 281 |
+
) -> dict[str, dict]:
|
| 282 |
+
"""Calcule le coût de chaque moteur d'un benchmark.
|
| 283 |
+
|
| 284 |
+
Returns
|
| 285 |
+
-------
|
| 286 |
+
dict ``{engine_name: EngineCost.as_dict()}``.
|
| 287 |
+
"""
|
| 288 |
+
defaults, table = load_pricing_database(pricing_path)
|
| 289 |
+
out: dict[str, dict] = {}
|
| 290 |
+
for e in engines_summary:
|
| 291 |
+
name = e.get("name")
|
| 292 |
+
if not name:
|
| 293 |
+
continue
|
| 294 |
+
measured = durations_by_engine.get(name)
|
| 295 |
+
llm_model = None
|
| 296 |
+
pipeline_info = e.get("pipeline_info") or {}
|
| 297 |
+
if pipeline_info:
|
| 298 |
+
llm_model = pipeline_info.get("llm_model")
|
| 299 |
+
cost = estimate_cost(
|
| 300 |
+
engine_name=name,
|
| 301 |
+
llm_model=llm_model,
|
| 302 |
+
is_pipeline=bool(e.get("is_pipeline")),
|
| 303 |
+
measured_seconds_per_page=measured,
|
| 304 |
+
table=table,
|
| 305 |
+
defaults=defaults,
|
| 306 |
+
hourly_rate_override_eur=hourly_rate_local_eur,
|
| 307 |
+
)
|
| 308 |
+
out[name] = cost.as_dict()
|
| 309 |
+
return out
|
|
@@ -8,6 +8,7 @@ Fonctions fournies
|
|
| 8 |
- friedman_test(engine_cer_map) : Friedman (k moteurs, n documents) [Sprint 17]
|
| 9 |
- nemenyi_posthoc(engine_cer_map) : post-hoc Nemenyi avec critical distance [Sprint 17]
|
| 10 |
- build_critical_difference_svg(...) : rendu SVG du CDD (Demšar 2006) [Sprint 17]
|
|
|
|
| 11 |
- cluster_errors(...) : regroupement des patterns d'erreurs
|
| 12 |
- compute_correlation_matrix(...) : matrice de corrélation des métriques
|
| 13 |
- compute_reliability_curve(...) : courbe CER vs. % docs les plus faciles
|
|
@@ -757,6 +758,85 @@ def _svg_escape(text: str) -> str:
|
|
| 757 |
.replace("'", "'"))
|
| 758 |
|
| 759 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 760 |
# ---------------------------------------------------------------------------
|
| 761 |
# Clustering des patterns d'erreurs
|
| 762 |
# ---------------------------------------------------------------------------
|
|
|
|
| 8 |
- friedman_test(engine_cer_map) : Friedman (k moteurs, n documents) [Sprint 17]
|
| 9 |
- nemenyi_posthoc(engine_cer_map) : post-hoc Nemenyi avec critical distance [Sprint 17]
|
| 10 |
- build_critical_difference_svg(...) : rendu SVG du CDD (Demšar 2006) [Sprint 17]
|
| 11 |
+
- compute_pareto_front(points, ...) : frontière de Pareto multi-objectifs [Sprint 19]
|
| 12 |
- cluster_errors(...) : regroupement des patterns d'erreurs
|
| 13 |
- compute_correlation_matrix(...) : matrice de corrélation des métriques
|
| 14 |
- compute_reliability_curve(...) : courbe CER vs. % docs les plus faciles
|
|
|
|
| 758 |
.replace("'", "'"))
|
| 759 |
|
| 760 |
|
| 761 |
+
# ---------------------------------------------------------------------------
|
| 762 |
+
# Frontière de Pareto (Sprint 19)
|
| 763 |
+
# ---------------------------------------------------------------------------
|
| 764 |
+
|
| 765 |
+
def compute_pareto_front(
|
| 766 |
+
points: list[dict],
|
| 767 |
+
objectives: tuple[str, ...] = ("cer", "cost"),
|
| 768 |
+
name_key: str = "engine",
|
| 769 |
+
minimize: Optional[tuple[bool, ...]] = None,
|
| 770 |
+
) -> list[str]:
|
| 771 |
+
"""Calcule la frontière de Pareto sur ``len(objectives)`` dimensions.
|
| 772 |
+
|
| 773 |
+
Un point ``p`` est Pareto-dominant si aucun autre point n'a, pour TOUS
|
| 774 |
+
les objectifs, une valeur au moins aussi bonne ET au moins une valeur
|
| 775 |
+
strictement meilleure.
|
| 776 |
+
|
| 777 |
+
Parameters
|
| 778 |
+
----------
|
| 779 |
+
points:
|
| 780 |
+
Liste de dicts. Chaque dict doit contenir ``name_key`` et toutes les
|
| 781 |
+
clés de ``objectives``. Les points dont une valeur d'objectif est
|
| 782 |
+
``None`` sont ignorés (pas de comparaison possible).
|
| 783 |
+
objectives:
|
| 784 |
+
Clés des objectifs à minimiser/maximiser.
|
| 785 |
+
name_key:
|
| 786 |
+
Clé identifiant le point (par défaut ``"engine"``).
|
| 787 |
+
minimize:
|
| 788 |
+
Pour chaque objectif, ``True`` = minimiser (ex. CER, coût),
|
| 789 |
+
``False`` = maximiser (ex. ancrage). Doit avoir la même longueur
|
| 790 |
+
que ``objectives``.
|
| 791 |
+
|
| 792 |
+
Returns
|
| 793 |
+
-------
|
| 794 |
+
Liste des ``name`` des points sur le front Pareto, ordre stable depuis
|
| 795 |
+
``points``.
|
| 796 |
+
"""
|
| 797 |
+
if minimize is None:
|
| 798 |
+
minimize = tuple(True for _ in objectives)
|
| 799 |
+
if len(minimize) != len(objectives):
|
| 800 |
+
raise ValueError("`minimize` doit avoir la même longueur que `objectives`")
|
| 801 |
+
|
| 802 |
+
valid = []
|
| 803 |
+
for p in points:
|
| 804 |
+
try:
|
| 805 |
+
vals = tuple(float(p[k]) for k in objectives)
|
| 806 |
+
except (KeyError, TypeError, ValueError):
|
| 807 |
+
continue
|
| 808 |
+
valid.append((p[name_key], vals))
|
| 809 |
+
|
| 810 |
+
front: list[str] = []
|
| 811 |
+
for name_a, vals_a in valid:
|
| 812 |
+
dominated = False
|
| 813 |
+
for name_b, vals_b in valid:
|
| 814 |
+
if name_a == name_b:
|
| 815 |
+
continue
|
| 816 |
+
# B domine A si B est ≥ aussi bon partout ET strictement meilleur quelque part
|
| 817 |
+
better_or_equal_everywhere = True
|
| 818 |
+
strictly_better_somewhere = False
|
| 819 |
+
for va, vb, mini in zip(vals_a, vals_b, minimize):
|
| 820 |
+
if mini:
|
| 821 |
+
if vb > va:
|
| 822 |
+
better_or_equal_everywhere = False
|
| 823 |
+
break
|
| 824 |
+
if vb < va:
|
| 825 |
+
strictly_better_somewhere = True
|
| 826 |
+
else: # maximiser
|
| 827 |
+
if vb < va:
|
| 828 |
+
better_or_equal_everywhere = False
|
| 829 |
+
break
|
| 830 |
+
if vb > va:
|
| 831 |
+
strictly_better_somewhere = True
|
| 832 |
+
if better_or_equal_everywhere and strictly_better_somewhere:
|
| 833 |
+
dominated = True
|
| 834 |
+
break
|
| 835 |
+
if not dominated:
|
| 836 |
+
front.append(name_a)
|
| 837 |
+
return front
|
| 838 |
+
|
| 839 |
+
|
| 840 |
# ---------------------------------------------------------------------------
|
| 841 |
# Clustering des patterns d'erreurs
|
| 842 |
# ---------------------------------------------------------------------------
|
|
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Base de prix indicative des moteurs OCR/HTR et des LLM utilisés dans les
|
| 2 |
+
# pipelines OCR+LLM. Sert uniquement à la vue Pareto coût/qualité du rapport.
|
| 3 |
+
#
|
| 4 |
+
# AVERTISSEMENT
|
| 5 |
+
# -------------
|
| 6 |
+
# Ces prix sont des estimations datées et vieillissent vite. Ils sont donnés
|
| 7 |
+
# à titre indicatif et ne remplacent pas une négociation commerciale ou un
|
| 8 |
+
# relevé sur facture. Toute institution menant un benchmark avec un budget
|
| 9 |
+
# réel doit surcharger ces valeurs via ``ReportGenerator(..., pricing=...)``.
|
| 10 |
+
#
|
| 11 |
+
# CONVENTIONS
|
| 12 |
+
# -----------
|
| 13 |
+
# - Unité monétaire : EUR (conversion indicative depuis USD quand applicable)
|
| 14 |
+
# - Prix exprimé par 1000 pages traitées (1 page = 1 document moyen patrimonial,
|
| 15 |
+
# environ 1 500 caractères ou ~2 000 tokens LLM).
|
| 16 |
+
# - kWh par 1000 pages : estimation énergétique pour le calcul carbone optionnel.
|
| 17 |
+
# - Intensité carbone du réseau (g CO2 / kWh) : dépend du mix électrique de la
|
| 18 |
+
# région où le moteur est exécuté (France ≈ 58, US moyen ≈ 400, Irlande ≈ 350).
|
| 19 |
+
#
|
| 20 |
+
# CATÉGORIES
|
| 21 |
+
# ----------
|
| 22 |
+
# - ``type: local`` : moteur open-source tournant sur machine de l'utilisateur.
|
| 23 |
+
# Coût effectif = temps d'inférence × taux horaire paramétré.
|
| 24 |
+
# - ``type: cloud_api`` : service facturé à la page ou au token.
|
| 25 |
+
|
| 26 |
+
meta:
|
| 27 |
+
last_updated: "2026-04-01"
|
| 28 |
+
currency: EUR
|
| 29 |
+
default_hourly_rate_local_cpu_eur: 0.08 # machine locale amortie
|
| 30 |
+
default_hourly_rate_local_gpu_eur: 1.20 # g4dn.xlarge ou équivalent
|
| 31 |
+
default_grid_intensity_g_co2_per_kwh: 58 # France 2025 (mix bas carbone)
|
| 32 |
+
cloud_grid_intensity_g_co2_per_kwh: 380 # moyenne cloud hyperscalers
|
| 33 |
+
|
| 34 |
+
engines:
|
| 35 |
+
# ── OCR classiques locaux ─────────────────────────────────────────
|
| 36 |
+
tesseract:
|
| 37 |
+
type: local
|
| 38 |
+
local_mean_seconds_per_page: 2.0
|
| 39 |
+
kwh_per_1k_pages: 0.012
|
| 40 |
+
notes: "Open-source, CPU uniquement. Rapide mais moins précis sur écriture manuscrite."
|
| 41 |
+
|
| 42 |
+
pero_ocr:
|
| 43 |
+
type: local
|
| 44 |
+
local_mean_seconds_per_page: 18.0
|
| 45 |
+
kwh_per_1k_pages: 0.300
|
| 46 |
+
hourly_rate_override_eur: 1.20 # GPU
|
| 47 |
+
notes: "HTR deep learning, GPU recommandé. Best-in-class sur documents historiques."
|
| 48 |
+
|
| 49 |
+
kraken:
|
| 50 |
+
type: local
|
| 51 |
+
local_mean_seconds_per_page: 8.0
|
| 52 |
+
kwh_per_1k_pages: 0.150
|
| 53 |
+
hourly_rate_override_eur: 1.20
|
| 54 |
+
notes: "HTR open-source, GPU recommandé. Modèles pré-entraînés via HTR-United."
|
| 55 |
+
|
| 56 |
+
calamari:
|
| 57 |
+
type: local
|
| 58 |
+
local_mean_seconds_per_page: 6.0
|
| 59 |
+
kwh_per_1k_pages: 0.100
|
| 60 |
+
hourly_rate_override_eur: 1.20
|
| 61 |
+
|
| 62 |
+
# ── APIs OCR cloud ────────────────────────────────────────────────
|
| 63 |
+
mistral_ocr:
|
| 64 |
+
type: cloud_api
|
| 65 |
+
api_price_per_1k_pages: 0.90 # ≈ 0.001 USD / page, endpoint /v1/ocr dédié
|
| 66 |
+
pricing_source_url: "https://mistral.ai/pricing"
|
| 67 |
+
pricing_date: "2026-01"
|
| 68 |
+
kwh_per_1k_pages: 0.120
|
| 69 |
+
notes: "Endpoint /v1/ocr dédié (pas chat/completions)."
|
| 70 |
+
|
| 71 |
+
google_vision:
|
| 72 |
+
type: cloud_api
|
| 73 |
+
api_price_per_1k_pages: 1.40 # Document Text Detection, 1-1000 = $1.50/1k
|
| 74 |
+
pricing_source_url: "https://cloud.google.com/vision/pricing"
|
| 75 |
+
pricing_date: "2026-01"
|
| 76 |
+
kwh_per_1k_pages: 0.120
|
| 77 |
+
|
| 78 |
+
azure_doc_intel:
|
| 79 |
+
type: cloud_api
|
| 80 |
+
api_price_per_1k_pages: 9.50 # Read S1 tier
|
| 81 |
+
pricing_source_url: "https://azure.microsoft.com/pricing/details/ai-document-intelligence/"
|
| 82 |
+
pricing_date: "2026-01"
|
| 83 |
+
kwh_per_1k_pages: 0.120
|
| 84 |
+
|
| 85 |
+
# ── LLM pour pipelines OCR+LLM ────────────────────────────────────
|
| 86 |
+
# Estimation par page : prompt ~500 tokens + réponse ~1500 tokens = 2k tokens.
|
| 87 |
+
# Les VLM consomment en plus des tokens image (~1k tokens pour une page A4).
|
| 88 |
+
"gpt-4o":
|
| 89 |
+
type: cloud_api
|
| 90 |
+
api_price_per_1k_pages: 7.50 # approx 2.5k tokens text + 1k image
|
| 91 |
+
pricing_source_url: "https://openai.com/api/pricing/"
|
| 92 |
+
pricing_date: "2026-01"
|
| 93 |
+
kwh_per_1k_pages: 0.200
|
| 94 |
+
|
| 95 |
+
"gpt-4o-mini":
|
| 96 |
+
type: cloud_api
|
| 97 |
+
api_price_per_1k_pages: 0.45
|
| 98 |
+
pricing_source_url: "https://openai.com/api/pricing/"
|
| 99 |
+
pricing_date: "2026-01"
|
| 100 |
+
kwh_per_1k_pages: 0.060
|
| 101 |
+
|
| 102 |
+
"claude-sonnet-4-6":
|
| 103 |
+
type: cloud_api
|
| 104 |
+
api_price_per_1k_pages: 6.00
|
| 105 |
+
pricing_source_url: "https://www.anthropic.com/pricing"
|
| 106 |
+
pricing_date: "2026-01"
|
| 107 |
+
kwh_per_1k_pages: 0.180
|
| 108 |
+
|
| 109 |
+
"claude-haiku-4-5":
|
| 110 |
+
type: cloud_api
|
| 111 |
+
api_price_per_1k_pages: 0.80
|
| 112 |
+
pricing_source_url: "https://www.anthropic.com/pricing"
|
| 113 |
+
pricing_date: "2026-01"
|
| 114 |
+
kwh_per_1k_pages: 0.070
|
| 115 |
+
|
| 116 |
+
"mistral-large-latest":
|
| 117 |
+
type: cloud_api
|
| 118 |
+
api_price_per_1k_pages: 2.40
|
| 119 |
+
pricing_source_url: "https://mistral.ai/pricing"
|
| 120 |
+
pricing_date: "2026-01"
|
| 121 |
+
kwh_per_1k_pages: 0.150
|
| 122 |
+
|
| 123 |
+
"ministral-3b-latest":
|
| 124 |
+
type: cloud_api
|
| 125 |
+
api_price_per_1k_pages: 0.08
|
| 126 |
+
pricing_source_url: "https://mistral.ai/pricing"
|
| 127 |
+
pricing_date: "2026-01"
|
| 128 |
+
kwh_per_1k_pages: 0.040
|
| 129 |
+
notes: "Text-only, ne supporte pas le mode multimodal."
|
| 130 |
+
|
| 131 |
+
"pixtral-large-latest":
|
| 132 |
+
type: cloud_api
|
| 133 |
+
api_price_per_1k_pages: 3.00
|
| 134 |
+
pricing_source_url: "https://mistral.ai/pricing"
|
| 135 |
+
pricing_date: "2026-01"
|
| 136 |
+
kwh_per_1k_pages: 0.170
|
|
@@ -47,7 +47,9 @@ from picarones.core.statistics import (
|
|
| 47 |
friedman_test,
|
| 48 |
nemenyi_posthoc,
|
| 49 |
build_critical_difference_svg,
|
|
|
|
| 50 |
)
|
|
|
|
| 51 |
from picarones.core.difficulty import compute_all_difficulties, difficulty_label
|
| 52 |
|
| 53 |
|
|
@@ -439,6 +441,88 @@ def _build_report_data(benchmark: BenchmarkResult, images_b64: dict[str, str]) -
|
|
| 439 |
"is_pipeline": report.is_pipeline,
|
| 440 |
})
|
| 441 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 442 |
# Scatter 2 : ratio longueur vs score d'ancrage (moteurs)
|
| 443 |
ratio_vs_anchor = []
|
| 444 |
for report in benchmark.engine_reports:
|
|
@@ -478,6 +562,8 @@ def _build_report_data(benchmark: BenchmarkResult, images_b64: dict[str, str]) -
|
|
| 478 |
# Sprint 10
|
| 479 |
"gini_vs_cer": gini_vs_cer,
|
| 480 |
"ratio_vs_anchor": ratio_vs_anchor,
|
|
|
|
|
|
|
| 481 |
}
|
| 482 |
|
| 483 |
|
|
|
|
| 47 |
friedman_test,
|
| 48 |
nemenyi_posthoc,
|
| 49 |
build_critical_difference_svg,
|
| 50 |
+
compute_pareto_front,
|
| 51 |
)
|
| 52 |
+
from picarones.core.pricing import build_costs_for_benchmark, load_pricing_database
|
| 53 |
from picarones.core.difficulty import compute_all_difficulties, difficulty_label
|
| 54 |
|
| 55 |
|
|
|
|
| 441 |
"is_pipeline": report.is_pipeline,
|
| 442 |
})
|
| 443 |
|
| 444 |
+
# ── Sprint 19 — Coûts et frontière de Pareto ────────────────────────
|
| 445 |
+
# Durée moyenne mesurée par moteur sur le benchmark courant (sec/page)
|
| 446 |
+
durations_by_engine: dict[str, float] = {}
|
| 447 |
+
for report in benchmark.engine_reports:
|
| 448 |
+
durs = [dr.duration_seconds for dr in report.document_results
|
| 449 |
+
if dr.duration_seconds is not None]
|
| 450 |
+
if durs:
|
| 451 |
+
durations_by_engine[report.engine_name] = sum(durs) / len(durs)
|
| 452 |
+
|
| 453 |
+
pricing_defaults, _ = load_pricing_database()
|
| 454 |
+
costs_by_engine = build_costs_for_benchmark(
|
| 455 |
+
engines_summary, durations_by_engine,
|
| 456 |
+
)
|
| 457 |
+
# Annoter chaque résumé moteur avec son coût et sa durée
|
| 458 |
+
for entry in engines_summary:
|
| 459 |
+
name = entry["name"]
|
| 460 |
+
entry["mean_duration_seconds"] = round(durations_by_engine.get(name, 0.0), 4) \
|
| 461 |
+
if name in durations_by_engine else None
|
| 462 |
+
entry["cost"] = costs_by_engine.get(name)
|
| 463 |
+
|
| 464 |
+
# Front Pareto sur (CER moyen, coût €/1000 pages) — moteurs avec les deux dispos
|
| 465 |
+
pareto_points = []
|
| 466 |
+
for entry in engines_summary:
|
| 467 |
+
cer = entry.get("cer")
|
| 468 |
+
cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
|
| 469 |
+
if cer is None or cost is None:
|
| 470 |
+
continue
|
| 471 |
+
pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
|
| 472 |
+
pareto_front_engines = compute_pareto_front(
|
| 473 |
+
pareto_points, objectives=("cer", "cost"),
|
| 474 |
+
)
|
| 475 |
+
|
| 476 |
+
# Front Pareto secondaire (CER, vitesse) pour le toggle "vitesse"
|
| 477 |
+
pareto_speed_points = []
|
| 478 |
+
for entry in engines_summary:
|
| 479 |
+
cer = entry.get("cer")
|
| 480 |
+
dur = entry.get("mean_duration_seconds")
|
| 481 |
+
if cer is None or dur is None:
|
| 482 |
+
continue
|
| 483 |
+
pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
|
| 484 |
+
pareto_front_speed = compute_pareto_front(
|
| 485 |
+
pareto_speed_points, objectives=("cer", "dur"),
|
| 486 |
+
)
|
| 487 |
+
|
| 488 |
+
# Front Pareto carbone (CER, g CO2 / 1000 pages) — étiqueté expérimental
|
| 489 |
+
pareto_co2_points = []
|
| 490 |
+
for entry in engines_summary:
|
| 491 |
+
cer = entry.get("cer")
|
| 492 |
+
co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
|
| 493 |
+
if cer is None or co2 is None:
|
| 494 |
+
continue
|
| 495 |
+
pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
|
| 496 |
+
pareto_front_co2 = compute_pareto_front(
|
| 497 |
+
pareto_co2_points, objectives=("cer", "co2"),
|
| 498 |
+
)
|
| 499 |
+
|
| 500 |
+
pareto_data = {
|
| 501 |
+
"cost": {
|
| 502 |
+
"points": pareto_points,
|
| 503 |
+
"front": pareto_front_engines,
|
| 504 |
+
"axis_label": "Coût (€ / 1000 pages)",
|
| 505 |
+
},
|
| 506 |
+
"speed": {
|
| 507 |
+
"points": pareto_speed_points,
|
| 508 |
+
"front": pareto_front_speed,
|
| 509 |
+
"axis_label": "Temps moyen (s / page)",
|
| 510 |
+
},
|
| 511 |
+
"co2": {
|
| 512 |
+
"points": pareto_co2_points,
|
| 513 |
+
"front": pareto_front_co2,
|
| 514 |
+
"axis_label": "Empreinte carbone (g CO₂ / 1000 pages, expérimental)",
|
| 515 |
+
},
|
| 516 |
+
"pricing_meta": {
|
| 517 |
+
"last_updated": pricing_defaults.last_updated,
|
| 518 |
+
"currency": pricing_defaults.currency,
|
| 519 |
+
"hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
|
| 520 |
+
"hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
|
| 521 |
+
"grid_intensity_local": pricing_defaults.grid_intensity_local,
|
| 522 |
+
"grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
|
| 523 |
+
},
|
| 524 |
+
}
|
| 525 |
+
|
| 526 |
# Scatter 2 : ratio longueur vs score d'ancrage (moteurs)
|
| 527 |
ratio_vs_anchor = []
|
| 528 |
for report in benchmark.engine_reports:
|
|
|
|
| 562 |
# Sprint 10
|
| 563 |
"gini_vs_cer": gini_vs_cer,
|
| 564 |
"ratio_vs_anchor": ratio_vs_anchor,
|
| 565 |
+
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 566 |
+
"pareto": pareto_data,
|
| 567 |
}
|
| 568 |
|
| 569 |
|
|
@@ -65,6 +65,7 @@
|
|
| 65 |
"h_image": "Original Image",
|
| 66 |
"h_line_metrics": "Error Distribution by Line",
|
| 67 |
"h_pairwise": "Wilcoxon Tests — pairwise comparisons",
|
|
|
|
| 68 |
"h_quality_cer": "Image Quality ↔ CER (scatter plot)",
|
| 69 |
"h_radar": "Engine Profile (radar)",
|
| 70 |
"h_ranking": "Engine Ranking",
|
|
@@ -90,6 +91,14 @@
|
|
| 90 |
"no_line_metrics": "No line metrics available.",
|
| 91 |
"no_scatter": "Data not available.",
|
| 92 |
"pairwise_note": "Wilcoxon signed-rank test (non-parametric). Threshold α = 0.05.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
"percentile_title": "CER PERCENTILES",
|
| 94 |
"proportion_col": "Proportion",
|
| 95 |
"quality_cer_note": "Each point = one document. X-axis = image quality score [0–1]. Y-axis = CER. Negative correlation expected.",
|
|
|
|
| 65 |
"h_image": "Original Image",
|
| 66 |
"h_line_metrics": "Error Distribution by Line",
|
| 67 |
"h_pairwise": "Wilcoxon Tests — pairwise comparisons",
|
| 68 |
+
"h_pareto": "Quality / cost trade-off",
|
| 69 |
"h_quality_cer": "Image Quality ↔ CER (scatter plot)",
|
| 70 |
"h_radar": "Engine Profile (radar)",
|
| 71 |
"h_ranking": "Engine Ranking",
|
|
|
|
| 91 |
"no_line_metrics": "No line metrics available.",
|
| 92 |
"no_scatter": "Data not available.",
|
| 93 |
"pairwise_note": "Wilcoxon signed-rank test (non-parametric). Threshold α = 0.05.",
|
| 94 |
+
"pareto_assumptions_summary": "Detailed assumptions per engine",
|
| 95 |
+
"pareto_axis_co2": "Carbon (g CO₂)",
|
| 96 |
+
"pareto_axis_cost": "Cost € / 1000 pages",
|
| 97 |
+
"pareto_axis_speed": "Speed (s / page)",
|
| 98 |
+
"pareto_dominated_label": "Dominated",
|
| 99 |
+
"pareto_empty": "Not enough data for this view.",
|
| 100 |
+
"pareto_front_label": "Pareto front",
|
| 101 |
+
"pareto_note": "Engines on the Pareto front (highlighted) are those for which no other engine offers simultaneously a better CER AND a better cost. Prices are indicative (internal table, dated). Carbon mode is experimental.",
|
| 102 |
"percentile_title": "CER PERCENTILES",
|
| 103 |
"proportion_col": "Proportion",
|
| 104 |
"quality_cer_note": "Each point = one document. X-axis = image quality score [0–1]. Y-axis = CER. Negative correlation expected.",
|
|
@@ -65,6 +65,7 @@
|
|
| 65 |
"h_image": "Image originale",
|
| 66 |
"h_line_metrics": "Distribution des erreurs par ligne",
|
| 67 |
"h_pairwise": "Tests de Wilcoxon — comparaisons par paires",
|
|
|
|
| 68 |
"h_quality_cer": "Qualité image ↔ CER (scatter plot)",
|
| 69 |
"h_radar": "Profil des moteurs (radar)",
|
| 70 |
"h_ranking": "Classement des moteurs",
|
|
@@ -90,6 +91,14 @@
|
|
| 90 |
"no_line_metrics": "Aucune métrique de ligne disponible.",
|
| 91 |
"no_scatter": "Données non disponibles.",
|
| 92 |
"pairwise_note": "Test signé-rangé de Wilcoxon (non-paramétrique). Seuil α = 0.05.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
"percentile_title": "PERCENTILES CER",
|
| 94 |
"proportion_col": "Proportion",
|
| 95 |
"quality_cer_note": "Chaque point = un document. Axe X = score qualité image [0–1]. Axe Y = CER. Corrélation négative attendue.",
|
|
|
|
| 65 |
"h_image": "Image originale",
|
| 66 |
"h_line_metrics": "Distribution des erreurs par ligne",
|
| 67 |
"h_pairwise": "Tests de Wilcoxon — comparaisons par paires",
|
| 68 |
+
"h_pareto": "Compromis qualité / coût",
|
| 69 |
"h_quality_cer": "Qualité image ↔ CER (scatter plot)",
|
| 70 |
"h_radar": "Profil des moteurs (radar)",
|
| 71 |
"h_ranking": "Classement des moteurs",
|
|
|
|
| 91 |
"no_line_metrics": "Aucune métrique de ligne disponible.",
|
| 92 |
"no_scatter": "Données non disponibles.",
|
| 93 |
"pairwise_note": "Test signé-rangé de Wilcoxon (non-paramétrique). Seuil α = 0.05.",
|
| 94 |
+
"pareto_assumptions_summary": "Hypothèses détaillées par moteur",
|
| 95 |
+
"pareto_axis_co2": "Carbone (g CO₂)",
|
| 96 |
+
"pareto_axis_cost": "Coût € / 1000 pages",
|
| 97 |
+
"pareto_axis_speed": "Vitesse (s / page)",
|
| 98 |
+
"pareto_dominated_label": "Dominés",
|
| 99 |
+
"pareto_empty": "Données insuffisantes pour cette vue.",
|
| 100 |
+
"pareto_front_label": "Front Pareto",
|
| 101 |
+
"pareto_note": "Les moteurs sur la frontière de Pareto (en évidence) sont ceux pour lesquels aucun autre moteur n'offre simultanément un meilleur CER ET un meilleur coût. Prix indicatifs (table interne, datée). Le mode carbone est expérimental.",
|
| 102 |
"percentile_title": "PERCENTILES CER",
|
| 103 |
"proportion_col": "Proportion",
|
| 104 |
"quality_cer_note": "Chaque point = un document. Axe X = score qualité image [0–1]. Axe Y = CER. Corrélation négative attendue.",
|
|
@@ -1704,6 +1704,122 @@ function toggleCDDHelp() {
|
|
| 1704 |
el.hidden = !el.hidden;
|
| 1705 |
}
|
| 1706 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1707 |
// ── Sprint 7 — Mode présentation ────────────────────────────────
|
| 1708 |
let presentMode = false;
|
| 1709 |
function togglePresentMode() {
|
|
@@ -2071,6 +2187,8 @@ function init() {
|
|
| 2071 |
renderRobustMetrics();
|
| 2072 |
renderGallery();
|
| 2073 |
buildDocList();
|
|
|
|
|
|
|
| 2074 |
|
| 2075 |
// Restaurer l'état depuis l'URL
|
| 2076 |
const { view, params } = readURLState();
|
|
|
|
| 1704 |
el.hidden = !el.hidden;
|
| 1705 |
}
|
| 1706 |
|
| 1707 |
+
// ── Sprint 19 — Vue Pareto coût/qualité ─────────────────────────
|
| 1708 |
+
let _paretoChart = null;
|
| 1709 |
+
let _paretoAxis = 'cost';
|
| 1710 |
+
|
| 1711 |
+
function setParetoAxis(axis) {
|
| 1712 |
+
_paretoAxis = axis;
|
| 1713 |
+
document.querySelectorAll('.pareto-toggle').forEach(btn => {
|
| 1714 |
+
btn.classList.toggle('active', btn.dataset.axis === axis);
|
| 1715 |
+
});
|
| 1716 |
+
renderParetoChart();
|
| 1717 |
+
renderParetoAssumptions();
|
| 1718 |
+
}
|
| 1719 |
+
|
| 1720 |
+
function _paretoAxisConfig(axis) {
|
| 1721 |
+
const pareto = (DATA.pareto || {})[axis] || {};
|
| 1722 |
+
const xKey = axis === 'cost' ? 'cost' : (axis === 'speed' ? 'dur' : 'co2');
|
| 1723 |
+
const xLabel = pareto.axis_label ||
|
| 1724 |
+
(I18N['pareto_axis_' + axis] || axis);
|
| 1725 |
+
return { pareto, xKey, xLabel };
|
| 1726 |
+
}
|
| 1727 |
+
|
| 1728 |
+
function renderParetoChart() {
|
| 1729 |
+
const canvas = document.getElementById('pareto-chart');
|
| 1730 |
+
if (!canvas || !window.Chart || !DATA.pareto) return;
|
| 1731 |
+
|
| 1732 |
+
const { pareto, xKey, xLabel } = _paretoAxisConfig(_paretoAxis);
|
| 1733 |
+
const points = pareto.points || [];
|
| 1734 |
+
const frontNames = new Set(pareto.front || []);
|
| 1735 |
+
|
| 1736 |
+
if (_paretoChart) { _paretoChart.destroy(); _paretoChart = null; }
|
| 1737 |
+
if (points.length === 0) {
|
| 1738 |
+
const ctx = canvas.getContext('2d');
|
| 1739 |
+
ctx.clearRect(0, 0, canvas.width, canvas.height);
|
| 1740 |
+
ctx.fillStyle = '#64748b';
|
| 1741 |
+
ctx.font = '13px system-ui, sans-serif';
|
| 1742 |
+
ctx.fillText(I18N.pareto_empty || 'Données insuffisantes pour cette vue.', 10, 30);
|
| 1743 |
+
return;
|
| 1744 |
+
}
|
| 1745 |
+
|
| 1746 |
+
const frontPts = points.filter(p => frontNames.has(p.engine));
|
| 1747 |
+
const otherPts = points.filter(p => !frontNames.has(p.engine));
|
| 1748 |
+
|
| 1749 |
+
_paretoChart = new Chart(canvas.getContext('2d'), {
|
| 1750 |
+
type: 'scatter',
|
| 1751 |
+
data: {
|
| 1752 |
+
datasets: [
|
| 1753 |
+
{
|
| 1754 |
+
label: I18N.pareto_front_label || 'Front Pareto',
|
| 1755 |
+
data: frontPts.map(p => ({ x: p[xKey], y: p.cer * 100, engine: p.engine })),
|
| 1756 |
+
backgroundColor: '#16a34a',
|
| 1757 |
+
borderColor: '#166534',
|
| 1758 |
+
pointRadius: 8,
|
| 1759 |
+
pointHoverRadius: 10,
|
| 1760 |
+
},
|
| 1761 |
+
{
|
| 1762 |
+
label: I18N.pareto_dominated_label || 'Dominés',
|
| 1763 |
+
data: otherPts.map(p => ({ x: p[xKey], y: p.cer * 100, engine: p.engine })),
|
| 1764 |
+
backgroundColor: '#94a3b8',
|
| 1765 |
+
borderColor: '#64748b',
|
| 1766 |
+
pointRadius: 6,
|
| 1767 |
+
pointHoverRadius: 8,
|
| 1768 |
+
},
|
| 1769 |
+
],
|
| 1770 |
+
},
|
| 1771 |
+
options: {
|
| 1772 |
+
responsive: true,
|
| 1773 |
+
maintainAspectRatio: false,
|
| 1774 |
+
plugins: {
|
| 1775 |
+
legend: { position: 'bottom' },
|
| 1776 |
+
tooltip: {
|
| 1777 |
+
callbacks: {
|
| 1778 |
+
label: ctx => {
|
| 1779 |
+
const p = ctx.raw;
|
| 1780 |
+
return p.engine + ' — CER ' + p.y.toFixed(2) + ' %, ' +
|
| 1781 |
+
xLabel + ' : ' + p.x.toFixed(2);
|
| 1782 |
+
},
|
| 1783 |
+
},
|
| 1784 |
+
},
|
| 1785 |
+
},
|
| 1786 |
+
scales: {
|
| 1787 |
+
x: {
|
| 1788 |
+
type: _paretoAxis === 'cost' ? 'logarithmic' : 'linear',
|
| 1789 |
+
title: { display: true, text: xLabel },
|
| 1790 |
+
},
|
| 1791 |
+
y: {
|
| 1792 |
+
title: { display: true, text: I18N.col_cer || 'CER (%)' },
|
| 1793 |
+
ticks: { callback: v => v + ' %' },
|
| 1794 |
+
},
|
| 1795 |
+
},
|
| 1796 |
+
},
|
| 1797 |
+
});
|
| 1798 |
+
}
|
| 1799 |
+
|
| 1800 |
+
function renderParetoAssumptions() {
|
| 1801 |
+
const ul = document.getElementById('pareto-assumptions-list');
|
| 1802 |
+
if (!ul) return;
|
| 1803 |
+
ul.innerHTML = '';
|
| 1804 |
+
(DATA.engines || []).forEach(e => {
|
| 1805 |
+
const c = e.cost || {};
|
| 1806 |
+
const parts = [];
|
| 1807 |
+
if (c.cost_per_1k_pages_eur != null) {
|
| 1808 |
+
parts.push((c.cost_per_1k_pages_eur).toFixed(2) + ' €/1000 pages');
|
| 1809 |
+
}
|
| 1810 |
+
if (c.type) parts.push(c.type);
|
| 1811 |
+
if (c.pricing_source_url) {
|
| 1812 |
+
parts.push('<a href="' + c.pricing_source_url + '" target="_blank" rel="noopener">' +
|
| 1813 |
+
(c.pricing_date || 'source') + '</a>');
|
| 1814 |
+
}
|
| 1815 |
+
const assumptions = (c.assumptions || []).join(' ');
|
| 1816 |
+
const li = document.createElement('li');
|
| 1817 |
+
li.innerHTML = '<strong>' + e.name + '</strong> — ' + parts.join(' · ') +
|
| 1818 |
+
(assumptions ? ' <em>' + assumptions + '</em>' : '');
|
| 1819 |
+
ul.appendChild(li);
|
| 1820 |
+
});
|
| 1821 |
+
}
|
| 1822 |
+
|
| 1823 |
// ── Sprint 7 — Mode présentation ────────────────────────────────
|
| 1824 |
let presentMode = false;
|
| 1825 |
function togglePresentMode() {
|
|
|
|
| 2187 |
renderRobustMetrics();
|
| 2188 |
renderGallery();
|
| 2189 |
buildDocList();
|
| 2190 |
+
renderParetoChart();
|
| 2191 |
+
renderParetoAssumptions();
|
| 2192 |
|
| 2193 |
// Restaurer l'état depuis l'URL
|
| 2194 |
const { view, params } = readURLState();
|
|
@@ -669,3 +669,56 @@ body.present-mode .cdd-help { display: none !important; }
|
|
| 669 |
.synth-list li::marker { color: #2563eb; }
|
| 670 |
|
| 671 |
body.present-mode .synth-hint { display: none; }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 669 |
.synth-list li::marker { color: #2563eb; }
|
| 670 |
|
| 671 |
body.present-mode .synth-hint { display: none; }
|
| 672 |
+
|
| 673 |
+
/* ── Sprint 19 — Vue Pareto coût/qualité ────────────────────── */
|
| 674 |
+
.pareto-card {
|
| 675 |
+
border-left: 4px solid #16a34a;
|
| 676 |
+
}
|
| 677 |
+
.pareto-toolbar {
|
| 678 |
+
display: flex; gap: .5rem; flex-wrap: wrap;
|
| 679 |
+
margin: .5rem 0 .75rem;
|
| 680 |
+
}
|
| 681 |
+
.pareto-toggle {
|
| 682 |
+
border: 1px solid var(--border, #e2e8f0);
|
| 683 |
+
background: #fff;
|
| 684 |
+
padding: .35rem .75rem;
|
| 685 |
+
border-radius: 6px;
|
| 686 |
+
font-size: .82rem;
|
| 687 |
+
cursor: pointer;
|
| 688 |
+
color: var(--text, #0f172a);
|
| 689 |
+
}
|
| 690 |
+
.pareto-toggle:hover { background: #f1f5f9; }
|
| 691 |
+
.pareto-toggle.active {
|
| 692 |
+
background: #16a34a; color: #fff; border-color: #166534;
|
| 693 |
+
}
|
| 694 |
+
.pareto-toggle.pareto-experimental::after {
|
| 695 |
+
content: " ⚗";
|
| 696 |
+
color: #f59e0b;
|
| 697 |
+
margin-left: .15rem;
|
| 698 |
+
}
|
| 699 |
+
.chart-canvas-wrap {
|
| 700 |
+
position: relative;
|
| 701 |
+
height: 360px;
|
| 702 |
+
}
|
| 703 |
+
.pareto-note {
|
| 704 |
+
font-size: .75rem;
|
| 705 |
+
color: var(--text-muted, #64748b);
|
| 706 |
+
font-style: italic;
|
| 707 |
+
margin-top: .5rem;
|
| 708 |
+
line-height: 1.4;
|
| 709 |
+
}
|
| 710 |
+
.pareto-assumptions {
|
| 711 |
+
margin-top: .5rem;
|
| 712 |
+
font-size: .78rem;
|
| 713 |
+
color: var(--text, #0f172a);
|
| 714 |
+
}
|
| 715 |
+
.pareto-assumptions summary {
|
| 716 |
+
cursor: pointer;
|
| 717 |
+
font-weight: 600;
|
| 718 |
+
color: var(--text-muted, #64748b);
|
| 719 |
+
}
|
| 720 |
+
.pareto-assumptions ul {
|
| 721 |
+
margin: .4rem 0 0;
|
| 722 |
+
padding-left: 1.1rem;
|
| 723 |
+
}
|
| 724 |
+
.pareto-assumptions li { margin: .2rem 0; line-height: 1.4; }
|
|
@@ -125,6 +125,31 @@
|
|
| 125 |
</div>
|
| 126 |
</div>
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
<!-- Sprint 7 — Matrice de corrélation -->
|
| 129 |
<div class="chart-card technical" style="grid-column:1/-1">
|
| 130 |
<h3 data-i18n="h_correlation">Matrice de corrélation entre métriques</h3>
|
|
|
|
| 125 |
</div>
|
| 126 |
</div>
|
| 127 |
|
| 128 |
+
<!-- Sprint 19 — Vue Pareto coût/qualité ────────────────────────── -->
|
| 129 |
+
<div class="chart-card pareto-card" style="grid-column:1/-1">
|
| 130 |
+
<h3 data-i18n="h_pareto">Compromis qualité / coût</h3>
|
| 131 |
+
<div class="pareto-toolbar">
|
| 132 |
+
<button class="pareto-toggle active" data-axis="cost" onclick="setParetoAxis('cost')"
|
| 133 |
+
data-i18n="pareto_axis_cost">Coût € / 1000 pages</button>
|
| 134 |
+
<button class="pareto-toggle" data-axis="speed" onclick="setParetoAxis('speed')"
|
| 135 |
+
data-i18n="pareto_axis_speed">Vitesse (s / page)</button>
|
| 136 |
+
<button class="pareto-toggle pareto-experimental" data-axis="co2"
|
| 137 |
+
onclick="setParetoAxis('co2')" data-i18n="pareto_axis_co2"
|
| 138 |
+
title="Estimation expérimentale">Carbone (g CO₂)</button>
|
| 139 |
+
</div>
|
| 140 |
+
<div class="chart-canvas-wrap"><canvas id="pareto-chart"></canvas></div>
|
| 141 |
+
<div id="pareto-method-note" class="pareto-note" data-i18n="pareto_note">
|
| 142 |
+
Les moteurs sur la frontière de Pareto (en évidence) sont ceux pour
|
| 143 |
+
lesquels aucun autre moteur n'offre simultanément un meilleur CER ET
|
| 144 |
+
un meilleur coût. Prix indicatifs (table interne, datée). Le mode
|
| 145 |
+
carbone est expérimental.
|
| 146 |
+
</div>
|
| 147 |
+
<details class="pareto-assumptions">
|
| 148 |
+
<summary data-i18n="pareto_assumptions_summary">Hypothèses détaillées par moteur</summary>
|
| 149 |
+
<ul id="pareto-assumptions-list"></ul>
|
| 150 |
+
</details>
|
| 151 |
+
</div>
|
| 152 |
+
|
| 153 |
<!-- Sprint 7 — Matrice de corrélation -->
|
| 154 |
<div class="chart-card technical" style="grid-column:1/-1">
|
| 155 |
<h3 data-i18n="h_correlation">Matrice de corrélation entre métriques</h3>
|
|
@@ -87,6 +87,7 @@ picarones = [
|
|
| 87 |
"report/templates/*.js",
|
| 88 |
"report/i18n/*.json",
|
| 89 |
"core/narrative/templates/*.yaml",
|
|
|
|
| 90 |
]
|
| 91 |
|
| 92 |
[tool.pytest.ini_options]
|
|
|
|
| 87 |
"report/templates/*.js",
|
| 88 |
"report/i18n/*.json",
|
| 89 |
"core/narrative/templates/*.yaml",
|
| 90 |
+
"data/*.yaml",
|
| 91 |
]
|
| 92 |
|
| 93 |
[tool.pytest.ini_options]
|
|
@@ -0,0 +1,371 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tests Sprint 20 — modélisation coût + vue Pareto.
|
| 2 |
+
|
| 3 |
+
Sprint 5 du plan rapport. Couvre :
|
| 4 |
+
1. `pricing.py` : chargement de la table, estimation locale vs cloud.
|
| 5 |
+
2. `compute_pareto_front` : cas canoniques + dégénérés.
|
| 6 |
+
3. Intégration `_build_report_data` : coût annoté, front calculé, JSON ok.
|
| 7 |
+
4. Détecteurs narratifs `pareto_alternative` et `cost_outlier`.
|
| 8 |
+
5. Rendu HTML : section Pareto, toggles axes, notes méthodologiques.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
|
| 13 |
+
import re
|
| 14 |
+
from pathlib import Path
|
| 15 |
+
|
| 16 |
+
import pytest
|
| 17 |
+
|
| 18 |
+
from picarones.core.narrative import build_synthesis
|
| 19 |
+
from picarones.core.narrative.detectors import (
|
| 20 |
+
detect_cost_outlier,
|
| 21 |
+
detect_pareto_alternative,
|
| 22 |
+
)
|
| 23 |
+
from picarones.core.narrative.facts import FactType
|
| 24 |
+
from picarones.core.pricing import (
|
| 25 |
+
EngineCost,
|
| 26 |
+
build_costs_for_benchmark,
|
| 27 |
+
estimate_cost,
|
| 28 |
+
load_pricing_database,
|
| 29 |
+
)
|
| 30 |
+
from picarones.core.statistics import compute_pareto_front
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
# ---------------------------------------------------------------------------
|
| 34 |
+
# 1. Pricing
|
| 35 |
+
# ---------------------------------------------------------------------------
|
| 36 |
+
|
| 37 |
+
class TestLoadPricingDatabase:
|
| 38 |
+
def test_default_file_loads(self):
|
| 39 |
+
defaults, table = load_pricing_database()
|
| 40 |
+
assert defaults.currency == "EUR"
|
| 41 |
+
assert defaults.last_updated # doit être rempli
|
| 42 |
+
assert "tesseract" in table
|
| 43 |
+
assert "gpt-4o" in table
|
| 44 |
+
assert "google_vision" in table
|
| 45 |
+
|
| 46 |
+
def test_missing_file_returns_empty(self, tmp_path):
|
| 47 |
+
missing = tmp_path / "nope.yaml"
|
| 48 |
+
defaults, table = load_pricing_database(missing)
|
| 49 |
+
assert table == {}
|
| 50 |
+
assert defaults.currency == "EUR" # fallback
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
class TestEstimateCost:
|
| 54 |
+
def test_cloud_api_uses_listed_price(self):
|
| 55 |
+
cost = estimate_cost("google_vision")
|
| 56 |
+
assert cost.type == "cloud_api"
|
| 57 |
+
assert cost.cost_per_1k_pages_eur > 0
|
| 58 |
+
assert cost.pricing_source_url is not None
|
| 59 |
+
assert cost.api_price_per_1k_pages == cost.cost_per_1k_pages_eur
|
| 60 |
+
|
| 61 |
+
def test_local_engine_uses_seconds_times_rate(self):
|
| 62 |
+
cost = estimate_cost("tesseract")
|
| 63 |
+
assert cost.type == "local"
|
| 64 |
+
# 2s/page × 1000 pages / 3600 × 0.08 €/h ≈ 0.044 €
|
| 65 |
+
assert cost.cost_per_1k_pages_eur == pytest.approx(0.044, abs=0.01)
|
| 66 |
+
assert "Temps d'inférence" in " ".join(cost.assumptions)
|
| 67 |
+
|
| 68 |
+
def test_measured_seconds_override_indicative(self):
|
| 69 |
+
cost = estimate_cost("tesseract", measured_seconds_per_page=10.0)
|
| 70 |
+
# Rate = 0.08 €/h → 10 × 1000 / 3600 × 0.08 ≈ 0.22 €
|
| 71 |
+
assert cost.cost_per_1k_pages_eur == pytest.approx(0.222, abs=0.01)
|
| 72 |
+
assert "mesuré" in " ".join(cost.assumptions)
|
| 73 |
+
|
| 74 |
+
def test_pipeline_prefers_llm_model(self):
|
| 75 |
+
cost = estimate_cost(
|
| 76 |
+
engine_name="tesseract → gpt-4o",
|
| 77 |
+
llm_model="gpt-4o",
|
| 78 |
+
is_pipeline=True,
|
| 79 |
+
)
|
| 80 |
+
assert cost.engine_key == "gpt-4o"
|
| 81 |
+
assert cost.type == "cloud_api"
|
| 82 |
+
|
| 83 |
+
def test_unknown_engine_returns_unknown_type(self):
|
| 84 |
+
cost = estimate_cost("totally-not-a-real-engine")
|
| 85 |
+
assert cost.type == "unknown"
|
| 86 |
+
assert cost.cost_per_1k_pages_eur is None
|
| 87 |
+
assert "Aucune entrée" in " ".join(cost.assumptions)
|
| 88 |
+
|
| 89 |
+
def test_hourly_rate_override(self):
|
| 90 |
+
cheap = estimate_cost("tesseract", hourly_rate_override_eur=0.01)
|
| 91 |
+
expensive = estimate_cost("tesseract", hourly_rate_override_eur=10.0)
|
| 92 |
+
assert expensive.cost_per_1k_pages_eur > cheap.cost_per_1k_pages_eur
|
| 93 |
+
|
| 94 |
+
def test_carbon_estimate_computed(self):
|
| 95 |
+
cost = estimate_cost("gpt-4o")
|
| 96 |
+
assert cost.co2_per_1k_pages_g is not None
|
| 97 |
+
assert cost.co2_per_1k_pages_g > 0
|
| 98 |
+
# kWh × grid intensity → positive et cohérent
|
| 99 |
+
expected = cost.kwh_per_1k_pages * cost.grid_intensity_g_co2_per_kwh
|
| 100 |
+
assert cost.co2_per_1k_pages_g == pytest.approx(expected)
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
class TestBuildCostsForBenchmark:
|
| 104 |
+
def test_annotates_all_engines(self):
|
| 105 |
+
engines = [
|
| 106 |
+
{"name": "tesseract", "is_pipeline": False, "pipeline_info": {}},
|
| 107 |
+
{"name": "pipeline", "is_pipeline": True,
|
| 108 |
+
"pipeline_info": {"llm_model": "gpt-4o"}},
|
| 109 |
+
]
|
| 110 |
+
durations = {"tesseract": 1.5, "pipeline": 12.0}
|
| 111 |
+
costs = build_costs_for_benchmark(engines, durations)
|
| 112 |
+
assert "tesseract" in costs
|
| 113 |
+
assert "pipeline" in costs
|
| 114 |
+
assert costs["tesseract"]["type"] == "local"
|
| 115 |
+
assert costs["pipeline"]["type"] == "cloud_api"
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
# ---------------------------------------------------------------------------
|
| 119 |
+
# 2. Pareto
|
| 120 |
+
# ---------------------------------------------------------------------------
|
| 121 |
+
|
| 122 |
+
class TestComputeParetoFront:
|
| 123 |
+
def test_trivial_front(self):
|
| 124 |
+
points = [
|
| 125 |
+
{"engine": "A", "cer": 0.05, "cost": 1.0}, # meilleur CER
|
| 126 |
+
{"engine": "B", "cer": 0.10, "cost": 0.1}, # meilleur coût
|
| 127 |
+
{"engine": "C", "cer": 0.08, "cost": 2.0}, # dominé par A
|
| 128 |
+
]
|
| 129 |
+
front = compute_pareto_front(points)
|
| 130 |
+
assert set(front) == {"A", "B"}
|
| 131 |
+
|
| 132 |
+
def test_empty_input(self):
|
| 133 |
+
assert compute_pareto_front([]) == []
|
| 134 |
+
|
| 135 |
+
def test_single_point_is_its_own_front(self):
|
| 136 |
+
assert compute_pareto_front([{"engine": "X", "cer": 0.1, "cost": 1.0}]) == ["X"]
|
| 137 |
+
|
| 138 |
+
def test_skips_points_with_missing_values(self):
|
| 139 |
+
points = [
|
| 140 |
+
{"engine": "A", "cer": 0.05, "cost": 1.0},
|
| 141 |
+
{"engine": "B", "cost": 0.5}, # pas de cer
|
| 142 |
+
{"engine": "C", "cer": 0.10}, # pas de cost
|
| 143 |
+
]
|
| 144 |
+
front = compute_pareto_front(points)
|
| 145 |
+
assert front == ["A"]
|
| 146 |
+
|
| 147 |
+
def test_three_dimensional_front(self):
|
| 148 |
+
# 3 objectifs à minimiser — vérifie que le détecteur marche à k>2
|
| 149 |
+
points = [
|
| 150 |
+
{"name": "A", "a": 1, "b": 10, "c": 100}, # meilleur en a
|
| 151 |
+
{"name": "B", "a": 10, "b": 1, "c": 100}, # meilleur en b
|
| 152 |
+
{"name": "C", "a": 10, "b": 10, "c": 1}, # meilleur en c
|
| 153 |
+
{"name": "D", "a": 20, "b": 20, "c": 200}, # dominé partout
|
| 154 |
+
]
|
| 155 |
+
front = compute_pareto_front(
|
| 156 |
+
points, objectives=("a", "b", "c"), name_key="name",
|
| 157 |
+
)
|
| 158 |
+
assert set(front) == {"A", "B", "C"}
|
| 159 |
+
assert "D" not in front
|
| 160 |
+
|
| 161 |
+
def test_mixed_min_max(self):
|
| 162 |
+
# Minimiser CER, maximiser ancrage
|
| 163 |
+
points = [
|
| 164 |
+
{"engine": "A", "cer": 0.05, "anchor": 0.95}, # meilleur partout
|
| 165 |
+
{"engine": "B", "cer": 0.10, "anchor": 0.85}, # dominé
|
| 166 |
+
{"engine": "C", "cer": 0.08, "anchor": 0.99}, # meilleur anchor
|
| 167 |
+
]
|
| 168 |
+
front = compute_pareto_front(
|
| 169 |
+
points,
|
| 170 |
+
objectives=("cer", "anchor"),
|
| 171 |
+
minimize=(True, False),
|
| 172 |
+
)
|
| 173 |
+
assert set(front) == {"A", "C"}
|
| 174 |
+
|
| 175 |
+
def test_minimize_length_mismatch_raises(self):
|
| 176 |
+
with pytest.raises(ValueError):
|
| 177 |
+
compute_pareto_front([{"engine": "A", "cer": 0.1, "cost": 1.0}],
|
| 178 |
+
objectives=("cer", "cost"),
|
| 179 |
+
minimize=(True,))
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
# ---------------------------------------------------------------------------
|
| 183 |
+
# 3. Détecteurs narratifs Pareto / cost
|
| 184 |
+
# ---------------------------------------------------------------------------
|
| 185 |
+
|
| 186 |
+
def _pareto_data(cost_points, front=None, speed_points=None, co2_points=None):
|
| 187 |
+
return {
|
| 188 |
+
"ranking": [{"engine": p["engine"], "mean_cer": p["cer"],
|
| 189 |
+
"documents": 10, "failed": 0} for p in cost_points],
|
| 190 |
+
"pareto": {
|
| 191 |
+
"cost": {"points": cost_points, "front": front or [p["engine"] for p in cost_points]},
|
| 192 |
+
"speed": {"points": speed_points or [], "front": []},
|
| 193 |
+
"co2": {"points": co2_points or [], "front": []},
|
| 194 |
+
},
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
class TestDetectParetoAlternative:
|
| 199 |
+
def test_emits_when_alt_is_cheaper(self):
|
| 200 |
+
data = _pareto_data(
|
| 201 |
+
[
|
| 202 |
+
{"engine": "best", "cer": 0.02, "cost": 5.0},
|
| 203 |
+
{"engine": "cheap", "cer": 0.04, "cost": 0.1},
|
| 204 |
+
{"engine": "dominated", "cer": 0.05, "cost": 3.0},
|
| 205 |
+
],
|
| 206 |
+
front=["best", "cheap"],
|
| 207 |
+
)
|
| 208 |
+
# Forcer "best" comme leader
|
| 209 |
+
data["ranking"] = [
|
| 210 |
+
{"engine": "best", "mean_cer": 0.02, "documents": 10, "failed": 0},
|
| 211 |
+
{"engine": "cheap", "mean_cer": 0.04, "documents": 10, "failed": 0},
|
| 212 |
+
{"engine": "dominated", "mean_cer": 0.05, "documents": 10, "failed": 0},
|
| 213 |
+
]
|
| 214 |
+
facts = detect_pareto_alternative(data)
|
| 215 |
+
assert len(facts) == 1
|
| 216 |
+
assert facts[0].payload["engine"] == "cheap"
|
| 217 |
+
assert facts[0].payload["leader"] == "best"
|
| 218 |
+
assert facts[0].payload["cost_saving_ratio"] >= 10
|
| 219 |
+
|
| 220 |
+
def test_empty_when_front_has_only_leader(self):
|
| 221 |
+
data = _pareto_data(
|
| 222 |
+
[{"engine": "best", "cer": 0.02, "cost": 5.0}],
|
| 223 |
+
front=["best"],
|
| 224 |
+
)
|
| 225 |
+
assert detect_pareto_alternative(data) == []
|
| 226 |
+
|
| 227 |
+
def test_empty_when_no_pareto_section(self):
|
| 228 |
+
assert detect_pareto_alternative({}) == []
|
| 229 |
+
|
| 230 |
+
|
| 231 |
+
class TestDetectCostOutlier:
|
| 232 |
+
def test_flags_expensive_dominated_engine(self):
|
| 233 |
+
data = _pareto_data(
|
| 234 |
+
[
|
| 235 |
+
{"engine": "cheap", "cer": 0.05, "cost": 0.1},
|
| 236 |
+
{"engine": "normal", "cer": 0.08, "cost": 1.0},
|
| 237 |
+
{"engine": "expensive_bad", "cer": 0.15, "cost": 20.0},
|
| 238 |
+
],
|
| 239 |
+
front=["cheap"],
|
| 240 |
+
)
|
| 241 |
+
facts = detect_cost_outlier(data)
|
| 242 |
+
assert any(f.payload["engine"] == "expensive_bad" for f in facts)
|
| 243 |
+
|
| 244 |
+
def test_does_not_flag_expensive_on_front(self):
|
| 245 |
+
# Un moteur cher mais sur le front = coût justifié par qualité unique
|
| 246 |
+
data = _pareto_data(
|
| 247 |
+
[
|
| 248 |
+
{"engine": "cheap", "cer": 0.30, "cost": 0.1},
|
| 249 |
+
{"engine": "normal", "cer": 0.15, "cost": 1.0},
|
| 250 |
+
{"engine": "expensive_best", "cer": 0.02, "cost": 20.0},
|
| 251 |
+
],
|
| 252 |
+
front=["cheap", "expensive_best"],
|
| 253 |
+
)
|
| 254 |
+
facts = detect_cost_outlier(data)
|
| 255 |
+
names = {f.payload["engine"] for f in facts}
|
| 256 |
+
assert "expensive_best" not in names
|
| 257 |
+
|
| 258 |
+
|
| 259 |
+
# ---------------------------------------------------------------------------
|
| 260 |
+
# 4. Intégration rapport HTML
|
| 261 |
+
# ---------------------------------------------------------------------------
|
| 262 |
+
|
| 263 |
+
@pytest.fixture(scope="module")
|
| 264 |
+
def benchmark_result():
|
| 265 |
+
from picarones import fixtures
|
| 266 |
+
return fixtures.generate_sample_benchmark(n_docs=8)
|
| 267 |
+
|
| 268 |
+
|
| 269 |
+
class TestReportIntegration:
|
| 270 |
+
def test_report_contains_pareto_card(self, benchmark_result, tmp_path):
|
| 271 |
+
from picarones.report.generator import ReportGenerator
|
| 272 |
+
out = tmp_path / "report.html"
|
| 273 |
+
ReportGenerator(benchmark_result).generate(out)
|
| 274 |
+
html = out.read_text(encoding="utf-8")
|
| 275 |
+
assert 'class="chart-card pareto-card"' in html
|
| 276 |
+
assert 'id="pareto-chart"' in html
|
| 277 |
+
assert 'setParetoAxis(\'cost\')' in html
|
| 278 |
+
assert 'setParetoAxis(\'speed\')' in html
|
| 279 |
+
assert 'setParetoAxis(\'co2\')' in html
|
| 280 |
+
assert "pareto-experimental" in html # étiquette expérimental
|
| 281 |
+
|
| 282 |
+
def test_report_json_contains_pareto_data(self, benchmark_result):
|
| 283 |
+
from picarones.report.generator import _build_report_data
|
| 284 |
+
data = _build_report_data(benchmark_result, images_b64={})
|
| 285 |
+
pareto = data.get("pareto", {})
|
| 286 |
+
assert "cost" in pareto
|
| 287 |
+
assert "speed" in pareto
|
| 288 |
+
assert "co2" in pareto
|
| 289 |
+
assert "pricing_meta" in pareto
|
| 290 |
+
# Les moteurs doivent porter leur champ cost
|
| 291 |
+
for e in data["engines"]:
|
| 292 |
+
assert "cost" in e, f"Moteur {e.get('name')} sans champ cost"
|
| 293 |
+
|
| 294 |
+
def test_synthesis_may_include_pareto_sentence(self, benchmark_result, tmp_path):
|
| 295 |
+
# Sur la fixture de démo, pero_ocr + tesseract sont sur le front → la
|
| 296 |
+
# synthèse doit remonter une alternative moins chère
|
| 297 |
+
from picarones.report.generator import ReportGenerator
|
| 298 |
+
out = tmp_path / "report.html"
|
| 299 |
+
ReportGenerator(benchmark_result).generate(out)
|
| 300 |
+
html = out.read_text(encoding="utf-8")
|
| 301 |
+
m = re.search(r'<ul class="synth-list">(.*?)</ul>', html, re.DOTALL)
|
| 302 |
+
assert m
|
| 303 |
+
ul_content = m.group(1)
|
| 304 |
+
# On n'exige pas "compromis" en dur (dépend de l'i18n) — simplement
|
| 305 |
+
# qu'un moteur et "€" apparaissent (signe que pareto_alternative a tiré)
|
| 306 |
+
assert "€" in ul_content or "pero_ocr" in ul_content
|
| 307 |
+
|
| 308 |
+
def test_pricing_yaml_is_packaged(self):
|
| 309 |
+
"""Garde-fou : le YAML doit être accessible depuis le package."""
|
| 310 |
+
from picarones.core.pricing import _DEFAULT_PRICING_PATH
|
| 311 |
+
assert Path(_DEFAULT_PRICING_PATH).exists()
|
| 312 |
+
|
| 313 |
+
def test_english_locale_renders_pareto_labels(self, benchmark_result, tmp_path):
|
| 314 |
+
from picarones.report.generator import ReportGenerator
|
| 315 |
+
out = tmp_path / "report_en.html"
|
| 316 |
+
ReportGenerator(benchmark_result, lang="en").generate(out)
|
| 317 |
+
html = out.read_text(encoding="utf-8")
|
| 318 |
+
assert 'data-i18n="h_pareto"' in html
|
| 319 |
+
assert 'data-i18n="pareto_axis_cost"' in html
|
| 320 |
+
|
| 321 |
+
|
| 322 |
+
# ---------------------------------------------------------------------------
|
| 323 |
+
# 5. Traçabilité des nombres (anti-hallucination pour les 2 nouveaux templates)
|
| 324 |
+
# ---------------------------------------------------------------------------
|
| 325 |
+
|
| 326 |
+
class TestAntiHallucinationOnPareto:
|
| 327 |
+
def test_pareto_alternative_numbers_traceable(self):
|
| 328 |
+
data = _pareto_data(
|
| 329 |
+
[
|
| 330 |
+
{"engine": "A", "cer": 0.02, "cost": 5.0},
|
| 331 |
+
{"engine": "B", "cer": 0.04, "cost": 0.25},
|
| 332 |
+
],
|
| 333 |
+
front=["A", "B"],
|
| 334 |
+
)
|
| 335 |
+
data["ranking"] = [
|
| 336 |
+
{"engine": "A", "mean_cer": 0.02, "documents": 10, "failed": 0},
|
| 337 |
+
{"engine": "B", "mean_cer": 0.04, "documents": 10, "failed": 0},
|
| 338 |
+
]
|
| 339 |
+
# Autres infos requises par build_synthesis
|
| 340 |
+
data.setdefault("meta", {"document_count": 10})
|
| 341 |
+
data.setdefault("engines", [
|
| 342 |
+
{"name": "A", "cer": 0.02},
|
| 343 |
+
{"name": "B", "cer": 0.04},
|
| 344 |
+
])
|
| 345 |
+
data.setdefault("statistics", {
|
| 346 |
+
"pairwise_wilcoxon": [], "bootstrap_cis": [],
|
| 347 |
+
"friedman": {}, "nemenyi": {"tied_groups": [], "mean_ranks": {}},
|
| 348 |
+
})
|
| 349 |
+
data.setdefault("documents", [])
|
| 350 |
+
|
| 351 |
+
result = build_synthesis(data, "fr")
|
| 352 |
+
# Chercher la phrase pareto
|
| 353 |
+
pareto_sentences = [s for s in result["sentences"] if "compromis" in s or "€" in s]
|
| 354 |
+
assert pareto_sentences
|
| 355 |
+
# Les nombres principaux doivent venir du payload : 4 (cer_pct=4), 0.25 (cost),
|
| 356 |
+
# 2 (leader_cer_pct=2), 5 (leader_cost), 20 (ratio=5/0.25)
|
| 357 |
+
facts_by_type = {f["type"]: f for f in result["facts"]}
|
| 358 |
+
assert FactType.PARETO_ALTERNATIVE.value in facts_by_type
|
| 359 |
+
payload = facts_by_type[FactType.PARETO_ALTERNATIVE.value]["payload"]
|
| 360 |
+
sentence = pareto_sentences[0]
|
| 361 |
+
for k in ("cost", "leader_cost", "cer_pct", "leader_cer_pct", "cost_saving_ratio"):
|
| 362 |
+
val = payload.get(k)
|
| 363 |
+
if val is None:
|
| 364 |
+
continue
|
| 365 |
+
# Au moins une représentation du nombre doit apparaître
|
| 366 |
+
variants = {str(val), str(float(val)), f"{float(val):.1f}", f"{float(val):.2f}"}
|
| 367 |
+
if val == int(val):
|
| 368 |
+
variants.add(str(int(val)))
|
| 369 |
+
assert any(v in sentence for v in variants), (
|
| 370 |
+
f"Valeur {k}={val} absente de la phrase : {sentence!r}"
|
| 371 |
+
)
|