Spaces:
Sleeping
Sleeping
Merge pull request #54 from maribakulj/claude/repo-analysis-a319T
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- CLAUDE.md +0 -0
- README.md +1 -1
- SPECS.md +1 -1
- docs/architecture.md +37 -1
- docs/cli-workflows.md +1 -1
- docs/developer/index.md +10 -2
- docs/profiles.md +1 -1
- docs/roadmap/evolution-2026.md +2 -2
- docs/user/writing-a-pipeline-module.md +1 -1
- picarones/measurements/__init__.py +25 -0
- picarones/measurements/runner/__init__.py +103 -0
- picarones/measurements/runner/aggregation.py +82 -0
- picarones/measurements/runner/document.py +190 -0
- picarones/measurements/runner/ner_attach.py +133 -0
- picarones/measurements/{runner.py → runner/orchestration.py} +33 -558
- picarones/measurements/runner/partial.py +140 -0
- picarones/measurements/runner/workers.py +107 -0
- picarones/measurements/statistics.py +0 -1128
- picarones/measurements/statistics/__init__.py +82 -0
- picarones/measurements/statistics/bootstrap.py +47 -0
- picarones/measurements/statistics/cdd_render.py +171 -0
- picarones/measurements/statistics/clustering.py +158 -0
- picarones/measurements/statistics/correlation.py +75 -0
- picarones/measurements/statistics/distributions.py +88 -0
- picarones/measurements/statistics/friedman_nemenyi.py +350 -0
- picarones/measurements/statistics/pareto.py +87 -0
- picarones/measurements/statistics/wilcoxon.py +227 -0
- picarones/report/assets.py +203 -0
- picarones/report/calibration_render.py +2 -16
- picarones/report/error_absorption_render.py +17 -51
- picarones/report/generator.py +178 -775
- picarones/report/image_predictive_render.py +4 -18
- picarones/report/incremental_comparison_render.py +17 -19
- picarones/report/inter_engine_render.py +8 -15
- picarones/report/levers_render.py +9 -1
- picarones/report/lexical_modernization_render.py +5 -10
- picarones/report/longitudinal_render.py +15 -24
- picarones/report/marginal_cost_render.py +111 -0
- picarones/report/multirun_stability_render.py +2 -16
- picarones/report/ner_render.py +3 -22
- picarones/report/numerical_sequences_render.py +3 -18
- picarones/report/philological_render.py +5 -25
- picarones/report/pipeline_render.py +14 -24
- picarones/report/rare_token_recall_render.py +116 -0
- picarones/report/readability_render.py +13 -18
- picarones/report/render_helpers.py +422 -0
- picarones/report/report_data/__init__.py +132 -0
- picarones/report/report_data/_helpers.py +30 -0
- picarones/report/report_data/documents.py +167 -0
- picarones/report/report_data/engines.py +103 -0
CLAUDE.md
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.md
CHANGED
|
@@ -385,7 +385,7 @@ ruff check picarones/ tests/
|
|
| 385 |
python -m mypy picarones/core/
|
| 386 |
```
|
| 387 |
|
| 388 |
-
**Test suite**: ~
|
| 389 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 390 |
requiring live HTTP.
|
| 391 |
|
|
|
|
| 385 |
python -m mypy picarones/core/
|
| 386 |
```
|
| 387 |
|
| 388 |
+
**Test suite**: ~3871 tests, ~3 min on a modern laptop. Coverage
|
| 389 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 390 |
requiring live HTTP.
|
| 391 |
|
SPECS.md
CHANGED
|
@@ -425,7 +425,7 @@ colonne) et `picarones/report/glossary/{fr,en}.yaml`.
|
|
| 425 |
|
| 426 |
**Note de traçabilité** : les références primaires (Demšar 2006,
|
| 427 |
Wilcoxon 1945, Efron 1979, etc.) sont citées dans les docstrings
|
| 428 |
-
de chaque fonction de `picarones/measurements/statistics
|
| 429 |
Le glossaire contextuel relie chaque métrique à sa publication
|
| 430 |
canonique (champ `reference`).
|
| 431 |
|
|
|
|
| 425 |
|
| 426 |
**Note de traçabilité** : les références primaires (Demšar 2006,
|
| 427 |
Wilcoxon 1945, Efron 1979, etc.) sont citées dans les docstrings
|
| 428 |
+
de chaque fonction de `picarones/measurements/statistics/`.
|
| 429 |
Le glossaire contextuel relie chaque métrique à sa publication
|
| 430 |
canonique (champ `reference`).
|
| 431 |
|
docs/architecture.md
CHANGED
|
@@ -41,7 +41,7 @@ Les implémentations distribuées par défaut dans le package `picarones`.
|
|
| 41 |
|
| 42 |
| Catégorie | Modules |
|
| 43 |
|---|---|
|
| 44 |
-
| Coeur | `metrics.py`, `statistics
|
| 45 |
| Erreurs | `confusion.py`, `taxonomy.py`, `taxonomy_comparison.py`, `taxonomy_cooccurrence.py`, `taxonomy_intra_doc.py` |
|
| 46 |
| Lignes/structure | `line_metrics.py`, `structure.py`, `worst_lines.py`, `char_scores.py` |
|
| 47 |
| Calibration/fiabilité | `calibration.py`, `reliability.py`, `hallucination.py` |
|
|
@@ -141,3 +141,39 @@ Organisés par cercle : `tests/core/`, `tests/measurements/`,
|
|
| 141 |
|
| 142 |
Un test du cercle N **n'importe pas** les implémentations des
|
| 143 |
cercles > N (sauf `tests/integration/`).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
| Catégorie | Modules |
|
| 43 |
|---|---|
|
| 44 |
+
| Coeur | `metrics.py`, `statistics/` (sous-package), `runner.py`, `builtin_hooks.py`, `builtin_metrics.py`, `normalization.py` |
|
| 45 |
| Erreurs | `confusion.py`, `taxonomy.py`, `taxonomy_comparison.py`, `taxonomy_cooccurrence.py`, `taxonomy_intra_doc.py` |
|
| 46 |
| Lignes/structure | `line_metrics.py`, `structure.py`, `worst_lines.py`, `char_scores.py` |
|
| 47 |
| Calibration/fiabilité | `calibration.py`, `reliability.py`, `hallucination.py` |
|
|
|
|
| 141 |
|
| 142 |
Un test du cercle N **n'importe pas** les implémentations des
|
| 143 |
cercles > N (sauf `tests/integration/`).
|
| 144 |
+
|
| 145 |
+
## Convention de découpage des modules > 400 lignes
|
| 146 |
+
|
| 147 |
+
Quand un module Python dépasse 400 lignes ET contient plusieurs
|
| 148 |
+
responsabilités disjointes, le découper en **sous-package** plutôt
|
| 149 |
+
qu'en plusieurs modules à plat. Modèle de référence :
|
| 150 |
+
[`picarones/measurements/statistics/`](../picarones/measurements/statistics/)
|
| 151 |
+
issu du sprint « découpage de statistics.py » (mai 2026).
|
| 152 |
+
|
| 153 |
+
Convention :
|
| 154 |
+
|
| 155 |
+
1. **Renommer** `X.py` en `X/__init__.py` via `git mv` (préserve
|
| 156 |
+
l'historique du fichier original).
|
| 157 |
+
2. **Créer** dans `X/` un sous-module par famille fonctionnelle
|
| 158 |
+
(`bootstrap.py`, `wilcoxon.py`, `friedman_nemenyi.py`, etc.).
|
| 159 |
+
Chaque sous-module doit faire moins de ~400 lignes ; sinon
|
| 160 |
+
re-décomposer.
|
| 161 |
+
3. **`X/__init__.py`** ne contient QUE des ré-exports rétrocompat —
|
| 162 |
+
tous les symboles publics de l'ancien `X.py` doivent rester
|
| 163 |
+
importables via `from picarones.X import …`. Les symboles privés
|
| 164 |
+
ré-exportés doivent être ceux **réellement** consommés par les
|
| 165 |
+
tests (vérifié par grep, pas par supposition).
|
| 166 |
+
4. **`__all__`** explicite dans chaque sous-module et dans le
|
| 167 |
+
`__init__.py`.
|
| 168 |
+
5. **Tests architecture** (`tests/architecture/test_*.py`) doivent
|
| 169 |
+
continuer à passer : si nécessaire, étendre `_measurements_modules()`
|
| 170 |
+
ou `_imports_target_*` pour reconnaître les sous-packages.
|
| 171 |
+
6. **Préfixer les modules de rendu** par leur domaine
|
| 172 |
+
(`cdd_render.py` plutôt que `render_cdd.py`) pour cohérence avec
|
| 173 |
+
`picarones/report/*_render.py`.
|
| 174 |
+
|
| 175 |
+
**Quand NE PAS découper** : si les responsabilités sont fortement
|
| 176 |
+
couplées (ex: un orchestrateur qui appelle 12 sous-fonctions au
|
| 177 |
+
même endroit), le maintien dans un seul fichier > 400 lignes est
|
| 178 |
+
acceptable. Le budget par fichier (`tests/architecture/test_file_budgets.py`)
|
| 179 |
+
documente ces dérogations conscientes.
|
docs/cli-workflows.md
CHANGED
|
@@ -133,7 +133,7 @@ picarones import iiif \
|
|
| 133 |
Télécharge un manifeste IIIF v2/v3 (BnF Gallica, Bodleian, Vatican…) et
|
| 134 |
crée un corpus local avec `.gt.txt` extraits de l'OCR ALTO si présent.
|
| 135 |
Depuis le chantier 4, IIIF et Gallica utilisent les mêmes helpers HTTP
|
| 136 |
-
factorisés ([`picarones/importers/_http.py`](../picarones/importers/_http.py))
|
| 137 |
avec garde-fou `file://`/`ftp://`/`javascript://`.
|
| 138 |
|
| 139 |
## Outils utilitaires
|
|
|
|
| 133 |
Télécharge un manifeste IIIF v2/v3 (BnF Gallica, Bodleian, Vatican…) et
|
| 134 |
crée un corpus local avec `.gt.txt` extraits de l'OCR ALTO si présent.
|
| 135 |
Depuis le chantier 4, IIIF et Gallica utilisent les mêmes helpers HTTP
|
| 136 |
+
factorisés ([`picarones/extras/importers/_http.py`](../picarones/extras/importers/_http.py))
|
| 137 |
avec garde-fou `file://`/`ftp://`/`javascript://`.
|
| 138 |
|
| 139 |
## Outils utilitaires
|
docs/developer/index.md
CHANGED
|
@@ -10,10 +10,18 @@ modules. En résumé :
|
|
| 10 |
|
| 11 |
```
|
| 12 |
picarones/
|
| 13 |
-
├── core/ # cœur analytique pur Python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
│ ├── runner.py # orchestration ThreadPool/ProcessPool
|
| 15 |
│ ├── metrics.py # CER/WER/MER/WIL via jiwer
|
| 16 |
-
│ ├── statistics
|
|
|
|
| 17 |
│ ├── narrative/ # moteur de synthèse factuelle
|
| 18 |
│ ├── pricing.py # modèle de coût pour la vue Pareto
|
| 19 |
│ └── …
|
|
|
|
| 10 |
|
| 11 |
```
|
| 12 |
picarones/
|
| 13 |
+
├── core/ # cœur analytique pur Python (Cercle 1)
|
| 14 |
+
│ ├── pipeline.py # PipelineRunner pour pipelines composées
|
| 15 |
+
│ ├── corpus.py # Document, Corpus, GTLevel
|
| 16 |
+
│ ├── results.py # DocumentResult, EngineReport, BenchmarkResult
|
| 17 |
+
│ ├── modules.py # BaseModule, ArtifactType
|
| 18 |
+
│ ├── facts.py # Fact, FactType, registre narratif
|
| 19 |
+
│ └── …
|
| 20 |
+
├── measurements/ # métriques officielles (Cercle 2)
|
| 21 |
│ ├── runner.py # orchestration ThreadPool/ProcessPool
|
| 22 |
│ ├── metrics.py # CER/WER/MER/WIL via jiwer
|
| 23 |
+
│ ├── statistics/ # Wilcoxon, Friedman, Nemenyi, Pareto
|
| 24 |
+
│ │ (sous-package depuis le sprint « découpage statistics.py »)
|
| 25 |
│ ├── narrative/ # moteur de synthèse factuelle
|
| 26 |
│ ├── pricing.py # modèle de coût pour la vue Pareto
|
| 27 |
│ └── …
|
docs/profiles.md
CHANGED
|
@@ -150,7 +150,7 @@ def my_hook(*, ground_truth, hypothesis, image_path, corpus_lang, ocr_result):
|
|
| 150 |
|
| 151 |
- [`picarones/core/metric_hooks.py`](../picarones/core/metric_hooks.py)
|
| 152 |
— registre, profils, `run_document_hooks()`, `run_corpus_aggregators()`.
|
| 153 |
-
- [`picarones/
|
| 154 |
— les 12 hooks doc + 12 agrégateurs natifs Picarones.
|
| 155 |
- [`tests/test_metric_hooks.py`](../tests/test_metric_hooks.py)
|
| 156 |
— tests unitaires + rétrocompat profil `standard`.
|
|
|
|
| 150 |
|
| 151 |
- [`picarones/core/metric_hooks.py`](../picarones/core/metric_hooks.py)
|
| 152 |
— registre, profils, `run_document_hooks()`, `run_corpus_aggregators()`.
|
| 153 |
+
- [`picarones/measurements/builtin_hooks.py`](../picarones/measurements/builtin_hooks.py)
|
| 154 |
— les 12 hooks doc + 12 agrégateurs natifs Picarones.
|
| 155 |
- [`tests/test_metric_hooks.py`](../tests/test_metric_hooks.py)
|
| 156 |
— tests unitaires + rétrocompat profil `standard`.
|
docs/roadmap/evolution-2026.md
CHANGED
|
@@ -442,7 +442,7 @@ nouvelle dans le rapport.
|
|
| 442 |
|
| 443 |
**A.II.1.a — Précision sur entités nommées (NER).**
|
| 444 |
|
| 445 |
-
Nouveau module `picarones/
|
| 446 |
Stanza, modèle HIPE pour les corpus historiques. Choix paramétré par
|
| 447 |
profil (`fr_core_news_lg`, `xx_ent_wiki_sm`, `hipe2022`).
|
| 448 |
|
|
@@ -464,7 +464,7 @@ glossaire (entrée `ner_score`).
|
|
| 464 |
|
| 465 |
**A.II.1.b — Score de calibration des moteurs.**
|
| 466 |
|
| 467 |
-
Nouveau module `picarones/
|
| 468 |
fournissent une confidence par token ou par ligne (Tesseract `tsv`
|
| 469 |
output, Pero OCR via `PageLayout`, Mistral OCR via `confidence`, Google
|
| 470 |
Vision via `Word.confidence`). Ajout d'un champ
|
|
|
|
| 442 |
|
| 443 |
**A.II.1.a — Précision sur entités nommées (NER).**
|
| 444 |
|
| 445 |
+
Nouveau module `picarones/measurements/ner.py`. Backends : spaCy multilingue,
|
| 446 |
Stanza, modèle HIPE pour les corpus historiques. Choix paramétré par
|
| 447 |
profil (`fr_core_news_lg`, `xx_ent_wiki_sm`, `hipe2022`).
|
| 448 |
|
|
|
|
| 464 |
|
| 465 |
**A.II.1.b — Score de calibration des moteurs.**
|
| 466 |
|
| 467 |
+
Nouveau module `picarones/measurements/calibration.py`. Tous les moteurs cibles
|
| 468 |
fournissent une confidence par token ou par ligne (Tesseract `tsv`
|
| 469 |
output, Pero OCR via `PageLayout`, Mistral OCR via `confidence`, Google
|
| 470 |
Vision via `Word.confidence`). Ajout d'un champ
|
docs/user/writing-a-pipeline-module.md
CHANGED
|
@@ -350,7 +350,7 @@ brancher dans la pipeline et de mesurer.
|
|
| 350 |
### 6.b « Et si je veux juste tester une pipeline OCR seule, sans étapes en aval ? »
|
| 351 |
|
| 352 |
C'est exactement ce que fait le runner OCR historique
|
| 353 |
-
(`run_benchmark` dans `picarones/
|
| 354 |
toujours là, n'a pas changé, et reste la voie recommandée pour
|
| 355 |
les benchmarks d'OCR mono-étage.
|
| 356 |
|
|
|
|
| 350 |
### 6.b « Et si je veux juste tester une pipeline OCR seule, sans étapes en aval ? »
|
| 351 |
|
| 352 |
C'est exactement ce que fait le runner OCR historique
|
| 353 |
+
(`run_benchmark` dans `picarones/measurements/runner/`) — il est
|
| 354 |
toujours là, n'a pas changé, et reste la voie recommandée pour
|
| 355 |
les benchmarks d'OCR mono-étage.
|
| 356 |
|
picarones/measurements/__init__.py
CHANGED
|
@@ -151,3 +151,28 @@ from picarones.measurements import reading_order # noqa: F401
|
|
| 151 |
# Chantier 1 (post-Sprint 97) : métriques (ALTO, ALTO) pour évaluer
|
| 152 |
# les reconstructeurs ALTO contre une GT ALTO du document.
|
| 153 |
from picarones.measurements import alto_metrics # noqa: F401
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
# Chantier 1 (post-Sprint 97) : métriques (ALTO, ALTO) pour évaluer
|
| 152 |
# les reconstructeurs ALTO contre une GT ALTO du document.
|
| 153 |
from picarones.measurements import alto_metrics # noqa: F401
|
| 154 |
+
|
| 155 |
+
# ──────────────────────────────────────────────────────────────────────────
|
| 156 |
+
# Sprint « zéro dette actionnable » (mai 2026) — modules sans appel
|
| 157 |
+
# automatique par le runner OCR principal mais qui font partie de l'API
|
| 158 |
+
# publique de ``picarones.measurements``. L'import ici les rend
|
| 159 |
+
# accessibles en ``from picarones.measurements import X`` et garantit
|
| 160 |
+
# qu'aucun ne devient « test-only » silencieusement (cf.
|
| 161 |
+
# ``tests/architecture/test_module_coverage.py``).
|
| 162 |
+
#
|
| 163 |
+
# Distinction de scope :
|
| 164 |
+
# - Modules de calcul utilisés via les renderers HTML composables
|
| 165 |
+
# (l'utilisateur les compose lui-même selon son use case) :
|
| 166 |
+
from picarones.measurements import baseline_comparison # noqa: F401 # historique SQLite
|
| 167 |
+
from picarones.measurements import cost_projection # noqa: F401 # volume cible utilisateur
|
| 168 |
+
from picarones.measurements import equivalence_profile # noqa: F401 # curseur HTML
|
| 169 |
+
from picarones.measurements import error_absorption # noqa: F401 # jonction pipeline composée
|
| 170 |
+
from picarones.measurements import layout # noqa: F401 # GT ALTO requise (axe B)
|
| 171 |
+
from picarones.measurements import longitudinal # noqa: F401 # historique SQLite
|
| 172 |
+
from picarones.measurements import marginal_cost # noqa: F401 # paires de moteurs
|
| 173 |
+
from picarones.measurements import module_policy # noqa: F401 # outil d'audit
|
| 174 |
+
from picarones.measurements import ner_backends # noqa: F401 # factory backends NER
|
| 175 |
+
from picarones.measurements import rare_tokens # noqa: F401 # corpus-wide
|
| 176 |
+
from picarones.measurements import reliability # noqa: F401 # multi-runs
|
| 177 |
+
from picarones.measurements import taxonomy_cooccurrence # noqa: F401 # depuis taxonomy
|
| 178 |
+
from picarones.measurements import taxonomy_intra_doc # noqa: F401 # depuis taxonomy
|
picarones/measurements/runner/__init__.py
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Orchestrateur du benchmark.
|
| 2 |
+
|
| 3 |
+
Exécute les moteurs OCR/HTR sur le corpus de manière parallèle :
|
| 4 |
+
|
| 5 |
+
- ``ProcessPoolExecutor`` pour les moteurs CPU-bound (Tesseract, Pero OCR,
|
| 6 |
+
Kraken) — les workers picklables vivent dans :mod:`workers`.
|
| 7 |
+
- ``ThreadPoolExecutor`` pour les moteurs IO-bound / API (Mistral, Google,
|
| 8 |
+
Azure, LLMs).
|
| 9 |
+
|
| 10 |
+
Avant le sprint « découpage de runner.py » (mai 2026) ce module était
|
| 11 |
+
un fichier unique de 1019 lignes. Le sous-package éclate la
|
| 12 |
+
responsabilité par concern :
|
| 13 |
+
|
| 14 |
+
- :mod:`document` — calcul d'un :class:`DocumentResult` à partir d'un
|
| 15 |
+
OCR (métriques principales + hooks via ``run_document_hooks(profile)``).
|
| 16 |
+
- :mod:`workers` — fonctions de niveau module pour ``ProcessPoolExecutor``
|
| 17 |
+
(:func:`_cpu_doc_worker`) et ``ThreadPoolExecutor`` (:func:`_io_doc_worker`).
|
| 18 |
+
- :mod:`partial` — persistance NDJSON des résultats partiels pour
|
| 19 |
+
reprise sur interruption.
|
| 20 |
+
- :mod:`orchestration` — :func:`run_benchmark` (boucle principale,
|
| 21 |
+
pools, agrégation par moteur) + :func:`_build_pipeline_info`.
|
| 22 |
+
- :mod:`aggregation` — délégations rétrocompat vers les agrégateurs de
|
| 23 |
+
``builtin_hooks`` (chantier 2 post-Sprint 97).
|
| 24 |
+
- :mod:`ner_attach` — câblage NER au post-process (Sprint 40).
|
| 25 |
+
|
| 26 |
+
Ce ``__init__.py`` ré-exporte toute l'API publique historique pour que
|
| 27 |
+
les ~25 fichiers qui importent depuis ``picarones.measurements.runner``
|
| 28 |
+
continuent à fonctionner sans modification. Les symboles privés
|
| 29 |
+
``_compute_document_result``, ``_load_partial``, ``_partial_path``,
|
| 30 |
+
``_aggregate_*``, ``_calibration_from_engine_result`` sont ré-exportés
|
| 31 |
+
car les tests Sprint 13/40/42 les consomment directement.
|
| 32 |
+
"""
|
| 33 |
+
|
| 34 |
+
from picarones.measurements.runner.aggregation import (
|
| 35 |
+
_aggregate_calibration,
|
| 36 |
+
_aggregate_char_scores,
|
| 37 |
+
_aggregate_confusion,
|
| 38 |
+
_aggregate_hallucination,
|
| 39 |
+
_aggregate_image_quality,
|
| 40 |
+
_aggregate_line_metrics,
|
| 41 |
+
_aggregate_structure,
|
| 42 |
+
_aggregate_taxonomy,
|
| 43 |
+
)
|
| 44 |
+
from picarones.measurements.runner.document import (
|
| 45 |
+
_calibration_from_engine_result,
|
| 46 |
+
_compute_document_result,
|
| 47 |
+
_make_error_doc_result,
|
| 48 |
+
_make_timeout_doc_result,
|
| 49 |
+
)
|
| 50 |
+
from picarones.measurements.runner.ner_attach import (
|
| 51 |
+
_aggregate_ner,
|
| 52 |
+
_attach_ner_metrics,
|
| 53 |
+
)
|
| 54 |
+
from picarones.measurements.runner.orchestration import (
|
| 55 |
+
_build_pipeline_info,
|
| 56 |
+
run_benchmark,
|
| 57 |
+
)
|
| 58 |
+
from picarones.measurements.runner.partial import (
|
| 59 |
+
_delete_partial,
|
| 60 |
+
_load_partial,
|
| 61 |
+
_partial_path,
|
| 62 |
+
_partial_write_lock,
|
| 63 |
+
_sanitize_filename,
|
| 64 |
+
_save_partial_line,
|
| 65 |
+
)
|
| 66 |
+
from picarones.measurements.runner.workers import (
|
| 67 |
+
_cpu_doc_worker,
|
| 68 |
+
_io_doc_worker,
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
__all__ = [
|
| 72 |
+
# API publique principale
|
| 73 |
+
"run_benchmark",
|
| 74 |
+
# Helpers calcul document
|
| 75 |
+
"_compute_document_result",
|
| 76 |
+
"_calibration_from_engine_result",
|
| 77 |
+
"_make_error_doc_result",
|
| 78 |
+
"_make_timeout_doc_result",
|
| 79 |
+
# Workers picklables
|
| 80 |
+
"_cpu_doc_worker",
|
| 81 |
+
"_io_doc_worker",
|
| 82 |
+
# Persistance partial
|
| 83 |
+
"_partial_path",
|
| 84 |
+
"_load_partial",
|
| 85 |
+
"_save_partial_line",
|
| 86 |
+
"_delete_partial",
|
| 87 |
+
"_sanitize_filename",
|
| 88 |
+
"_partial_write_lock",
|
| 89 |
+
# Orchestration helper
|
| 90 |
+
"_build_pipeline_info",
|
| 91 |
+
# Délégations agrégation (rétrocompat tests Sprint 13/42)
|
| 92 |
+
"_aggregate_calibration",
|
| 93 |
+
"_aggregate_char_scores",
|
| 94 |
+
"_aggregate_confusion",
|
| 95 |
+
"_aggregate_hallucination",
|
| 96 |
+
"_aggregate_image_quality",
|
| 97 |
+
"_aggregate_line_metrics",
|
| 98 |
+
"_aggregate_structure",
|
| 99 |
+
"_aggregate_taxonomy",
|
| 100 |
+
# NER (Sprint 40)
|
| 101 |
+
"_aggregate_ner",
|
| 102 |
+
"_attach_ner_metrics",
|
| 103 |
+
]
|
picarones/measurements/runner/aggregation.py
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Délégations rétrocompat vers ``builtin_hooks._aggregate_*``.
|
| 2 |
+
|
| 3 |
+
Chantier 2 (post-Sprint 97) : la logique d'agrégation par-engine de
|
| 4 |
+
toutes les métriques (confusion, taxonomy, structure, image_quality,
|
| 5 |
+
line_metrics, hallucination, calibration, char_scores) vit désormais
|
| 6 |
+
dans :mod:`picarones.measurements.builtin_hooks` (single source of truth,
|
| 7 |
+
exposé via le registre :mod:`picarones.core.metric_hooks`).
|
| 8 |
+
|
| 9 |
+
Les noms ci-dessous restent disponibles depuis
|
| 10 |
+
``picarones.measurements.runner`` pour la rétrocompat des tests
|
| 11 |
+
Sprint 13 / 42 qui les importent directement.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
from typing import Optional
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def _aggregate_confusion(doc_results: list) -> Optional[dict]:
|
| 20 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_confusion`."""
|
| 21 |
+
from picarones.measurements.builtin_hooks import _aggregate_confusion as _impl
|
| 22 |
+
return _impl(doc_results)
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def _aggregate_char_scores(doc_results: list) -> Optional[dict]:
|
| 26 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_char_scores`."""
|
| 27 |
+
from picarones.measurements.builtin_hooks import _aggregate_char_scores as _impl
|
| 28 |
+
return _impl(doc_results)
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
def _aggregate_taxonomy(doc_results: list) -> Optional[dict]:
|
| 32 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_taxonomy`."""
|
| 33 |
+
from picarones.measurements.builtin_hooks import _aggregate_taxonomy as _impl
|
| 34 |
+
return _impl(doc_results)
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def _aggregate_structure(doc_results: list) -> Optional[dict]:
|
| 38 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_structure`."""
|
| 39 |
+
from picarones.measurements.builtin_hooks import _aggregate_structure as _impl
|
| 40 |
+
return _impl(doc_results)
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def _aggregate_image_quality(doc_results: list) -> Optional[dict]:
|
| 44 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_image_quality`."""
|
| 45 |
+
from picarones.measurements.builtin_hooks import _aggregate_image_quality as _impl
|
| 46 |
+
return _impl(doc_results)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def _aggregate_line_metrics(doc_results: list) -> Optional[dict]:
|
| 50 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_line_metrics`."""
|
| 51 |
+
from picarones.measurements.builtin_hooks import _aggregate_line_metrics as _impl
|
| 52 |
+
return _impl(doc_results)
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def _aggregate_hallucination(doc_results: list) -> Optional[dict]:
|
| 56 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_hallucination`."""
|
| 57 |
+
from picarones.measurements.builtin_hooks import _aggregate_hallucination as _impl
|
| 58 |
+
return _impl(doc_results)
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def _aggregate_calibration(doc_results: list) -> Optional[dict]:
|
| 62 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_calibration`.
|
| 63 |
+
|
| 64 |
+
Conservé pour la rétrocompat du test ``test_sprint42_calibration_runner``
|
| 65 |
+
qui importe directement depuis ``picarones.measurements.runner``. La
|
| 66 |
+
logique réelle vit dans :mod:`picarones.measurements.builtin_hooks`
|
| 67 |
+
(chantier 2 post-Sprint 97).
|
| 68 |
+
"""
|
| 69 |
+
from picarones.measurements.builtin_hooks import _aggregate_calibration as _impl
|
| 70 |
+
return _impl(doc_results)
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
__all__ = [
|
| 74 |
+
"_aggregate_calibration",
|
| 75 |
+
"_aggregate_char_scores",
|
| 76 |
+
"_aggregate_confusion",
|
| 77 |
+
"_aggregate_hallucination",
|
| 78 |
+
"_aggregate_image_quality",
|
| 79 |
+
"_aggregate_line_metrics",
|
| 80 |
+
"_aggregate_structure",
|
| 81 |
+
"_aggregate_taxonomy",
|
| 82 |
+
]
|
picarones/measurements/runner/document.py
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction d'un :class:`DocumentResult` à partir d'un OCR.
|
| 2 |
+
|
| 3 |
+
Centralise le calcul de toutes les métriques attachées à un document
|
| 4 |
+
unique : métriques principales (CER/WER/MER/WIL via jiwer), hooks
|
| 5 |
+
optionnels (calibration, taxonomy, philological, etc. — exécutés via
|
| 6 |
+
``run_document_hooks(profile)``), et meta pipeline OCR+LLM.
|
| 7 |
+
|
| 8 |
+
Aussi : helpers pour construire les ``DocumentResult`` synthétiques
|
| 9 |
+
en cas de timeout ou d'erreur d'engine (``_make_timeout_doc_result``,
|
| 10 |
+
``_make_error_doc_result``).
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
from __future__ import annotations
|
| 14 |
+
|
| 15 |
+
from typing import Optional
|
| 16 |
+
|
| 17 |
+
from picarones.core.results import DocumentResult
|
| 18 |
+
from picarones.engines.base import EngineResult
|
| 19 |
+
from picarones.measurements.metrics import MetricsResult, compute_metrics
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def _calibration_from_engine_result(
|
| 23 |
+
ground_truth: str,
|
| 24 |
+
token_confidences: list,
|
| 25 |
+
) -> Optional[dict]:
|
| 26 |
+
"""Délégation vers
|
| 27 |
+
:func:`picarones.measurements.builtin_hooks.calibration_from_engine_result`.
|
| 28 |
+
|
| 29 |
+
Conservé pour la rétrocompat des tests Sprint 42 qui font
|
| 30 |
+
``from picarones.measurements.runner import _calibration_from_engine_result``.
|
| 31 |
+
Toute évolution du calcul doit se faire dans ``builtin_hooks``.
|
| 32 |
+
"""
|
| 33 |
+
from picarones.measurements.builtin_hooks import calibration_from_engine_result
|
| 34 |
+
return calibration_from_engine_result(ground_truth, token_confidences)
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def _compute_document_result(
|
| 38 |
+
doc_id: str,
|
| 39 |
+
image_path: str,
|
| 40 |
+
ground_truth: str,
|
| 41 |
+
ocr_result: EngineResult,
|
| 42 |
+
char_exclude: Optional[frozenset],
|
| 43 |
+
corpus_lang: str = "fr",
|
| 44 |
+
profile: str = "standard",
|
| 45 |
+
) -> DocumentResult:
|
| 46 |
+
"""Calcule toutes les métriques pour un document et retourne un DocumentResult.
|
| 47 |
+
|
| 48 |
+
Utilisable à la fois dans le processus principal (IO-bound) et dans les
|
| 49 |
+
sous-processus créés par ProcessPoolExecutor (CPU-bound).
|
| 50 |
+
Les imports lourds sont différés pour accélérer le démarrage des sous-processus.
|
| 51 |
+
|
| 52 |
+
Chantier 2 (post-Sprint 97) — refonte
|
| 53 |
+
------------------------------------
|
| 54 |
+
Les 11 ``try/except`` codés en dur (Sprints 5+10+39+42+61+86+87) sont
|
| 55 |
+
désormais centralisés dans ``picarones.measurements.builtin_hooks`` et
|
| 56 |
+
sélectionnés via ``run_document_hooks(profile)``. Le profil
|
| 57 |
+
``"standard"`` (défaut) reproduit strictement le comportement
|
| 58 |
+
pré-chantier-2. Les profils ``"minimal"``, ``"philological"``,
|
| 59 |
+
``"diagnostics"``, ``"economics"``, ``"pipeline"``, ``"full"``
|
| 60 |
+
permettent à l'utilisateur de moduler le coût de calcul.
|
| 61 |
+
"""
|
| 62 |
+
import logging as _logging
|
| 63 |
+
_logger = _logging.getLogger(__name__)
|
| 64 |
+
|
| 65 |
+
# Eager-load des hooks natifs pour peupler le registre dans les
|
| 66 |
+
# sous-processus du pool (le top-level ``import`` du runner ne le fait
|
| 67 |
+
# pas pour ne pas pénaliser le démarrage des moteurs minimaux).
|
| 68 |
+
import picarones.measurements.builtin_hooks # noqa: F401
|
| 69 |
+
from picarones.core.metric_hooks import run_document_hooks
|
| 70 |
+
|
| 71 |
+
if ocr_result.success:
|
| 72 |
+
metrics = compute_metrics(ground_truth, ocr_result.text, char_exclude=char_exclude)
|
| 73 |
+
else:
|
| 74 |
+
metrics = MetricsResult(
|
| 75 |
+
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 76 |
+
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 77 |
+
reference_length=len(ground_truth),
|
| 78 |
+
hypothesis_length=0,
|
| 79 |
+
error=ocr_result.error,
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
ocr_intermediate = ocr_result.metadata.get("ocr_intermediate")
|
| 83 |
+
pipeline_meta: dict = {}
|
| 84 |
+
|
| 85 |
+
if ocr_result.metadata.get("is_pipeline"):
|
| 86 |
+
pipeline_meta = {
|
| 87 |
+
"pipeline_mode": ocr_result.metadata.get("pipeline_mode"),
|
| 88 |
+
"prompt_file": ocr_result.metadata.get("prompt_file"),
|
| 89 |
+
"llm_model": ocr_result.metadata.get("llm_model"),
|
| 90 |
+
"llm_provider": ocr_result.metadata.get("llm_provider"),
|
| 91 |
+
}
|
| 92 |
+
if ocr_intermediate is not None and ocr_result.success:
|
| 93 |
+
try:
|
| 94 |
+
from picarones.pipelines.over_normalization import detect_over_normalization
|
| 95 |
+
over_norm = detect_over_normalization(
|
| 96 |
+
ground_truth=ground_truth,
|
| 97 |
+
ocr_text=ocr_intermediate,
|
| 98 |
+
llm_text=ocr_result.text,
|
| 99 |
+
)
|
| 100 |
+
pipeline_meta["over_normalization"] = over_norm.as_dict()
|
| 101 |
+
except Exception as e:
|
| 102 |
+
_logger.warning("[over_normalization] fonctionnalité dégradée : %s", e)
|
| 103 |
+
|
| 104 |
+
# Hooks document-level — chaque hook produit un attribut nommé du
|
| 105 |
+
# ``DocumentResult``. Les hooks invalides pour ce contexte (échec
|
| 106 |
+
# OCR pour les hooks ``requires_success``, absence de
|
| 107 |
+
# ``token_confidences`` pour ``calibration``) sont sautés
|
| 108 |
+
# silencieusement. Les exceptions levées par un hook sont
|
| 109 |
+
# capturées et loggées en warning par ``run_document_hooks``.
|
| 110 |
+
extras = run_document_hooks(
|
| 111 |
+
profile,
|
| 112 |
+
ground_truth=ground_truth,
|
| 113 |
+
hypothesis=ocr_result.text,
|
| 114 |
+
image_path=image_path,
|
| 115 |
+
corpus_lang=corpus_lang,
|
| 116 |
+
ocr_result=ocr_result,
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
return DocumentResult(
|
| 120 |
+
doc_id=doc_id,
|
| 121 |
+
image_path=image_path,
|
| 122 |
+
ground_truth=ground_truth,
|
| 123 |
+
hypothesis=ocr_result.text,
|
| 124 |
+
metrics=metrics,
|
| 125 |
+
duration_seconds=ocr_result.duration_seconds,
|
| 126 |
+
engine_error=ocr_result.error,
|
| 127 |
+
ocr_intermediate=ocr_intermediate,
|
| 128 |
+
pipeline_metadata=pipeline_meta,
|
| 129 |
+
confusion_matrix=extras.get("confusion_matrix"),
|
| 130 |
+
char_scores=extras.get("char_scores"),
|
| 131 |
+
taxonomy=extras.get("taxonomy"),
|
| 132 |
+
structure=extras.get("structure"),
|
| 133 |
+
image_quality=extras.get("image_quality"),
|
| 134 |
+
line_metrics=extras.get("line_metrics"),
|
| 135 |
+
hallucination_metrics=extras.get("hallucination_metrics"),
|
| 136 |
+
calibration_metrics=extras.get("calibration_metrics"),
|
| 137 |
+
philological_metrics=extras.get("philological_metrics"),
|
| 138 |
+
searchability_metrics=extras.get("searchability_metrics"),
|
| 139 |
+
numerical_sequence_metrics=extras.get("numerical_sequence_metrics"),
|
| 140 |
+
readability_metrics=extras.get("readability_metrics"),
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def _make_timeout_doc_result(doc: object, timeout_seconds: float) -> DocumentResult:
|
| 145 |
+
"""DocumentResult synthétique pour un document ayant dépassé le timeout."""
|
| 146 |
+
err = f"timeout ({timeout_seconds:.0f}s)"
|
| 147 |
+
metrics = MetricsResult(
|
| 148 |
+
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 149 |
+
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 150 |
+
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 151 |
+
hypothesis_length=0,
|
| 152 |
+
error=err,
|
| 153 |
+
)
|
| 154 |
+
return DocumentResult(
|
| 155 |
+
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 156 |
+
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 157 |
+
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 158 |
+
hypothesis="",
|
| 159 |
+
metrics=metrics,
|
| 160 |
+
duration_seconds=timeout_seconds,
|
| 161 |
+
engine_error=err,
|
| 162 |
+
)
|
| 163 |
+
|
| 164 |
+
|
| 165 |
+
def _make_error_doc_result(doc: object, error_msg: str) -> DocumentResult:
|
| 166 |
+
"""DocumentResult synthétique pour une erreur lors d'un appel engine."""
|
| 167 |
+
metrics = MetricsResult(
|
| 168 |
+
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 169 |
+
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 170 |
+
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 171 |
+
hypothesis_length=0,
|
| 172 |
+
error=error_msg,
|
| 173 |
+
)
|
| 174 |
+
return DocumentResult(
|
| 175 |
+
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 176 |
+
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 177 |
+
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 178 |
+
hypothesis="",
|
| 179 |
+
metrics=metrics,
|
| 180 |
+
duration_seconds=0.0,
|
| 181 |
+
engine_error=error_msg,
|
| 182 |
+
)
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
__all__ = [
|
| 186 |
+
"_calibration_from_engine_result",
|
| 187 |
+
"_compute_document_result",
|
| 188 |
+
"_make_error_doc_result",
|
| 189 |
+
"_make_timeout_doc_result",
|
| 190 |
+
]
|
picarones/measurements/runner/ner_attach.py
ADDED
|
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Câblage NER au post-process du benchmark (Sprint 40).
|
| 2 |
+
|
| 3 |
+
Le runner appelle :func:`_attach_ner_metrics` après que tous les
|
| 4 |
+
documents ont été calculés, pour les moteurs où la GT possède un
|
| 5 |
+
niveau ``ENTITIES`` (Sprint 32 — multi-level GT).
|
| 6 |
+
|
| 7 |
+
L'extracteur NER est typiquement un wrapper :class:`SpacyEntityExtractor`
|
| 8 |
+
construit via :func:`picarones.measurements.ner_backends.get_extractor`.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
|
| 13 |
+
import logging
|
| 14 |
+
|
| 15 |
+
from picarones.core.corpus import Corpus
|
| 16 |
+
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def _attach_ner_metrics(
|
| 21 |
+
corpus: Corpus,
|
| 22 |
+
doc_results: list,
|
| 23 |
+
entity_extractor: callable,
|
| 24 |
+
) -> None:
|
| 25 |
+
"""Calcule et attache ``DocumentResult.ner_metrics`` pour chaque doc
|
| 26 |
+
dont la GT possède un niveau ``ENTITIES`` (Sprint 32).
|
| 27 |
+
|
| 28 |
+
L'extracteur est appelé sur l'hypothèse OCR ``dr.hypothesis``.
|
| 29 |
+
Les erreurs sont dégradées en warnings (pas de propagation) afin
|
| 30 |
+
de ne pas casser le benchmark si un document spécifique fait
|
| 31 |
+
crasher le NER.
|
| 32 |
+
"""
|
| 33 |
+
try:
|
| 34 |
+
from picarones.core.corpus import GTLevel
|
| 35 |
+
from picarones.measurements.ner import compute_ner_metrics
|
| 36 |
+
except ImportError as exc:
|
| 37 |
+
logger.warning("[ner.attach] imports indisponibles : %s", exc)
|
| 38 |
+
return
|
| 39 |
+
|
| 40 |
+
docs_by_id = {d.doc_id: d for d in corpus.documents}
|
| 41 |
+
n_done = 0
|
| 42 |
+
for dr in doc_results:
|
| 43 |
+
if dr.engine_error is not None or not dr.hypothesis:
|
| 44 |
+
continue
|
| 45 |
+
doc = docs_by_id.get(dr.doc_id)
|
| 46 |
+
if doc is None or not doc.has_gt(GTLevel.ENTITIES):
|
| 47 |
+
continue
|
| 48 |
+
try:
|
| 49 |
+
gt_payload = doc.get_gt(GTLevel.ENTITIES)
|
| 50 |
+
gt_entities = list(gt_payload.entities) if gt_payload else []
|
| 51 |
+
hyp_entities = entity_extractor(dr.hypothesis) or []
|
| 52 |
+
dr.ner_metrics = compute_ner_metrics(gt_entities, hyp_entities)
|
| 53 |
+
n_done += 1
|
| 54 |
+
except Exception as exc: # noqa: BLE001
|
| 55 |
+
logger.warning(
|
| 56 |
+
"[ner.attach] %s : extraction/comparaison NER dégradée : %s",
|
| 57 |
+
dr.doc_id, exc,
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
if n_done > 0:
|
| 61 |
+
logger.info("[ner] %d documents évalués pour NER.", n_done)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def _aggregate_ner(doc_results: list) -> "dict | None":
|
| 65 |
+
"""Agrège les métriques NER au niveau du moteur.
|
| 66 |
+
|
| 67 |
+
Recalcule precision/recall/F1 *micro* à partir des sommes globales
|
| 68 |
+
de TP/FP/FN, plus le détail par catégorie, plus les compteurs
|
| 69 |
+
totaux d'hallucinations et d'entités manquées.
|
| 70 |
+
"""
|
| 71 |
+
relevant = [dr for dr in doc_results if dr.ner_metrics is not None]
|
| 72 |
+
if not relevant:
|
| 73 |
+
return None
|
| 74 |
+
|
| 75 |
+
total_tp = 0
|
| 76 |
+
total_fp = 0
|
| 77 |
+
total_fn = 0
|
| 78 |
+
cat_tp: dict[str, int] = {}
|
| 79 |
+
cat_fp: dict[str, int] = {}
|
| 80 |
+
cat_fn: dict[str, int] = {}
|
| 81 |
+
total_hallucinated = 0
|
| 82 |
+
total_missed = 0
|
| 83 |
+
iou_threshold = 0.5
|
| 84 |
+
|
| 85 |
+
for dr in relevant:
|
| 86 |
+
m = dr.ner_metrics
|
| 87 |
+
total_tp += int(m.get("true_positives", 0))
|
| 88 |
+
total_fp += int(m.get("false_positives", 0))
|
| 89 |
+
total_fn += int(m.get("false_negatives", 0))
|
| 90 |
+
total_hallucinated += len(m.get("hallucinated_entities", []) or [])
|
| 91 |
+
total_missed += len(m.get("missed_entities", []) or [])
|
| 92 |
+
iou_threshold = float(m.get("iou_threshold", iou_threshold))
|
| 93 |
+
for cat, stats in (m.get("per_category") or {}).items():
|
| 94 |
+
cat_tp[cat] = cat_tp.get(cat, 0)
|
| 95 |
+
cat_fp[cat] = cat_fp.get(cat, 0)
|
| 96 |
+
cat_fn[cat] = cat_fn.get(cat, 0)
|
| 97 |
+
# Reconstitue les sommes par catégorie via support et P/R
|
| 98 |
+
support = int(stats.get("support", 0))
|
| 99 |
+
recall = float(stats.get("recall", 0.0))
|
| 100 |
+
precision = float(stats.get("precision", 0.0))
|
| 101 |
+
tp_cat = round(support * recall) if support > 0 else 0
|
| 102 |
+
fn_cat = max(0, support - tp_cat)
|
| 103 |
+
fp_cat = (
|
| 104 |
+
round(tp_cat * (1 - precision) / precision)
|
| 105 |
+
if precision > 0 else 0
|
| 106 |
+
)
|
| 107 |
+
cat_tp[cat] += tp_cat
|
| 108 |
+
cat_fp[cat] += fp_cat
|
| 109 |
+
cat_fn[cat] += fn_cat
|
| 110 |
+
|
| 111 |
+
def _prf(tp: int, fp: int, fn: int) -> dict[str, float]:
|
| 112 |
+
p = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 113 |
+
r = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 114 |
+
f1 = 2 * p * r / (p + r) if (p + r) > 0 else 0.0
|
| 115 |
+
return {"precision": p, "recall": r, "f1": f1, "support": tp + fn}
|
| 116 |
+
|
| 117 |
+
return {
|
| 118 |
+
"global": _prf(total_tp, total_fp, total_fn),
|
| 119 |
+
"per_category": {
|
| 120 |
+
cat: _prf(cat_tp[cat], cat_fp[cat], cat_fn[cat])
|
| 121 |
+
for cat in sorted(set(cat_tp) | set(cat_fp) | set(cat_fn))
|
| 122 |
+
},
|
| 123 |
+
"true_positives": total_tp,
|
| 124 |
+
"false_positives": total_fp,
|
| 125 |
+
"false_negatives": total_fn,
|
| 126 |
+
"hallucinated_total": total_hallucinated,
|
| 127 |
+
"missed_total": total_missed,
|
| 128 |
+
"doc_count": len(relevant),
|
| 129 |
+
"iou_threshold": iou_threshold,
|
| 130 |
+
}
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
__all__ = ["_aggregate_ner", "_attach_ner_metrics"]
|
picarones/measurements/{runner.py → runner/orchestration.py}
RENAMED
|
@@ -1,21 +1,25 @@
|
|
| 1 |
-
"""Orchestrateur du benchmark.
|
| 2 |
|
| 3 |
-
|
| 4 |
-
- ``ProcessPoolExecutor`` pour les moteurs CPU-bound (Tesseract, Pero OCR, Kraken)
|
| 5 |
-
- ``ThreadPoolExecutor`` pour les moteurs IO-bound / API (Mistral, Google, Azure, LLMs)
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
"""
|
| 11 |
|
| 12 |
from __future__ import annotations
|
| 13 |
|
| 14 |
import concurrent.futures
|
| 15 |
-
import json
|
| 16 |
import logging
|
| 17 |
-
import re
|
| 18 |
-
import tempfile
|
| 19 |
import threading
|
| 20 |
import time
|
| 21 |
from pathlib import Path
|
|
@@ -24,379 +28,28 @@ from typing import Optional
|
|
| 24 |
from tqdm import tqdm
|
| 25 |
|
| 26 |
from picarones.core.corpus import Corpus
|
| 27 |
-
from picarones.measurements.metrics import MetricsResult, compute_metrics
|
| 28 |
from picarones.core.results import BenchmarkResult, DocumentResult, EngineReport
|
| 29 |
-
from picarones.engines.base import BaseOCREngine
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
logger = logging.getLogger(__name__)
|
| 32 |
|
| 33 |
-
# Lock pour la sérialisation des écritures de résultats partiels
|
| 34 |
-
_partial_write_lock = threading.Lock()
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
# ---------------------------------------------------------------------------
|
| 38 |
-
# Workers de niveau module (requis pour ProcessPoolExecutor — picklables)
|
| 39 |
-
# ---------------------------------------------------------------------------
|
| 40 |
-
|
| 41 |
-
def _cpu_doc_worker(args: tuple) -> "DocumentResult":
|
| 42 |
-
"""Worker pour ProcessPoolExecutor (moteurs CPU-bound).
|
| 43 |
-
|
| 44 |
-
Instancie le moteur dans le sous-processus, exécute l'OCR et calcule
|
| 45 |
-
toutes les métriques. Doit être une fonction de niveau module pour être
|
| 46 |
-
sérialisable par ``pickle``.
|
| 47 |
-
|
| 48 |
-
Le tuple ``args`` peut contenir, par compatibilité ascendante :
|
| 49 |
-
- 7 éléments : legacy (Sprint 13)
|
| 50 |
-
- 8 éléments : + ``corpus_lang`` (Sprint 87)
|
| 51 |
-
- 9 éléments : + ``profile`` (chantier 2 post-Sprint 97)
|
| 52 |
-
"""
|
| 53 |
-
if len(args) == 9:
|
| 54 |
-
(engine_module, engine_class_name, engine_config, doc_id,
|
| 55 |
-
image_path, ground_truth, char_exclude_chars, corpus_lang,
|
| 56 |
-
profile) = args
|
| 57 |
-
elif len(args) == 8:
|
| 58 |
-
(engine_module, engine_class_name, engine_config, doc_id,
|
| 59 |
-
image_path, ground_truth, char_exclude_chars, corpus_lang) = args
|
| 60 |
-
profile = "standard"
|
| 61 |
-
else:
|
| 62 |
-
(engine_module, engine_class_name, engine_config, doc_id,
|
| 63 |
-
image_path, ground_truth, char_exclude_chars) = args
|
| 64 |
-
corpus_lang = "fr"
|
| 65 |
-
profile = "standard"
|
| 66 |
-
import importlib
|
| 67 |
-
mod = importlib.import_module(engine_module)
|
| 68 |
-
engine_cls = getattr(mod, engine_class_name)
|
| 69 |
-
engine = engine_cls(config=engine_config)
|
| 70 |
-
ocr_result = engine.run(image_path)
|
| 71 |
-
char_exclude = frozenset(char_exclude_chars) if char_exclude_chars else None
|
| 72 |
-
return _compute_document_result(
|
| 73 |
-
doc_id=doc_id,
|
| 74 |
-
image_path=image_path,
|
| 75 |
-
ground_truth=ground_truth,
|
| 76 |
-
ocr_result=ocr_result,
|
| 77 |
-
char_exclude=char_exclude,
|
| 78 |
-
corpus_lang=corpus_lang,
|
| 79 |
-
profile=profile,
|
| 80 |
-
)
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
def _io_doc_worker(
|
| 84 |
-
engine: BaseOCREngine,
|
| 85 |
-
doc: object,
|
| 86 |
-
char_exclude: Optional[frozenset],
|
| 87 |
-
corpus_lang: str = "fr",
|
| 88 |
-
profile: str = "standard",
|
| 89 |
-
) -> "DocumentResult":
|
| 90 |
-
"""Worker pour ThreadPoolExecutor (moteurs IO-bound / API).
|
| 91 |
-
|
| 92 |
-
Exécute l'OCR et calcule les métriques dans un thread. L'instance du
|
| 93 |
-
moteur est partagée entre les threads — les adaptateurs HTTP sont
|
| 94 |
-
généralement sans état mutable entre les appels.
|
| 95 |
-
|
| 96 |
-
Si le document possède un texte OCR pré-calculé (corpus triplet) et que
|
| 97 |
-
le moteur est un pipeline OCR+LLM, utilise ``run_with_ocr_text()`` pour
|
| 98 |
-
court-circuiter l'étape OCR et tester directement la post-correction LLM.
|
| 99 |
-
"""
|
| 100 |
-
doc_ocr_text = getattr(doc, "ocr_text", None)
|
| 101 |
-
if doc_ocr_text is not None:
|
| 102 |
-
# Corpus triplet — vérifier si le moteur supporte run_with_ocr_text
|
| 103 |
-
run_with = getattr(engine, "run_with_ocr_text", None)
|
| 104 |
-
if run_with is not None:
|
| 105 |
-
ocr_result = run_with(doc.image_path, doc_ocr_text) # type: ignore[attr-defined]
|
| 106 |
-
else:
|
| 107 |
-
# Moteur OCR classique — ignorer le texte OCR pré-calculé
|
| 108 |
-
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 109 |
-
else:
|
| 110 |
-
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 111 |
-
|
| 112 |
-
return _compute_document_result(
|
| 113 |
-
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 114 |
-
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 115 |
-
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 116 |
-
ocr_result=ocr_result,
|
| 117 |
-
char_exclude=char_exclude,
|
| 118 |
-
corpus_lang=corpus_lang,
|
| 119 |
-
profile=profile,
|
| 120 |
-
)
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
# ---------------------------------------------------------------------------
|
| 124 |
-
# Calcul documentaire centralisé
|
| 125 |
-
# ---------------------------------------------------------------------------
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
# Chantier 2 (post-Sprint 97) — la logique du helper calibration vit
|
| 129 |
-
# désormais dans :mod:`picarones.measurements.builtin_hooks`. Ce nom reste exposé
|
| 130 |
-
# ici pour la rétrocompat des tests Sprint 42 qui font
|
| 131 |
-
# ``from picarones.measurements.runner import _calibration_from_engine_result``.
|
| 132 |
-
def _calibration_from_engine_result(
|
| 133 |
-
ground_truth: str,
|
| 134 |
-
token_confidences: list,
|
| 135 |
-
) -> Optional[dict]:
|
| 136 |
-
"""Délégation vers :func:`picarones.measurements.builtin_hooks.calibration_from_engine_result`.
|
| 137 |
-
|
| 138 |
-
Conservé pour la rétrocompat des tests existants ; toute évolution
|
| 139 |
-
du calcul doit se faire dans ``builtin_hooks``.
|
| 140 |
-
"""
|
| 141 |
-
from picarones.measurements.builtin_hooks import calibration_from_engine_result
|
| 142 |
-
return calibration_from_engine_result(ground_truth, token_confidences)
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
def _compute_document_result(
|
| 148 |
-
doc_id: str,
|
| 149 |
-
image_path: str,
|
| 150 |
-
ground_truth: str,
|
| 151 |
-
ocr_result: EngineResult,
|
| 152 |
-
char_exclude: Optional[frozenset],
|
| 153 |
-
corpus_lang: str = "fr",
|
| 154 |
-
profile: str = "standard",
|
| 155 |
-
) -> DocumentResult:
|
| 156 |
-
"""Calcule toutes les métriques pour un document et retourne un DocumentResult.
|
| 157 |
-
|
| 158 |
-
Utilisable à la fois dans le processus principal (IO-bound) et dans les
|
| 159 |
-
sous-processus créés par ProcessPoolExecutor (CPU-bound).
|
| 160 |
-
Les imports lourds sont différés pour accélérer le démarrage des sous-processus.
|
| 161 |
-
|
| 162 |
-
Chantier 2 (post-Sprint 97) — refonte
|
| 163 |
-
------------------------------------
|
| 164 |
-
Les 11 ``try/except`` codés en dur (Sprints 5+10+39+42+61+86+87) sont
|
| 165 |
-
désormais centralisés dans ``picarones.measurements.builtin_hooks`` et
|
| 166 |
-
sélectionnés via ``run_document_hooks(profile)``. Le profil
|
| 167 |
-
``"standard"`` (défaut) reproduit strictement le comportement
|
| 168 |
-
pré-chantier-2. Les profils ``"minimal"``, ``"philological"``,
|
| 169 |
-
``"diagnostics"``, ``"economics"``, ``"pipeline"``, ``"full"``
|
| 170 |
-
permettent à l'utilisateur de moduler le coût de calcul.
|
| 171 |
-
"""
|
| 172 |
-
import logging as _logging
|
| 173 |
-
_logger = _logging.getLogger(__name__)
|
| 174 |
-
|
| 175 |
-
# Eager-load des hooks natifs pour peupler le registre dans les
|
| 176 |
-
# sous-processus du pool (le top-level ``import`` du runner ne le fait
|
| 177 |
-
# pas pour ne pas pénaliser le démarrage des moteurs minimaux).
|
| 178 |
-
import picarones.measurements.builtin_hooks # noqa: F401
|
| 179 |
-
from picarones.core.metric_hooks import run_document_hooks
|
| 180 |
-
|
| 181 |
-
if ocr_result.success:
|
| 182 |
-
metrics = compute_metrics(ground_truth, ocr_result.text, char_exclude=char_exclude)
|
| 183 |
-
else:
|
| 184 |
-
metrics = MetricsResult(
|
| 185 |
-
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 186 |
-
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 187 |
-
reference_length=len(ground_truth),
|
| 188 |
-
hypothesis_length=0,
|
| 189 |
-
error=ocr_result.error,
|
| 190 |
-
)
|
| 191 |
-
|
| 192 |
-
ocr_intermediate = ocr_result.metadata.get("ocr_intermediate")
|
| 193 |
-
pipeline_meta: dict = {}
|
| 194 |
-
|
| 195 |
-
if ocr_result.metadata.get("is_pipeline"):
|
| 196 |
-
pipeline_meta = {
|
| 197 |
-
"pipeline_mode": ocr_result.metadata.get("pipeline_mode"),
|
| 198 |
-
"prompt_file": ocr_result.metadata.get("prompt_file"),
|
| 199 |
-
"llm_model": ocr_result.metadata.get("llm_model"),
|
| 200 |
-
"llm_provider": ocr_result.metadata.get("llm_provider"),
|
| 201 |
-
}
|
| 202 |
-
if ocr_intermediate is not None and ocr_result.success:
|
| 203 |
-
try:
|
| 204 |
-
from picarones.pipelines.over_normalization import detect_over_normalization
|
| 205 |
-
over_norm = detect_over_normalization(
|
| 206 |
-
ground_truth=ground_truth,
|
| 207 |
-
ocr_text=ocr_intermediate,
|
| 208 |
-
llm_text=ocr_result.text,
|
| 209 |
-
)
|
| 210 |
-
pipeline_meta["over_normalization"] = over_norm.as_dict()
|
| 211 |
-
except Exception as e:
|
| 212 |
-
_logger.warning("[over_normalization] fonctionnalité dégradée : %s", e)
|
| 213 |
-
|
| 214 |
-
# Hooks document-level — chaque hook produit un attribut nommé du
|
| 215 |
-
# ``DocumentResult``. Les hooks invalides pour ce contexte (échec
|
| 216 |
-
# OCR pour les hooks ``requires_success``, absence de
|
| 217 |
-
# ``token_confidences`` pour ``calibration``) sont sautés
|
| 218 |
-
# silencieusement. Les exceptions levées par un hook sont
|
| 219 |
-
# capturées et loggées en warning par ``run_document_hooks``.
|
| 220 |
-
extras = run_document_hooks(
|
| 221 |
-
profile,
|
| 222 |
-
ground_truth=ground_truth,
|
| 223 |
-
hypothesis=ocr_result.text,
|
| 224 |
-
image_path=image_path,
|
| 225 |
-
corpus_lang=corpus_lang,
|
| 226 |
-
ocr_result=ocr_result,
|
| 227 |
-
)
|
| 228 |
-
|
| 229 |
-
return DocumentResult(
|
| 230 |
-
doc_id=doc_id,
|
| 231 |
-
image_path=image_path,
|
| 232 |
-
ground_truth=ground_truth,
|
| 233 |
-
hypothesis=ocr_result.text,
|
| 234 |
-
metrics=metrics,
|
| 235 |
-
duration_seconds=ocr_result.duration_seconds,
|
| 236 |
-
engine_error=ocr_result.error,
|
| 237 |
-
ocr_intermediate=ocr_intermediate,
|
| 238 |
-
pipeline_metadata=pipeline_meta,
|
| 239 |
-
confusion_matrix=extras.get("confusion_matrix"),
|
| 240 |
-
char_scores=extras.get("char_scores"),
|
| 241 |
-
taxonomy=extras.get("taxonomy"),
|
| 242 |
-
structure=extras.get("structure"),
|
| 243 |
-
image_quality=extras.get("image_quality"),
|
| 244 |
-
line_metrics=extras.get("line_metrics"),
|
| 245 |
-
hallucination_metrics=extras.get("hallucination_metrics"),
|
| 246 |
-
calibration_metrics=extras.get("calibration_metrics"),
|
| 247 |
-
philological_metrics=extras.get("philological_metrics"),
|
| 248 |
-
searchability_metrics=extras.get("searchability_metrics"),
|
| 249 |
-
numerical_sequence_metrics=extras.get("numerical_sequence_metrics"),
|
| 250 |
-
readability_metrics=extras.get("readability_metrics"),
|
| 251 |
-
)
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
def _make_timeout_doc_result(doc: object, timeout_seconds: float) -> DocumentResult:
|
| 255 |
-
"""DocumentResult synthétique pour un document ayant dépassé le timeout."""
|
| 256 |
-
err = f"timeout ({timeout_seconds:.0f}s)"
|
| 257 |
-
metrics = MetricsResult(
|
| 258 |
-
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 259 |
-
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 260 |
-
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 261 |
-
hypothesis_length=0,
|
| 262 |
-
error=err,
|
| 263 |
-
)
|
| 264 |
-
return DocumentResult(
|
| 265 |
-
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 266 |
-
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 267 |
-
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 268 |
-
hypothesis="",
|
| 269 |
-
metrics=metrics,
|
| 270 |
-
duration_seconds=timeout_seconds,
|
| 271 |
-
engine_error=err,
|
| 272 |
-
)
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
def _make_error_doc_result(doc: object, error_msg: str) -> DocumentResult:
|
| 276 |
-
"""DocumentResult synthétique pour un document en erreur inattendue."""
|
| 277 |
-
metrics = MetricsResult(
|
| 278 |
-
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 279 |
-
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 280 |
-
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 281 |
-
hypothesis_length=0,
|
| 282 |
-
error=error_msg,
|
| 283 |
-
)
|
| 284 |
-
return DocumentResult(
|
| 285 |
-
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 286 |
-
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 287 |
-
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 288 |
-
hypothesis="",
|
| 289 |
-
metrics=metrics,
|
| 290 |
-
duration_seconds=0.0,
|
| 291 |
-
engine_error=error_msg,
|
| 292 |
-
)
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
# ---------------------------------------------------------------------------
|
| 296 |
-
# Résultats partiels (sauvegarde / reprise)
|
| 297 |
-
# ---------------------------------------------------------------------------
|
| 298 |
-
|
| 299 |
-
def _sanitize_filename(s: str) -> str:
|
| 300 |
-
return re.sub(r"[^\w\-]", "_", s)[:64]
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
def _partial_path(
|
| 304 |
-
corpus_name: str,
|
| 305 |
-
engine_name: str,
|
| 306 |
-
partial_dir: Optional[str | Path],
|
| 307 |
-
) -> Path:
|
| 308 |
-
base = Path(partial_dir) if partial_dir else Path(tempfile.gettempdir())
|
| 309 |
-
name = (
|
| 310 |
-
f"picarones_{_sanitize_filename(corpus_name)}"
|
| 311 |
-
f"_{_sanitize_filename(engine_name)}.partial.json"
|
| 312 |
-
)
|
| 313 |
-
return base / name
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
def _load_partial(
|
| 317 |
-
corpus_name: str,
|
| 318 |
-
engine_name: str,
|
| 319 |
-
partial_dir: Optional[str | Path],
|
| 320 |
-
) -> tuple[Path, list[DocumentResult]]:
|
| 321 |
-
"""Charge les résultats partiels d'une exécution précédente interrompue.
|
| 322 |
-
|
| 323 |
-
Returns
|
| 324 |
-
-------
|
| 325 |
-
(path, results) — chemin du fichier partiel et liste des DocumentResult déjà calculés.
|
| 326 |
-
"""
|
| 327 |
-
path = _partial_path(corpus_name, engine_name, partial_dir)
|
| 328 |
-
results: list[DocumentResult] = []
|
| 329 |
-
if not path.exists():
|
| 330 |
-
return path, results
|
| 331 |
-
|
| 332 |
-
try:
|
| 333 |
-
with path.open("r", encoding="utf-8") as fh:
|
| 334 |
-
for line in fh:
|
| 335 |
-
line = line.strip()
|
| 336 |
-
if not line:
|
| 337 |
-
continue
|
| 338 |
-
d = json.loads(line)
|
| 339 |
-
m = d.get("metrics", {})
|
| 340 |
-
metrics = MetricsResult(
|
| 341 |
-
cer=m.get("cer", 1.0),
|
| 342 |
-
cer_nfc=m.get("cer_nfc", 1.0),
|
| 343 |
-
cer_caseless=m.get("cer_caseless", 1.0),
|
| 344 |
-
wer=m.get("wer", 1.0),
|
| 345 |
-
wer_normalized=m.get("wer_normalized", 1.0),
|
| 346 |
-
mer=m.get("mer", 1.0),
|
| 347 |
-
wil=m.get("wil", 1.0),
|
| 348 |
-
reference_length=m.get("reference_length", 0),
|
| 349 |
-
hypothesis_length=m.get("hypothesis_length", 0),
|
| 350 |
-
error=m.get("error"),
|
| 351 |
-
)
|
| 352 |
-
results.append(DocumentResult(
|
| 353 |
-
doc_id=d["doc_id"],
|
| 354 |
-
image_path=d.get("image_path", ""),
|
| 355 |
-
ground_truth=d.get("ground_truth", ""),
|
| 356 |
-
hypothesis=d.get("hypothesis", ""),
|
| 357 |
-
metrics=metrics,
|
| 358 |
-
duration_seconds=d.get("duration_seconds", 0.0),
|
| 359 |
-
engine_error=d.get("engine_error"),
|
| 360 |
-
ocr_intermediate=d.get("ocr_intermediate"),
|
| 361 |
-
pipeline_metadata=d.get("pipeline_metadata", {}),
|
| 362 |
-
confusion_matrix=d.get("confusion_matrix"),
|
| 363 |
-
char_scores=d.get("char_scores"),
|
| 364 |
-
taxonomy=d.get("taxonomy"),
|
| 365 |
-
structure=d.get("structure"),
|
| 366 |
-
image_quality=d.get("image_quality"),
|
| 367 |
-
line_metrics=d.get("line_metrics"),
|
| 368 |
-
hallucination_metrics=d.get("hallucination_metrics"),
|
| 369 |
-
))
|
| 370 |
-
except Exception as e:
|
| 371 |
-
logger.warning("Impossible de charger les résultats partiels '%s' : %s", path, e)
|
| 372 |
-
results = []
|
| 373 |
-
|
| 374 |
-
return path, results
|
| 375 |
-
|
| 376 |
-
|
| 377 |
-
def _save_partial_line(partial_path: Path, doc_result: DocumentResult) -> None:
|
| 378 |
-
"""Ajoute une entrée NDJSON au fichier de résultats partiels (thread-safe)."""
|
| 379 |
-
try:
|
| 380 |
-
line = json.dumps(doc_result.as_dict(), ensure_ascii=False) + "\n"
|
| 381 |
-
with _partial_write_lock:
|
| 382 |
-
with partial_path.open("a", encoding="utf-8") as fh:
|
| 383 |
-
fh.write(line)
|
| 384 |
-
except Exception as e:
|
| 385 |
-
logger.warning("Impossible d'écrire dans le fichier partiel '%s' : %s", partial_path, e)
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
def _delete_partial(partial_path: Path) -> None:
|
| 389 |
-
"""Supprime le fichier de résultats partiels à la fin d'un moteur."""
|
| 390 |
-
try:
|
| 391 |
-
if partial_path.exists():
|
| 392 |
-
partial_path.unlink()
|
| 393 |
-
except Exception as e:
|
| 394 |
-
logger.warning("Impossible de supprimer le fichier partiel '%s' : %s", partial_path, e)
|
| 395 |
-
|
| 396 |
-
|
| 397 |
-
# ---------------------------------------------------------------------------
|
| 398 |
-
# Benchmark principal
|
| 399 |
-
# ---------------------------------------------------------------------------
|
| 400 |
|
| 401 |
def run_benchmark(
|
| 402 |
corpus: Corpus,
|
|
@@ -838,182 +491,4 @@ def _build_pipeline_info(engine: BaseOCREngine, doc_results: list[DocumentResult
|
|
| 838 |
return info
|
| 839 |
|
| 840 |
|
| 841 |
-
|
| 842 |
-
# Helpers d'agrégation — délégations rétrocompat
|
| 843 |
-
# ---------------------------------------------------------------------------
|
| 844 |
-
# Chantier 2 (post-Sprint 97) : les implémentations vivent désormais dans
|
| 845 |
-
# :mod:`picarones.measurements.builtin_hooks` (single source of truth, exposé via
|
| 846 |
-
# le registre :mod:`picarones.core.metric_hooks`). Les noms ci-dessous
|
| 847 |
-
# restent disponibles depuis ``picarones.measurements.runner`` pour la rétrocompat
|
| 848 |
-
# des tests Sprint 13 / 42 qui les importent directement.
|
| 849 |
-
|
| 850 |
-
def _aggregate_confusion(doc_results: list) -> Optional[dict]:
|
| 851 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_confusion`."""
|
| 852 |
-
from picarones.measurements.builtin_hooks import _aggregate_confusion as _impl
|
| 853 |
-
return _impl(doc_results)
|
| 854 |
-
|
| 855 |
-
|
| 856 |
-
def _aggregate_char_scores(doc_results: list) -> Optional[dict]:
|
| 857 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_char_scores`."""
|
| 858 |
-
from picarones.measurements.builtin_hooks import _aggregate_char_scores as _impl
|
| 859 |
-
return _impl(doc_results)
|
| 860 |
-
|
| 861 |
-
|
| 862 |
-
def _aggregate_taxonomy(doc_results: list) -> Optional[dict]:
|
| 863 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_taxonomy`."""
|
| 864 |
-
from picarones.measurements.builtin_hooks import _aggregate_taxonomy as _impl
|
| 865 |
-
return _impl(doc_results)
|
| 866 |
-
|
| 867 |
-
|
| 868 |
-
def _aggregate_structure(doc_results: list) -> Optional[dict]:
|
| 869 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_structure`."""
|
| 870 |
-
from picarones.measurements.builtin_hooks import _aggregate_structure as _impl
|
| 871 |
-
return _impl(doc_results)
|
| 872 |
-
|
| 873 |
-
|
| 874 |
-
def _aggregate_image_quality(doc_results: list) -> Optional[dict]:
|
| 875 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_image_quality`."""
|
| 876 |
-
from picarones.measurements.builtin_hooks import _aggregate_image_quality as _impl
|
| 877 |
-
return _impl(doc_results)
|
| 878 |
-
|
| 879 |
-
|
| 880 |
-
def _aggregate_line_metrics(doc_results: list) -> Optional[dict]:
|
| 881 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_line_metrics`."""
|
| 882 |
-
from picarones.measurements.builtin_hooks import _aggregate_line_metrics as _impl
|
| 883 |
-
return _impl(doc_results)
|
| 884 |
-
|
| 885 |
-
|
| 886 |
-
def _aggregate_hallucination(doc_results: list) -> Optional[dict]:
|
| 887 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_hallucination`."""
|
| 888 |
-
from picarones.measurements.builtin_hooks import _aggregate_hallucination as _impl
|
| 889 |
-
return _impl(doc_results)
|
| 890 |
-
|
| 891 |
-
|
| 892 |
-
# ────────────────────────────────────────��─────────────────────────────────
|
| 893 |
-
# Sprint 40 — extraction NER au post-process et agrégation
|
| 894 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 895 |
-
|
| 896 |
-
|
| 897 |
-
def _attach_ner_metrics(
|
| 898 |
-
corpus: Corpus,
|
| 899 |
-
doc_results: list,
|
| 900 |
-
entity_extractor: callable,
|
| 901 |
-
) -> None:
|
| 902 |
-
"""Calcule et attache ``DocumentResult.ner_metrics`` pour chaque doc
|
| 903 |
-
dont la GT possède un niveau ``ENTITIES`` (Sprint 32).
|
| 904 |
-
|
| 905 |
-
L'extracteur est appelé sur l'hypothèse OCR ``dr.hypothesis``.
|
| 906 |
-
Les erreurs sont dégradées en warnings (pas de propagation) afin
|
| 907 |
-
de ne pas casser le benchmark si un document spécifique fait
|
| 908 |
-
crasher le NER.
|
| 909 |
-
"""
|
| 910 |
-
try:
|
| 911 |
-
from picarones.core.corpus import GTLevel
|
| 912 |
-
from picarones.measurements.ner import compute_ner_metrics
|
| 913 |
-
except ImportError as exc:
|
| 914 |
-
logger.warning("[ner.attach] imports indisponibles : %s", exc)
|
| 915 |
-
return
|
| 916 |
-
|
| 917 |
-
docs_by_id = {d.doc_id: d for d in corpus.documents}
|
| 918 |
-
n_done = 0
|
| 919 |
-
for dr in doc_results:
|
| 920 |
-
if dr.engine_error is not None or not dr.hypothesis:
|
| 921 |
-
continue
|
| 922 |
-
doc = docs_by_id.get(dr.doc_id)
|
| 923 |
-
if doc is None or not doc.has_gt(GTLevel.ENTITIES):
|
| 924 |
-
continue
|
| 925 |
-
try:
|
| 926 |
-
gt_payload = doc.get_gt(GTLevel.ENTITIES)
|
| 927 |
-
gt_entities = list(gt_payload.entities) if gt_payload else []
|
| 928 |
-
hyp_entities = entity_extractor(dr.hypothesis) or []
|
| 929 |
-
dr.ner_metrics = compute_ner_metrics(gt_entities, hyp_entities)
|
| 930 |
-
n_done += 1
|
| 931 |
-
except Exception as exc: # noqa: BLE001
|
| 932 |
-
logger.warning(
|
| 933 |
-
"[ner.attach] %s : extraction/comparaison NER dégradée : %s",
|
| 934 |
-
dr.doc_id, exc,
|
| 935 |
-
)
|
| 936 |
-
|
| 937 |
-
if n_done > 0:
|
| 938 |
-
logger.info("[ner] %d documents évalués pour NER.", n_done)
|
| 939 |
-
|
| 940 |
-
|
| 941 |
-
def _aggregate_calibration(doc_results: list) -> Optional[dict]:
|
| 942 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_calibration`.
|
| 943 |
-
|
| 944 |
-
Conservé pour la rétrocompat du test ``test_sprint42_calibration_runner``
|
| 945 |
-
qui importe directement depuis ``picarones.measurements.runner``. La logique
|
| 946 |
-
réelle vit dans :mod:`picarones.measurements.builtin_hooks` (chantier 2
|
| 947 |
-
post-Sprint 97).
|
| 948 |
-
"""
|
| 949 |
-
from picarones.measurements.builtin_hooks import _aggregate_calibration as _impl
|
| 950 |
-
return _impl(doc_results)
|
| 951 |
-
|
| 952 |
-
|
| 953 |
-
def _aggregate_ner(doc_results: list) -> Optional[dict]:
|
| 954 |
-
"""Agrège les métriques NER au niveau du moteur.
|
| 955 |
-
|
| 956 |
-
Recalcule precision/recall/F1 *micro* à partir des sommes globales
|
| 957 |
-
de TP/FP/FN, plus le détail par catégorie, plus les compteurs
|
| 958 |
-
totaux d'hallucinations et d'entités manquées.
|
| 959 |
-
"""
|
| 960 |
-
relevant = [dr for dr in doc_results if dr.ner_metrics is not None]
|
| 961 |
-
if not relevant:
|
| 962 |
-
return None
|
| 963 |
-
|
| 964 |
-
total_tp = 0
|
| 965 |
-
total_fp = 0
|
| 966 |
-
total_fn = 0
|
| 967 |
-
cat_tp: dict[str, int] = {}
|
| 968 |
-
cat_fp: dict[str, int] = {}
|
| 969 |
-
cat_fn: dict[str, int] = {}
|
| 970 |
-
total_hallucinated = 0
|
| 971 |
-
total_missed = 0
|
| 972 |
-
iou_threshold = 0.5
|
| 973 |
-
|
| 974 |
-
for dr in relevant:
|
| 975 |
-
m = dr.ner_metrics
|
| 976 |
-
total_tp += int(m.get("true_positives", 0))
|
| 977 |
-
total_fp += int(m.get("false_positives", 0))
|
| 978 |
-
total_fn += int(m.get("false_negatives", 0))
|
| 979 |
-
total_hallucinated += len(m.get("hallucinated_entities", []) or [])
|
| 980 |
-
total_missed += len(m.get("missed_entities", []) or [])
|
| 981 |
-
iou_threshold = float(m.get("iou_threshold", iou_threshold))
|
| 982 |
-
for cat, stats in (m.get("per_category") or {}).items():
|
| 983 |
-
cat_tp[cat] = cat_tp.get(cat, 0)
|
| 984 |
-
cat_fp[cat] = cat_fp.get(cat, 0)
|
| 985 |
-
cat_fn[cat] = cat_fn.get(cat, 0)
|
| 986 |
-
# Reconstitue les sommes par catégorie via support et P/R
|
| 987 |
-
support = int(stats.get("support", 0))
|
| 988 |
-
recall = float(stats.get("recall", 0.0))
|
| 989 |
-
precision = float(stats.get("precision", 0.0))
|
| 990 |
-
tp_cat = round(support * recall) if support > 0 else 0
|
| 991 |
-
fn_cat = max(0, support - tp_cat)
|
| 992 |
-
fp_cat = (
|
| 993 |
-
round(tp_cat * (1 - precision) / precision)
|
| 994 |
-
if precision > 0 else 0
|
| 995 |
-
)
|
| 996 |
-
cat_tp[cat] += tp_cat
|
| 997 |
-
cat_fp[cat] += fp_cat
|
| 998 |
-
cat_fn[cat] += fn_cat
|
| 999 |
-
|
| 1000 |
-
def _prf(tp: int, fp: int, fn: int) -> dict[str, float]:
|
| 1001 |
-
p = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 1002 |
-
r = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 1003 |
-
f1 = 2 * p * r / (p + r) if (p + r) > 0 else 0.0
|
| 1004 |
-
return {"precision": p, "recall": r, "f1": f1, "support": tp + fn}
|
| 1005 |
-
|
| 1006 |
-
return {
|
| 1007 |
-
"global": _prf(total_tp, total_fp, total_fn),
|
| 1008 |
-
"per_category": {
|
| 1009 |
-
cat: _prf(cat_tp[cat], cat_fp[cat], cat_fn[cat])
|
| 1010 |
-
for cat in sorted(set(cat_tp) | set(cat_fp) | set(cat_fn))
|
| 1011 |
-
},
|
| 1012 |
-
"true_positives": total_tp,
|
| 1013 |
-
"false_positives": total_fp,
|
| 1014 |
-
"false_negatives": total_fn,
|
| 1015 |
-
"hallucinated_total": total_hallucinated,
|
| 1016 |
-
"missed_total": total_missed,
|
| 1017 |
-
"doc_count": len(relevant),
|
| 1018 |
-
"iou_threshold": iou_threshold,
|
| 1019 |
-
}
|
|
|
|
| 1 |
+
"""Orchestrateur principal du benchmark.
|
| 2 |
|
| 3 |
+
Contient :func:`run_benchmark` et son helper :func:`_build_pipeline_info`.
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
Le runner exécute chaque moteur de la liste sur le corpus complet :
|
| 6 |
+
|
| 7 |
+
- Pour les moteurs CPU-bound (``execution_mode == "cpu"`` :
|
| 8 |
+
Tesseract, Pero OCR, Kraken), utilise un ``ProcessPoolExecutor``
|
| 9 |
+
et délègue aux workers picklables de :mod:`workers`.
|
| 10 |
+
- Pour les moteurs IO-bound (Mistral, Google Vision, Azure, LLMs),
|
| 11 |
+
utilise un ``ThreadPoolExecutor``.
|
| 12 |
+
|
| 13 |
+
Les résultats partiels (NDJSON par moteur) sont gérés par
|
| 14 |
+
:mod:`partial` ; le calcul d'un :class:`DocumentResult` individuel
|
| 15 |
+
par :mod:`document` ; l'agrégation finale par les hooks délégués à
|
| 16 |
+
:mod:`builtin_hooks` (chantier 2 post-Sprint 97).
|
| 17 |
"""
|
| 18 |
|
| 19 |
from __future__ import annotations
|
| 20 |
|
| 21 |
import concurrent.futures
|
|
|
|
| 22 |
import logging
|
|
|
|
|
|
|
| 23 |
import threading
|
| 24 |
import time
|
| 25 |
from pathlib import Path
|
|
|
|
| 28 |
from tqdm import tqdm
|
| 29 |
|
| 30 |
from picarones.core.corpus import Corpus
|
|
|
|
| 31 |
from picarones.core.results import BenchmarkResult, DocumentResult, EngineReport
|
| 32 |
+
from picarones.engines.base import BaseOCREngine
|
| 33 |
+
from picarones.measurements.runner.document import (
|
| 34 |
+
_make_error_doc_result,
|
| 35 |
+
_make_timeout_doc_result,
|
| 36 |
+
)
|
| 37 |
+
from picarones.measurements.runner.ner_attach import (
|
| 38 |
+
_aggregate_ner,
|
| 39 |
+
_attach_ner_metrics,
|
| 40 |
+
)
|
| 41 |
+
from picarones.measurements.runner.partial import (
|
| 42 |
+
_delete_partial,
|
| 43 |
+
_load_partial,
|
| 44 |
+
_save_partial_line,
|
| 45 |
+
)
|
| 46 |
+
from picarones.measurements.runner.workers import (
|
| 47 |
+
_cpu_doc_worker,
|
| 48 |
+
_io_doc_worker,
|
| 49 |
+
)
|
| 50 |
|
| 51 |
logger = logging.getLogger(__name__)
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
def run_benchmark(
|
| 55 |
corpus: Corpus,
|
|
|
|
| 491 |
return info
|
| 492 |
|
| 493 |
|
| 494 |
+
__all__ = ["_build_pipeline_info", "run_benchmark"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
picarones/measurements/runner/partial.py
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Persistance des résultats partiels du benchmark (NDJSON).
|
| 2 |
+
|
| 3 |
+
Quand le runner traite un corpus, il écrit chaque ``DocumentResult``
|
| 4 |
+
dans un fichier ``{partial_dir}/picarones_{corpus}_{engine}.partial.json``
|
| 5 |
+
au format NDJSON. Si le benchmark est interrompu (Ctrl+C, crash, kill),
|
| 6 |
+
la prochaine exécution reprend depuis ce fichier sans perdre le travail
|
| 7 |
+
déjà fait.
|
| 8 |
+
|
| 9 |
+
Thread-safe : le module utilise un :class:`threading.Lock` partagé
|
| 10 |
+
entre toutes les écritures pour sérialiser les appends.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
from __future__ import annotations
|
| 14 |
+
|
| 15 |
+
import json
|
| 16 |
+
import logging
|
| 17 |
+
import re
|
| 18 |
+
import tempfile
|
| 19 |
+
import threading
|
| 20 |
+
from pathlib import Path
|
| 21 |
+
from typing import Optional
|
| 22 |
+
|
| 23 |
+
from picarones.core.results import DocumentResult
|
| 24 |
+
from picarones.measurements.metrics import MetricsResult
|
| 25 |
+
|
| 26 |
+
logger = logging.getLogger(__name__)
|
| 27 |
+
|
| 28 |
+
# Lock pour la sérialisation des écritures de résultats partiels.
|
| 29 |
+
# Partagé entre tous les call sites (workers IO et CPU se relayent
|
| 30 |
+
# sur la même file).
|
| 31 |
+
_partial_write_lock = threading.Lock()
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def _sanitize_filename(s: str) -> str:
|
| 35 |
+
return re.sub(r"[^\w\-]", "_", s)[:64]
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def _partial_path(
|
| 39 |
+
corpus_name: str,
|
| 40 |
+
engine_name: str,
|
| 41 |
+
partial_dir: Optional[str | Path],
|
| 42 |
+
) -> Path:
|
| 43 |
+
base = Path(partial_dir) if partial_dir else Path(tempfile.gettempdir())
|
| 44 |
+
name = (
|
| 45 |
+
f"picarones_{_sanitize_filename(corpus_name)}"
|
| 46 |
+
f"_{_sanitize_filename(engine_name)}.partial.json"
|
| 47 |
+
)
|
| 48 |
+
return base / name
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def _load_partial(
|
| 52 |
+
corpus_name: str,
|
| 53 |
+
engine_name: str,
|
| 54 |
+
partial_dir: Optional[str | Path],
|
| 55 |
+
) -> tuple[Path, list[DocumentResult]]:
|
| 56 |
+
"""Charge les résultats partiels d'une exécution précédente interrompue.
|
| 57 |
+
|
| 58 |
+
Returns
|
| 59 |
+
-------
|
| 60 |
+
(path, results) — chemin du fichier partiel et liste des
|
| 61 |
+
DocumentResult déjà calculés.
|
| 62 |
+
"""
|
| 63 |
+
path = _partial_path(corpus_name, engine_name, partial_dir)
|
| 64 |
+
results: list[DocumentResult] = []
|
| 65 |
+
if not path.exists():
|
| 66 |
+
return path, results
|
| 67 |
+
|
| 68 |
+
try:
|
| 69 |
+
with path.open("r", encoding="utf-8") as fh:
|
| 70 |
+
for line in fh:
|
| 71 |
+
line = line.strip()
|
| 72 |
+
if not line:
|
| 73 |
+
continue
|
| 74 |
+
d = json.loads(line)
|
| 75 |
+
m = d.get("metrics", {})
|
| 76 |
+
metrics = MetricsResult(
|
| 77 |
+
cer=m.get("cer", 1.0),
|
| 78 |
+
cer_nfc=m.get("cer_nfc", 1.0),
|
| 79 |
+
cer_caseless=m.get("cer_caseless", 1.0),
|
| 80 |
+
wer=m.get("wer", 1.0),
|
| 81 |
+
wer_normalized=m.get("wer_normalized", 1.0),
|
| 82 |
+
mer=m.get("mer", 1.0),
|
| 83 |
+
wil=m.get("wil", 1.0),
|
| 84 |
+
reference_length=m.get("reference_length", 0),
|
| 85 |
+
hypothesis_length=m.get("hypothesis_length", 0),
|
| 86 |
+
error=m.get("error"),
|
| 87 |
+
)
|
| 88 |
+
results.append(DocumentResult(
|
| 89 |
+
doc_id=d["doc_id"],
|
| 90 |
+
image_path=d.get("image_path", ""),
|
| 91 |
+
ground_truth=d.get("ground_truth", ""),
|
| 92 |
+
hypothesis=d.get("hypothesis", ""),
|
| 93 |
+
metrics=metrics,
|
| 94 |
+
duration_seconds=d.get("duration_seconds", 0.0),
|
| 95 |
+
engine_error=d.get("engine_error"),
|
| 96 |
+
ocr_intermediate=d.get("ocr_intermediate"),
|
| 97 |
+
pipeline_metadata=d.get("pipeline_metadata", {}),
|
| 98 |
+
confusion_matrix=d.get("confusion_matrix"),
|
| 99 |
+
char_scores=d.get("char_scores"),
|
| 100 |
+
taxonomy=d.get("taxonomy"),
|
| 101 |
+
structure=d.get("structure"),
|
| 102 |
+
image_quality=d.get("image_quality"),
|
| 103 |
+
line_metrics=d.get("line_metrics"),
|
| 104 |
+
hallucination_metrics=d.get("hallucination_metrics"),
|
| 105 |
+
))
|
| 106 |
+
except Exception as e:
|
| 107 |
+
logger.warning("Impossible de charger les résultats partiels '%s' : %s", path, e)
|
| 108 |
+
results = []
|
| 109 |
+
|
| 110 |
+
return path, results
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def _save_partial_line(partial_path: Path, doc_result: DocumentResult) -> None:
|
| 114 |
+
"""Ajoute une entrée NDJSON au fichier de résultats partiels (thread-safe)."""
|
| 115 |
+
try:
|
| 116 |
+
line = json.dumps(doc_result.as_dict(), ensure_ascii=False) + "\n"
|
| 117 |
+
with _partial_write_lock:
|
| 118 |
+
with partial_path.open("a", encoding="utf-8") as fh:
|
| 119 |
+
fh.write(line)
|
| 120 |
+
except Exception as e:
|
| 121 |
+
logger.warning("Impossible d'écrire dans le fichier partiel '%s' : %s", partial_path, e)
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def _delete_partial(partial_path: Path) -> None:
|
| 125 |
+
"""Supprime le fichier de résultats partiels à la fin d'un moteur."""
|
| 126 |
+
try:
|
| 127 |
+
if partial_path.exists():
|
| 128 |
+
partial_path.unlink()
|
| 129 |
+
except Exception as e:
|
| 130 |
+
logger.warning("Impossible de supprimer le fichier partiel '%s' : %s", partial_path, e)
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
__all__ = [
|
| 134 |
+
"_delete_partial",
|
| 135 |
+
"_load_partial",
|
| 136 |
+
"_partial_path",
|
| 137 |
+
"_partial_write_lock",
|
| 138 |
+
"_sanitize_filename",
|
| 139 |
+
"_save_partial_line",
|
| 140 |
+
]
|
picarones/measurements/runner/workers.py
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Workers de niveau module pour les pools d'exécution.
|
| 2 |
+
|
| 3 |
+
Deux workers correspondant aux deux modes d'exécution :
|
| 4 |
+
|
| 5 |
+
- :func:`_cpu_doc_worker` — pour ``ProcessPoolExecutor`` (moteurs
|
| 6 |
+
CPU-bound, instanciés dans le sous-processus). Doit être picklable :
|
| 7 |
+
c'est pour ça qu'il est défini au niveau module.
|
| 8 |
+
- :func:`_io_doc_worker` — pour ``ThreadPoolExecutor`` (moteurs
|
| 9 |
+
IO-bound / API HTTP). L'instance du moteur est partagée entre les
|
| 10 |
+
threads.
|
| 11 |
+
|
| 12 |
+
Les deux finissent par appeler :func:`_compute_document_result` du
|
| 13 |
+
sous-module :mod:`document` pour calculer toutes les métriques.
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
from __future__ import annotations
|
| 17 |
+
|
| 18 |
+
from typing import Optional
|
| 19 |
+
|
| 20 |
+
from picarones.core.results import DocumentResult
|
| 21 |
+
from picarones.engines.base import BaseOCREngine
|
| 22 |
+
from picarones.measurements.runner.document import _compute_document_result
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def _cpu_doc_worker(args: tuple) -> "DocumentResult":
|
| 26 |
+
"""Worker pour ProcessPoolExecutor (moteurs CPU-bound).
|
| 27 |
+
|
| 28 |
+
Instancie le moteur dans le sous-processus, exécute l'OCR et calcule
|
| 29 |
+
toutes les métriques. Doit être une fonction de niveau module pour être
|
| 30 |
+
sérialisable par ``pickle``.
|
| 31 |
+
|
| 32 |
+
Le tuple ``args`` peut contenir, par compatibilité ascendante :
|
| 33 |
+
- 7 éléments : legacy (Sprint 13)
|
| 34 |
+
- 8 éléments : + ``corpus_lang`` (Sprint 87)
|
| 35 |
+
- 9 éléments : + ``profile`` (chantier 2 post-Sprint 97)
|
| 36 |
+
"""
|
| 37 |
+
if len(args) == 9:
|
| 38 |
+
(engine_module, engine_class_name, engine_config, doc_id,
|
| 39 |
+
image_path, ground_truth, char_exclude_chars, corpus_lang,
|
| 40 |
+
profile) = args
|
| 41 |
+
elif len(args) == 8:
|
| 42 |
+
(engine_module, engine_class_name, engine_config, doc_id,
|
| 43 |
+
image_path, ground_truth, char_exclude_chars, corpus_lang) = args
|
| 44 |
+
profile = "standard"
|
| 45 |
+
else:
|
| 46 |
+
(engine_module, engine_class_name, engine_config, doc_id,
|
| 47 |
+
image_path, ground_truth, char_exclude_chars) = args
|
| 48 |
+
corpus_lang = "fr"
|
| 49 |
+
profile = "standard"
|
| 50 |
+
import importlib
|
| 51 |
+
mod = importlib.import_module(engine_module)
|
| 52 |
+
engine_cls = getattr(mod, engine_class_name)
|
| 53 |
+
engine = engine_cls(config=engine_config)
|
| 54 |
+
ocr_result = engine.run(image_path)
|
| 55 |
+
char_exclude = frozenset(char_exclude_chars) if char_exclude_chars else None
|
| 56 |
+
return _compute_document_result(
|
| 57 |
+
doc_id=doc_id,
|
| 58 |
+
image_path=image_path,
|
| 59 |
+
ground_truth=ground_truth,
|
| 60 |
+
ocr_result=ocr_result,
|
| 61 |
+
char_exclude=char_exclude,
|
| 62 |
+
corpus_lang=corpus_lang,
|
| 63 |
+
profile=profile,
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def _io_doc_worker(
|
| 68 |
+
engine: BaseOCREngine,
|
| 69 |
+
doc: object,
|
| 70 |
+
char_exclude: Optional[frozenset],
|
| 71 |
+
corpus_lang: str = "fr",
|
| 72 |
+
profile: str = "standard",
|
| 73 |
+
) -> "DocumentResult":
|
| 74 |
+
"""Worker pour ThreadPoolExecutor (moteurs IO-bound / API).
|
| 75 |
+
|
| 76 |
+
Exécute l'OCR et calcule les métriques dans un thread. L'instance du
|
| 77 |
+
moteur est partagée entre les threads — les adaptateurs HTTP sont
|
| 78 |
+
généralement sans état mutable entre les appels.
|
| 79 |
+
|
| 80 |
+
Si le document possède un texte OCR pré-calculé (corpus triplet) et que
|
| 81 |
+
le moteur est un pipeline OCR+LLM, utilise ``run_with_ocr_text()`` pour
|
| 82 |
+
court-circuiter l'étape OCR et tester directement la post-correction LLM.
|
| 83 |
+
"""
|
| 84 |
+
doc_ocr_text = getattr(doc, "ocr_text", None)
|
| 85 |
+
if doc_ocr_text is not None:
|
| 86 |
+
# Corpus triplet — vérifier si le moteur supporte run_with_ocr_text
|
| 87 |
+
run_with = getattr(engine, "run_with_ocr_text", None)
|
| 88 |
+
if run_with is not None:
|
| 89 |
+
ocr_result = run_with(doc.image_path, doc_ocr_text) # type: ignore[attr-defined]
|
| 90 |
+
else:
|
| 91 |
+
# Moteur OCR classique — ignorer le texte OCR pré-calculé
|
| 92 |
+
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 93 |
+
else:
|
| 94 |
+
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 95 |
+
|
| 96 |
+
return _compute_document_result(
|
| 97 |
+
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 98 |
+
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 99 |
+
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 100 |
+
ocr_result=ocr_result,
|
| 101 |
+
char_exclude=char_exclude,
|
| 102 |
+
corpus_lang=corpus_lang,
|
| 103 |
+
profile=profile,
|
| 104 |
+
)
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
__all__ = ["_cpu_doc_worker", "_io_doc_worker"]
|
picarones/measurements/statistics.py
DELETED
|
@@ -1,1128 +0,0 @@
|
|
| 1 |
-
"""Tests statistiques et clustering d'erreurs pour Picarones.
|
| 2 |
-
|
| 3 |
-
Fonctions fournies
|
| 4 |
-
------------------
|
| 5 |
-
- wilcoxon_test(a, b) : Wilcoxon signé-rangé (2 moteurs appariés)
|
| 6 |
-
- bootstrap_ci(values, ...) : intervalle de confiance à 95 % par bootstrap
|
| 7 |
-
- compute_pairwise_stats(...) : matrice de Wilcoxon entre toutes les paires
|
| 8 |
-
- friedman_test(engine_cer_map) : Friedman (k moteurs, n documents) [Sprint 17]
|
| 9 |
-
- nemenyi_posthoc(engine_cer_map) : post-hoc Nemenyi avec critical distance [Sprint 17]
|
| 10 |
-
- build_critical_difference_svg(...) : rendu SVG du CDD (Demšar 2006) [Sprint 17]
|
| 11 |
-
- compute_pareto_front(points, ...) : frontière de Pareto multi-objectifs [Sprint 19]
|
| 12 |
-
- cluster_errors(...) : regroupement des patterns d'erreurs
|
| 13 |
-
- compute_correlation_matrix(...) : matrice de corrélation des métriques
|
| 14 |
-
- compute_reliability_curve(...) : courbe CER vs. % docs les plus faciles
|
| 15 |
-
- compute_venn_data(...) : diagramme de Venn 2/3 moteurs
|
| 16 |
-
"""
|
| 17 |
-
|
| 18 |
-
from __future__ import annotations
|
| 19 |
-
|
| 20 |
-
import math
|
| 21 |
-
import random
|
| 22 |
-
import re
|
| 23 |
-
from collections import defaultdict
|
| 24 |
-
from dataclasses import dataclass
|
| 25 |
-
from typing import Optional
|
| 26 |
-
|
| 27 |
-
# Import optionnel de scipy — utilisé pour le test de Wilcoxon si disponible
|
| 28 |
-
# (méthode exacte pour n ≤ 25, approximation normale pour n > 25).
|
| 29 |
-
# En son absence, l'implémentation native (approximation normale pour n ≥ 10)
|
| 30 |
-
# est utilisée automatiquement.
|
| 31 |
-
try:
|
| 32 |
-
from scipy.stats import wilcoxon as _scipy_wilcoxon # type: ignore[import-untyped]
|
| 33 |
-
_SCIPY_AVAILABLE = True
|
| 34 |
-
except ImportError:
|
| 35 |
-
_SCIPY_AVAILABLE = False
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
# ---------------------------------------------------------------------------
|
| 39 |
-
# Bootstrap CI
|
| 40 |
-
# ---------------------------------------------------------------------------
|
| 41 |
-
|
| 42 |
-
def bootstrap_ci(
|
| 43 |
-
values: list[float],
|
| 44 |
-
n_iter: int = 1000,
|
| 45 |
-
ci: float = 0.95,
|
| 46 |
-
seed: int = 42,
|
| 47 |
-
) -> tuple[float, float]:
|
| 48 |
-
"""Intervalle de confiance par bootstrap.
|
| 49 |
-
|
| 50 |
-
Parameters
|
| 51 |
-
----------
|
| 52 |
-
values : liste des valeurs (ex. CER par document)
|
| 53 |
-
n_iter : nombre d'itérations bootstrap (défaut 1000)
|
| 54 |
-
ci : niveau de confiance (défaut 0.95 → 95 %)
|
| 55 |
-
seed : graine RNG pour reproductibilité
|
| 56 |
-
|
| 57 |
-
Returns
|
| 58 |
-
-------
|
| 59 |
-
(lower, upper) — les bornes de l'IC à ``ci`` %
|
| 60 |
-
"""
|
| 61 |
-
if not values:
|
| 62 |
-
return (0.0, 0.0)
|
| 63 |
-
rng = random.Random(seed)
|
| 64 |
-
n = len(values)
|
| 65 |
-
means = []
|
| 66 |
-
for _ in range(n_iter):
|
| 67 |
-
sample = [values[rng.randint(0, n - 1)] for _ in range(n)]
|
| 68 |
-
means.append(sum(sample) / n)
|
| 69 |
-
means.sort()
|
| 70 |
-
alpha = (1.0 - ci) / 2.0
|
| 71 |
-
lo_idx = max(0, int(alpha * n_iter))
|
| 72 |
-
hi_idx = min(n_iter - 1, int((1.0 - alpha) * n_iter))
|
| 73 |
-
return (means[lo_idx], means[hi_idx])
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
# ---------------------------------------------------------------------------
|
| 77 |
-
# Test de Wilcoxon signé-rangé (implémentation pure Python)
|
| 78 |
-
# ---------------------------------------------------------------------------
|
| 79 |
-
|
| 80 |
-
def wilcoxon_test(
|
| 81 |
-
a: list[float],
|
| 82 |
-
b: list[float],
|
| 83 |
-
zero_method: str = "wilcox",
|
| 84 |
-
) -> dict:
|
| 85 |
-
"""Test de Wilcoxon signé-rangé entre deux séries de CER appariées.
|
| 86 |
-
|
| 87 |
-
Retourne un dict avec :
|
| 88 |
-
- statistic : W = min(W⁺, W⁻)
|
| 89 |
-
- p_value : p-value bilatérale
|
| 90 |
-
- significant : bool (p < 0.05)
|
| 91 |
-
- interpretation : phrase lisible
|
| 92 |
-
- n_pairs : nombre de paires utilisées (après retrait des zéros)
|
| 93 |
-
- W_plus : somme des rangs des différences positives
|
| 94 |
-
- W_minus : somme des rangs des différences négatives
|
| 95 |
-
|
| 96 |
-
Hypothèses et limites
|
| 97 |
-
---------------------
|
| 98 |
-
* Les observations sont appariées (même corpus, deux moteurs différents).
|
| 99 |
-
* Le test est non-paramétrique : aucune hypothèse de normalité des CER.
|
| 100 |
-
* ``zero_method="wilcox"`` (défaut) : les paires sans différence (aᵢ = bᵢ)
|
| 101 |
-
sont simplement exclues. Les autres méthodes (``"pratt"``, ``"zsplit"``)
|
| 102 |
-
nécessitent scipy.
|
| 103 |
-
* **Approximation normale** (implémentation native, n ≥ 10) :
|
| 104 |
-
L'approximation est raisonnable pour n ≥ 10 et converge vers la
|
| 105 |
-
distribution exacte. Pour n < 10, une table critique simplifiée est
|
| 106 |
-
utilisée (p ∈ {0.04, 0.20}) — résultat **conservateur**.
|
| 107 |
-
* **scipy** (si installé) : ``scipy.stats.wilcoxon`` est utilisé à la place
|
| 108 |
-
de l'approximation native. scipy utilise la méthode exacte pour n ≤ 25
|
| 109 |
-
et l'approximation normale pour n > 25, ce qui est plus précis.
|
| 110 |
-
* **Validité** : le test suppose la symétrie de la distribution des
|
| 111 |
-
différences. Avec de très petits n (< 5), les résultats sont peu fiables
|
| 112 |
-
quelle que soit la méthode.
|
| 113 |
-
|
| 114 |
-
Parameters
|
| 115 |
-
----------
|
| 116 |
-
a, b : séries de CER (même longueur, même ordre de documents)
|
| 117 |
-
zero_method : gestion des paires nulles (défaut : ``"wilcox"``)
|
| 118 |
-
"""
|
| 119 |
-
if len(a) != len(b):
|
| 120 |
-
raise ValueError("Les deux listes doivent avoir la même longueur")
|
| 121 |
-
|
| 122 |
-
diffs = [x - y for x, y in zip(a, b)]
|
| 123 |
-
|
| 124 |
-
# Retirer les zéros (méthode "wilcox")
|
| 125 |
-
if zero_method == "wilcox":
|
| 126 |
-
diffs = [d for d in diffs if d != 0.0]
|
| 127 |
-
|
| 128 |
-
n = len(diffs)
|
| 129 |
-
if n == 0:
|
| 130 |
-
return {
|
| 131 |
-
"statistic": 0.0,
|
| 132 |
-
"p_value": 1.0,
|
| 133 |
-
"significant": False,
|
| 134 |
-
"interpretation": "Aucune différence entre les deux concurrents.",
|
| 135 |
-
"n_pairs": 0,
|
| 136 |
-
}
|
| 137 |
-
|
| 138 |
-
# Rangs des valeurs absolues
|
| 139 |
-
abs_diffs = [abs(d) for d in diffs]
|
| 140 |
-
indexed = sorted(enumerate(abs_diffs), key=lambda x: x[1])
|
| 141 |
-
|
| 142 |
-
# Gestion des ex-aequo : rang moyen
|
| 143 |
-
ranks = [0.0] * n
|
| 144 |
-
i = 0
|
| 145 |
-
while i < n:
|
| 146 |
-
j = i
|
| 147 |
-
while j < n and abs_diffs[indexed[j][0]] == abs_diffs[indexed[i][0]]:
|
| 148 |
-
j += 1
|
| 149 |
-
avg_rank = (i + j + 1) / 2.0 # rang moyen (1-based)
|
| 150 |
-
for k in range(i, j):
|
| 151 |
-
ranks[indexed[k][0]] = avg_rank
|
| 152 |
-
i = j
|
| 153 |
-
|
| 154 |
-
W_plus = sum(ranks[k] for k in range(n) if diffs[k] > 0)
|
| 155 |
-
W_minus = sum(ranks[k] for k in range(n) if diffs[k] < 0)
|
| 156 |
-
W = min(W_plus, W_minus)
|
| 157 |
-
|
| 158 |
-
# Calcul de la p-value : scipy si disponible, sinon approximation native
|
| 159 |
-
if _SCIPY_AVAILABLE:
|
| 160 |
-
try:
|
| 161 |
-
scipy_res = _scipy_wilcoxon(diffs, zero_method=zero_method)
|
| 162 |
-
p_value = float(scipy_res.pvalue)
|
| 163 |
-
except Exception:
|
| 164 |
-
# Repli sur l'implémentation native en cas d'erreur scipy
|
| 165 |
-
p_value = _native_p_value(n, W)
|
| 166 |
-
else:
|
| 167 |
-
p_value = _native_p_value(n, W)
|
| 168 |
-
|
| 169 |
-
significant = p_value < 0.05
|
| 170 |
-
|
| 171 |
-
if significant:
|
| 172 |
-
better = "premier" if W_plus < W_minus else "second"
|
| 173 |
-
interpretation = (
|
| 174 |
-
f"Différence statistiquement significative (p = {p_value:.4f} < 0.05). "
|
| 175 |
-
f"Le {better} concurrent obtient de meilleurs scores."
|
| 176 |
-
)
|
| 177 |
-
else:
|
| 178 |
-
interpretation = (
|
| 179 |
-
f"Différence non significative (p = {p_value:.4f} ≥ 0.05). "
|
| 180 |
-
"On ne peut pas conclure que l'un surpasse l'autre."
|
| 181 |
-
)
|
| 182 |
-
|
| 183 |
-
return {
|
| 184 |
-
"statistic": round(W, 4),
|
| 185 |
-
"p_value": round(p_value, 6),
|
| 186 |
-
"significant": significant,
|
| 187 |
-
"interpretation": interpretation,
|
| 188 |
-
"n_pairs": n,
|
| 189 |
-
"W_plus": round(W_plus, 4),
|
| 190 |
-
"W_minus": round(W_minus, 4),
|
| 191 |
-
}
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
def _normal_sf(z: float) -> float:
|
| 195 |
-
"""Survival function de la loi normale standard (1 - CDF)."""
|
| 196 |
-
# Approximation Abramowitz & Stegun 26.2.17
|
| 197 |
-
t = 1.0 / (1.0 + 0.2316419 * abs(z))
|
| 198 |
-
poly = t * (0.319381530 + t * (-0.356563782 + t * (1.781477937
|
| 199 |
-
+ t * (-1.821255978 + t * 1.330274429))))
|
| 200 |
-
phi_z = math.exp(-0.5 * z * z) / math.sqrt(2.0 * math.pi)
|
| 201 |
-
p = phi_z * poly
|
| 202 |
-
return p if z >= 0 else 1.0 - p
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
# Table des valeurs critiques de W pour α=0.05 bilatéral (test exact, source : tables de Wilcoxon)
|
| 206 |
-
_W_CRITICAL = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 2, 8: 3, 9: 5}
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
def _wilcoxon_exact_p(n: int, w: float) -> float:
|
| 210 |
-
"""P-value approximée pour petits n (< 10) via table critique simplifiée.
|
| 211 |
-
|
| 212 |
-
Note : résultat **conservateur** — seules deux valeurs sont retournées :
|
| 213 |
-
0.04 (significatif à 5 %) ou 0.20 (non significatif).
|
| 214 |
-
Préférer scipy pour des p-values exactes.
|
| 215 |
-
"""
|
| 216 |
-
critical = _W_CRITICAL.get(n, 0)
|
| 217 |
-
if w <= critical:
|
| 218 |
-
return 0.04 # significatif à 5 %
|
| 219 |
-
return 0.20 # non significatif (approximation conservative)
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
def _native_p_value(n: int, W: float) -> float:
|
| 223 |
-
"""Calcule la p-value via l'approximation normale (n ≥ 10) ou la table exacte (n < 10)."""
|
| 224 |
-
if n >= 10:
|
| 225 |
-
mu = n * (n + 1) / 4.0
|
| 226 |
-
sigma2 = n * (n + 1) * (2 * n + 1) / 24.0
|
| 227 |
-
if sigma2 <= 0:
|
| 228 |
-
return 1.0
|
| 229 |
-
z = abs((W + 0.5) - mu) / math.sqrt(sigma2) # correction de continuité
|
| 230 |
-
return 2.0 * _normal_sf(z) # test bilatéral
|
| 231 |
-
return _wilcoxon_exact_p(n, W)
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
# ---------------------------------------------------------------------------
|
| 235 |
-
# Matrice des tests pairwise
|
| 236 |
-
# ---------------------------------------------------------------------------
|
| 237 |
-
|
| 238 |
-
def compute_pairwise_stats(
|
| 239 |
-
engine_cer_map: dict[str, list[float]],
|
| 240 |
-
) -> list[dict]:
|
| 241 |
-
"""Calcule les tests de Wilcoxon entre toutes les paires de concurrents.
|
| 242 |
-
|
| 243 |
-
Parameters
|
| 244 |
-
----------
|
| 245 |
-
engine_cer_map : dict {engine_name → [cer_doc1, cer_doc2, ...]}
|
| 246 |
-
|
| 247 |
-
Returns
|
| 248 |
-
-------
|
| 249 |
-
Liste de dicts, un par paire :
|
| 250 |
-
- engine_a, engine_b, statistic, p_value, significant, interpretation
|
| 251 |
-
"""
|
| 252 |
-
names = list(engine_cer_map.keys())
|
| 253 |
-
results = []
|
| 254 |
-
for i in range(len(names)):
|
| 255 |
-
for j in range(i + 1, len(names)):
|
| 256 |
-
a_name, b_name = names[i], names[j]
|
| 257 |
-
a_vals = engine_cer_map[a_name]
|
| 258 |
-
b_vals = engine_cer_map[b_name]
|
| 259 |
-
# Aligner les longueurs
|
| 260 |
-
min_len = min(len(a_vals), len(b_vals))
|
| 261 |
-
if min_len < 2:
|
| 262 |
-
continue
|
| 263 |
-
res = wilcoxon_test(a_vals[:min_len], b_vals[:min_len])
|
| 264 |
-
results.append({
|
| 265 |
-
"engine_a": a_name,
|
| 266 |
-
"engine_b": b_name,
|
| 267 |
-
**res,
|
| 268 |
-
})
|
| 269 |
-
return results
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
# ---------------------------------------------------------------------------
|
| 273 |
-
# Test de Friedman + post-hoc Nemenyi (Sprint 17)
|
| 274 |
-
# ---------------------------------------------------------------------------
|
| 275 |
-
#
|
| 276 |
-
# Référence : Demšar, J. (2006), "Statistical Comparisons of Classifiers over
|
| 277 |
-
# Multiple Data Sets", Journal of Machine Learning Research 7:1-30. Standard
|
| 278 |
-
# de facto pour comparer plusieurs systèmes sur plusieurs datasets — ici :
|
| 279 |
-
# plusieurs moteurs OCR sur plusieurs documents. Le CDD (critical difference
|
| 280 |
-
# diagram) issu de Nemenyi est le rendu canonique.
|
| 281 |
-
|
| 282 |
-
# Valeurs critiques de la distribution du Studentized Range divisées par √2,
|
| 283 |
-
# pour df = ∞ (approximation usuelle pour Nemenyi). Source : tables de Tukey.
|
| 284 |
-
# Clé : nombre de traitements k ; valeur : q_α pour α ∈ {0.05, 0.01}.
|
| 285 |
-
_NEMENYI_Q_TABLE = {
|
| 286 |
-
# k q_0.05 q_0.01
|
| 287 |
-
2: (1.960, 2.576),
|
| 288 |
-
3: (2.343, 2.913),
|
| 289 |
-
4: (2.569, 3.113),
|
| 290 |
-
5: (2.728, 3.255),
|
| 291 |
-
6: (2.850, 3.364),
|
| 292 |
-
7: (2.949, 3.452),
|
| 293 |
-
8: (3.031, 3.526),
|
| 294 |
-
9: (3.102, 3.590),
|
| 295 |
-
10: (3.164, 3.646),
|
| 296 |
-
11: (3.219, 3.696),
|
| 297 |
-
12: (3.268, 3.741),
|
| 298 |
-
13: (3.313, 3.781),
|
| 299 |
-
14: (3.354, 3.818),
|
| 300 |
-
15: (3.391, 3.853),
|
| 301 |
-
16: (3.426, 3.886),
|
| 302 |
-
17: (3.458, 3.916),
|
| 303 |
-
18: (3.489, 3.944),
|
| 304 |
-
19: (3.517, 3.970),
|
| 305 |
-
20: (3.544, 3.995),
|
| 306 |
-
25: (3.658, 4.095),
|
| 307 |
-
30: (3.739, 4.167),
|
| 308 |
-
40: (3.858, 4.272),
|
| 309 |
-
50: (3.945, 4.349),
|
| 310 |
-
}
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
def _chi_square_sf(x: float, df: int) -> float:
|
| 314 |
-
"""Survival function de la loi chi², 1 - CDF(x).
|
| 315 |
-
|
| 316 |
-
Utilise scipy si disponible (méthode exacte), sinon Wilson-Hilferty
|
| 317 |
-
(approximation normale précise dès df ≥ 3).
|
| 318 |
-
"""
|
| 319 |
-
if x <= 0 or df <= 0:
|
| 320 |
-
return 1.0
|
| 321 |
-
try:
|
| 322 |
-
from scipy.stats import chi2 as _chi2 # type: ignore[import-untyped]
|
| 323 |
-
return float(_chi2.sf(x, df))
|
| 324 |
-
except ImportError:
|
| 325 |
-
pass
|
| 326 |
-
# Wilson-Hilferty : transforme chi² en approximation normale
|
| 327 |
-
z = (((x / df) ** (1.0 / 3.0)) - (1.0 - 2.0 / (9.0 * df))) / math.sqrt(2.0 / (9.0 * df))
|
| 328 |
-
return _normal_sf(z)
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
def _rank_row(values: list[float]) -> list[float]:
|
| 332 |
-
"""Rangs d'une ligne — petit = rang 1. Ex-aequo : rangs moyens."""
|
| 333 |
-
n = len(values)
|
| 334 |
-
indexed = sorted(range(n), key=lambda i: values[i])
|
| 335 |
-
ranks = [0.0] * n
|
| 336 |
-
i = 0
|
| 337 |
-
while i < n:
|
| 338 |
-
j = i
|
| 339 |
-
while j < n and values[indexed[j]] == values[indexed[i]]:
|
| 340 |
-
j += 1
|
| 341 |
-
avg_rank = (i + j + 1) / 2.0 # 1-based
|
| 342 |
-
for k in range(i, j):
|
| 343 |
-
ranks[indexed[k]] = avg_rank
|
| 344 |
-
i = j
|
| 345 |
-
return ranks
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
def _aligned_cer_matrix(
|
| 349 |
-
engine_cer_map: dict[str, list[float]],
|
| 350 |
-
) -> tuple[list[str], list[list[float]]]:
|
| 351 |
-
"""Construit la matrice (k moteurs × n documents) alignée sur la longueur
|
| 352 |
-
minimale. Retourne ``(noms, matrice_colonne_par_moteur)``.
|
| 353 |
-
|
| 354 |
-
Friedman exige des blocs (documents) complets : si les moteurs n'ont pas
|
| 355 |
-
tous été exécutés sur les mêmes documents, on tronque à la longueur
|
| 356 |
-
minimale, documentée dans le résultat via ``n_blocks``.
|
| 357 |
-
"""
|
| 358 |
-
names = list(engine_cer_map.keys())
|
| 359 |
-
if not names:
|
| 360 |
-
return [], []
|
| 361 |
-
min_len = min(len(v) for v in engine_cer_map.values())
|
| 362 |
-
if min_len == 0:
|
| 363 |
-
return names, []
|
| 364 |
-
matrix = [engine_cer_map[n][:min_len] for n in names]
|
| 365 |
-
return names, matrix
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
def friedman_test(engine_cer_map: dict[str, list[float]]) -> dict:
|
| 369 |
-
"""Test de Friedman — k moteurs sur n documents appariés.
|
| 370 |
-
|
| 371 |
-
Test non-paramétrique équivalent à l'ANOVA à mesures répétées pour des
|
| 372 |
-
données ordinales. Hypothèse nulle : tous les moteurs ont la même
|
| 373 |
-
performance moyenne. Rejet → au moins un moteur diffère des autres.
|
| 374 |
-
|
| 375 |
-
Parameters
|
| 376 |
-
----------
|
| 377 |
-
engine_cer_map:
|
| 378 |
-
Dict ``{engine_name → [cer_doc1, cer_doc2, ...]}``. Tous les moteurs
|
| 379 |
-
doivent avoir été évalués sur les mêmes documents (dans le même ordre).
|
| 380 |
-
|
| 381 |
-
Returns
|
| 382 |
-
-------
|
| 383 |
-
dict avec :
|
| 384 |
-
- ``statistic`` : Q corrigé pour les ex-aequo
|
| 385 |
-
- ``p_value`` : p-value (scipy si dispo, sinon Wilson-Hilferty)
|
| 386 |
-
- ``significant`` : bool, p < 0.05
|
| 387 |
-
- ``df`` : degrés de liberté = k - 1
|
| 388 |
-
- ``n_blocks`` : nombre de documents (blocs) utilisés
|
| 389 |
-
- ``n_engines`` : nombre de moteurs (k)
|
| 390 |
-
- ``mean_ranks`` : dict ``{engine: rang_moyen}``
|
| 391 |
-
- ``interpretation``: phrase lisible
|
| 392 |
-
- ``error`` : message si le test n'est pas applicable
|
| 393 |
-
"""
|
| 394 |
-
names, matrix = _aligned_cer_matrix(engine_cer_map)
|
| 395 |
-
k = len(names)
|
| 396 |
-
n = len(matrix[0]) if matrix else 0
|
| 397 |
-
|
| 398 |
-
if k < 2:
|
| 399 |
-
return {
|
| 400 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 401 |
-
"df": 0, "n_blocks": n, "n_engines": k,
|
| 402 |
-
"mean_ranks": {names[0]: 1.0} if k == 1 else {},
|
| 403 |
-
"interpretation": "Test de Friedman non applicable : il faut au moins 2 moteurs.",
|
| 404 |
-
"error": "not_enough_engines",
|
| 405 |
-
}
|
| 406 |
-
if n < 2:
|
| 407 |
-
return {
|
| 408 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 409 |
-
"df": k - 1, "n_blocks": n, "n_engines": k,
|
| 410 |
-
"mean_ranks": {name: 1.0 for name in names},
|
| 411 |
-
"interpretation": "Test de Friedman non applicable : il faut au moins 2 documents communs.",
|
| 412 |
-
"error": "not_enough_blocks",
|
| 413 |
-
}
|
| 414 |
-
|
| 415 |
-
# Rangs par bloc (document) : pour chaque doc, ranger les k moteurs
|
| 416 |
-
ranks_by_engine: list[list[float]] = [[] for _ in range(k)]
|
| 417 |
-
for j in range(n):
|
| 418 |
-
row = [matrix[i][j] for i in range(k)]
|
| 419 |
-
row_ranks = _rank_row(row)
|
| 420 |
-
for i in range(k):
|
| 421 |
-
ranks_by_engine[i].append(row_ranks[i])
|
| 422 |
-
|
| 423 |
-
rank_sums = [sum(r) for r in ranks_by_engine]
|
| 424 |
-
mean_ranks = {names[i]: rank_sums[i] / n for i in range(k)}
|
| 425 |
-
|
| 426 |
-
# Statistique Q non-corrigée (sans ex-aequo)
|
| 427 |
-
# Q = 12 / (n·k·(k+1)) · Σ R_j² − 3·n·(k+1)
|
| 428 |
-
Q = (12.0 / (n * k * (k + 1))) * sum(rs ** 2 for rs in rank_sums) - 3.0 * n * (k + 1)
|
| 429 |
-
|
| 430 |
-
# Correction pour les ex-aequo (ties factor) — ajuste si des rangs sont
|
| 431 |
-
# partagés dans certains blocs. Formule : Q_corr = Q / (1 - T/(n·(k³−k)))
|
| 432 |
-
# où T = Σ (tⱼ³ − tⱼ) sur tous les groupes d'ex-aequo.
|
| 433 |
-
tie_correction = 0.0
|
| 434 |
-
for j in range(n):
|
| 435 |
-
row = [matrix[i][j] for i in range(k)]
|
| 436 |
-
sorted_row = sorted(row)
|
| 437 |
-
i = 0
|
| 438 |
-
while i < len(sorted_row):
|
| 439 |
-
count = 1
|
| 440 |
-
while i + count < len(sorted_row) and sorted_row[i + count] == sorted_row[i]:
|
| 441 |
-
count += 1
|
| 442 |
-
if count > 1:
|
| 443 |
-
tie_correction += count ** 3 - count
|
| 444 |
-
i += count
|
| 445 |
-
denom = 1.0 - tie_correction / (n * (k ** 3 - k)) if k >= 2 else 1.0
|
| 446 |
-
if denom > 0:
|
| 447 |
-
Q = Q / denom
|
| 448 |
-
|
| 449 |
-
df = k - 1
|
| 450 |
-
p_value = _chi_square_sf(Q, df)
|
| 451 |
-
significant = p_value < 0.05
|
| 452 |
-
|
| 453 |
-
if significant:
|
| 454 |
-
interpretation = (
|
| 455 |
-
f"Test de Friedman significatif (Q = {Q:.3f}, df = {df}, p = {p_value:.4f}). "
|
| 456 |
-
f"Au moins un moteur diffère des autres — utiliser le post-hoc Nemenyi "
|
| 457 |
-
f"pour identifier les paires distinguables."
|
| 458 |
-
)
|
| 459 |
-
else:
|
| 460 |
-
interpretation = (
|
| 461 |
-
f"Test de Friedman non significatif (Q = {Q:.3f}, df = {df}, p = {p_value:.4f}). "
|
| 462 |
-
f"Aucune différence globale détectée entre les moteurs sur ce corpus."
|
| 463 |
-
)
|
| 464 |
-
|
| 465 |
-
return {
|
| 466 |
-
"statistic": round(Q, 4),
|
| 467 |
-
"p_value": round(p_value, 6),
|
| 468 |
-
"significant": significant,
|
| 469 |
-
"df": df,
|
| 470 |
-
"n_blocks": n,
|
| 471 |
-
"n_engines": k,
|
| 472 |
-
"mean_ranks": {k_: round(v, 4) for k_, v in mean_ranks.items()},
|
| 473 |
-
"interpretation": interpretation,
|
| 474 |
-
}
|
| 475 |
-
|
| 476 |
-
|
| 477 |
-
def _nemenyi_critical_value(k: int, alpha: float = 0.05) -> Optional[float]:
|
| 478 |
-
"""Valeur critique q_α pour k traitements, df = ∞.
|
| 479 |
-
|
| 480 |
-
Retourne ``None`` si k est hors table (< 2 ou > 50).
|
| 481 |
-
"""
|
| 482 |
-
if k < 2:
|
| 483 |
-
return None
|
| 484 |
-
if k in _NEMENYI_Q_TABLE:
|
| 485 |
-
q05, q01 = _NEMENYI_Q_TABLE[k]
|
| 486 |
-
return q05 if alpha == 0.05 else q01 if alpha == 0.01 else q05
|
| 487 |
-
# Au-delà de la table : borne supérieure (conservateur)
|
| 488 |
-
max_k = max(_NEMENYI_Q_TABLE.keys())
|
| 489 |
-
if k > max_k:
|
| 490 |
-
q05, q01 = _NEMENYI_Q_TABLE[max_k]
|
| 491 |
-
return q05 if alpha == 0.05 else q01
|
| 492 |
-
# Entre deux clés : interpolation linéaire
|
| 493 |
-
keys = sorted(_NEMENYI_Q_TABLE.keys())
|
| 494 |
-
for i in range(len(keys) - 1):
|
| 495 |
-
if keys[i] < k < keys[i + 1]:
|
| 496 |
-
lo, hi = keys[i], keys[i + 1]
|
| 497 |
-
q_lo = _NEMENYI_Q_TABLE[lo][0 if alpha == 0.05 else 1]
|
| 498 |
-
q_hi = _NEMENYI_Q_TABLE[hi][0 if alpha == 0.05 else 1]
|
| 499 |
-
frac = (k - lo) / (hi - lo)
|
| 500 |
-
return q_lo + frac * (q_hi - q_lo)
|
| 501 |
-
return None
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
def nemenyi_posthoc(
|
| 505 |
-
engine_cer_map: dict[str, list[float]],
|
| 506 |
-
alpha: float = 0.05,
|
| 507 |
-
) -> dict:
|
| 508 |
-
"""Post-hoc de Nemenyi — identifie les paires de moteurs statistiquement
|
| 509 |
-
indiscernables après un test de Friedman.
|
| 510 |
-
|
| 511 |
-
Calcule la *critical distance* CD = q_α · √(k·(k+1) / (6·n)). Deux moteurs
|
| 512 |
-
dont les rangs moyens diffèrent de moins que CD ne sont **pas**
|
| 513 |
-
statistiquement distinguables au seuil α.
|
| 514 |
-
|
| 515 |
-
Returns
|
| 516 |
-
-------
|
| 517 |
-
dict avec :
|
| 518 |
-
- ``alpha`` : seuil utilisé
|
| 519 |
-
- ``critical_distance`` : CD calculée
|
| 520 |
-
- ``q_alpha`` : valeur critique q_α issue de la table
|
| 521 |
-
- ``n_blocks``, ``n_engines``
|
| 522 |
-
- ``mean_ranks`` : rangs moyens par moteur (dict)
|
| 523 |
-
- ``engines_sorted`` : liste des moteurs triés par rang croissant
|
| 524 |
-
- ``significant_matrix`` : matrice bool (list[list[bool]]),
|
| 525 |
-
``True`` = paire significativement différente
|
| 526 |
-
- ``tied_groups`` : liste de listes de moteurs indiscernables
|
| 527 |
-
(groupes maximaux d'ex-aequo pratiques)
|
| 528 |
-
- ``error`` : présent si le test n'est pas applicable
|
| 529 |
-
"""
|
| 530 |
-
names, matrix = _aligned_cer_matrix(engine_cer_map)
|
| 531 |
-
k = len(names)
|
| 532 |
-
n = len(matrix[0]) if matrix else 0
|
| 533 |
-
|
| 534 |
-
if k < 2 or n < 2:
|
| 535 |
-
return {
|
| 536 |
-
"alpha": alpha,
|
| 537 |
-
"critical_distance": 0.0,
|
| 538 |
-
"q_alpha": 0.0,
|
| 539 |
-
"n_blocks": n,
|
| 540 |
-
"n_engines": k,
|
| 541 |
-
"mean_ranks": {name: 1.0 for name in names},
|
| 542 |
-
"engines_sorted": list(names),
|
| 543 |
-
"significant_matrix": [[False] * k for _ in range(k)],
|
| 544 |
-
"tied_groups": [list(names)] if names else [],
|
| 545 |
-
"error": "not_enough_data",
|
| 546 |
-
}
|
| 547 |
-
|
| 548 |
-
# Friedman fournit les rangs moyens — on les recalcule ici pour rester
|
| 549 |
-
# autonome (sans forcer l'utilisateur à chaîner les deux appels).
|
| 550 |
-
ranks_by_engine: list[list[float]] = [[] for _ in range(k)]
|
| 551 |
-
for j in range(n):
|
| 552 |
-
row = [matrix[i][j] for i in range(k)]
|
| 553 |
-
row_ranks = _rank_row(row)
|
| 554 |
-
for i in range(k):
|
| 555 |
-
ranks_by_engine[i].append(row_ranks[i])
|
| 556 |
-
|
| 557 |
-
mean_ranks_list = [sum(r) / n for r in ranks_by_engine]
|
| 558 |
-
mean_ranks = {names[i]: round(mean_ranks_list[i], 4) for i in range(k)}
|
| 559 |
-
|
| 560 |
-
q_alpha = _nemenyi_critical_value(k, alpha) or 0.0
|
| 561 |
-
critical_distance = q_alpha * math.sqrt(k * (k + 1) / (6.0 * n))
|
| 562 |
-
|
| 563 |
-
# Matrice de significativité : paire (i,j) significative si |R_i - R_j| > CD
|
| 564 |
-
significant_matrix = [
|
| 565 |
-
[
|
| 566 |
-
(i != j) and (abs(mean_ranks_list[i] - mean_ranks_list[j]) > critical_distance)
|
| 567 |
-
for j in range(k)
|
| 568 |
-
]
|
| 569 |
-
for i in range(k)
|
| 570 |
-
]
|
| 571 |
-
|
| 572 |
-
# Groupes d'ex-aequo pratiques : fenêtre glissante sur les rangs triés.
|
| 573 |
-
# Deux moteurs sont dans le même groupe si leur écart ≤ CD.
|
| 574 |
-
order = sorted(range(k), key=lambda i: mean_ranks_list[i])
|
| 575 |
-
sorted_names = [names[i] for i in order]
|
| 576 |
-
sorted_ranks = [mean_ranks_list[i] for i in order]
|
| 577 |
-
|
| 578 |
-
tied_groups: list[list[str]] = []
|
| 579 |
-
i = 0
|
| 580 |
-
while i < len(sorted_names):
|
| 581 |
-
# étendre le groupe tant que le moteur suivant est à ≤ CD du premier du groupe
|
| 582 |
-
j = i
|
| 583 |
-
while j + 1 < len(sorted_names) and (sorted_ranks[j + 1] - sorted_ranks[i]) <= critical_distance:
|
| 584 |
-
j += 1
|
| 585 |
-
tied_groups.append(sorted_names[i:j + 1])
|
| 586 |
-
i = j + 1 if j > i else i + 1
|
| 587 |
-
|
| 588 |
-
return {
|
| 589 |
-
"alpha": alpha,
|
| 590 |
-
"critical_distance": round(critical_distance, 4),
|
| 591 |
-
"q_alpha": round(q_alpha, 4),
|
| 592 |
-
"n_blocks": n,
|
| 593 |
-
"n_engines": k,
|
| 594 |
-
"mean_ranks": mean_ranks,
|
| 595 |
-
"engines_sorted": sorted_names,
|
| 596 |
-
"significant_matrix": significant_matrix,
|
| 597 |
-
"tied_groups": tied_groups,
|
| 598 |
-
}
|
| 599 |
-
|
| 600 |
-
|
| 601 |
-
# ---------------------------------------------------------------------------
|
| 602 |
-
# Critical Difference Diagram — rendu SVG (Sprint 17)
|
| 603 |
-
# ---------------------------------------------------------------------------
|
| 604 |
-
|
| 605 |
-
def build_critical_difference_svg(
|
| 606 |
-
nemenyi_result: dict,
|
| 607 |
-
width: int = 780,
|
| 608 |
-
row_height: int = 22,
|
| 609 |
-
) -> str:
|
| 610 |
-
"""Génère le SVG du Critical Difference Diagram (Demšar 2006).
|
| 611 |
-
|
| 612 |
-
Le diagramme montre :
|
| 613 |
-
* un axe horizontal des rangs moyens (1 à k),
|
| 614 |
-
* chaque moteur positionné sur l'axe à son rang moyen,
|
| 615 |
-
* des barres horizontales épaisses reliant les moteurs statistiquement
|
| 616 |
-
indiscernables (distance ≤ CD),
|
| 617 |
-
* la longueur de CD affichée au-dessus de l'axe en référence.
|
| 618 |
-
|
| 619 |
-
Parameters
|
| 620 |
-
----------
|
| 621 |
-
nemenyi_result:
|
| 622 |
-
Résultat de ``nemenyi_posthoc``.
|
| 623 |
-
width:
|
| 624 |
-
Largeur totale du SVG en pixels.
|
| 625 |
-
row_height:
|
| 626 |
-
Hauteur de chaque ligne d'étiquette moteur (auto-adaptatif).
|
| 627 |
-
|
| 628 |
-
Returns
|
| 629 |
-
-------
|
| 630 |
-
Chaîne contenant le SVG (balise racine ``<svg>…</svg>``).
|
| 631 |
-
"""
|
| 632 |
-
k = nemenyi_result.get("n_engines", 0)
|
| 633 |
-
if k < 2 or nemenyi_result.get("error"):
|
| 634 |
-
return (
|
| 635 |
-
'<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="40" '
|
| 636 |
-
'role="img" aria-label="Critical Difference Diagram indisponible">'
|
| 637 |
-
'<text x="10" y="24" font-family="sans-serif" font-size="12" fill="#666">'
|
| 638 |
-
'Critical Difference Diagram non calculable — données insuffisantes.'
|
| 639 |
-
'</text></svg>'
|
| 640 |
-
)
|
| 641 |
-
|
| 642 |
-
engines_sorted: list[str] = list(nemenyi_result.get("engines_sorted", []))
|
| 643 |
-
mean_ranks: dict[str, float] = dict(nemenyi_result.get("mean_ranks", {}))
|
| 644 |
-
tied_groups: list[list[str]] = list(nemenyi_result.get("tied_groups", []))
|
| 645 |
-
cd: float = float(nemenyi_result.get("critical_distance", 0.0))
|
| 646 |
-
|
| 647 |
-
# Dimensions
|
| 648 |
-
left_pad, right_pad = 40, 40
|
| 649 |
-
top_pad = 50 # espace pour l'affichage CD
|
| 650 |
-
axis_y = top_pad + 10
|
| 651 |
-
bars_start_y = axis_y + 20 # première barre d'ex-aequo sous l'axe
|
| 652 |
-
# Empiler une ligne par groupe + une ligne par étiquette
|
| 653 |
-
label_rows = k # chaque moteur a sa propre ligne de label
|
| 654 |
-
bars_count = len(tied_groups)
|
| 655 |
-
total_h = bars_start_y + bars_count * 10 + label_rows * row_height + 20
|
| 656 |
-
|
| 657 |
-
axis_x0, axis_x1 = left_pad, width - right_pad
|
| 658 |
-
axis_width = axis_x1 - axis_x0
|
| 659 |
-
|
| 660 |
-
def x_for_rank(r: float) -> float:
|
| 661 |
-
# Rang 1 à gauche, rang k à droite
|
| 662 |
-
if k <= 1:
|
| 663 |
-
return axis_x0
|
| 664 |
-
return axis_x0 + (r - 1.0) / (k - 1.0) * axis_width
|
| 665 |
-
|
| 666 |
-
parts: list[str] = []
|
| 667 |
-
parts.append(
|
| 668 |
-
f'<svg xmlns="http://www.w3.org/2000/svg" width="100%" viewBox="0 0 {width} {total_h}" '
|
| 669 |
-
f'role="img" aria-label="Critical Difference Diagram (Friedman-Nemenyi)" '
|
| 670 |
-
f'font-family="system-ui, -apple-system, sans-serif">'
|
| 671 |
-
)
|
| 672 |
-
parts.append('<style>.cd-axis{stroke:#334155;stroke-width:1.5}.cd-tick{stroke:#334155;stroke-width:1}'
|
| 673 |
-
'.cd-label{fill:#0f172a;font-size:11px}'
|
| 674 |
-
'.cd-tie{stroke:#0f172a;stroke-width:4;stroke-linecap:round}'
|
| 675 |
-
'.cd-cd-bar{stroke:#dc2626;stroke-width:2}'
|
| 676 |
-
'.cd-cd-txt{fill:#dc2626;font-size:11px;font-weight:600}'
|
| 677 |
-
'.cd-name{fill:#0f172a;font-size:12px}'
|
| 678 |
-
'.cd-rank{fill:#64748b;font-size:10px}'
|
| 679 |
-
'</style>')
|
| 680 |
-
|
| 681 |
-
# Barre CD de référence (en haut, à gauche de l'axe)
|
| 682 |
-
if cd > 0 and k >= 2:
|
| 683 |
-
cd_bar_x0 = axis_x0
|
| 684 |
-
cd_bar_x1 = axis_x0 + (cd / max(1, k - 1)) * axis_width
|
| 685 |
-
cd_y = top_pad - 20
|
| 686 |
-
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x0:.1f}" y1="{cd_y}" '
|
| 687 |
-
f'x2="{cd_bar_x1:.1f}" y2="{cd_y}"/>')
|
| 688 |
-
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x0:.1f}" y1="{cd_y - 4}" '
|
| 689 |
-
f'x2="{cd_bar_x0:.1f}" y2="{cd_y + 4}"/>')
|
| 690 |
-
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x1:.1f}" y1="{cd_y - 4}" '
|
| 691 |
-
f'x2="{cd_bar_x1:.1f}" y2="{cd_y + 4}"/>')
|
| 692 |
-
parts.append(f'<text class="cd-cd-txt" x="{(cd_bar_x0 + cd_bar_x1)/2:.1f}" y="{cd_y - 8}" '
|
| 693 |
-
f'text-anchor="middle">CD = {cd:.3f}</text>')
|
| 694 |
-
|
| 695 |
-
# Axe principal
|
| 696 |
-
parts.append(f'<line class="cd-axis" x1="{axis_x0}" y1="{axis_y}" '
|
| 697 |
-
f'x2="{axis_x1}" y2="{axis_y}"/>')
|
| 698 |
-
# Ticks entiers
|
| 699 |
-
for r in range(1, k + 1):
|
| 700 |
-
xt = x_for_rank(r)
|
| 701 |
-
parts.append(f'<line class="cd-tick" x1="{xt:.1f}" y1="{axis_y - 5}" '
|
| 702 |
-
f'x2="{xt:.1f}" y2="{axis_y + 5}"/>')
|
| 703 |
-
parts.append(f'<text class="cd-label" x="{xt:.1f}" y="{axis_y - 9}" '
|
| 704 |
-
f'text-anchor="middle">{r}</text>')
|
| 705 |
-
|
| 706 |
-
# Barres reliant les groupes indiscernables
|
| 707 |
-
for i, group in enumerate(tied_groups):
|
| 708 |
-
if len(group) < 2:
|
| 709 |
-
continue
|
| 710 |
-
rs = [mean_ranks[n] for n in group]
|
| 711 |
-
x0 = x_for_rank(min(rs))
|
| 712 |
-
x1 = x_for_rank(max(rs))
|
| 713 |
-
y_bar = bars_start_y + i * 10
|
| 714 |
-
parts.append(f'<line class="cd-tie" x1="{x0 - 3:.1f}" y1="{y_bar}" '
|
| 715 |
-
f'x2="{x1 + 3:.1f}" y2="{y_bar}"/>')
|
| 716 |
-
|
| 717 |
-
# Étiquettes des moteurs : la moitié la plus basse à gauche, l'autre à droite
|
| 718 |
-
labels_y_base = bars_start_y + bars_count * 10 + 15
|
| 719 |
-
half = (len(engines_sorted) + 1) // 2
|
| 720 |
-
left_engines = engines_sorted[:half]
|
| 721 |
-
right_engines = engines_sorted[half:]
|
| 722 |
-
|
| 723 |
-
for idx, name in enumerate(left_engines):
|
| 724 |
-
r = mean_ranks[name]
|
| 725 |
-
x = x_for_rank(r)
|
| 726 |
-
y_label = labels_y_base + idx * row_height
|
| 727 |
-
# Ligne du moteur vers axe
|
| 728 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{axis_y + 6}" '
|
| 729 |
-
f'x2="{x:.1f}" y2="{y_label - 4}"/>')
|
| 730 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{y_label - 4}" '
|
| 731 |
-
f'x2="{axis_x0 - 4:.1f}" y2="{y_label - 4}"/>')
|
| 732 |
-
parts.append(f'<text class="cd-name" x="{axis_x0 - 6:.1f}" y="{y_label}" '
|
| 733 |
-
f'text-anchor="end">{_svg_escape(name)} '
|
| 734 |
-
f'<tspan class="cd-rank">({r:.2f})</tspan></text>')
|
| 735 |
-
|
| 736 |
-
for idx, name in enumerate(right_engines):
|
| 737 |
-
r = mean_ranks[name]
|
| 738 |
-
x = x_for_rank(r)
|
| 739 |
-
y_label = labels_y_base + idx * row_height
|
| 740 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{axis_y + 6}" '
|
| 741 |
-
f'x2="{x:.1f}" y2="{y_label - 4}"/>')
|
| 742 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{y_label - 4}" '
|
| 743 |
-
f'x2="{axis_x1 + 4:.1f}" y2="{y_label - 4}"/>')
|
| 744 |
-
parts.append(f'<text class="cd-name" x="{axis_x1 + 6:.1f}" y="{y_label}" '
|
| 745 |
-
f'text-anchor="start">{_svg_escape(name)} '
|
| 746 |
-
f'<tspan class="cd-rank">({r:.2f})</tspan></text>')
|
| 747 |
-
|
| 748 |
-
parts.append('</svg>')
|
| 749 |
-
return "".join(parts)
|
| 750 |
-
|
| 751 |
-
|
| 752 |
-
def _svg_escape(text: str) -> str:
|
| 753 |
-
"""Échappe un texte pour inclusion sûre dans un nœud SVG/XML."""
|
| 754 |
-
return (text.replace("&", "&")
|
| 755 |
-
.replace("<", "<")
|
| 756 |
-
.replace(">", ">")
|
| 757 |
-
.replace('"', """)
|
| 758 |
-
.replace("'", "'"))
|
| 759 |
-
|
| 760 |
-
|
| 761 |
-
# ---------------------------------------------------------------------------
|
| 762 |
-
# Frontière de Pareto (Sprint 19)
|
| 763 |
-
# ---------------------------------------------------------------------------
|
| 764 |
-
|
| 765 |
-
def compute_pareto_front(
|
| 766 |
-
points: list[dict],
|
| 767 |
-
objectives: tuple[str, ...] = ("cer", "cost"),
|
| 768 |
-
name_key: str = "engine",
|
| 769 |
-
minimize: Optional[tuple[bool, ...]] = None,
|
| 770 |
-
) -> list[str]:
|
| 771 |
-
"""Calcule la frontière de Pareto sur ``len(objectives)`` dimensions.
|
| 772 |
-
|
| 773 |
-
Un point ``p`` est Pareto-dominant si aucun autre point n'a, pour TOUS
|
| 774 |
-
les objectifs, une valeur au moins aussi bonne ET au moins une valeur
|
| 775 |
-
strictement meilleure.
|
| 776 |
-
|
| 777 |
-
Parameters
|
| 778 |
-
----------
|
| 779 |
-
points:
|
| 780 |
-
Liste de dicts. Chaque dict doit contenir ``name_key`` et toutes les
|
| 781 |
-
clés de ``objectives``. Les points dont une valeur d'objectif est
|
| 782 |
-
``None`` sont ignorés (pas de comparaison possible).
|
| 783 |
-
objectives:
|
| 784 |
-
Clés des objectifs à minimiser/maximiser.
|
| 785 |
-
name_key:
|
| 786 |
-
Clé identifiant le point (par défaut ``"engine"``).
|
| 787 |
-
minimize:
|
| 788 |
-
Pour chaque objectif, ``True`` = minimiser (ex. CER, coût),
|
| 789 |
-
``False`` = maximiser (ex. ancrage). Doit avoir la même longueur
|
| 790 |
-
que ``objectives``.
|
| 791 |
-
|
| 792 |
-
Returns
|
| 793 |
-
-------
|
| 794 |
-
Liste des ``name`` des points sur le front Pareto, ordre stable depuis
|
| 795 |
-
``points``.
|
| 796 |
-
"""
|
| 797 |
-
if minimize is None:
|
| 798 |
-
minimize = tuple(True for _ in objectives)
|
| 799 |
-
if len(minimize) != len(objectives):
|
| 800 |
-
raise ValueError("`minimize` doit avoir la même longueur que `objectives`")
|
| 801 |
-
|
| 802 |
-
valid = []
|
| 803 |
-
for p in points:
|
| 804 |
-
try:
|
| 805 |
-
vals = tuple(float(p[k]) for k in objectives)
|
| 806 |
-
except (KeyError, TypeError, ValueError):
|
| 807 |
-
continue
|
| 808 |
-
valid.append((p[name_key], vals))
|
| 809 |
-
|
| 810 |
-
front: list[str] = []
|
| 811 |
-
for name_a, vals_a in valid:
|
| 812 |
-
dominated = False
|
| 813 |
-
for name_b, vals_b in valid:
|
| 814 |
-
if name_a == name_b:
|
| 815 |
-
continue
|
| 816 |
-
# B domine A si B est ≥ aussi bon partout ET strictement meilleur quelque part
|
| 817 |
-
better_or_equal_everywhere = True
|
| 818 |
-
strictly_better_somewhere = False
|
| 819 |
-
for va, vb, mini in zip(vals_a, vals_b, minimize):
|
| 820 |
-
if mini:
|
| 821 |
-
if vb > va:
|
| 822 |
-
better_or_equal_everywhere = False
|
| 823 |
-
break
|
| 824 |
-
if vb < va:
|
| 825 |
-
strictly_better_somewhere = True
|
| 826 |
-
else: # maximiser
|
| 827 |
-
if vb < va:
|
| 828 |
-
better_or_equal_everywhere = False
|
| 829 |
-
break
|
| 830 |
-
if vb > va:
|
| 831 |
-
strictly_better_somewhere = True
|
| 832 |
-
if better_or_equal_everywhere and strictly_better_somewhere:
|
| 833 |
-
dominated = True
|
| 834 |
-
break
|
| 835 |
-
if not dominated:
|
| 836 |
-
front.append(name_a)
|
| 837 |
-
return front
|
| 838 |
-
|
| 839 |
-
|
| 840 |
-
# ---------------------------------------------------------------------------
|
| 841 |
-
# Clustering des patterns d'erreurs
|
| 842 |
-
# ---------------------------------------------------------------------------
|
| 843 |
-
|
| 844 |
-
# Patterns d'erreurs fréquentes (OCR + HTR documents patrimoniaux)
|
| 845 |
-
_ERROR_PATTERNS = [
|
| 846 |
-
# (pattern_re, label)
|
| 847 |
-
(r"\brn\b.*\bm\b|\bm\b.*\brn\b|rn→m|m→rn", "confusion rn/m"),
|
| 848 |
-
(r"[lI]→1|1→[lI]|l→1|1→l|I→1|1→I", "confusion l/1/I"),
|
| 849 |
-
(r"u→n|n→u|v→u|u→v", "confusion u/n/v"),
|
| 850 |
-
(r"[oO]→0|0→[oO]", "confusion O/0"),
|
| 851 |
-
(r"ſ→[fs]|[fs]→ſ", "confusion ſ/f/s"),
|
| 852 |
-
(r"é→e|è→e|ê→e|e→[éèê]", "erreur diacritique é/e"),
|
| 853 |
-
(r"œ→oe|oe→œ|æ→ae|ae→æ", "ligature œ/æ"),
|
| 854 |
-
(r"[fF]i→fi|fi→[fF]i", "ligature fi"),
|
| 855 |
-
(r"[fF]l→fl|fl→[fF]l", "ligature fl"),
|
| 856 |
-
(r"\s+→''|''→\s+", "segmentation espace"),
|
| 857 |
-
]
|
| 858 |
-
|
| 859 |
-
def _extract_error_pairs(gt: str, hyp: str) -> list[tuple[str, str]]:
|
| 860 |
-
"""Extrait les paires (gt_char_seq, hyp_char_seq) d'erreurs de substitution."""
|
| 861 |
-
# Sprint A3 (B-1) : import depuis Cercle 1, plus de violation Cercle 2→3.
|
| 862 |
-
from picarones.core.diff_utils import compute_word_diff
|
| 863 |
-
ops = compute_word_diff(gt, hyp)
|
| 864 |
-
pairs = []
|
| 865 |
-
for op in ops:
|
| 866 |
-
if op["op"] == "replace":
|
| 867 |
-
pairs.append((op["old"], op["new"]))
|
| 868 |
-
elif op["op"] == "delete":
|
| 869 |
-
pairs.append((op["text"], ""))
|
| 870 |
-
elif op["op"] == "insert":
|
| 871 |
-
pairs.append(("", op["text"]))
|
| 872 |
-
return pairs
|
| 873 |
-
|
| 874 |
-
|
| 875 |
-
@dataclass
|
| 876 |
-
class ErrorCluster:
|
| 877 |
-
"""Un cluster d'erreurs similaires."""
|
| 878 |
-
cluster_id: int
|
| 879 |
-
label: str
|
| 880 |
-
"""Description humaine du pattern (ex. 'confusion rn/m')."""
|
| 881 |
-
count: int
|
| 882 |
-
examples: list[dict]
|
| 883 |
-
"""Liste de {engine, gt_fragment, ocr_fragment}."""
|
| 884 |
-
|
| 885 |
-
def as_dict(self) -> dict:
|
| 886 |
-
return {
|
| 887 |
-
"cluster_id": self.cluster_id,
|
| 888 |
-
"label": self.label,
|
| 889 |
-
"count": self.count,
|
| 890 |
-
"examples": self.examples[:5], # 5 exemples max
|
| 891 |
-
}
|
| 892 |
-
|
| 893 |
-
|
| 894 |
-
def cluster_errors(
|
| 895 |
-
error_data: list[dict],
|
| 896 |
-
max_clusters: int = 8,
|
| 897 |
-
) -> list[ErrorCluster]:
|
| 898 |
-
"""Regroupe les erreurs en clusters avec labels lisibles.
|
| 899 |
-
|
| 900 |
-
Parameters
|
| 901 |
-
----------
|
| 902 |
-
error_data : liste de dicts {engine, gt, hypothesis}
|
| 903 |
-
max_clusters : nombre max de clusters à retourner
|
| 904 |
-
|
| 905 |
-
Returns
|
| 906 |
-
-------
|
| 907 |
-
Liste de ErrorCluster triée par count décroissant.
|
| 908 |
-
"""
|
| 909 |
-
# Collecter tous les patterns d'erreur avec contexte
|
| 910 |
-
# Clé : catégorie d'erreur → liste d'exemples
|
| 911 |
-
bucket: dict[str, list[dict]] = defaultdict(list)
|
| 912 |
-
other_pairs: list[dict] = []
|
| 913 |
-
|
| 914 |
-
for item in error_data:
|
| 915 |
-
engine = item.get("engine", "")
|
| 916 |
-
gt = item.get("gt", "")
|
| 917 |
-
hyp = item.get("hypothesis", "")
|
| 918 |
-
pairs = _extract_error_pairs(gt, hyp)
|
| 919 |
-
|
| 920 |
-
for old, new in pairs:
|
| 921 |
-
if not old and not new:
|
| 922 |
-
continue
|
| 923 |
-
matched = False
|
| 924 |
-
# Essayer de matcher un pattern connu
|
| 925 |
-
probe = f"{old}→{new}"
|
| 926 |
-
for _pat, label in _ERROR_PATTERNS:
|
| 927 |
-
try:
|
| 928 |
-
if re.search(_pat, probe, re.IGNORECASE):
|
| 929 |
-
bucket[label].append({
|
| 930 |
-
"engine": engine,
|
| 931 |
-
"gt_fragment": old,
|
| 932 |
-
"ocr_fragment": new,
|
| 933 |
-
})
|
| 934 |
-
matched = True
|
| 935 |
-
break
|
| 936 |
-
except re.error:
|
| 937 |
-
pass
|
| 938 |
-
|
| 939 |
-
if not matched:
|
| 940 |
-
# Regrouper les substitutions restantes par paire de caractères
|
| 941 |
-
if len(old) <= 3 and len(new) <= 3:
|
| 942 |
-
key = f"{old}→{new}" if (old and new) else (f"—→{new}" if new else f"{old}→—")
|
| 943 |
-
bucket[key].append({
|
| 944 |
-
"engine": engine,
|
| 945 |
-
"gt_fragment": old,
|
| 946 |
-
"ocr_fragment": new,
|
| 947 |
-
})
|
| 948 |
-
else:
|
| 949 |
-
other_pairs.append({
|
| 950 |
-
"engine": engine,
|
| 951 |
-
"gt_fragment": old,
|
| 952 |
-
"ocr_fragment": new,
|
| 953 |
-
})
|
| 954 |
-
|
| 955 |
-
# Construire les clusters triés par fréquence
|
| 956 |
-
clusters: list[ErrorCluster] = []
|
| 957 |
-
cluster_id = 1
|
| 958 |
-
sorted_buckets = sorted(bucket.items(), key=lambda x: -len(x[1]))
|
| 959 |
-
|
| 960 |
-
for label, examples in sorted_buckets[:max_clusters - 1]:
|
| 961 |
-
clusters.append(ErrorCluster(
|
| 962 |
-
cluster_id=cluster_id,
|
| 963 |
-
label=label,
|
| 964 |
-
count=len(examples),
|
| 965 |
-
examples=examples,
|
| 966 |
-
))
|
| 967 |
-
cluster_id += 1
|
| 968 |
-
|
| 969 |
-
# Cluster "autres"
|
| 970 |
-
if other_pairs:
|
| 971 |
-
clusters.append(ErrorCluster(
|
| 972 |
-
cluster_id=cluster_id,
|
| 973 |
-
label="autres substitutions",
|
| 974 |
-
count=len(other_pairs),
|
| 975 |
-
examples=other_pairs,
|
| 976 |
-
))
|
| 977 |
-
|
| 978 |
-
# Trier par count décroissant et limiter
|
| 979 |
-
clusters.sort(key=lambda c: -c.count)
|
| 980 |
-
return clusters[:max_clusters]
|
| 981 |
-
|
| 982 |
-
|
| 983 |
-
# ---------------------------------------------------------------------------
|
| 984 |
-
# Matrice de corrélation entre métriques
|
| 985 |
-
# ---------------------------------------------------------------------------
|
| 986 |
-
|
| 987 |
-
def _pearson(x: list[float], y: list[float]) -> float:
|
| 988 |
-
"""Coefficient de corrélation de Pearson."""
|
| 989 |
-
n = len(x)
|
| 990 |
-
if n < 2:
|
| 991 |
-
return 0.0
|
| 992 |
-
mx = sum(x) / n
|
| 993 |
-
my = sum(y) / n
|
| 994 |
-
num = sum((xi - mx) * (yi - my) for xi, yi in zip(x, y))
|
| 995 |
-
den = math.sqrt(
|
| 996 |
-
sum((xi - mx) ** 2 for xi in x) * sum((yi - my) ** 2 for yi in y)
|
| 997 |
-
)
|
| 998 |
-
return num / den if den > 0 else 0.0
|
| 999 |
-
|
| 1000 |
-
|
| 1001 |
-
def compute_correlation_matrix(
|
| 1002 |
-
metrics_per_doc: list[dict],
|
| 1003 |
-
metric_keys: Optional[list[str]] = None,
|
| 1004 |
-
) -> dict:
|
| 1005 |
-
"""Calcule la matrice de corrélation entre toutes les métriques numériques.
|
| 1006 |
-
|
| 1007 |
-
Parameters
|
| 1008 |
-
----------
|
| 1009 |
-
metrics_per_doc : liste de dicts, un par document, contenant les métriques
|
| 1010 |
-
metric_keys : clés à inclure (None → toutes les clés numériques)
|
| 1011 |
-
|
| 1012 |
-
Returns
|
| 1013 |
-
-------
|
| 1014 |
-
{
|
| 1015 |
-
"labels": [...],
|
| 1016 |
-
"matrix": [[r_ij, ...], ...] // coefficients de Pearson
|
| 1017 |
-
}
|
| 1018 |
-
"""
|
| 1019 |
-
if not metrics_per_doc:
|
| 1020 |
-
return {"labels": [], "matrix": []}
|
| 1021 |
-
|
| 1022 |
-
if metric_keys is None:
|
| 1023 |
-
# Déduire les clés numériques
|
| 1024 |
-
sample = metrics_per_doc[0]
|
| 1025 |
-
metric_keys = [k for k, v in sample.items() if isinstance(v, (int, float))]
|
| 1026 |
-
|
| 1027 |
-
# Construire les vecteurs
|
| 1028 |
-
vectors: dict[str, list[float]] = {k: [] for k in metric_keys}
|
| 1029 |
-
for doc in metrics_per_doc:
|
| 1030 |
-
for k in metric_keys:
|
| 1031 |
-
v = doc.get(k)
|
| 1032 |
-
vectors[k].append(float(v) if v is not None else 0.0)
|
| 1033 |
-
|
| 1034 |
-
# Calculer la matrice
|
| 1035 |
-
labels = metric_keys
|
| 1036 |
-
n = len(labels)
|
| 1037 |
-
matrix = []
|
| 1038 |
-
for i in range(n):
|
| 1039 |
-
row = []
|
| 1040 |
-
for j in range(n):
|
| 1041 |
-
r = _pearson(vectors[labels[i]], vectors[labels[j]])
|
| 1042 |
-
row.append(round(r, 4))
|
| 1043 |
-
matrix.append(row)
|
| 1044 |
-
|
| 1045 |
-
return {"labels": labels, "matrix": matrix}
|
| 1046 |
-
|
| 1047 |
-
|
| 1048 |
-
# ---------------------------------------------------------------------------
|
| 1049 |
-
# Courbe de fiabilité (reliability curve)
|
| 1050 |
-
# ---------------------------------------------------------------------------
|
| 1051 |
-
|
| 1052 |
-
def compute_reliability_curve(
|
| 1053 |
-
cer_values: list[float],
|
| 1054 |
-
steps: int = 20,
|
| 1055 |
-
) -> list[dict]:
|
| 1056 |
-
"""Pour les X% documents les plus faciles, quel est le CER moyen ?
|
| 1057 |
-
|
| 1058 |
-
Returns
|
| 1059 |
-
-------
|
| 1060 |
-
Liste de {pct_docs: float, mean_cer: float}
|
| 1061 |
-
"""
|
| 1062 |
-
if not cer_values:
|
| 1063 |
-
return []
|
| 1064 |
-
sorted_cer = sorted(cer_values)
|
| 1065 |
-
n = len(sorted_cer)
|
| 1066 |
-
points = []
|
| 1067 |
-
for step in range(1, steps + 1):
|
| 1068 |
-
pct = step / steps
|
| 1069 |
-
cutoff = max(1, int(pct * n))
|
| 1070 |
-
subset = sorted_cer[:cutoff]
|
| 1071 |
-
mean_cer = sum(subset) / len(subset)
|
| 1072 |
-
points.append({"pct_docs": round(pct * 100, 1), "mean_cer": round(mean_cer, 6)})
|
| 1073 |
-
return points
|
| 1074 |
-
|
| 1075 |
-
|
| 1076 |
-
# ---------------------------------------------------------------------------
|
| 1077 |
-
# Données pour le diagramme de Venn (erreurs communes / exclusives)
|
| 1078 |
-
# ---------------------------------------------------------------------------
|
| 1079 |
-
|
| 1080 |
-
def compute_venn_data(
|
| 1081 |
-
engine_error_sets: dict[str, set[str]],
|
| 1082 |
-
) -> dict:
|
| 1083 |
-
"""Calcule les cardinalités pour un diagramme de Venn entre 2 ou 3 concurrents.
|
| 1084 |
-
|
| 1085 |
-
Parameters
|
| 1086 |
-
----------
|
| 1087 |
-
engine_error_sets : {engine_name → set of doc_id:error_token_pair strings}
|
| 1088 |
-
|
| 1089 |
-
Returns
|
| 1090 |
-
-------
|
| 1091 |
-
Pour 2 concurrents :
|
| 1092 |
-
{only_a, only_b, both, label_a, label_b}
|
| 1093 |
-
Pour 3 concurrents :
|
| 1094 |
-
{only_a, only_b, only_c, ab, ac, bc, abc, label_a, label_b, label_c}
|
| 1095 |
-
"""
|
| 1096 |
-
names = list(engine_error_sets.keys())[:3] # max 3 pour Venn lisible
|
| 1097 |
-
if len(names) < 2:
|
| 1098 |
-
return {}
|
| 1099 |
-
|
| 1100 |
-
sets = {n: engine_error_sets[n] for n in names}
|
| 1101 |
-
|
| 1102 |
-
if len(names) == 2:
|
| 1103 |
-
a, b = names
|
| 1104 |
-
sa, sb = sets[a], sets[b]
|
| 1105 |
-
return {
|
| 1106 |
-
"type": "venn2",
|
| 1107 |
-
"label_a": a,
|
| 1108 |
-
"label_b": b,
|
| 1109 |
-
"only_a": len(sa - sb),
|
| 1110 |
-
"only_b": len(sb - sa),
|
| 1111 |
-
"both": len(sa & sb),
|
| 1112 |
-
}
|
| 1113 |
-
else:
|
| 1114 |
-
a, b, c = names
|
| 1115 |
-
sa, sb, sc = sets[a], sets[b], sets[c]
|
| 1116 |
-
return {
|
| 1117 |
-
"type": "venn3",
|
| 1118 |
-
"label_a": a,
|
| 1119 |
-
"label_b": b,
|
| 1120 |
-
"label_c": c,
|
| 1121 |
-
"only_a": len(sa - sb - sc),
|
| 1122 |
-
"only_b": len(sb - sa - sc),
|
| 1123 |
-
"only_c": len(sc - sa - sb),
|
| 1124 |
-
"ab": len((sa & sb) - sc),
|
| 1125 |
-
"ac": len((sa & sc) - sb),
|
| 1126 |
-
"bc": len((sb & sc) - sa),
|
| 1127 |
-
"abc": len(sa & sb & sc),
|
| 1128 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
picarones/measurements/statistics/__init__.py
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tests statistiques et clustering d'erreurs pour Picarones.
|
| 2 |
+
|
| 3 |
+
Avant le sprint « découpage de statistics.py » (2026-05-02) ce module
|
| 4 |
+
était un fichier unique de 1128 lignes mélangeant Wilcoxon, Friedman,
|
| 5 |
+
Nemenyi, bootstrap, Pareto, clustering, corrélation, courbes de
|
| 6 |
+
distribution et rendu SVG du Critical Difference Diagram.
|
| 7 |
+
|
| 8 |
+
Le sous-package éclate la responsabilité par famille statistique :
|
| 9 |
+
|
| 10 |
+
- :mod:`bootstrap` — IC bootstrap par rééchantillonnage.
|
| 11 |
+
- :mod:`wilcoxon` — Test signé-rangé + matrice pairwise.
|
| 12 |
+
- :mod:`friedman_nemenyi` — Friedman multi-moteurs + post-hoc Nemenyi
|
| 13 |
+
(calcul uniquement, pas de rendu).
|
| 14 |
+
- :mod:`cdd_render` — Rendu SVG du Critical Difference Diagram.
|
| 15 |
+
- :mod:`pareto` — Frontière de Pareto multi-objectifs.
|
| 16 |
+
- :mod:`clustering` — Regroupement des patterns d'erreur OCR/HTR.
|
| 17 |
+
- :mod:`correlation` — Matrice de corrélation entre métriques.
|
| 18 |
+
- :mod:`distributions` — Reliability curve et données Venn 2/3.
|
| 19 |
+
|
| 20 |
+
Ce ``__init__.py`` ré-exporte toute l'API publique historique pour
|
| 21 |
+
que les ~30 fichiers qui importent depuis
|
| 22 |
+
``picarones.measurements.statistics`` continuent à fonctionner sans
|
| 23 |
+
modification. Les symboles privés ``_SCIPY_AVAILABLE``,
|
| 24 |
+
``_chi_square_sf``, ``_nemenyi_critical_value``, ``_rank_row`` sont
|
| 25 |
+
également ré-exportés car certains tests les consomment directement.
|
| 26 |
+
"""
|
| 27 |
+
|
| 28 |
+
from picarones.measurements.statistics.bootstrap import bootstrap_ci
|
| 29 |
+
from picarones.measurements.statistics.cdd_render import (
|
| 30 |
+
build_critical_difference_svg,
|
| 31 |
+
)
|
| 32 |
+
from picarones.measurements.statistics.clustering import (
|
| 33 |
+
ErrorCluster,
|
| 34 |
+
cluster_errors,
|
| 35 |
+
)
|
| 36 |
+
from picarones.measurements.statistics.correlation import (
|
| 37 |
+
compute_correlation_matrix,
|
| 38 |
+
)
|
| 39 |
+
from picarones.measurements.statistics.distributions import (
|
| 40 |
+
compute_reliability_curve,
|
| 41 |
+
compute_venn_data,
|
| 42 |
+
)
|
| 43 |
+
from picarones.measurements.statistics.friedman_nemenyi import (
|
| 44 |
+
_chi_square_sf,
|
| 45 |
+
_nemenyi_critical_value,
|
| 46 |
+
_rank_row,
|
| 47 |
+
friedman_test,
|
| 48 |
+
nemenyi_posthoc,
|
| 49 |
+
)
|
| 50 |
+
from picarones.measurements.statistics.pareto import compute_pareto_front
|
| 51 |
+
from picarones.measurements.statistics.wilcoxon import (
|
| 52 |
+
_SCIPY_AVAILABLE,
|
| 53 |
+
compute_pairwise_stats,
|
| 54 |
+
wilcoxon_test,
|
| 55 |
+
)
|
| 56 |
+
|
| 57 |
+
__all__ = [
|
| 58 |
+
# Bootstrap
|
| 59 |
+
"bootstrap_ci",
|
| 60 |
+
# Wilcoxon
|
| 61 |
+
"wilcoxon_test",
|
| 62 |
+
"compute_pairwise_stats",
|
| 63 |
+
# Friedman / Nemenyi
|
| 64 |
+
"friedman_test",
|
| 65 |
+
"nemenyi_posthoc",
|
| 66 |
+
"build_critical_difference_svg",
|
| 67 |
+
# Pareto
|
| 68 |
+
"compute_pareto_front",
|
| 69 |
+
# Clustering
|
| 70 |
+
"ErrorCluster",
|
| 71 |
+
"cluster_errors",
|
| 72 |
+
# Correlation
|
| 73 |
+
"compute_correlation_matrix",
|
| 74 |
+
# Distributions
|
| 75 |
+
"compute_reliability_curve",
|
| 76 |
+
"compute_venn_data",
|
| 77 |
+
# Privés ré-exportés (consommés par certains tests)
|
| 78 |
+
"_SCIPY_AVAILABLE",
|
| 79 |
+
"_chi_square_sf",
|
| 80 |
+
"_nemenyi_critical_value",
|
| 81 |
+
"_rank_row",
|
| 82 |
+
]
|
picarones/measurements/statistics/bootstrap.py
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Intervalle de confiance par bootstrap (Sprint 7).
|
| 2 |
+
|
| 3 |
+
Méthode de rééchantillonnage non-paramétrique. Pas d'hypothèse de
|
| 4 |
+
distribution normale — adapté aux distributions asymétriques de CER
|
| 5 |
+
typiques des corpus patrimoniaux.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
from __future__ import annotations
|
| 9 |
+
|
| 10 |
+
import random
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def bootstrap_ci(
|
| 14 |
+
values: list[float],
|
| 15 |
+
n_iter: int = 1000,
|
| 16 |
+
ci: float = 0.95,
|
| 17 |
+
seed: int = 42,
|
| 18 |
+
) -> tuple[float, float]:
|
| 19 |
+
"""Intervalle de confiance par bootstrap.
|
| 20 |
+
|
| 21 |
+
Parameters
|
| 22 |
+
----------
|
| 23 |
+
values : liste des valeurs (ex. CER par document)
|
| 24 |
+
n_iter : nombre d'itérations bootstrap (défaut 1000)
|
| 25 |
+
ci : niveau de confiance (défaut 0.95 → 95 %)
|
| 26 |
+
seed : graine RNG pour reproductibilité
|
| 27 |
+
|
| 28 |
+
Returns
|
| 29 |
+
-------
|
| 30 |
+
(lower, upper) — les bornes de l'IC à ``ci`` %
|
| 31 |
+
"""
|
| 32 |
+
if not values:
|
| 33 |
+
return (0.0, 0.0)
|
| 34 |
+
rng = random.Random(seed)
|
| 35 |
+
n = len(values)
|
| 36 |
+
means = []
|
| 37 |
+
for _ in range(n_iter):
|
| 38 |
+
sample = [values[rng.randint(0, n - 1)] for _ in range(n)]
|
| 39 |
+
means.append(sum(sample) / n)
|
| 40 |
+
means.sort()
|
| 41 |
+
alpha = (1.0 - ci) / 2.0
|
| 42 |
+
lo_idx = max(0, int(alpha * n_iter))
|
| 43 |
+
hi_idx = min(n_iter - 1, int((1.0 - alpha) * n_iter))
|
| 44 |
+
return (means[lo_idx], means[hi_idx])
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
__all__ = ["bootstrap_ci"]
|
picarones/measurements/statistics/cdd_render.py
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Rendu SVG du Critical Difference Diagram (Sprint 17).
|
| 2 |
+
|
| 3 |
+
Visualisation canonique du résultat Friedman-Nemenyi (Demšar 2006) :
|
| 4 |
+
axe horizontal des rangs moyens + barres horizontales reliant les
|
| 5 |
+
moteurs statistiquement indiscernables au seuil α.
|
| 6 |
+
|
| 7 |
+
Module séparé du calcul (:mod:`friedman_nemenyi`) pour respecter la
|
| 8 |
+
distinction "computation vs presentation" : on peut imaginer un
|
| 9 |
+
rendu PNG, PDF, ou autre, sans toucher au calcul.
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from __future__ import annotations
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def build_critical_difference_svg(
|
| 16 |
+
nemenyi_result: dict,
|
| 17 |
+
width: int = 780,
|
| 18 |
+
row_height: int = 22,
|
| 19 |
+
) -> str:
|
| 20 |
+
"""Génère le SVG du Critical Difference Diagram (Demšar 2006).
|
| 21 |
+
|
| 22 |
+
Le diagramme montre :
|
| 23 |
+
* un axe horizontal des rangs moyens (1 à k),
|
| 24 |
+
* chaque moteur positionné sur l'axe à son rang moyen,
|
| 25 |
+
* des barres horizontales épaisses reliant les moteurs statistiquement
|
| 26 |
+
indiscernables (distance ≤ CD),
|
| 27 |
+
* la longueur de CD affichée au-dessus de l'axe en référence.
|
| 28 |
+
|
| 29 |
+
Parameters
|
| 30 |
+
----------
|
| 31 |
+
nemenyi_result:
|
| 32 |
+
Résultat de ``nemenyi_posthoc``.
|
| 33 |
+
width:
|
| 34 |
+
Largeur totale du SVG en pixels.
|
| 35 |
+
row_height:
|
| 36 |
+
Hauteur de chaque ligne d'étiquette moteur (auto-adaptatif).
|
| 37 |
+
|
| 38 |
+
Returns
|
| 39 |
+
-------
|
| 40 |
+
Chaîne contenant le SVG (balise racine ``<svg>…</svg>``).
|
| 41 |
+
"""
|
| 42 |
+
k = nemenyi_result.get("n_engines", 0)
|
| 43 |
+
if k < 2 or nemenyi_result.get("error"):
|
| 44 |
+
return (
|
| 45 |
+
'<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="40" '
|
| 46 |
+
'role="img" aria-label="Critical Difference Diagram indisponible">'
|
| 47 |
+
'<text x="10" y="24" font-family="sans-serif" font-size="12" fill="#666">'
|
| 48 |
+
'Critical Difference Diagram non calculable — données insuffisantes.'
|
| 49 |
+
'</text></svg>'
|
| 50 |
+
)
|
| 51 |
+
|
| 52 |
+
engines_sorted: list[str] = list(nemenyi_result.get("engines_sorted", []))
|
| 53 |
+
mean_ranks: dict[str, float] = dict(nemenyi_result.get("mean_ranks", {}))
|
| 54 |
+
tied_groups: list[list[str]] = list(nemenyi_result.get("tied_groups", []))
|
| 55 |
+
cd: float = float(nemenyi_result.get("critical_distance", 0.0))
|
| 56 |
+
|
| 57 |
+
# Dimensions
|
| 58 |
+
left_pad, right_pad = 40, 40
|
| 59 |
+
top_pad = 50 # espace pour l'affichage CD
|
| 60 |
+
axis_y = top_pad + 10
|
| 61 |
+
bars_start_y = axis_y + 20 # première barre d'ex-aequo sous l'axe
|
| 62 |
+
# Empiler une ligne par groupe + une ligne par étiquette
|
| 63 |
+
label_rows = k # chaque moteur a sa propre ligne de label
|
| 64 |
+
bars_count = len(tied_groups)
|
| 65 |
+
total_h = bars_start_y + bars_count * 10 + label_rows * row_height + 20
|
| 66 |
+
|
| 67 |
+
axis_x0, axis_x1 = left_pad, width - right_pad
|
| 68 |
+
axis_width = axis_x1 - axis_x0
|
| 69 |
+
|
| 70 |
+
def x_for_rank(r: float) -> float:
|
| 71 |
+
# Rang 1 à gauche, rang k à droite
|
| 72 |
+
if k <= 1:
|
| 73 |
+
return axis_x0
|
| 74 |
+
return axis_x0 + (r - 1.0) / (k - 1.0) * axis_width
|
| 75 |
+
|
| 76 |
+
parts: list[str] = []
|
| 77 |
+
parts.append(
|
| 78 |
+
f'<svg xmlns="http://www.w3.org/2000/svg" width="100%" viewBox="0 0 {width} {total_h}" '
|
| 79 |
+
f'role="img" aria-label="Critical Difference Diagram (Friedman-Nemenyi)" '
|
| 80 |
+
f'font-family="system-ui, -apple-system, sans-serif">'
|
| 81 |
+
)
|
| 82 |
+
parts.append('<style>.cd-axis{stroke:#334155;stroke-width:1.5}.cd-tick{stroke:#334155;stroke-width:1}'
|
| 83 |
+
'.cd-label{fill:#0f172a;font-size:11px}'
|
| 84 |
+
'.cd-tie{stroke:#0f172a;stroke-width:4;stroke-linecap:round}'
|
| 85 |
+
'.cd-cd-bar{stroke:#dc2626;stroke-width:2}'
|
| 86 |
+
'.cd-cd-txt{fill:#dc2626;font-size:11px;font-weight:600}'
|
| 87 |
+
'.cd-name{fill:#0f172a;font-size:12px}'
|
| 88 |
+
'.cd-rank{fill:#64748b;font-size:10px}'
|
| 89 |
+
'</style>')
|
| 90 |
+
|
| 91 |
+
# Barre CD de référence (en haut, à gauche de l'axe)
|
| 92 |
+
if cd > 0 and k >= 2:
|
| 93 |
+
cd_bar_x0 = axis_x0
|
| 94 |
+
cd_bar_x1 = axis_x0 + (cd / max(1, k - 1)) * axis_width
|
| 95 |
+
cd_y = top_pad - 20
|
| 96 |
+
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x0:.1f}" y1="{cd_y}" '
|
| 97 |
+
f'x2="{cd_bar_x1:.1f}" y2="{cd_y}"/>')
|
| 98 |
+
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x0:.1f}" y1="{cd_y - 4}" '
|
| 99 |
+
f'x2="{cd_bar_x0:.1f}" y2="{cd_y + 4}"/>')
|
| 100 |
+
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x1:.1f}" y1="{cd_y - 4}" '
|
| 101 |
+
f'x2="{cd_bar_x1:.1f}" y2="{cd_y + 4}"/>')
|
| 102 |
+
parts.append(f'<text class="cd-cd-txt" x="{(cd_bar_x0 + cd_bar_x1)/2:.1f}" y="{cd_y - 8}" '
|
| 103 |
+
f'text-anchor="middle">CD = {cd:.3f}</text>')
|
| 104 |
+
|
| 105 |
+
# Axe principal
|
| 106 |
+
parts.append(f'<line class="cd-axis" x1="{axis_x0}" y1="{axis_y}" '
|
| 107 |
+
f'x2="{axis_x1}" y2="{axis_y}"/>')
|
| 108 |
+
# Ticks entiers
|
| 109 |
+
for r in range(1, k + 1):
|
| 110 |
+
xt = x_for_rank(r)
|
| 111 |
+
parts.append(f'<line class="cd-tick" x1="{xt:.1f}" y1="{axis_y - 5}" '
|
| 112 |
+
f'x2="{xt:.1f}" y2="{axis_y + 5}"/>')
|
| 113 |
+
parts.append(f'<text class="cd-label" x="{xt:.1f}" y="{axis_y - 9}" '
|
| 114 |
+
f'text-anchor="middle">{r}</text>')
|
| 115 |
+
|
| 116 |
+
# Barres reliant les groupes indiscernables
|
| 117 |
+
for i, group in enumerate(tied_groups):
|
| 118 |
+
if len(group) < 2:
|
| 119 |
+
continue
|
| 120 |
+
rs = [mean_ranks[n] for n in group]
|
| 121 |
+
x0 = x_for_rank(min(rs))
|
| 122 |
+
x1 = x_for_rank(max(rs))
|
| 123 |
+
y_bar = bars_start_y + i * 10
|
| 124 |
+
parts.append(f'<line class="cd-tie" x1="{x0 - 3:.1f}" y1="{y_bar}" '
|
| 125 |
+
f'x2="{x1 + 3:.1f}" y2="{y_bar}"/>')
|
| 126 |
+
|
| 127 |
+
# Étiquettes des moteurs : la moitié la plus basse à gauche, l'autre à droite
|
| 128 |
+
labels_y_base = bars_start_y + bars_count * 10 + 15
|
| 129 |
+
half = (len(engines_sorted) + 1) // 2
|
| 130 |
+
left_engines = engines_sorted[:half]
|
| 131 |
+
right_engines = engines_sorted[half:]
|
| 132 |
+
|
| 133 |
+
for idx, name in enumerate(left_engines):
|
| 134 |
+
r = mean_ranks[name]
|
| 135 |
+
x = x_for_rank(r)
|
| 136 |
+
y_label = labels_y_base + idx * row_height
|
| 137 |
+
# Ligne du moteur vers axe
|
| 138 |
+
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{axis_y + 6}" '
|
| 139 |
+
f'x2="{x:.1f}" y2="{y_label - 4}"/>')
|
| 140 |
+
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{y_label - 4}" '
|
| 141 |
+
f'x2="{axis_x0 - 4:.1f}" y2="{y_label - 4}"/>')
|
| 142 |
+
parts.append(f'<text class="cd-name" x="{axis_x0 - 6:.1f}" y="{y_label}" '
|
| 143 |
+
f'text-anchor="end">{_svg_escape(name)} '
|
| 144 |
+
f'<tspan class="cd-rank">({r:.2f})</tspan></text>')
|
| 145 |
+
|
| 146 |
+
for idx, name in enumerate(right_engines):
|
| 147 |
+
r = mean_ranks[name]
|
| 148 |
+
x = x_for_rank(r)
|
| 149 |
+
y_label = labels_y_base + idx * row_height
|
| 150 |
+
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{axis_y + 6}" '
|
| 151 |
+
f'x2="{x:.1f}" y2="{y_label - 4}"/>')
|
| 152 |
+
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{y_label - 4}" '
|
| 153 |
+
f'x2="{axis_x1 + 4:.1f}" y2="{y_label - 4}"/>')
|
| 154 |
+
parts.append(f'<text class="cd-name" x="{axis_x1 + 6:.1f}" y="{y_label}" '
|
| 155 |
+
f'text-anchor="start">{_svg_escape(name)} '
|
| 156 |
+
f'<tspan class="cd-rank">({r:.2f})</tspan></text>')
|
| 157 |
+
|
| 158 |
+
parts.append('</svg>')
|
| 159 |
+
return "".join(parts)
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
def _svg_escape(text: str) -> str:
|
| 163 |
+
"""Échappe un texte pour inclusion sûre dans un nœud SVG/XML."""
|
| 164 |
+
return (text.replace("&", "&")
|
| 165 |
+
.replace("<", "<")
|
| 166 |
+
.replace(">", ">")
|
| 167 |
+
.replace('"', """)
|
| 168 |
+
.replace("'", "'"))
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
__all__ = ["build_critical_difference_svg"]
|
picarones/measurements/statistics/clustering.py
ADDED
|
@@ -0,0 +1,158 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Clustering des patterns d'erreurs (Sprint 7).
|
| 2 |
+
|
| 3 |
+
Regroupe les substitutions OCR/HTR fréquentes en clusters lisibles
|
| 4 |
+
(« confusion rn/m », « ligature œ/æ », etc.) pour le rapport HTML.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
import re
|
| 10 |
+
from collections import defaultdict
|
| 11 |
+
from dataclasses import dataclass
|
| 12 |
+
|
| 13 |
+
from picarones.core.diff_utils import compute_word_diff
|
| 14 |
+
|
| 15 |
+
# Patterns d'erreurs fréquentes (OCR + HTR documents patrimoniaux)
|
| 16 |
+
_ERROR_PATTERNS = [
|
| 17 |
+
# (pattern_re, label)
|
| 18 |
+
(r"\brn\b.*\bm\b|\bm\b.*\brn\b|rn→m|m→rn", "confusion rn/m"),
|
| 19 |
+
(r"[lI]→1|1→[lI]|l→1|1→l|I→1|1→I", "confusion l/1/I"),
|
| 20 |
+
(r"u→n|n→u|v→u|u→v", "confusion u/n/v"),
|
| 21 |
+
(r"[oO]→0|0→[oO]", "confusion O/0"),
|
| 22 |
+
(r"ſ→[fs]|[fs]→ſ", "confusion ſ/f/s"),
|
| 23 |
+
(r"é→e|è→e|ê→e|e→[éèê]", "erreur diacritique é/e"),
|
| 24 |
+
(r"œ→oe|oe→œ|æ→ae|ae→æ", "ligature œ/æ"),
|
| 25 |
+
(r"[fF]i→fi|fi→[fF]i", "ligature fi"),
|
| 26 |
+
(r"[fF]l→fl|fl→[fF]l", "ligature fl"),
|
| 27 |
+
(r"\s+→''|''→\s+", "segmentation espace"),
|
| 28 |
+
]
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
def _extract_error_pairs(gt: str, hyp: str) -> list[tuple[str, str]]:
|
| 32 |
+
"""Extrait les paires (gt_char_seq, hyp_char_seq) d'erreurs de substitution.
|
| 33 |
+
|
| 34 |
+
L'import de ``compute_word_diff`` est au top-level du module
|
| 35 |
+
(cercle 1 → cercle 2, sens autorisé). Il était paresseux historiquement
|
| 36 |
+
pour contourner une violation de cercle (Sprint A3) qui n'existe plus.
|
| 37 |
+
"""
|
| 38 |
+
ops = compute_word_diff(gt, hyp)
|
| 39 |
+
pairs = []
|
| 40 |
+
for op in ops:
|
| 41 |
+
if op["op"] == "replace":
|
| 42 |
+
pairs.append((op["old"], op["new"]))
|
| 43 |
+
elif op["op"] == "delete":
|
| 44 |
+
pairs.append((op["text"], ""))
|
| 45 |
+
elif op["op"] == "insert":
|
| 46 |
+
pairs.append(("", op["text"]))
|
| 47 |
+
return pairs
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
@dataclass
|
| 51 |
+
class ErrorCluster:
|
| 52 |
+
"""Un cluster d'erreurs similaires."""
|
| 53 |
+
cluster_id: int
|
| 54 |
+
label: str
|
| 55 |
+
"""Description humaine du pattern (ex. 'confusion rn/m')."""
|
| 56 |
+
count: int
|
| 57 |
+
examples: list[dict]
|
| 58 |
+
"""Liste de {engine, gt_fragment, ocr_fragment}."""
|
| 59 |
+
|
| 60 |
+
def as_dict(self) -> dict:
|
| 61 |
+
return {
|
| 62 |
+
"cluster_id": self.cluster_id,
|
| 63 |
+
"label": self.label,
|
| 64 |
+
"count": self.count,
|
| 65 |
+
"examples": self.examples[:5], # 5 exemples max
|
| 66 |
+
}
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def cluster_errors(
|
| 70 |
+
error_data: list[dict],
|
| 71 |
+
max_clusters: int = 8,
|
| 72 |
+
) -> list[ErrorCluster]:
|
| 73 |
+
"""Regroupe les erreurs en clusters avec labels lisibles.
|
| 74 |
+
|
| 75 |
+
Parameters
|
| 76 |
+
----------
|
| 77 |
+
error_data : liste de dicts {engine, gt, hypothesis}
|
| 78 |
+
max_clusters : nombre max de clusters à retourner
|
| 79 |
+
|
| 80 |
+
Returns
|
| 81 |
+
-------
|
| 82 |
+
Liste de ErrorCluster triée par count décroissant.
|
| 83 |
+
"""
|
| 84 |
+
# Collecter tous les patterns d'erreur avec contexte
|
| 85 |
+
# Clé : catégorie d'erreur → liste d'exemples
|
| 86 |
+
bucket: dict[str, list[dict]] = defaultdict(list)
|
| 87 |
+
other_pairs: list[dict] = []
|
| 88 |
+
|
| 89 |
+
for item in error_data:
|
| 90 |
+
engine = item.get("engine", "")
|
| 91 |
+
gt = item.get("gt", "")
|
| 92 |
+
hyp = item.get("hypothesis", "")
|
| 93 |
+
pairs = _extract_error_pairs(gt, hyp)
|
| 94 |
+
|
| 95 |
+
for old, new in pairs:
|
| 96 |
+
if not old and not new:
|
| 97 |
+
continue
|
| 98 |
+
matched = False
|
| 99 |
+
# Essayer de matcher un pattern connu
|
| 100 |
+
probe = f"{old}→{new}"
|
| 101 |
+
for _pat, label in _ERROR_PATTERNS:
|
| 102 |
+
try:
|
| 103 |
+
if re.search(_pat, probe, re.IGNORECASE):
|
| 104 |
+
bucket[label].append({
|
| 105 |
+
"engine": engine,
|
| 106 |
+
"gt_fragment": old,
|
| 107 |
+
"ocr_fragment": new,
|
| 108 |
+
})
|
| 109 |
+
matched = True
|
| 110 |
+
break
|
| 111 |
+
except re.error:
|
| 112 |
+
pass
|
| 113 |
+
|
| 114 |
+
if not matched:
|
| 115 |
+
# Regrouper les substitutions restantes par paire de caractères
|
| 116 |
+
if len(old) <= 3 and len(new) <= 3:
|
| 117 |
+
key = f"{old}→{new}" if (old and new) else (f"—→{new}" if new else f"{old}→—")
|
| 118 |
+
bucket[key].append({
|
| 119 |
+
"engine": engine,
|
| 120 |
+
"gt_fragment": old,
|
| 121 |
+
"ocr_fragment": new,
|
| 122 |
+
})
|
| 123 |
+
else:
|
| 124 |
+
other_pairs.append({
|
| 125 |
+
"engine": engine,
|
| 126 |
+
"gt_fragment": old,
|
| 127 |
+
"ocr_fragment": new,
|
| 128 |
+
})
|
| 129 |
+
|
| 130 |
+
# Construire les clusters triés par fréquence
|
| 131 |
+
clusters: list[ErrorCluster] = []
|
| 132 |
+
cluster_id = 1
|
| 133 |
+
sorted_buckets = sorted(bucket.items(), key=lambda x: -len(x[1]))
|
| 134 |
+
|
| 135 |
+
for label, examples in sorted_buckets[:max_clusters - 1]:
|
| 136 |
+
clusters.append(ErrorCluster(
|
| 137 |
+
cluster_id=cluster_id,
|
| 138 |
+
label=label,
|
| 139 |
+
count=len(examples),
|
| 140 |
+
examples=examples,
|
| 141 |
+
))
|
| 142 |
+
cluster_id += 1
|
| 143 |
+
|
| 144 |
+
# Cluster "autres"
|
| 145 |
+
if other_pairs:
|
| 146 |
+
clusters.append(ErrorCluster(
|
| 147 |
+
cluster_id=cluster_id,
|
| 148 |
+
label="autres substitutions",
|
| 149 |
+
count=len(other_pairs),
|
| 150 |
+
examples=other_pairs,
|
| 151 |
+
))
|
| 152 |
+
|
| 153 |
+
# Trier par count décroissant et limiter
|
| 154 |
+
clusters.sort(key=lambda c: -c.count)
|
| 155 |
+
return clusters[:max_clusters]
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
__all__ = ["ErrorCluster", "cluster_errors"]
|
picarones/measurements/statistics/correlation.py
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Matrice de corrélation entre métriques (Sprint 7).
|
| 2 |
+
|
| 3 |
+
Coefficient de Pearson entre toutes les métriques numériques d'un
|
| 4 |
+
DocumentResult — montre les redondances (CER ↔ WER ≈ 1) et les
|
| 5 |
+
dimensions indépendantes (CER ↔ image_quality ≈ 0.5).
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
from __future__ import annotations
|
| 9 |
+
|
| 10 |
+
import math
|
| 11 |
+
from typing import Optional
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def _pearson(x: list[float], y: list[float]) -> float:
|
| 15 |
+
"""Coefficient de corrélation de Pearson."""
|
| 16 |
+
n = len(x)
|
| 17 |
+
if n < 2:
|
| 18 |
+
return 0.0
|
| 19 |
+
mx = sum(x) / n
|
| 20 |
+
my = sum(y) / n
|
| 21 |
+
num = sum((xi - mx) * (yi - my) for xi, yi in zip(x, y))
|
| 22 |
+
den = math.sqrt(
|
| 23 |
+
sum((xi - mx) ** 2 for xi in x) * sum((yi - my) ** 2 for yi in y)
|
| 24 |
+
)
|
| 25 |
+
return num / den if den > 0 else 0.0
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def compute_correlation_matrix(
|
| 29 |
+
metrics_per_doc: list[dict],
|
| 30 |
+
metric_keys: Optional[list[str]] = None,
|
| 31 |
+
) -> dict:
|
| 32 |
+
"""Calcule la matrice de corrélation entre toutes les métriques numériques.
|
| 33 |
+
|
| 34 |
+
Parameters
|
| 35 |
+
----------
|
| 36 |
+
metrics_per_doc : liste de dicts, un par document, contenant les métriques
|
| 37 |
+
metric_keys : clés à inclure (None → toutes les clés numériques)
|
| 38 |
+
|
| 39 |
+
Returns
|
| 40 |
+
-------
|
| 41 |
+
{
|
| 42 |
+
"labels": [...],
|
| 43 |
+
"matrix": [[r_ij, ...], ...] // coefficients de Pearson
|
| 44 |
+
}
|
| 45 |
+
"""
|
| 46 |
+
if not metrics_per_doc:
|
| 47 |
+
return {"labels": [], "matrix": []}
|
| 48 |
+
|
| 49 |
+
if metric_keys is None:
|
| 50 |
+
# Déduire les clés numériques
|
| 51 |
+
sample = metrics_per_doc[0]
|
| 52 |
+
metric_keys = [k for k, v in sample.items() if isinstance(v, (int, float))]
|
| 53 |
+
|
| 54 |
+
# Construire les vecteurs
|
| 55 |
+
vectors: dict[str, list[float]] = {k: [] for k in metric_keys}
|
| 56 |
+
for doc in metrics_per_doc:
|
| 57 |
+
for k in metric_keys:
|
| 58 |
+
v = doc.get(k)
|
| 59 |
+
vectors[k].append(float(v) if v is not None else 0.0)
|
| 60 |
+
|
| 61 |
+
# Calculer la matrice
|
| 62 |
+
labels = metric_keys
|
| 63 |
+
n = len(labels)
|
| 64 |
+
matrix = []
|
| 65 |
+
for i in range(n):
|
| 66 |
+
row = []
|
| 67 |
+
for j in range(n):
|
| 68 |
+
r = _pearson(vectors[labels[i]], vectors[labels[j]])
|
| 69 |
+
row.append(round(r, 4))
|
| 70 |
+
matrix.append(row)
|
| 71 |
+
|
| 72 |
+
return {"labels": labels, "matrix": matrix}
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
__all__ = ["compute_correlation_matrix"]
|
picarones/measurements/statistics/distributions.py
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Courbes de distribution de la performance (Sprint 7).
|
| 2 |
+
|
| 3 |
+
- :func:`compute_reliability_curve` — pour les X % docs les plus
|
| 4 |
+
faciles, quel est le CER moyen ? Révèle si un moteur a un long
|
| 5 |
+
tail catastrophique.
|
| 6 |
+
- :func:`compute_venn_data` — cardinalités pour un diagramme de
|
| 7 |
+
Venn 2 ou 3 moteurs sur les ensembles d'erreurs commises.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
from __future__ import annotations
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def compute_reliability_curve(
|
| 14 |
+
cer_values: list[float],
|
| 15 |
+
steps: int = 20,
|
| 16 |
+
) -> list[dict]:
|
| 17 |
+
"""Pour les X% documents les plus faciles, quel est le CER moyen ?
|
| 18 |
+
|
| 19 |
+
Returns
|
| 20 |
+
-------
|
| 21 |
+
Liste de {pct_docs: float, mean_cer: float}
|
| 22 |
+
"""
|
| 23 |
+
if not cer_values:
|
| 24 |
+
return []
|
| 25 |
+
sorted_cer = sorted(cer_values)
|
| 26 |
+
n = len(sorted_cer)
|
| 27 |
+
points = []
|
| 28 |
+
for step in range(1, steps + 1):
|
| 29 |
+
pct = step / steps
|
| 30 |
+
cutoff = max(1, int(pct * n))
|
| 31 |
+
subset = sorted_cer[:cutoff]
|
| 32 |
+
mean_cer = sum(subset) / len(subset)
|
| 33 |
+
points.append({"pct_docs": round(pct * 100, 1), "mean_cer": round(mean_cer, 6)})
|
| 34 |
+
return points
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def compute_venn_data(
|
| 38 |
+
engine_error_sets: dict[str, set[str]],
|
| 39 |
+
) -> dict:
|
| 40 |
+
"""Calcule les cardinalités pour un diagramme de Venn entre 2 ou 3 concurrents.
|
| 41 |
+
|
| 42 |
+
Parameters
|
| 43 |
+
----------
|
| 44 |
+
engine_error_sets : {engine_name → set of doc_id:error_token_pair strings}
|
| 45 |
+
|
| 46 |
+
Returns
|
| 47 |
+
-------
|
| 48 |
+
Pour 2 concurrents :
|
| 49 |
+
{only_a, only_b, both, label_a, label_b}
|
| 50 |
+
Pour 3 concurrents :
|
| 51 |
+
{only_a, only_b, only_c, ab, ac, bc, abc, label_a, label_b, label_c}
|
| 52 |
+
"""
|
| 53 |
+
names = list(engine_error_sets.keys())[:3] # max 3 pour Venn lisible
|
| 54 |
+
if len(names) < 2:
|
| 55 |
+
return {}
|
| 56 |
+
|
| 57 |
+
sets = {n: engine_error_sets[n] for n in names}
|
| 58 |
+
|
| 59 |
+
if len(names) == 2:
|
| 60 |
+
a, b = names
|
| 61 |
+
sa, sb = sets[a], sets[b]
|
| 62 |
+
return {
|
| 63 |
+
"type": "venn2",
|
| 64 |
+
"label_a": a,
|
| 65 |
+
"label_b": b,
|
| 66 |
+
"only_a": len(sa - sb),
|
| 67 |
+
"only_b": len(sb - sa),
|
| 68 |
+
"both": len(sa & sb),
|
| 69 |
+
}
|
| 70 |
+
else:
|
| 71 |
+
a, b, c = names
|
| 72 |
+
sa, sb, sc = sets[a], sets[b], sets[c]
|
| 73 |
+
return {
|
| 74 |
+
"type": "venn3",
|
| 75 |
+
"label_a": a,
|
| 76 |
+
"label_b": b,
|
| 77 |
+
"label_c": c,
|
| 78 |
+
"only_a": len(sa - sb - sc),
|
| 79 |
+
"only_b": len(sb - sa - sc),
|
| 80 |
+
"only_c": len(sc - sa - sb),
|
| 81 |
+
"ab": len((sa & sb) - sc),
|
| 82 |
+
"ac": len((sa & sc) - sb),
|
| 83 |
+
"bc": len((sb & sc) - sa),
|
| 84 |
+
"abc": len(sa & sb & sc),
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
__all__ = ["compute_reliability_curve", "compute_venn_data"]
|
picarones/measurements/statistics/friedman_nemenyi.py
ADDED
|
@@ -0,0 +1,350 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Test de Friedman + post-hoc Nemenyi (Sprint 17).
|
| 2 |
+
|
| 3 |
+
Référence : Demšar, J. (2006), "Statistical Comparisons of Classifiers
|
| 4 |
+
over Multiple Data Sets", Journal of Machine Learning Research 7:1-30.
|
| 5 |
+
Standard de facto pour comparer plusieurs systèmes sur plusieurs
|
| 6 |
+
datasets — ici plusieurs moteurs OCR sur plusieurs documents.
|
| 7 |
+
|
| 8 |
+
Le rendu visuel canonique (Critical Difference Diagram) vit dans
|
| 9 |
+
:mod:`picarones.measurements.statistics.cdd_render` pour séparer
|
| 10 |
+
calcul (ce module) et présentation (l'autre).
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
from __future__ import annotations
|
| 14 |
+
|
| 15 |
+
import math
|
| 16 |
+
from typing import Optional
|
| 17 |
+
|
| 18 |
+
from picarones.measurements.statistics.wilcoxon import _normal_sf
|
| 19 |
+
|
| 20 |
+
# Valeurs critiques de la distribution du Studentized Range divisées par √2,
|
| 21 |
+
# pour df = ∞ (approximation usuelle pour Nemenyi). Source : tables de Tukey.
|
| 22 |
+
# Clé : nombre de traitements k ; valeur : q_α pour α ∈ {0.05, 0.01}.
|
| 23 |
+
_NEMENYI_Q_TABLE = {
|
| 24 |
+
# k q_0.05 q_0.01
|
| 25 |
+
2: (1.960, 2.576),
|
| 26 |
+
3: (2.343, 2.913),
|
| 27 |
+
4: (2.569, 3.113),
|
| 28 |
+
5: (2.728, 3.255),
|
| 29 |
+
6: (2.850, 3.364),
|
| 30 |
+
7: (2.949, 3.452),
|
| 31 |
+
8: (3.031, 3.526),
|
| 32 |
+
9: (3.102, 3.590),
|
| 33 |
+
10: (3.164, 3.646),
|
| 34 |
+
11: (3.219, 3.696),
|
| 35 |
+
12: (3.268, 3.741),
|
| 36 |
+
13: (3.313, 3.781),
|
| 37 |
+
14: (3.354, 3.818),
|
| 38 |
+
15: (3.391, 3.853),
|
| 39 |
+
16: (3.426, 3.886),
|
| 40 |
+
17: (3.458, 3.916),
|
| 41 |
+
18: (3.489, 3.944),
|
| 42 |
+
19: (3.517, 3.970),
|
| 43 |
+
20: (3.544, 3.995),
|
| 44 |
+
25: (3.658, 4.095),
|
| 45 |
+
30: (3.739, 4.167),
|
| 46 |
+
40: (3.858, 4.272),
|
| 47 |
+
50: (3.945, 4.349),
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def _chi_square_sf(x: float, df: int) -> float:
|
| 52 |
+
"""Survival function de la loi chi², 1 - CDF(x).
|
| 53 |
+
|
| 54 |
+
Utilise scipy si disponible (méthode exacte), sinon Wilson-Hilferty
|
| 55 |
+
(approximation normale précise dès df ≥ 3).
|
| 56 |
+
"""
|
| 57 |
+
if x <= 0 or df <= 0:
|
| 58 |
+
return 1.0
|
| 59 |
+
try:
|
| 60 |
+
from scipy.stats import chi2 as _chi2 # type: ignore[import-untyped]
|
| 61 |
+
return float(_chi2.sf(x, df))
|
| 62 |
+
except ImportError:
|
| 63 |
+
pass
|
| 64 |
+
# Wilson-Hilferty : transforme chi² en approximation normale
|
| 65 |
+
z = (((x / df) ** (1.0 / 3.0)) - (1.0 - 2.0 / (9.0 * df))) / math.sqrt(2.0 / (9.0 * df))
|
| 66 |
+
return _normal_sf(z)
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def _rank_row(values: list[float]) -> list[float]:
|
| 70 |
+
"""Rangs d'une ligne — petit = rang 1. Ex-aequo : rangs moyens."""
|
| 71 |
+
n = len(values)
|
| 72 |
+
indexed = sorted(range(n), key=lambda i: values[i])
|
| 73 |
+
ranks = [0.0] * n
|
| 74 |
+
i = 0
|
| 75 |
+
while i < n:
|
| 76 |
+
j = i
|
| 77 |
+
while j < n and values[indexed[j]] == values[indexed[i]]:
|
| 78 |
+
j += 1
|
| 79 |
+
avg_rank = (i + j + 1) / 2.0 # 1-based
|
| 80 |
+
for k in range(i, j):
|
| 81 |
+
ranks[indexed[k]] = avg_rank
|
| 82 |
+
i = j
|
| 83 |
+
return ranks
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
def _aligned_cer_matrix(
|
| 87 |
+
engine_cer_map: dict[str, list[float]],
|
| 88 |
+
) -> tuple[list[str], list[list[float]]]:
|
| 89 |
+
"""Construit la matrice (k moteurs × n documents) alignée sur la longueur
|
| 90 |
+
minimale. Retourne ``(noms, matrice_colonne_par_moteur)``.
|
| 91 |
+
|
| 92 |
+
Friedman exige des blocs (documents) complets : si les moteurs n'ont pas
|
| 93 |
+
tous été exécutés sur les mêmes documents, on tronque à la longueur
|
| 94 |
+
minimale, documentée dans le résultat via ``n_blocks``.
|
| 95 |
+
"""
|
| 96 |
+
names = list(engine_cer_map.keys())
|
| 97 |
+
if not names:
|
| 98 |
+
return [], []
|
| 99 |
+
min_len = min(len(v) for v in engine_cer_map.values())
|
| 100 |
+
if min_len == 0:
|
| 101 |
+
return names, []
|
| 102 |
+
matrix = [engine_cer_map[n][:min_len] for n in names]
|
| 103 |
+
return names, matrix
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def friedman_test(engine_cer_map: dict[str, list[float]]) -> dict:
|
| 107 |
+
"""Test de Friedman — k moteurs sur n documents appariés.
|
| 108 |
+
|
| 109 |
+
Test non-paramétrique équivalent à l'ANOVA à mesures répétées pour des
|
| 110 |
+
données ordinales. Hypothèse nulle : tous les moteurs ont la même
|
| 111 |
+
performance moyenne. Rejet → au moins un moteur diffère des autres.
|
| 112 |
+
|
| 113 |
+
Parameters
|
| 114 |
+
----------
|
| 115 |
+
engine_cer_map:
|
| 116 |
+
Dict ``{engine_name → [cer_doc1, cer_doc2, ...]}``. Tous les moteurs
|
| 117 |
+
doivent avoir été évalués sur les mêmes documents (dans le même ordre).
|
| 118 |
+
|
| 119 |
+
Returns
|
| 120 |
+
-------
|
| 121 |
+
dict avec :
|
| 122 |
+
- ``statistic`` : Q corrigé pour les ex-aequo
|
| 123 |
+
- ``p_value`` : p-value (scipy si dispo, sinon Wilson-Hilferty)
|
| 124 |
+
- ``significant`` : bool, p < 0.05
|
| 125 |
+
- ``df`` : degrés de liberté = k - 1
|
| 126 |
+
- ``n_blocks`` : nombre de documents (blocs) utilisés
|
| 127 |
+
- ``n_engines`` : nombre de moteurs (k)
|
| 128 |
+
- ``mean_ranks`` : dict ``{engine: rang_moyen}``
|
| 129 |
+
- ``interpretation``: phrase lisible
|
| 130 |
+
- ``error`` : message si le test n'est pas applicable
|
| 131 |
+
"""
|
| 132 |
+
names, matrix = _aligned_cer_matrix(engine_cer_map)
|
| 133 |
+
k = len(names)
|
| 134 |
+
n = len(matrix[0]) if matrix else 0
|
| 135 |
+
|
| 136 |
+
if k < 2:
|
| 137 |
+
return {
|
| 138 |
+
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 139 |
+
"df": 0, "n_blocks": n, "n_engines": k,
|
| 140 |
+
"mean_ranks": {names[0]: 1.0} if k == 1 else {},
|
| 141 |
+
"interpretation": "Test de Friedman non applicable : il faut au moins 2 moteurs.",
|
| 142 |
+
"error": "not_enough_engines",
|
| 143 |
+
}
|
| 144 |
+
if n < 2:
|
| 145 |
+
return {
|
| 146 |
+
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 147 |
+
"df": k - 1, "n_blocks": n, "n_engines": k,
|
| 148 |
+
"mean_ranks": {name: 1.0 for name in names},
|
| 149 |
+
"interpretation": "Test de Friedman non applicable : il faut au moins 2 documents communs.",
|
| 150 |
+
"error": "not_enough_blocks",
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
# Rangs par bloc (document) : pour chaque doc, ranger les k moteurs
|
| 154 |
+
ranks_by_engine: list[list[float]] = [[] for _ in range(k)]
|
| 155 |
+
for j in range(n):
|
| 156 |
+
row = [matrix[i][j] for i in range(k)]
|
| 157 |
+
row_ranks = _rank_row(row)
|
| 158 |
+
for i in range(k):
|
| 159 |
+
ranks_by_engine[i].append(row_ranks[i])
|
| 160 |
+
|
| 161 |
+
rank_sums = [sum(r) for r in ranks_by_engine]
|
| 162 |
+
mean_ranks = {names[i]: rank_sums[i] / n for i in range(k)}
|
| 163 |
+
|
| 164 |
+
# Statistique Q non-corrigée (sans ex-aequo)
|
| 165 |
+
# Q = 12 / (n·k·(k+1)) · Σ R_j² − 3·n·(k+1)
|
| 166 |
+
Q = (12.0 / (n * k * (k + 1))) * sum(rs ** 2 for rs in rank_sums) - 3.0 * n * (k + 1)
|
| 167 |
+
|
| 168 |
+
# Correction pour les ex-aequo (ties factor) — ajuste si des rangs sont
|
| 169 |
+
# partagés dans certains blocs. Formule : Q_corr = Q / (1 - T/(n·(k³−k)))
|
| 170 |
+
# où T = Σ (tⱼ³ − tⱼ) sur tous les groupes d'ex-aequo.
|
| 171 |
+
tie_correction = 0.0
|
| 172 |
+
for j in range(n):
|
| 173 |
+
row = [matrix[i][j] for i in range(k)]
|
| 174 |
+
sorted_row = sorted(row)
|
| 175 |
+
i = 0
|
| 176 |
+
while i < len(sorted_row):
|
| 177 |
+
count = 1
|
| 178 |
+
while i + count < len(sorted_row) and sorted_row[i + count] == sorted_row[i]:
|
| 179 |
+
count += 1
|
| 180 |
+
if count > 1:
|
| 181 |
+
tie_correction += count ** 3 - count
|
| 182 |
+
i += count
|
| 183 |
+
denom = 1.0 - tie_correction / (n * (k ** 3 - k)) if k >= 2 else 1.0
|
| 184 |
+
if denom > 0:
|
| 185 |
+
Q = Q / denom
|
| 186 |
+
|
| 187 |
+
df = k - 1
|
| 188 |
+
p_value = _chi_square_sf(Q, df)
|
| 189 |
+
significant = p_value < 0.05
|
| 190 |
+
|
| 191 |
+
if significant:
|
| 192 |
+
interpretation = (
|
| 193 |
+
f"Test de Friedman significatif (Q = {Q:.3f}, df = {df}, p = {p_value:.4f}). "
|
| 194 |
+
f"Au moins un moteur diffère des autres — utiliser le post-hoc Nemenyi "
|
| 195 |
+
f"pour identifier les paires distinguables."
|
| 196 |
+
)
|
| 197 |
+
else:
|
| 198 |
+
interpretation = (
|
| 199 |
+
f"Test de Friedman non significatif (Q = {Q:.3f}, df = {df}, p = {p_value:.4f}). "
|
| 200 |
+
f"Aucune différence globale détectée entre les moteurs sur ce corpus."
|
| 201 |
+
)
|
| 202 |
+
|
| 203 |
+
return {
|
| 204 |
+
"statistic": round(Q, 4),
|
| 205 |
+
"p_value": round(p_value, 6),
|
| 206 |
+
"significant": significant,
|
| 207 |
+
"df": df,
|
| 208 |
+
"n_blocks": n,
|
| 209 |
+
"n_engines": k,
|
| 210 |
+
"mean_ranks": {k_: round(v, 4) for k_, v in mean_ranks.items()},
|
| 211 |
+
"interpretation": interpretation,
|
| 212 |
+
}
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
def _nemenyi_critical_value(k: int, alpha: float = 0.05) -> Optional[float]:
|
| 216 |
+
"""Valeur critique q_α pour k traitements, df = ∞.
|
| 217 |
+
|
| 218 |
+
Retourne ``None`` si k est hors table (< 2 ou > 50).
|
| 219 |
+
"""
|
| 220 |
+
if k < 2:
|
| 221 |
+
return None
|
| 222 |
+
if k in _NEMENYI_Q_TABLE:
|
| 223 |
+
q05, q01 = _NEMENYI_Q_TABLE[k]
|
| 224 |
+
return q05 if alpha == 0.05 else q01 if alpha == 0.01 else q05
|
| 225 |
+
# Au-delà de la table : borne supérieure (conservateur)
|
| 226 |
+
max_k = max(_NEMENYI_Q_TABLE.keys())
|
| 227 |
+
if k > max_k:
|
| 228 |
+
q05, q01 = _NEMENYI_Q_TABLE[max_k]
|
| 229 |
+
return q05 if alpha == 0.05 else q01
|
| 230 |
+
# Entre deux clés : interpolation linéaire
|
| 231 |
+
keys = sorted(_NEMENYI_Q_TABLE.keys())
|
| 232 |
+
for i in range(len(keys) - 1):
|
| 233 |
+
if keys[i] < k < keys[i + 1]:
|
| 234 |
+
lo, hi = keys[i], keys[i + 1]
|
| 235 |
+
q_lo = _NEMENYI_Q_TABLE[lo][0 if alpha == 0.05 else 1]
|
| 236 |
+
q_hi = _NEMENYI_Q_TABLE[hi][0 if alpha == 0.05 else 1]
|
| 237 |
+
frac = (k - lo) / (hi - lo)
|
| 238 |
+
return q_lo + frac * (q_hi - q_lo)
|
| 239 |
+
return None
|
| 240 |
+
|
| 241 |
+
|
| 242 |
+
def nemenyi_posthoc(
|
| 243 |
+
engine_cer_map: dict[str, list[float]],
|
| 244 |
+
alpha: float = 0.05,
|
| 245 |
+
) -> dict:
|
| 246 |
+
"""Post-hoc de Nemenyi — identifie les paires de moteurs statistiquement
|
| 247 |
+
indiscernables après un test de Friedman.
|
| 248 |
+
|
| 249 |
+
Calcule la *critical distance* CD = q_α · √(k·(k+1) / (6·n)). Deux moteurs
|
| 250 |
+
dont les rangs moyens diffèrent de moins que CD ne sont **pas**
|
| 251 |
+
statistiquement distinguables au seuil α.
|
| 252 |
+
|
| 253 |
+
Returns
|
| 254 |
+
-------
|
| 255 |
+
dict avec :
|
| 256 |
+
- ``alpha`` : seuil utilisé
|
| 257 |
+
- ``critical_distance`` : CD calculée
|
| 258 |
+
- ``q_alpha`` : valeur critique q_α issue de la table
|
| 259 |
+
- ``n_blocks``, ``n_engines``
|
| 260 |
+
- ``mean_ranks`` : rangs moyens par moteur (dict)
|
| 261 |
+
- ``engines_sorted`` : liste des moteurs triés par rang croissant
|
| 262 |
+
- ``significant_matrix`` : matrice bool (list[list[bool]]),
|
| 263 |
+
``True`` = paire significativement différente
|
| 264 |
+
- ``tied_groups`` : liste de listes de moteurs indiscernables
|
| 265 |
+
(groupes maximaux d'ex-aequo pratiques)
|
| 266 |
+
- ``error`` : présent si le test n'est pas applicable
|
| 267 |
+
"""
|
| 268 |
+
names, matrix = _aligned_cer_matrix(engine_cer_map)
|
| 269 |
+
k = len(names)
|
| 270 |
+
n = len(matrix[0]) if matrix else 0
|
| 271 |
+
|
| 272 |
+
if k < 2 or n < 2:
|
| 273 |
+
return {
|
| 274 |
+
"alpha": alpha,
|
| 275 |
+
"critical_distance": 0.0,
|
| 276 |
+
"q_alpha": 0.0,
|
| 277 |
+
"n_blocks": n,
|
| 278 |
+
"n_engines": k,
|
| 279 |
+
"mean_ranks": {name: 1.0 for name in names},
|
| 280 |
+
"engines_sorted": list(names),
|
| 281 |
+
"significant_matrix": [[False] * k for _ in range(k)],
|
| 282 |
+
"tied_groups": [list(names)] if names else [],
|
| 283 |
+
"error": "not_enough_data",
|
| 284 |
+
}
|
| 285 |
+
|
| 286 |
+
# Friedman fournit les rangs moyens — on les recalcule ici pour rester
|
| 287 |
+
# autonome (sans forcer l'utilisateur à chaîner les deux appels).
|
| 288 |
+
ranks_by_engine: list[list[float]] = [[] for _ in range(k)]
|
| 289 |
+
for j in range(n):
|
| 290 |
+
row = [matrix[i][j] for i in range(k)]
|
| 291 |
+
row_ranks = _rank_row(row)
|
| 292 |
+
for i in range(k):
|
| 293 |
+
ranks_by_engine[i].append(row_ranks[i])
|
| 294 |
+
|
| 295 |
+
mean_ranks_list = [sum(r) / n for r in ranks_by_engine]
|
| 296 |
+
mean_ranks = {names[i]: round(mean_ranks_list[i], 4) for i in range(k)}
|
| 297 |
+
|
| 298 |
+
q_alpha = _nemenyi_critical_value(k, alpha) or 0.0
|
| 299 |
+
critical_distance = q_alpha * math.sqrt(k * (k + 1) / (6.0 * n))
|
| 300 |
+
|
| 301 |
+
# Matrice de significativité : paire (i,j) significative si |R_i - R_j| > CD
|
| 302 |
+
significant_matrix = [
|
| 303 |
+
[
|
| 304 |
+
(i != j) and (abs(mean_ranks_list[i] - mean_ranks_list[j]) > critical_distance)
|
| 305 |
+
for j in range(k)
|
| 306 |
+
]
|
| 307 |
+
for i in range(k)
|
| 308 |
+
]
|
| 309 |
+
|
| 310 |
+
# Groupes d'ex-aequo pratiques : fenêtre glissante sur les rangs triés.
|
| 311 |
+
# Deux moteurs sont dans le même groupe si leur écart ≤ CD.
|
| 312 |
+
order = sorted(range(k), key=lambda i: mean_ranks_list[i])
|
| 313 |
+
sorted_names = [names[i] for i in order]
|
| 314 |
+
sorted_ranks = [mean_ranks_list[i] for i in order]
|
| 315 |
+
|
| 316 |
+
tied_groups: list[list[str]] = []
|
| 317 |
+
i = 0
|
| 318 |
+
while i < len(sorted_names):
|
| 319 |
+
# étendre le groupe tant que le moteur suivant est à ≤ CD du premier du groupe
|
| 320 |
+
j = i
|
| 321 |
+
while j + 1 < len(sorted_names) and (sorted_ranks[j + 1] - sorted_ranks[i]) <= critical_distance:
|
| 322 |
+
j += 1
|
| 323 |
+
tied_groups.append(sorted_names[i:j + 1])
|
| 324 |
+
i = j + 1 if j > i else i + 1
|
| 325 |
+
|
| 326 |
+
return {
|
| 327 |
+
"alpha": alpha,
|
| 328 |
+
"critical_distance": round(critical_distance, 4),
|
| 329 |
+
"q_alpha": round(q_alpha, 4),
|
| 330 |
+
"n_blocks": n,
|
| 331 |
+
"n_engines": k,
|
| 332 |
+
"mean_ranks": mean_ranks,
|
| 333 |
+
"engines_sorted": sorted_names,
|
| 334 |
+
"significant_matrix": significant_matrix,
|
| 335 |
+
"tied_groups": tied_groups,
|
| 336 |
+
}
|
| 337 |
+
|
| 338 |
+
|
| 339 |
+
__all__ = [
|
| 340 |
+
# Symboles publics.
|
| 341 |
+
"friedman_test",
|
| 342 |
+
"nemenyi_posthoc",
|
| 343 |
+
# Symboles privés ré-exportés (consommés par les tests Sprint 18).
|
| 344 |
+
# Note : ``_aligned_cer_matrix`` reste strictement interne au module
|
| 345 |
+
# (utilisé seulement par friedman_test et nemenyi_posthoc) ; il n'est
|
| 346 |
+
# ni dans __all__ ni ré-exporté par le __init__.py du sous-package.
|
| 347 |
+
"_chi_square_sf",
|
| 348 |
+
"_nemenyi_critical_value",
|
| 349 |
+
"_rank_row",
|
| 350 |
+
]
|
picarones/measurements/statistics/pareto.py
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Frontière de Pareto multi-objectifs (Sprint 19).
|
| 2 |
+
|
| 3 |
+
Algorithme générique sur N objectifs (CER, coût, vitesse, CO₂…).
|
| 4 |
+
Renvoie les noms des points non-dominés.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
from typing import Optional
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def compute_pareto_front(
|
| 13 |
+
points: list[dict],
|
| 14 |
+
objectives: tuple[str, ...] = ("cer", "cost"),
|
| 15 |
+
name_key: str = "engine",
|
| 16 |
+
minimize: Optional[tuple[bool, ...]] = None,
|
| 17 |
+
) -> list[str]:
|
| 18 |
+
"""Calcule la frontière de Pareto sur ``len(objectives)`` dimensions.
|
| 19 |
+
|
| 20 |
+
Un point ``p`` est Pareto-dominant si aucun autre point n'a, pour TOUS
|
| 21 |
+
les objectifs, une valeur au moins aussi bonne ET au moins une valeur
|
| 22 |
+
strictement meilleure.
|
| 23 |
+
|
| 24 |
+
Parameters
|
| 25 |
+
----------
|
| 26 |
+
points:
|
| 27 |
+
Liste de dicts. Chaque dict doit contenir ``name_key`` et toutes les
|
| 28 |
+
clés de ``objectives``. Les points dont une valeur d'objectif est
|
| 29 |
+
``None`` sont ignorés (pas de comparaison possible).
|
| 30 |
+
objectives:
|
| 31 |
+
Clés des objectifs à minimiser/maximiser.
|
| 32 |
+
name_key:
|
| 33 |
+
Clé identifiant le point (par défaut ``"engine"``).
|
| 34 |
+
minimize:
|
| 35 |
+
Pour chaque objectif, ``True`` = minimiser (ex. CER, coût),
|
| 36 |
+
``False`` = maximiser (ex. ancrage). Doit avoir la même longueur
|
| 37 |
+
que ``objectives``.
|
| 38 |
+
|
| 39 |
+
Returns
|
| 40 |
+
-------
|
| 41 |
+
Liste des ``name`` des points sur le front Pareto, ordre stable depuis
|
| 42 |
+
``points``.
|
| 43 |
+
"""
|
| 44 |
+
if minimize is None:
|
| 45 |
+
minimize = tuple(True for _ in objectives)
|
| 46 |
+
if len(minimize) != len(objectives):
|
| 47 |
+
raise ValueError("`minimize` doit avoir la même longueur que `objectives`")
|
| 48 |
+
|
| 49 |
+
valid = []
|
| 50 |
+
for p in points:
|
| 51 |
+
try:
|
| 52 |
+
vals = tuple(float(p[k]) for k in objectives)
|
| 53 |
+
except (KeyError, TypeError, ValueError):
|
| 54 |
+
continue
|
| 55 |
+
valid.append((p[name_key], vals))
|
| 56 |
+
|
| 57 |
+
front: list[str] = []
|
| 58 |
+
for name_a, vals_a in valid:
|
| 59 |
+
dominated = False
|
| 60 |
+
for name_b, vals_b in valid:
|
| 61 |
+
if name_a == name_b:
|
| 62 |
+
continue
|
| 63 |
+
# B domine A si B est ≥ aussi bon partout ET strictement meilleur quelque part
|
| 64 |
+
better_or_equal_everywhere = True
|
| 65 |
+
strictly_better_somewhere = False
|
| 66 |
+
for va, vb, mini in zip(vals_a, vals_b, minimize):
|
| 67 |
+
if mini:
|
| 68 |
+
if vb > va:
|
| 69 |
+
better_or_equal_everywhere = False
|
| 70 |
+
break
|
| 71 |
+
if vb < va:
|
| 72 |
+
strictly_better_somewhere = True
|
| 73 |
+
else: # maximiser
|
| 74 |
+
if vb < va:
|
| 75 |
+
better_or_equal_everywhere = False
|
| 76 |
+
break
|
| 77 |
+
if vb > va:
|
| 78 |
+
strictly_better_somewhere = True
|
| 79 |
+
if better_or_equal_everywhere and strictly_better_somewhere:
|
| 80 |
+
dominated = True
|
| 81 |
+
break
|
| 82 |
+
if not dominated:
|
| 83 |
+
front.append(name_a)
|
| 84 |
+
return front
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
__all__ = ["compute_pareto_front"]
|
picarones/measurements/statistics/wilcoxon.py
ADDED
|
@@ -0,0 +1,227 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Test de Wilcoxon signé-rangé + tests pairwise (Sprint 7).
|
| 2 |
+
|
| 3 |
+
Test non-paramétrique pour comparer 2 séries appariées (mêmes
|
| 4 |
+
documents, deux moteurs différents). Utilise scipy si disponible
|
| 5 |
+
(méthode exacte n ≤ 25), sinon approximation normale native (n ≥ 10)
|
| 6 |
+
ou table critique simplifiée pour très petits n.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
import math
|
| 12 |
+
|
| 13 |
+
# Import optionnel de scipy — utilisé pour le test de Wilcoxon si disponible
|
| 14 |
+
# (méthode exacte pour n ≤ 25, approximation normale pour n > 25).
|
| 15 |
+
# En son absence, l'implémentation native (approximation normale pour n ≥ 10)
|
| 16 |
+
# est utilisée automatiquement.
|
| 17 |
+
try:
|
| 18 |
+
from scipy.stats import wilcoxon as _scipy_wilcoxon # type: ignore[import-untyped]
|
| 19 |
+
_SCIPY_AVAILABLE = True
|
| 20 |
+
except ImportError:
|
| 21 |
+
_SCIPY_AVAILABLE = False
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def wilcoxon_test(
|
| 25 |
+
a: list[float],
|
| 26 |
+
b: list[float],
|
| 27 |
+
zero_method: str = "wilcox",
|
| 28 |
+
) -> dict:
|
| 29 |
+
"""Test de Wilcoxon signé-rangé entre deux séries de CER appariées.
|
| 30 |
+
|
| 31 |
+
Retourne un dict avec :
|
| 32 |
+
- statistic : W = min(W⁺, W⁻)
|
| 33 |
+
- p_value : p-value bilatérale
|
| 34 |
+
- significant : bool (p < 0.05)
|
| 35 |
+
- interpretation : phrase lisible
|
| 36 |
+
- n_pairs : nombre de paires utilisées (après retrait des zéros)
|
| 37 |
+
- W_plus : somme des rangs des différences positives
|
| 38 |
+
- W_minus : somme des rangs des différences négatives
|
| 39 |
+
|
| 40 |
+
Hypothèses et limites
|
| 41 |
+
---------------------
|
| 42 |
+
* Les observations sont appariées (même corpus, deux moteurs différents).
|
| 43 |
+
* Le test est non-paramétrique : aucune hypothèse de normalité des CER.
|
| 44 |
+
* ``zero_method="wilcox"`` (défaut) : les paires sans différence (aᵢ = bᵢ)
|
| 45 |
+
sont simplement exclues. Les autres méthodes (``"pratt"``, ``"zsplit"``)
|
| 46 |
+
nécessitent scipy.
|
| 47 |
+
* **Approximation normale** (implémentation native, n ≥ 10) :
|
| 48 |
+
L'approximation est raisonnable pour n ≥ 10 et converge vers la
|
| 49 |
+
distribution exacte. Pour n < 10, une table critique simplifiée est
|
| 50 |
+
utilisée (p ∈ {0.04, 0.20}) — résultat **conservateur**.
|
| 51 |
+
* **scipy** (si installé) : ``scipy.stats.wilcoxon`` est utilisé à la place
|
| 52 |
+
de l'approximation native. scipy utilise la méthode exacte pour n ≤ 25
|
| 53 |
+
et l'approximation normale pour n > 25, ce qui est plus précis.
|
| 54 |
+
* **Validité** : le test suppose la symétrie de la distribution des
|
| 55 |
+
différences. Avec de très petits n (< 5), les résultats sont peu fiables
|
| 56 |
+
quelle que soit la méthode.
|
| 57 |
+
|
| 58 |
+
Parameters
|
| 59 |
+
----------
|
| 60 |
+
a, b : séries de CER (même longueur, même ordre de documents)
|
| 61 |
+
zero_method : gestion des paires nulles (défaut : ``"wilcox"``)
|
| 62 |
+
"""
|
| 63 |
+
if len(a) != len(b):
|
| 64 |
+
raise ValueError("Les deux listes doivent avoir la même longueur")
|
| 65 |
+
|
| 66 |
+
diffs = [x - y for x, y in zip(a, b)]
|
| 67 |
+
|
| 68 |
+
# Retirer les zéros (méthode "wilcox")
|
| 69 |
+
if zero_method == "wilcox":
|
| 70 |
+
diffs = [d for d in diffs if d != 0.0]
|
| 71 |
+
|
| 72 |
+
n = len(diffs)
|
| 73 |
+
if n == 0:
|
| 74 |
+
return {
|
| 75 |
+
"statistic": 0.0,
|
| 76 |
+
"p_value": 1.0,
|
| 77 |
+
"significant": False,
|
| 78 |
+
"interpretation": "Aucune différence entre les deux concurrents.",
|
| 79 |
+
"n_pairs": 0,
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
# Rangs des valeurs absolues
|
| 83 |
+
abs_diffs = [abs(d) for d in diffs]
|
| 84 |
+
indexed = sorted(enumerate(abs_diffs), key=lambda x: x[1])
|
| 85 |
+
|
| 86 |
+
# Gestion des ex-aequo : rang moyen
|
| 87 |
+
ranks = [0.0] * n
|
| 88 |
+
i = 0
|
| 89 |
+
while i < n:
|
| 90 |
+
j = i
|
| 91 |
+
while j < n and abs_diffs[indexed[j][0]] == abs_diffs[indexed[i][0]]:
|
| 92 |
+
j += 1
|
| 93 |
+
avg_rank = (i + j + 1) / 2.0 # rang moyen (1-based)
|
| 94 |
+
for k in range(i, j):
|
| 95 |
+
ranks[indexed[k][0]] = avg_rank
|
| 96 |
+
i = j
|
| 97 |
+
|
| 98 |
+
W_plus = sum(ranks[k] for k in range(n) if diffs[k] > 0)
|
| 99 |
+
W_minus = sum(ranks[k] for k in range(n) if diffs[k] < 0)
|
| 100 |
+
W = min(W_plus, W_minus)
|
| 101 |
+
|
| 102 |
+
# Calcul de la p-value : scipy si disponible, sinon approximation native
|
| 103 |
+
if _SCIPY_AVAILABLE:
|
| 104 |
+
try:
|
| 105 |
+
scipy_res = _scipy_wilcoxon(diffs, zero_method=zero_method)
|
| 106 |
+
p_value = float(scipy_res.pvalue)
|
| 107 |
+
except Exception: # noqa: BLE001 — fallback gracieux
|
| 108 |
+
# Repli sur l'implémentation native en cas d'erreur scipy
|
| 109 |
+
p_value = _native_p_value(n, W)
|
| 110 |
+
else:
|
| 111 |
+
p_value = _native_p_value(n, W)
|
| 112 |
+
|
| 113 |
+
significant = p_value < 0.05
|
| 114 |
+
|
| 115 |
+
if significant:
|
| 116 |
+
better = "premier" if W_plus < W_minus else "second"
|
| 117 |
+
interpretation = (
|
| 118 |
+
f"Différence statistiquement significative (p = {p_value:.4f} < 0.05). "
|
| 119 |
+
f"Le {better} concurrent obtient de meilleurs scores."
|
| 120 |
+
)
|
| 121 |
+
else:
|
| 122 |
+
interpretation = (
|
| 123 |
+
f"Différence non significative (p = {p_value:.4f} ≥ 0.05). "
|
| 124 |
+
"On ne peut pas conclure que l'un surpasse l'autre."
|
| 125 |
+
)
|
| 126 |
+
|
| 127 |
+
return {
|
| 128 |
+
"statistic": round(W, 4),
|
| 129 |
+
"p_value": round(p_value, 6),
|
| 130 |
+
"significant": significant,
|
| 131 |
+
"interpretation": interpretation,
|
| 132 |
+
"n_pairs": n,
|
| 133 |
+
"W_plus": round(W_plus, 4),
|
| 134 |
+
"W_minus": round(W_minus, 4),
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
def _normal_sf(z: float) -> float:
|
| 139 |
+
"""Survival function de la loi normale standard (1 - CDF).
|
| 140 |
+
|
| 141 |
+
Approximation Abramowitz & Stegun 26.2.17. Utilisée par cette
|
| 142 |
+
famille pour Wilcoxon ET par friedman_nemenyi pour le fallback
|
| 143 |
+
Wilson-Hilferty quand scipy n'est pas disponible.
|
| 144 |
+
"""
|
| 145 |
+
t = 1.0 / (1.0 + 0.2316419 * abs(z))
|
| 146 |
+
poly = t * (0.319381530 + t * (-0.356563782 + t * (1.781477937
|
| 147 |
+
+ t * (-1.821255978 + t * 1.330274429))))
|
| 148 |
+
phi_z = math.exp(-0.5 * z * z) / math.sqrt(2.0 * math.pi)
|
| 149 |
+
p = phi_z * poly
|
| 150 |
+
return p if z >= 0 else 1.0 - p
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
# Table des valeurs critiques de W pour α=0.05 bilatéral (test exact, source : tables de Wilcoxon)
|
| 154 |
+
_W_CRITICAL = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 2, 8: 3, 9: 5}
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
def _wilcoxon_exact_p(n: int, w: float) -> float:
|
| 158 |
+
"""P-value approximée pour petits n (< 10) via table critique simplifiée.
|
| 159 |
+
|
| 160 |
+
Note : résultat **conservateur** — seules deux valeurs sont retournées :
|
| 161 |
+
0.04 (significatif à 5 %) ou 0.20 (non significatif).
|
| 162 |
+
Préférer scipy pour des p-values exactes.
|
| 163 |
+
"""
|
| 164 |
+
critical = _W_CRITICAL.get(n, 0)
|
| 165 |
+
if w <= critical:
|
| 166 |
+
return 0.04 # significatif à 5 %
|
| 167 |
+
return 0.20 # non significatif (approximation conservative)
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def _native_p_value(n: int, W: float) -> float:
|
| 171 |
+
"""Calcule la p-value via l'approximation normale (n ≥ 10) ou la table exacte (n < 10)."""
|
| 172 |
+
if n >= 10:
|
| 173 |
+
mu = n * (n + 1) / 4.0
|
| 174 |
+
sigma2 = n * (n + 1) * (2 * n + 1) / 24.0
|
| 175 |
+
if sigma2 <= 0:
|
| 176 |
+
return 1.0
|
| 177 |
+
z = abs((W + 0.5) - mu) / math.sqrt(sigma2) # correction de continuité
|
| 178 |
+
return 2.0 * _normal_sf(z) # test bilatéral
|
| 179 |
+
return _wilcoxon_exact_p(n, W)
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
def compute_pairwise_stats(
|
| 183 |
+
engine_cer_map: dict[str, list[float]],
|
| 184 |
+
) -> list[dict]:
|
| 185 |
+
"""Calcule les tests de Wilcoxon entre toutes les paires de concurrents.
|
| 186 |
+
|
| 187 |
+
Parameters
|
| 188 |
+
----------
|
| 189 |
+
engine_cer_map : dict {engine_name → [cer_doc1, cer_doc2, ...]}
|
| 190 |
+
|
| 191 |
+
Returns
|
| 192 |
+
-------
|
| 193 |
+
Liste de dicts, un par paire :
|
| 194 |
+
- engine_a, engine_b, statistic, p_value, significant, interpretation
|
| 195 |
+
"""
|
| 196 |
+
names = list(engine_cer_map.keys())
|
| 197 |
+
results = []
|
| 198 |
+
for i in range(len(names)):
|
| 199 |
+
for j in range(i + 1, len(names)):
|
| 200 |
+
a_name, b_name = names[i], names[j]
|
| 201 |
+
a_vals = engine_cer_map[a_name]
|
| 202 |
+
b_vals = engine_cer_map[b_name]
|
| 203 |
+
# Aligner les longueurs
|
| 204 |
+
min_len = min(len(a_vals), len(b_vals))
|
| 205 |
+
if min_len < 2:
|
| 206 |
+
continue
|
| 207 |
+
res = wilcoxon_test(a_vals[:min_len], b_vals[:min_len])
|
| 208 |
+
results.append({
|
| 209 |
+
"engine_a": a_name,
|
| 210 |
+
"engine_b": b_name,
|
| 211 |
+
**res,
|
| 212 |
+
})
|
| 213 |
+
return results
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
__all__ = [
|
| 217 |
+
# Symboles publics : signature stable, consommés directement par les
|
| 218 |
+
# tests via le ré-export de ``picarones.measurements.statistics``.
|
| 219 |
+
"compute_pairwise_stats",
|
| 220 |
+
"wilcoxon_test",
|
| 221 |
+
# Symboles privés ré-exportés (consommés par certains tests) :
|
| 222 |
+
# ``_SCIPY_AVAILABLE`` est utilisé pour skip les tests scipy quand
|
| 223 |
+
# la dépendance n'est pas installée. ``_normal_sf`` est par ailleurs
|
| 224 |
+
# importée par :mod:`friedman_nemenyi` comme utilité math pure.
|
| 225 |
+
"_SCIPY_AVAILABLE",
|
| 226 |
+
"_normal_sf",
|
| 227 |
+
]
|
picarones/report/assets.py
ADDED
|
@@ -0,0 +1,203 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Chargement et préparation des assets du rapport HTML.
|
| 2 |
+
|
| 3 |
+
Ce module concentre tout ce qui touche aux ressources binaires
|
| 4 |
+
embarquées ou référencées par le rapport :
|
| 5 |
+
|
| 6 |
+
- ``load_vendor_js`` lit un fichier JS vendorisé (Chart.js, etc.).
|
| 7 |
+
- ``encode_image_b64`` redimensionne et encode une image en data-URI.
|
| 8 |
+
- ``encode_images_b64_from_result`` itère sur un BenchmarkResult.
|
| 9 |
+
- ``externalize_images_to_dir`` écrit les images sur disque à côté
|
| 10 |
+
du HTML (mode ``--lazy-images`` du Sprint A5).
|
| 11 |
+
|
| 12 |
+
Extrait de ``picarones/report/generator.py`` lors du sprint de
|
| 13 |
+
découpage : isole l'I/O image et vendor du reste de l'orchestration.
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
from __future__ import annotations
|
| 17 |
+
|
| 18 |
+
import base64
|
| 19 |
+
import io
|
| 20 |
+
import logging
|
| 21 |
+
from pathlib import Path
|
| 22 |
+
from typing import TYPE_CHECKING
|
| 23 |
+
|
| 24 |
+
if TYPE_CHECKING:
|
| 25 |
+
from picarones.core.results import BenchmarkResult
|
| 26 |
+
|
| 27 |
+
logger = logging.getLogger(__name__)
|
| 28 |
+
|
| 29 |
+
#: Dossier où sont stockées les ressources JS embarquées.
|
| 30 |
+
_VENDOR_DIR = Path(__file__).parent / "vendor"
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def load_vendor_js(name: str) -> str:
|
| 34 |
+
"""Lit un fichier JS vendorisé et retourne son contenu.
|
| 35 |
+
|
| 36 |
+
Si le fichier n'existe pas, retourne un commentaire JS qui
|
| 37 |
+
garde le rapport valide (pas de SyntaxError côté navigateur).
|
| 38 |
+
"""
|
| 39 |
+
p = _VENDOR_DIR / name
|
| 40 |
+
if p.exists():
|
| 41 |
+
return p.read_text(encoding="utf-8")
|
| 42 |
+
return f"/* vendor/{name} non trouvé */"
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
def encode_image_b64(image_path: str, max_width: int = 1200) -> str:
|
| 46 |
+
"""Lit une image, la redimensionne si besoin, et retourne un data-URI base64.
|
| 47 |
+
|
| 48 |
+
Retourne ``""`` si l'image est introuvable ou si l'encodage
|
| 49 |
+
échoue (Pillow indisponible, format non géré, fichier corrompu).
|
| 50 |
+
Logue un avertissement dans ce dernier cas — le rapport reste
|
| 51 |
+
fonctionnel mais l'image manquera dans la galerie.
|
| 52 |
+
|
| 53 |
+
Distingue ``ImportError`` (Pillow non installée — problème
|
| 54 |
+
d'environnement) du reste (problème par image) pour aider au
|
| 55 |
+
diagnostic en logs de production.
|
| 56 |
+
"""
|
| 57 |
+
p = Path(image_path)
|
| 58 |
+
if not p.exists():
|
| 59 |
+
return ""
|
| 60 |
+
try:
|
| 61 |
+
from PIL import Image
|
| 62 |
+
except ImportError as exc:
|
| 63 |
+
logger.warning(
|
| 64 |
+
"[report] Pillow indisponible : %s — toutes les images "
|
| 65 |
+
"du rapport seront omises. Installer ``pip install Pillow`` "
|
| 66 |
+
"ou ``pip install picarones[report]``.",
|
| 67 |
+
exc,
|
| 68 |
+
)
|
| 69 |
+
return ""
|
| 70 |
+
try:
|
| 71 |
+
with Image.open(p) as img:
|
| 72 |
+
if img.width > max_width:
|
| 73 |
+
ratio = max_width / img.width
|
| 74 |
+
new_h = max(1, int(img.height * ratio))
|
| 75 |
+
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 76 |
+
# Convertir en RGB pour éviter les problèmes de mode (RGBA, palette…)
|
| 77 |
+
if img.mode not in ("RGB", "L"):
|
| 78 |
+
img = img.convert("RGB")
|
| 79 |
+
buf = io.BytesIO()
|
| 80 |
+
fmt = "JPEG" if p.suffix.lower() in (".jpg", ".jpeg") else "PNG"
|
| 81 |
+
img.save(buf, format=fmt, optimize=True, quality=85)
|
| 82 |
+
b64 = base64.b64encode(buf.getvalue()).decode("ascii")
|
| 83 |
+
mime = "image/jpeg" if fmt == "JPEG" else "image/png"
|
| 84 |
+
return f"data:{mime};base64,{b64}"
|
| 85 |
+
except Exception as exc: # noqa: BLE001 — fallback gracieux + warning
|
| 86 |
+
logger.warning(
|
| 87 |
+
"[report] échec d'encodage base64 de l'image %s : %s — "
|
| 88 |
+
"le rapport ignorera cette image",
|
| 89 |
+
image_path,
|
| 90 |
+
exc,
|
| 91 |
+
)
|
| 92 |
+
return ""
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def encode_images_b64_from_result(
|
| 96 |
+
benchmark: "BenchmarkResult", max_width: int = 1200,
|
| 97 |
+
) -> dict[str, str]:
|
| 98 |
+
"""Encode toutes les images d'un BenchmarkResult en base64.
|
| 99 |
+
|
| 100 |
+
Returns
|
| 101 |
+
-------
|
| 102 |
+
dict
|
| 103 |
+
``{doc_id: data_uri}``
|
| 104 |
+
"""
|
| 105 |
+
images: dict[str, str] = {}
|
| 106 |
+
if not benchmark.engine_reports:
|
| 107 |
+
return images
|
| 108 |
+
for dr in benchmark.engine_reports[0].document_results:
|
| 109 |
+
if dr.image_path and dr.doc_id not in images:
|
| 110 |
+
uri = encode_image_b64(dr.image_path, max_width=max_width)
|
| 111 |
+
if uri:
|
| 112 |
+
images[dr.doc_id] = uri
|
| 113 |
+
return images
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
def externalize_images_to_dir(
|
| 117 |
+
benchmark: "BenchmarkResult",
|
| 118 |
+
output_dir: Path,
|
| 119 |
+
max_width: int = 1200,
|
| 120 |
+
asset_subdir: str = "report-assets",
|
| 121 |
+
) -> dict[str, str]:
|
| 122 |
+
"""Sprint A5 (item M-16) — écrit les images sur disque dans un
|
| 123 |
+
sous-dossier à côté du HTML, et retourne ``{doc_id: url_relative}``.
|
| 124 |
+
|
| 125 |
+
Mode « lazy loading » : au lieu d'embarquer chaque image en
|
| 126 |
+
base64 dans le HTML (50 MB+ pour un corpus de 100 documents,
|
| 127 |
+
~200 MB+ pour 1 000 documents), on les externalise en fichiers
|
| 128 |
+
PNG/JPEG locaux. Le HTML les référence via
|
| 129 |
+
``<img src="report-assets/…">`` avec ``loading="lazy"`` côté
|
| 130 |
+
navigateur.
|
| 131 |
+
|
| 132 |
+
Le rapport reste auto-portant si l'utilisateur copie le dossier
|
| 133 |
+
``report-assets/`` à côté du HTML (cf. CLI ``--lazy-images``).
|
| 134 |
+
|
| 135 |
+
Parameters
|
| 136 |
+
----------
|
| 137 |
+
benchmark:
|
| 138 |
+
Résultat de benchmark (lit ``image_path`` de chaque DocumentResult).
|
| 139 |
+
output_dir:
|
| 140 |
+
Dossier où le HTML sera écrit ; le sous-dossier d'assets sera
|
| 141 |
+
créé à côté.
|
| 142 |
+
max_width:
|
| 143 |
+
Largeur max du redimensionnement (cohérent avec
|
| 144 |
+
``encode_image_b64``).
|
| 145 |
+
asset_subdir:
|
| 146 |
+
Nom du sous-dossier d'assets (défaut ``"report-assets"``).
|
| 147 |
+
|
| 148 |
+
Returns
|
| 149 |
+
-------
|
| 150 |
+
dict[str, str]
|
| 151 |
+
``{doc_id: "report-assets/<doc_id>.png"}`` (URL relative
|
| 152 |
+
consommable directement dans un attribut HTML ``src``).
|
| 153 |
+
"""
|
| 154 |
+
from PIL import Image
|
| 155 |
+
|
| 156 |
+
assets_dir = output_dir / asset_subdir
|
| 157 |
+
assets_dir.mkdir(parents=True, exist_ok=True)
|
| 158 |
+
out: dict[str, str] = {}
|
| 159 |
+
|
| 160 |
+
seen_ids: set[str] = set()
|
| 161 |
+
for engine_report in benchmark.engine_reports:
|
| 162 |
+
for dr in engine_report.document_results:
|
| 163 |
+
doc_id = dr.doc_id
|
| 164 |
+
if doc_id in seen_ids:
|
| 165 |
+
continue
|
| 166 |
+
seen_ids.add(doc_id)
|
| 167 |
+
try:
|
| 168 |
+
src = Path(dr.image_path)
|
| 169 |
+
if not src.exists():
|
| 170 |
+
continue
|
| 171 |
+
# Nom de fichier dérivé du doc_id, normalisé sans
|
| 172 |
+
# caractères dangereux pour le filesystem.
|
| 173 |
+
safe_id = "".join(
|
| 174 |
+
c if c.isalnum() or c in "._-" else "_" for c in doc_id
|
| 175 |
+
)
|
| 176 |
+
dest = assets_dir / f"{safe_id}{src.suffix.lower() or '.png'}"
|
| 177 |
+
with Image.open(src) as img:
|
| 178 |
+
if img.width > max_width:
|
| 179 |
+
ratio = max_width / img.width
|
| 180 |
+
new_h = max(1, int(img.height * ratio))
|
| 181 |
+
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 182 |
+
if img.mode not in ("RGB", "L"):
|
| 183 |
+
img = img.convert("RGB")
|
| 184 |
+
fmt = "JPEG" if dest.suffix in (".jpg", ".jpeg") else "PNG"
|
| 185 |
+
img.save(dest, format=fmt, optimize=True, quality=85)
|
| 186 |
+
# URL relative (POSIX style même sur Windows pour HTML).
|
| 187 |
+
out[doc_id] = f"{asset_subdir}/{dest.name}"
|
| 188 |
+
except Exception as exc: # noqa: BLE001 — fallback silencieux + warning
|
| 189 |
+
logger.warning(
|
| 190 |
+
"[report] échec d'externalisation de l'image %s : %s — "
|
| 191 |
+
"le rapport ignorera cette image",
|
| 192 |
+
dr.image_path,
|
| 193 |
+
exc,
|
| 194 |
+
)
|
| 195 |
+
return out
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
__all__ = [
|
| 199 |
+
"load_vendor_js",
|
| 200 |
+
"encode_image_b64",
|
| 201 |
+
"encode_images_b64_from_result",
|
| 202 |
+
"externalize_images_to_dir",
|
| 203 |
+
]
|
picarones/report/calibration_render.py
CHANGED
|
@@ -28,21 +28,7 @@ from __future__ import annotations
|
|
| 28 |
from html import escape as _e
|
| 29 |
from typing import Optional
|
| 30 |
|
| 31 |
-
|
| 32 |
-
def _color_for_ece(ece: float) -> str:
|
| 33 |
-
"""Gradient vert (ECE = 0, bien calibré) → rouge (ECE = 0.5+)."""
|
| 34 |
-
f = max(0.0, min(1.0, ece * 2.0)) # ECE > 0.5 → rouge max
|
| 35 |
-
if f <= 0.5:
|
| 36 |
-
ratio = f / 0.5
|
| 37 |
-
r = int(130 + (240 - 130) * ratio)
|
| 38 |
-
g = int(200 + (220 - 200) * ratio)
|
| 39 |
-
b = int(130 + (130 - 130) * ratio)
|
| 40 |
-
else:
|
| 41 |
-
ratio = (f - 0.5) / 0.5
|
| 42 |
-
r = int(240 + (220 - 240) * ratio)
|
| 43 |
-
g = int(220 + (100 - 220) * ratio)
|
| 44 |
-
b = int(130 + (100 - 130) * ratio)
|
| 45 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 46 |
|
| 47 |
|
| 48 |
def _engines_with_calibration(engines_summary: list[dict]) -> list[dict]:
|
|
@@ -98,7 +84,7 @@ def build_calibration_summary_html(
|
|
| 98 |
acc = float(agg.get("overall_accuracy") or 0.0)
|
| 99 |
conf = float(agg.get("overall_confidence") or 0.0)
|
| 100 |
doc_count = int(agg.get("doc_count") or 0)
|
| 101 |
-
bg =
|
| 102 |
parts.append("<tr>")
|
| 103 |
parts.append(
|
| 104 |
f'<td style="padding:.3rem .5rem;font-weight:600">'
|
|
|
|
| 28 |
from html import escape as _e
|
| 29 |
from typing import Optional
|
| 30 |
|
| 31 |
+
from picarones.report.render_helpers import color_traffic_light
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
def _engines_with_calibration(engines_summary: list[dict]) -> list[dict]:
|
|
|
|
| 84 |
acc = float(agg.get("overall_accuracy") or 0.0)
|
| 85 |
conf = float(agg.get("overall_confidence") or 0.0)
|
| 86 |
doc_count = int(agg.get("doc_count") or 0)
|
| 87 |
+
bg = color_traffic_light(ece, low_is_good=True, scale_max=0.5)
|
| 88 |
parts.append("<tr>")
|
| 89 |
parts.append(
|
| 90 |
f'<td style="padding:.3rem .5rem;font-weight:600">'
|
picarones/report/error_absorption_render.py
CHANGED
|
@@ -51,55 +51,15 @@ from __future__ import annotations
|
|
| 51 |
from html import escape as _e
|
| 52 |
from typing import Optional
|
| 53 |
|
|
|
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
b = 70
|
| 63 |
-
else:
|
| 64 |
-
t = (f - 0.5) / 0.5
|
| 65 |
-
r = int(235 + (60 - 235) * t)
|
| 66 |
-
g = int(200 + (160 - 200) * t)
|
| 67 |
-
b = int(70 + (90 - 70) * t)
|
| 68 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
def _color_for_introduction(rate: float) -> str:
|
| 72 |
-
"""Faible (vert) → élevé (rouge) — bon = peu introduites."""
|
| 73 |
-
f = max(0.0, min(1.0, rate))
|
| 74 |
-
if f < 0.5:
|
| 75 |
-
t = f / 0.5
|
| 76 |
-
r = int(60 + (235 - 60) * t)
|
| 77 |
-
g = int(160 + (180 - 160) * t)
|
| 78 |
-
b = int(90 + (60 - 90) * t)
|
| 79 |
-
else:
|
| 80 |
-
t = (f - 0.5) / 0.5
|
| 81 |
-
r = int(235 + (220 - 235) * t)
|
| 82 |
-
g = int(180 + (50 - 180) * t)
|
| 83 |
-
b = int(60 + (50 - 60) * t)
|
| 84 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
def _color_for_net(net: int, max_abs: int) -> str:
|
| 88 |
-
"""Vert si positif, rouge si négatif. Saturation à max_abs."""
|
| 89 |
-
if max_abs <= 0 or net == 0:
|
| 90 |
-
return "#a7f0a7"
|
| 91 |
-
f = max(-1.0, min(1.0, net / max_abs))
|
| 92 |
-
if f >= 0:
|
| 93 |
-
# vert clair → vert profond
|
| 94 |
-
r = int(167 + (90 - 167) * f)
|
| 95 |
-
g = int(240 + (200 - 240) * f)
|
| 96 |
-
b = int(167 + (90 - 167) * f)
|
| 97 |
-
else:
|
| 98 |
-
f = -f
|
| 99 |
-
r = int(167 + (220 - 167) * f)
|
| 100 |
-
g = int(240 + (50 - 240) * f)
|
| 101 |
-
b = int(167 + (50 - 167) * f)
|
| 102 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 103 |
|
| 104 |
|
| 105 |
def build_error_absorption_html(
|
|
@@ -186,7 +146,7 @@ def build_error_absorption_html(
|
|
| 186 |
intro_rate = entry.get("introduction_rate")
|
| 187 |
if isinstance(corr_rate, (int, float)):
|
| 188 |
corr_rate_str = f"{corr_rate * 100:.1f}%"
|
| 189 |
-
corr_color =
|
| 190 |
corr_cell = (
|
| 191 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 192 |
f'background:{corr_color};font-family:monospace;'
|
|
@@ -199,7 +159,7 @@ def build_error_absorption_html(
|
|
| 199 |
)
|
| 200 |
if isinstance(intro_rate, (int, float)):
|
| 201 |
intro_rate_str = f"{intro_rate * 100:.1f}%"
|
| 202 |
-
intro_color =
|
| 203 |
intro_cell = (
|
| 204 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 205 |
f'background:{intro_color};font-family:monospace;'
|
|
@@ -210,7 +170,13 @@ def build_error_absorption_html(
|
|
| 210 |
'<td style="padding:.4rem .6rem;text-align:right;'
|
| 211 |
'opacity:.4">—</td>'
|
| 212 |
)
|
| 213 |
-
net_color =
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 214 |
intro_sample = entry.get("introduced_tokens_sample") or []
|
| 215 |
sample_cell_text = ", ".join(
|
| 216 |
_e(str(t)) for t in intro_sample[:sample_max]
|
|
|
|
| 51 |
from html import escape as _e
|
| 52 |
from typing import Optional
|
| 53 |
|
| 54 |
+
from picarones.report.render_helpers import color_diverging, color_traffic_light
|
| 55 |
|
| 56 |
+
|
| 57 |
+
# Palette « net improvement » : vert clair au centre, vert profond
|
| 58 |
+
# si favorable (net > 0), rouge si défavorable (net < 0). Centrée
|
| 59 |
+
# sur le vert clair car un delta nul est déjà « pas de régression ».
|
| 60 |
+
_NET_NEUTRAL_RGB = (167, 240, 167)
|
| 61 |
+
_NET_POSITIVE_RGB = (90, 200, 90)
|
| 62 |
+
_NET_NEGATIVE_RGB = (220, 50, 50)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
|
| 65 |
def build_error_absorption_html(
|
|
|
|
| 146 |
intro_rate = entry.get("introduction_rate")
|
| 147 |
if isinstance(corr_rate, (int, float)):
|
| 148 |
corr_rate_str = f"{corr_rate * 100:.1f}%"
|
| 149 |
+
corr_color = color_traffic_light(float(corr_rate))
|
| 150 |
corr_cell = (
|
| 151 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 152 |
f'background:{corr_color};font-family:monospace;'
|
|
|
|
| 159 |
)
|
| 160 |
if isinstance(intro_rate, (int, float)):
|
| 161 |
intro_rate_str = f"{intro_rate * 100:.1f}%"
|
| 162 |
+
intro_color = color_traffic_light(float(intro_rate), low_is_good=True)
|
| 163 |
intro_cell = (
|
| 164 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 165 |
f'background:{intro_color};font-family:monospace;'
|
|
|
|
| 170 |
'<td style="padding:.4rem .6rem;text-align:right;'
|
| 171 |
'opacity:.4">—</td>'
|
| 172 |
)
|
| 173 |
+
net_color = color_diverging(
|
| 174 |
+
float(net),
|
| 175 |
+
max_abs=float(max_abs_net) if max_abs_net else 1.0,
|
| 176 |
+
neutral_rgb=_NET_NEUTRAL_RGB,
|
| 177 |
+
positive_rgb=_NET_POSITIVE_RGB,
|
| 178 |
+
negative_rgb=_NET_NEGATIVE_RGB,
|
| 179 |
+
)
|
| 180 |
intro_sample = entry.get("introduced_tokens_sample") or []
|
| 181 |
sample_cell_text = ", ".join(
|
| 182 |
_e(str(t)) for t in intro_sample[:sample_max]
|
picarones/report/generator.py
CHANGED
|
@@ -11,667 +11,56 @@ Vues disponibles
|
|
| 11 |
2. Galerie — grille d'images avec badge CER coloré
|
| 12 |
3. Document — image zoomable + diff coloré GT / OCR par moteur
|
| 13 |
4. Analyses — histogramme CER + graphique radar
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
"""
|
| 15 |
|
| 16 |
from __future__ import annotations
|
| 17 |
|
| 18 |
-
import base64
|
| 19 |
-
import io
|
| 20 |
import json
|
| 21 |
import logging
|
| 22 |
from pathlib import Path
|
| 23 |
from typing import Any, Optional
|
| 24 |
|
| 25 |
-
logger = logging.getLogger(__name__)
|
| 26 |
-
|
| 27 |
-
# ---------------------------------------------------------------------------
|
| 28 |
-
# Ressources vendor (embarquées dans le rapport HTML)
|
| 29 |
-
# ---------------------------------------------------------------------------
|
| 30 |
-
|
| 31 |
-
_VENDOR_DIR = Path(__file__).parent / "vendor"
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
def _load_vendor_js(name: str) -> str:
|
| 35 |
-
"""Lit un fichier JS vendorisé et retourne son contenu."""
|
| 36 |
-
p = _VENDOR_DIR / name
|
| 37 |
-
if p.exists():
|
| 38 |
-
return p.read_text(encoding="utf-8")
|
| 39 |
-
return f"/* vendor/{name} non trouvé */"
|
| 40 |
-
|
| 41 |
from picarones.core.results import BenchmarkResult
|
| 42 |
-
from picarones.
|
| 43 |
-
from picarones.
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
compute_venn_data,
|
| 48 |
-
cluster_errors,
|
| 49 |
-
bootstrap_ci,
|
| 50 |
-
friedman_test,
|
| 51 |
-
nemenyi_posthoc,
|
| 52 |
-
build_critical_difference_svg,
|
| 53 |
-
compute_pareto_front,
|
| 54 |
)
|
| 55 |
-
from picarones.measurements.pricing import build_costs_for_benchmark, load_pricing_database
|
| 56 |
-
from picarones.measurements.difficulty import compute_all_difficulties, difficulty_label
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
# ---------------------------------------------------------------------------
|
| 60 |
-
# Helpers
|
| 61 |
-
# ---------------------------------------------------------------------------
|
| 62 |
-
|
| 63 |
-
def _encode_image_b64(image_path: str, max_width: int = 1200) -> str:
|
| 64 |
-
"""Lit une image, la redimensionne si besoin, et retourne un data-URI base64."""
|
| 65 |
-
try:
|
| 66 |
-
from PIL import Image
|
| 67 |
-
p = Path(image_path)
|
| 68 |
-
if not p.exists():
|
| 69 |
-
return ""
|
| 70 |
-
with Image.open(p) as img:
|
| 71 |
-
if img.width > max_width:
|
| 72 |
-
ratio = max_width / img.width
|
| 73 |
-
new_h = max(1, int(img.height * ratio))
|
| 74 |
-
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 75 |
-
# Convertir en RGB pour éviter les problèmes de mode (RGBA, palette…)
|
| 76 |
-
if img.mode not in ("RGB", "L"):
|
| 77 |
-
img = img.convert("RGB")
|
| 78 |
-
buf = io.BytesIO()
|
| 79 |
-
fmt = "JPEG" if p.suffix.lower() in (".jpg", ".jpeg") else "PNG"
|
| 80 |
-
img.save(buf, format=fmt, optimize=True, quality=85)
|
| 81 |
-
b64 = base64.b64encode(buf.getvalue()).decode("ascii")
|
| 82 |
-
mime = "image/jpeg" if fmt == "JPEG" else "image/png"
|
| 83 |
-
return f"data:{mime};base64,{b64}"
|
| 84 |
-
except Exception:
|
| 85 |
-
return ""
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def _externalize_images_to_dir(
|
| 89 |
-
benchmark: "BenchmarkResult",
|
| 90 |
-
output_dir: Path,
|
| 91 |
-
max_width: int = 1200,
|
| 92 |
-
asset_subdir: str = "report-assets",
|
| 93 |
-
) -> dict[str, str]:
|
| 94 |
-
"""Sprint A5 (item M-16) — écrit les images sur disque dans un
|
| 95 |
-
sous-dossier à côté du HTML, et retourne ``{doc_id: url_relative}``.
|
| 96 |
-
|
| 97 |
-
Mode « lazy loading » : au lieu d'embarquer chaque image en
|
| 98 |
-
base64 dans le HTML (50 MB+ pour un corpus de 100 documents,
|
| 99 |
-
~200 MB+ pour 1 000 documents), on les externalise en fichiers
|
| 100 |
-
PNG/JPEG locaux. Le HTML les référence via ``<img src="report-assets/…">``
|
| 101 |
-
avec ``loading="lazy"`` côté navigateur.
|
| 102 |
-
|
| 103 |
-
Le rapport reste auto-portant si l'utilisateur copie le dossier
|
| 104 |
-
``report-assets/`` à côté du HTML (cf. CLI ``--lazy-images``).
|
| 105 |
-
|
| 106 |
-
Parameters
|
| 107 |
-
----------
|
| 108 |
-
benchmark:
|
| 109 |
-
Résultat de benchmark (lit ``image_path`` de chaque DocumentResult).
|
| 110 |
-
output_dir:
|
| 111 |
-
Dossier où le HTML sera écrit ; le sous-dossier d'assets sera
|
| 112 |
-
créé à côté.
|
| 113 |
-
max_width:
|
| 114 |
-
Largeur max du redimensionnement (cohérent avec
|
| 115 |
-
``_encode_image_b64``).
|
| 116 |
-
asset_subdir:
|
| 117 |
-
Nom du sous-dossier d'assets (défaut ``"report-assets"``).
|
| 118 |
-
|
| 119 |
-
Returns
|
| 120 |
-
-------
|
| 121 |
-
dict[str, str]
|
| 122 |
-
``{doc_id: "report-assets/<doc_id>.png"}`` (URL relative
|
| 123 |
-
consommable directement dans un attribut HTML ``src``).
|
| 124 |
-
"""
|
| 125 |
-
from PIL import Image
|
| 126 |
-
|
| 127 |
-
assets_dir = output_dir / asset_subdir
|
| 128 |
-
assets_dir.mkdir(parents=True, exist_ok=True)
|
| 129 |
-
out: dict[str, str] = {}
|
| 130 |
-
|
| 131 |
-
seen_ids: set[str] = set()
|
| 132 |
-
for engine_report in benchmark.engine_reports:
|
| 133 |
-
for dr in engine_report.document_results:
|
| 134 |
-
doc_id = dr.doc_id
|
| 135 |
-
if doc_id in seen_ids:
|
| 136 |
-
continue
|
| 137 |
-
seen_ids.add(doc_id)
|
| 138 |
-
try:
|
| 139 |
-
src = Path(dr.image_path)
|
| 140 |
-
if not src.exists():
|
| 141 |
-
continue
|
| 142 |
-
# Nom de fichier dérivé du doc_id, normalisé sans
|
| 143 |
-
# caractères dangereux pour le filesystem.
|
| 144 |
-
safe_id = "".join(
|
| 145 |
-
c if c.isalnum() or c in "._-" else "_" for c in doc_id
|
| 146 |
-
)
|
| 147 |
-
dest = assets_dir / f"{safe_id}{src.suffix.lower() or '.png'}"
|
| 148 |
-
with Image.open(src) as img:
|
| 149 |
-
if img.width > max_width:
|
| 150 |
-
ratio = max_width / img.width
|
| 151 |
-
new_h = max(1, int(img.height * ratio))
|
| 152 |
-
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 153 |
-
if img.mode not in ("RGB", "L"):
|
| 154 |
-
img = img.convert("RGB")
|
| 155 |
-
fmt = "JPEG" if dest.suffix in (".jpg", ".jpeg") else "PNG"
|
| 156 |
-
img.save(dest, format=fmt, optimize=True, quality=85)
|
| 157 |
-
# URL relative (POSIX style même sur Windows pour HTML).
|
| 158 |
-
out[doc_id] = f"{asset_subdir}/{dest.name}"
|
| 159 |
-
except Exception as exc: # noqa: BLE001 — fallback silencieux + warning
|
| 160 |
-
logger.warning(
|
| 161 |
-
"[report] échec d'externalisation de l'image %s : %s — "
|
| 162 |
-
"le rapport ignorera cette image",
|
| 163 |
-
dr.image_path,
|
| 164 |
-
exc,
|
| 165 |
-
)
|
| 166 |
-
return out
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
def _encode_images_b64_from_result(benchmark: "BenchmarkResult", max_width: int = 1200) -> dict[str, str]:
|
| 170 |
-
"""Encode toutes les images d'un BenchmarkResult en base64.
|
| 171 |
-
|
| 172 |
-
Returns
|
| 173 |
-
-------
|
| 174 |
-
dict
|
| 175 |
-
``{doc_id: data_uri}``
|
| 176 |
-
"""
|
| 177 |
-
images: dict[str, str] = {}
|
| 178 |
-
if not benchmark.engine_reports:
|
| 179 |
-
return images
|
| 180 |
-
for dr in benchmark.engine_reports[0].document_results:
|
| 181 |
-
if dr.image_path and dr.doc_id not in images:
|
| 182 |
-
uri = _encode_image_b64(dr.image_path, max_width=max_width)
|
| 183 |
-
if uri:
|
| 184 |
-
images[dr.doc_id] = uri
|
| 185 |
-
return images
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
def _cer_color(cer: float) -> str:
|
| 189 |
-
"""Retourne une couleur CSS pour un score CER donné (0→vert, 1→rouge)."""
|
| 190 |
-
from picarones.report.colors import COLOR_GREEN, COLOR_YELLOW, COLOR_ORANGE, COLOR_RED
|
| 191 |
-
if cer < 0.05:
|
| 192 |
-
return COLOR_GREEN
|
| 193 |
-
if cer < 0.15:
|
| 194 |
-
return COLOR_YELLOW
|
| 195 |
-
if cer < 0.30:
|
| 196 |
-
return COLOR_ORANGE
|
| 197 |
-
return COLOR_RED
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
def _cer_bg(cer: float) -> str:
|
| 201 |
-
from picarones.report.colors import BG_GREEN, BG_YELLOW, BG_ORANGE, BG_RED
|
| 202 |
-
if cer < 0.05:
|
| 203 |
-
return BG_GREEN
|
| 204 |
-
if cer < 0.15:
|
| 205 |
-
return BG_YELLOW
|
| 206 |
-
if cer < 0.30:
|
| 207 |
-
return BG_ORANGE
|
| 208 |
-
return BG_RED
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
def _pct(v: Optional[float], decimals: int = 2) -> str:
|
| 212 |
-
if v is None:
|
| 213 |
-
return "—"
|
| 214 |
-
return f"{v * 100:.{decimals}f} %"
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
def _safe(v: Optional[float], decimals: int = 4) -> float:
|
| 218 |
-
return round(v or 0.0, decimals)
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
# ---------------------------------------------------------------------------
|
| 222 |
-
# Préparation des données
|
| 223 |
-
# ---------------------------------------------------------------------------
|
| 224 |
-
|
| 225 |
-
def _build_report_data(benchmark: BenchmarkResult, images_b64: dict[str, str]) -> dict:
|
| 226 |
-
"""Transforme un BenchmarkResult en dict JSON pour le rapport HTML."""
|
| 227 |
-
|
| 228 |
-
engines_summary = []
|
| 229 |
-
for report in benchmark.engine_reports:
|
| 230 |
-
agg = report.aggregated_metrics
|
| 231 |
-
diplo_agg = agg.get("cer_diplomatic", {})
|
| 232 |
-
entry: dict = {
|
| 233 |
-
"name": report.engine_name,
|
| 234 |
-
"version": report.engine_version,
|
| 235 |
-
"cer": _safe(agg.get("cer", {}).get("mean")),
|
| 236 |
-
"wer": _safe(agg.get("wer", {}).get("mean")),
|
| 237 |
-
"mer": _safe(agg.get("mer", {}).get("mean")),
|
| 238 |
-
"wil": _safe(agg.get("wil", {}).get("mean")),
|
| 239 |
-
"cer_median": _safe(agg.get("cer", {}).get("median")),
|
| 240 |
-
"cer_min": _safe(agg.get("cer", {}).get("min")),
|
| 241 |
-
"cer_max": _safe(agg.get("cer", {}).get("max")),
|
| 242 |
-
"doc_count": agg.get("document_count", 0),
|
| 243 |
-
"failed": agg.get("failed_count", 0),
|
| 244 |
-
# CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
|
| 245 |
-
"cer_diplomatic": _safe(diplo_agg.get("mean")) if diplo_agg else None,
|
| 246 |
-
"cer_diplomatic_profile": diplo_agg.get("profile"),
|
| 247 |
-
# Distribution pour l'histogramme : liste des CER individuels
|
| 248 |
-
"cer_values": [
|
| 249 |
-
_safe(dr.metrics.cer)
|
| 250 |
-
for dr in report.document_results
|
| 251 |
-
if dr.metrics.error is None
|
| 252 |
-
],
|
| 253 |
-
"cer_diplomatic_values": [
|
| 254 |
-
_safe(dr.metrics.cer_diplomatic)
|
| 255 |
-
for dr in report.document_results
|
| 256 |
-
if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
|
| 257 |
-
],
|
| 258 |
-
# Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
|
| 259 |
-
"is_pipeline": report.is_pipeline,
|
| 260 |
-
"pipeline_info": report.pipeline_info,
|
| 261 |
-
# Sprint 5 — métriques avancées patrimoniales
|
| 262 |
-
"ligature_score": _safe(report.ligature_score) if report.ligature_score is not None else None,
|
| 263 |
-
"diacritic_score": _safe(report.diacritic_score) if report.diacritic_score is not None else None,
|
| 264 |
-
"aggregated_confusion": report.aggregated_confusion,
|
| 265 |
-
"aggregated_taxonomy": report.aggregated_taxonomy,
|
| 266 |
-
"aggregated_structure": report.aggregated_structure,
|
| 267 |
-
"aggregated_image_quality": report.aggregated_image_quality,
|
| 268 |
-
# Sprint 10 — distribution des erreurs + hallucinations VLM
|
| 269 |
-
"gini": _safe(report.aggregated_line_metrics.get("gini_mean")) if report.aggregated_line_metrics else None,
|
| 270 |
-
"cer_p90": _safe(report.aggregated_line_metrics.get("percentiles", {}).get("p90")) if report.aggregated_line_metrics else None,
|
| 271 |
-
"cer_p99": _safe(report.aggregated_line_metrics.get("percentiles", {}).get("p99")) if report.aggregated_line_metrics else None,
|
| 272 |
-
"catastrophic_rate_30": _safe(report.aggregated_line_metrics.get("catastrophic_rate", {}).get("0.3")) if report.aggregated_line_metrics else None,
|
| 273 |
-
"aggregated_line_metrics": report.aggregated_line_metrics,
|
| 274 |
-
"anchor_score": _safe(report.aggregated_hallucination.get("anchor_score_mean")) if report.aggregated_hallucination else None,
|
| 275 |
-
"length_ratio": _safe(report.aggregated_hallucination.get("length_ratio_mean")) if report.aggregated_hallucination else None,
|
| 276 |
-
"hallucinating_doc_rate": _safe(report.aggregated_hallucination.get("hallucinating_doc_rate")) if report.aggregated_hallucination else None,
|
| 277 |
-
"aggregated_hallucination": report.aggregated_hallucination,
|
| 278 |
-
# Sprint 41 — NER agrégé (None si aucun calcul effectué)
|
| 279 |
-
"aggregated_ner": report.aggregated_ner,
|
| 280 |
-
# Sprint 43 — calibration agrégée (None si aucune confidence
|
| 281 |
-
# n'a été exposée par le moteur sur ce corpus)
|
| 282 |
-
"aggregated_calibration": report.aggregated_calibration,
|
| 283 |
-
# Sprint 62 — profil philologique agrégé (None si aucun
|
| 284 |
-
# signal philologique sur le corpus pour ce moteur)
|
| 285 |
-
"aggregated_philological": report.aggregated_philological,
|
| 286 |
-
# Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
|
| 287 |
-
# numériques). None si aucun document n'a de signal.
|
| 288 |
-
"aggregated_searchability": report.aggregated_searchability,
|
| 289 |
-
"aggregated_numerical_sequences": (
|
| 290 |
-
report.aggregated_numerical_sequences
|
| 291 |
-
),
|
| 292 |
-
# Sprint 87 — A.II.2 (delta Flesch agrégé)
|
| 293 |
-
"aggregated_readability": report.aggregated_readability,
|
| 294 |
-
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 295 |
-
}
|
| 296 |
-
engines_summary.append(entry)
|
| 297 |
-
|
| 298 |
-
# Documents (vue galerie + vue détail)
|
| 299 |
-
# On collecte tous les doc_ids depuis l'union de tous les moteurs,
|
| 300 |
-
# en préservant l'ordre d'apparition (premier moteur d'abord, puis compléments).
|
| 301 |
-
seen_doc_ids: set[str] = set()
|
| 302 |
-
doc_ids_ordered: list[str] = []
|
| 303 |
-
for report in benchmark.engine_reports:
|
| 304 |
-
for dr in report.document_results:
|
| 305 |
-
if dr.doc_id not in seen_doc_ids:
|
| 306 |
-
seen_doc_ids.add(dr.doc_id)
|
| 307 |
-
doc_ids_ordered.append(dr.doc_id)
|
| 308 |
-
|
| 309 |
-
# Index croisé : doc_id → {engine_name → DocumentResult}
|
| 310 |
-
doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
|
| 311 |
-
for report in benchmark.engine_reports:
|
| 312 |
-
for dr in report.document_results:
|
| 313 |
-
doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
|
| 314 |
-
|
| 315 |
-
documents = []
|
| 316 |
-
for doc_id in doc_ids_ordered:
|
| 317 |
-
engine_results = []
|
| 318 |
-
gt = ""
|
| 319 |
-
image_path = ""
|
| 320 |
-
for engine_name in [r.engine_name for r in benchmark.engine_reports]:
|
| 321 |
-
dr = doc_engine_map[doc_id].get(engine_name)
|
| 322 |
-
if dr is None:
|
| 323 |
-
continue
|
| 324 |
-
gt = dr.ground_truth
|
| 325 |
-
image_path = dr.image_path
|
| 326 |
-
diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
|
| 327 |
-
er_entry: dict = {
|
| 328 |
-
"engine": engine_name,
|
| 329 |
-
"hypothesis": dr.hypothesis,
|
| 330 |
-
"cer": _safe(dr.metrics.cer),
|
| 331 |
-
"cer_diplomatic": _safe(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
|
| 332 |
-
"wer": _safe(dr.metrics.wer),
|
| 333 |
-
"mer": _safe(dr.metrics.mer),
|
| 334 |
-
"wil": _safe(dr.metrics.wil),
|
| 335 |
-
"duration": dr.duration_seconds,
|
| 336 |
-
"error": dr.engine_error,
|
| 337 |
-
"diff": diff_ops,
|
| 338 |
-
}
|
| 339 |
-
# Champs spécifiques aux pipelines OCR+LLM
|
| 340 |
-
if dr.ocr_intermediate is not None:
|
| 341 |
-
er_entry["ocr_intermediate"] = dr.ocr_intermediate
|
| 342 |
-
er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
|
| 343 |
-
er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
|
| 344 |
-
if dr.pipeline_metadata:
|
| 345 |
-
on = dr.pipeline_metadata.get("over_normalization")
|
| 346 |
-
if on is not None:
|
| 347 |
-
er_entry["over_normalization"] = on
|
| 348 |
-
er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
|
| 349 |
-
# Sprint 5 — métriques avancées par document
|
| 350 |
-
if dr.char_scores is not None:
|
| 351 |
-
er_entry["ligature_score"] = _safe(dr.char_scores.get("ligature", {}).get("score"))
|
| 352 |
-
er_entry["diacritic_score"] = _safe(dr.char_scores.get("diacritic", {}).get("score"))
|
| 353 |
-
if dr.taxonomy is not None:
|
| 354 |
-
er_entry["taxonomy"] = dr.taxonomy
|
| 355 |
-
if dr.structure is not None:
|
| 356 |
-
er_entry["structure"] = dr.structure
|
| 357 |
-
if dr.image_quality is not None:
|
| 358 |
-
er_entry["image_quality"] = dr.image_quality
|
| 359 |
-
# Sprint 10
|
| 360 |
-
if dr.line_metrics is not None:
|
| 361 |
-
er_entry["line_metrics"] = dr.line_metrics
|
| 362 |
-
if dr.hallucination_metrics is not None:
|
| 363 |
-
er_entry["hallucination_metrics"] = dr.hallucination_metrics
|
| 364 |
-
engine_results.append(er_entry)
|
| 365 |
-
|
| 366 |
-
# CER moyen sur ce document (pour le badge galerie)
|
| 367 |
-
cer_values = [er["cer"] for er in engine_results if er["error"] is None]
|
| 368 |
-
mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
|
| 369 |
-
best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
|
| 370 |
-
|
| 371 |
-
# Script type (depuis metadata par document si disponible)
|
| 372 |
-
script_type = ""
|
| 373 |
-
first_dr = doc_engine_map[doc_id].get(
|
| 374 |
-
benchmark.engine_reports[0].engine_name if benchmark.engine_reports else None
|
| 375 |
-
)
|
| 376 |
-
if first_dr and first_dr.image_quality:
|
| 377 |
-
script_type = first_dr.image_quality.get("script_type", "")
|
| 378 |
-
|
| 379 |
-
documents.append({
|
| 380 |
-
"doc_id": doc_id,
|
| 381 |
-
"image_path": image_path,
|
| 382 |
-
"image_b64": images_b64.get(doc_id, ""),
|
| 383 |
-
"ground_truth": gt,
|
| 384 |
-
"mean_cer": _safe(mean_cer),
|
| 385 |
-
"best_engine": best_engine["engine"] if best_engine else "",
|
| 386 |
-
"engine_results": engine_results,
|
| 387 |
-
"script_type": script_type,
|
| 388 |
-
})
|
| 389 |
-
|
| 390 |
-
# ── Sprint 7 — Score de difficulté intrinsèque ───────────────────────
|
| 391 |
-
gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
|
| 392 |
-
cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
|
| 393 |
-
iq_map: dict[str, float] = {}
|
| 394 |
-
for report in benchmark.engine_reports:
|
| 395 |
-
for dr in report.document_results:
|
| 396 |
-
cer_map.setdefault(dr.doc_id, {})[report.engine_name] = _safe(dr.metrics.cer)
|
| 397 |
-
if dr.image_quality and "quality_score" in dr.image_quality:
|
| 398 |
-
iq_map[dr.doc_id] = dr.image_quality["quality_score"]
|
| 399 |
-
difficulty_scores = compute_all_difficulties(
|
| 400 |
-
doc_ids=doc_ids_ordered,
|
| 401 |
-
ground_truths=gt_map,
|
| 402 |
-
cer_map=cer_map,
|
| 403 |
-
image_quality_map=iq_map or None,
|
| 404 |
-
)
|
| 405 |
-
# Ajouter difficulty_score à chaque document
|
| 406 |
-
for doc in documents:
|
| 407 |
-
ds = difficulty_scores.get(doc["doc_id"])
|
| 408 |
-
if ds:
|
| 409 |
-
doc["difficulty_score"] = _safe(ds.score)
|
| 410 |
-
doc["difficulty_label"] = difficulty_label(ds.score)
|
| 411 |
-
else:
|
| 412 |
-
doc["difficulty_score"] = 0.5
|
| 413 |
-
doc["difficulty_label"] = "Modéré"
|
| 414 |
-
|
| 415 |
-
# ── Sprint 7 — Tests statistiques (Wilcoxon pairwise + bootstrap CI) ─
|
| 416 |
-
engine_cer_map_stats: dict[str, list[float]] = {}
|
| 417 |
-
for report in benchmark.engine_reports:
|
| 418 |
-
vals = [_safe(dr.metrics.cer) for dr in report.document_results if dr.metrics.error is None]
|
| 419 |
-
if vals:
|
| 420 |
-
engine_cer_map_stats[report.engine_name] = vals
|
| 421 |
-
|
| 422 |
-
pairwise_stats = compute_pairwise_stats(engine_cer_map_stats)
|
| 423 |
-
|
| 424 |
-
# ── Sprint 17 — Friedman + Nemenyi ──────────────────────────────────
|
| 425 |
-
# Alignement strict sur le même ordre de documents : on reconstruit la
|
| 426 |
-
# map à partir des documents communs à tous les moteurs, sinon Friedman
|
| 427 |
-
# n'est pas applicable.
|
| 428 |
-
engine_cer_aligned: dict[str, list[float]] = {}
|
| 429 |
-
common_doc_ids: Optional[set[str]] = None
|
| 430 |
-
for report in benchmark.engine_reports:
|
| 431 |
-
doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
|
| 432 |
-
common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
|
| 433 |
-
if common_doc_ids:
|
| 434 |
-
ordered_common = [d for d in doc_ids_ordered if d in common_doc_ids]
|
| 435 |
-
for report in benchmark.engine_reports:
|
| 436 |
-
dr_by_id = {dr.doc_id: dr for dr in report.document_results}
|
| 437 |
-
engine_cer_aligned[report.engine_name] = [
|
| 438 |
-
_safe(dr_by_id[d].metrics.cer) for d in ordered_common
|
| 439 |
-
]
|
| 440 |
-
|
| 441 |
-
friedman = friedman_test(engine_cer_aligned) if engine_cer_aligned else {
|
| 442 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 443 |
-
"df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 444 |
-
"interpretation": "Test de Friedman non calculé — aucun document commun.",
|
| 445 |
-
"error": "no_common_documents",
|
| 446 |
-
}
|
| 447 |
-
nemenyi = nemenyi_posthoc(engine_cer_aligned) if engine_cer_aligned else {
|
| 448 |
-
"alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
|
| 449 |
-
"n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 450 |
-
"engines_sorted": [], "significant_matrix": [], "tied_groups": [],
|
| 451 |
-
"error": "no_common_documents",
|
| 452 |
-
}
|
| 453 |
-
|
| 454 |
-
bootstrap_cis: list[dict] = []
|
| 455 |
-
for engine_name, vals in engine_cer_map_stats.items():
|
| 456 |
-
lo, hi = bootstrap_ci(vals)
|
| 457 |
-
mean_v = sum(vals) / len(vals) if vals else 0.0
|
| 458 |
-
bootstrap_cis.append({
|
| 459 |
-
"engine": engine_name,
|
| 460 |
-
"mean": _safe(mean_v),
|
| 461 |
-
"ci_lower": _safe(lo),
|
| 462 |
-
"ci_upper": _safe(hi),
|
| 463 |
-
})
|
| 464 |
-
|
| 465 |
-
# ── Sprint 7 — Courbes de fiabilité ──────────────────────────────────
|
| 466 |
-
reliability_curves: list[dict] = []
|
| 467 |
-
for report in benchmark.engine_reports:
|
| 468 |
-
vals = [_safe(dr.metrics.cer) for dr in report.document_results if dr.metrics.error is None]
|
| 469 |
-
curve = compute_reliability_curve(vals)
|
| 470 |
-
reliability_curves.append({
|
| 471 |
-
"engine": report.engine_name,
|
| 472 |
-
"points": curve,
|
| 473 |
-
})
|
| 474 |
-
|
| 475 |
-
# ── Sprint 7 — Venn des erreurs communes / exclusives ────────────────
|
| 476 |
-
# Construire les ensembles d'erreurs par moteur : {engine → set(doc_id:gt_tok:hyp_tok)}
|
| 477 |
-
venn_error_sets: dict[str, set[str]] = {}
|
| 478 |
-
for report in benchmark.engine_reports:
|
| 479 |
-
error_set: set[str] = set()
|
| 480 |
-
for dr in report.document_results:
|
| 481 |
-
ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
|
| 482 |
-
for op in ops:
|
| 483 |
-
if op["op"] in ("replace", "delete", "insert"):
|
| 484 |
-
key = f"{dr.doc_id}:{op.get('old', op.get('text',''))}:{op.get('new', op.get('text',''))}"
|
| 485 |
-
error_set.add(key)
|
| 486 |
-
venn_error_sets[report.engine_name] = error_set
|
| 487 |
-
|
| 488 |
-
venn_data = compute_venn_data(venn_error_sets)
|
| 489 |
-
|
| 490 |
-
# ── Sprint 7 — Clustering des patterns d'erreurs ─────────────────────
|
| 491 |
-
error_data_all: list[dict] = []
|
| 492 |
-
for report in benchmark.engine_reports:
|
| 493 |
-
for dr in report.document_results:
|
| 494 |
-
error_data_all.append({
|
| 495 |
-
"engine": report.engine_name,
|
| 496 |
-
"gt": dr.ground_truth,
|
| 497 |
-
"hypothesis": dr.hypothesis,
|
| 498 |
-
})
|
| 499 |
-
error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
|
| 500 |
-
error_clusters = [c.as_dict() for c in error_clusters_raw]
|
| 501 |
-
|
| 502 |
-
# ── Sprint 7 — Matrice de corrélation ────────────────────────────────
|
| 503 |
-
# Pour chaque moteur : une liste de dicts métriques par document
|
| 504 |
-
correlation_per_engine: list[dict] = []
|
| 505 |
-
for report in benchmark.engine_reports:
|
| 506 |
-
metrics_list = []
|
| 507 |
-
for dr in report.document_results:
|
| 508 |
-
if dr.metrics.error is not None:
|
| 509 |
-
continue
|
| 510 |
-
entry: dict[str, float] = {
|
| 511 |
-
"cer": _safe(dr.metrics.cer),
|
| 512 |
-
"wer": _safe(dr.metrics.wer),
|
| 513 |
-
"mer": _safe(dr.metrics.mer),
|
| 514 |
-
"wil": _safe(dr.metrics.wil),
|
| 515 |
-
}
|
| 516 |
-
if dr.image_quality:
|
| 517 |
-
entry["quality_score"] = _safe(dr.image_quality.get("quality_score", 0.5))
|
| 518 |
-
entry["sharpness"] = _safe(dr.image_quality.get("sharpness_score", 0.5))
|
| 519 |
-
if dr.char_scores:
|
| 520 |
-
entry["ligature"] = _safe(dr.char_scores.get("ligature", {}).get("score", 0.5))
|
| 521 |
-
entry["diacritic"] = _safe(dr.char_scores.get("diacritic", {}).get("score", 0.5))
|
| 522 |
-
metrics_list.append(entry)
|
| 523 |
-
if metrics_list:
|
| 524 |
-
corr = compute_correlation_matrix(metrics_list)
|
| 525 |
-
correlation_per_engine.append({
|
| 526 |
-
"engine": report.engine_name,
|
| 527 |
-
**corr,
|
| 528 |
-
})
|
| 529 |
|
| 530 |
-
|
| 531 |
-
|
| 532 |
-
|
| 533 |
-
|
| 534 |
-
|
| 535 |
-
cer_val = report.mean_cer
|
| 536 |
-
if gini_val is not None and cer_val is not None:
|
| 537 |
-
gini_vs_cer.append({
|
| 538 |
-
"engine": report.engine_name,
|
| 539 |
-
"cer": _safe(cer_val),
|
| 540 |
-
"gini": _safe(gini_val),
|
| 541 |
-
"is_pipeline": report.is_pipeline,
|
| 542 |
-
})
|
| 543 |
|
| 544 |
-
|
| 545 |
-
# Durée moyenne mesurée par moteur sur le benchmark courant (sec/page)
|
| 546 |
-
durations_by_engine: dict[str, float] = {}
|
| 547 |
-
for report in benchmark.engine_reports:
|
| 548 |
-
durs = [dr.duration_seconds for dr in report.document_results
|
| 549 |
-
if dr.duration_seconds is not None]
|
| 550 |
-
if durs:
|
| 551 |
-
durations_by_engine[report.engine_name] = sum(durs) / len(durs)
|
| 552 |
-
|
| 553 |
-
pricing_defaults, _ = load_pricing_database()
|
| 554 |
-
costs_by_engine = build_costs_for_benchmark(
|
| 555 |
-
engines_summary, durations_by_engine,
|
| 556 |
-
)
|
| 557 |
-
# Annoter chaque résumé moteur avec son coût et sa durée
|
| 558 |
-
for entry in engines_summary:
|
| 559 |
-
name = entry["name"]
|
| 560 |
-
entry["mean_duration_seconds"] = round(durations_by_engine.get(name, 0.0), 4) \
|
| 561 |
-
if name in durations_by_engine else None
|
| 562 |
-
entry["cost"] = costs_by_engine.get(name)
|
| 563 |
-
|
| 564 |
-
# Front Pareto sur (CER moyen, coût €/1000 pages) — moteurs avec les deux dispos
|
| 565 |
-
pareto_points = []
|
| 566 |
-
for entry in engines_summary:
|
| 567 |
-
cer = entry.get("cer")
|
| 568 |
-
cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
|
| 569 |
-
if cer is None or cost is None:
|
| 570 |
-
continue
|
| 571 |
-
pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
|
| 572 |
-
pareto_front_engines = compute_pareto_front(
|
| 573 |
-
pareto_points, objectives=("cer", "cost"),
|
| 574 |
-
)
|
| 575 |
-
|
| 576 |
-
# Front Pareto secondaire (CER, vitesse) pour le toggle "vitesse"
|
| 577 |
-
pareto_speed_points = []
|
| 578 |
-
for entry in engines_summary:
|
| 579 |
-
cer = entry.get("cer")
|
| 580 |
-
dur = entry.get("mean_duration_seconds")
|
| 581 |
-
if cer is None or dur is None:
|
| 582 |
-
continue
|
| 583 |
-
pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
|
| 584 |
-
pareto_front_speed = compute_pareto_front(
|
| 585 |
-
pareto_speed_points, objectives=("cer", "dur"),
|
| 586 |
-
)
|
| 587 |
-
|
| 588 |
-
# Front Pareto carbone (CER, g CO2 / 1000 pages) — étiqueté expérimental
|
| 589 |
-
pareto_co2_points = []
|
| 590 |
-
for entry in engines_summary:
|
| 591 |
-
cer = entry.get("cer")
|
| 592 |
-
co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
|
| 593 |
-
if cer is None or co2 is None:
|
| 594 |
-
continue
|
| 595 |
-
pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
|
| 596 |
-
pareto_front_co2 = compute_pareto_front(
|
| 597 |
-
pareto_co2_points, objectives=("cer", "co2"),
|
| 598 |
-
)
|
| 599 |
-
|
| 600 |
-
pareto_data = {
|
| 601 |
-
"cost": {
|
| 602 |
-
"points": pareto_points,
|
| 603 |
-
"front": pareto_front_engines,
|
| 604 |
-
"axis_label": "Coût (€ / 1000 pages)",
|
| 605 |
-
},
|
| 606 |
-
"speed": {
|
| 607 |
-
"points": pareto_speed_points,
|
| 608 |
-
"front": pareto_front_speed,
|
| 609 |
-
"axis_label": "Temps moyen (s / page)",
|
| 610 |
-
},
|
| 611 |
-
"co2": {
|
| 612 |
-
"points": pareto_co2_points,
|
| 613 |
-
"front": pareto_front_co2,
|
| 614 |
-
"axis_label": "Empreinte carbone (g CO₂ / 1000 pages, expérimental)",
|
| 615 |
-
},
|
| 616 |
-
"pricing_meta": {
|
| 617 |
-
"last_updated": pricing_defaults.last_updated,
|
| 618 |
-
"currency": pricing_defaults.currency,
|
| 619 |
-
"hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
|
| 620 |
-
"hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
|
| 621 |
-
"grid_intensity_local": pricing_defaults.grid_intensity_local,
|
| 622 |
-
"grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
|
| 623 |
-
},
|
| 624 |
-
}
|
| 625 |
-
|
| 626 |
-
# Scatter 2 : ratio longueur vs score d'ancrage (moteurs)
|
| 627 |
-
ratio_vs_anchor = []
|
| 628 |
-
for report in benchmark.engine_reports:
|
| 629 |
-
if report.aggregated_hallucination:
|
| 630 |
-
ratio_vs_anchor.append({
|
| 631 |
-
"engine": report.engine_name,
|
| 632 |
-
"length_ratio": _safe(report.aggregated_hallucination.get("length_ratio_mean", 1.0)),
|
| 633 |
-
"anchor_score": _safe(report.aggregated_hallucination.get("anchor_score_mean", 1.0)),
|
| 634 |
-
"hallucinating_rate": _safe(report.aggregated_hallucination.get("hallucinating_doc_rate", 0.0)),
|
| 635 |
-
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 636 |
-
})
|
| 637 |
-
|
| 638 |
-
return {
|
| 639 |
-
"meta": {
|
| 640 |
-
"corpus_name": benchmark.corpus_name,
|
| 641 |
-
"corpus_source": benchmark.corpus_source,
|
| 642 |
-
"document_count": benchmark.document_count,
|
| 643 |
-
"run_date": benchmark.run_date,
|
| 644 |
-
"picarones_version": benchmark.picarones_version,
|
| 645 |
-
"metadata": benchmark.metadata,
|
| 646 |
-
},
|
| 647 |
-
"ranking": benchmark.ranking(),
|
| 648 |
-
"engines": engines_summary,
|
| 649 |
-
"documents": documents,
|
| 650 |
-
# Sprint 7
|
| 651 |
-
"statistics": {
|
| 652 |
-
"pairwise_wilcoxon": pairwise_stats,
|
| 653 |
-
"bootstrap_cis": bootstrap_cis,
|
| 654 |
-
# Sprint 17 — Friedman multi-moteurs + post-hoc Nemenyi + CDD
|
| 655 |
-
"friedman": friedman,
|
| 656 |
-
"nemenyi": nemenyi,
|
| 657 |
-
},
|
| 658 |
-
"reliability_curves": reliability_curves,
|
| 659 |
-
"venn_data": venn_data,
|
| 660 |
-
"error_clusters": error_clusters,
|
| 661 |
-
"correlation_per_engine": correlation_per_engine,
|
| 662 |
-
# Sprint 10
|
| 663 |
-
"gini_vs_cer": gini_vs_cer,
|
| 664 |
-
"ratio_vs_anchor": ratio_vs_anchor,
|
| 665 |
-
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 666 |
-
"pareto": pareto_data,
|
| 667 |
-
# Sprint 36 — analyse inter-moteurs (divergence taxonomique +
|
| 668 |
-
# complémentarité / oracle). ``None`` si moins de 2 moteurs.
|
| 669 |
-
"inter_engine_analysis": benchmark.inter_engine_analysis,
|
| 670 |
-
# Sprint 45-46 — stratification par script_type
|
| 671 |
-
"available_strata": benchmark.available_strata(),
|
| 672 |
-
"stratified_ranking": benchmark.stratified_ranking() or None,
|
| 673 |
-
"corpus_homogeneity": benchmark.corpus_homogeneity(),
|
| 674 |
-
}
|
| 675 |
|
| 676 |
|
| 677 |
# ---------------------------------------------------------------------------
|
|
@@ -691,8 +80,8 @@ def _build_jinja_env():
|
|
| 691 |
Autoescape désactivé : le comportement est équivalent à celui du
|
| 692 |
``_HTML_TEMPLATE.format()`` historique. Les variables injectées
|
| 693 |
(JSON embarqué, SVG généré, synthèse narrative issue de templates
|
| 694 |
-
internes) sont toutes produites par le code Picarones et ne
|
| 695 |
-
pas d'échappement HTML.
|
| 696 |
"""
|
| 697 |
from jinja2 import Environment, FileSystemLoader
|
| 698 |
env = Environment(
|
|
@@ -834,174 +223,188 @@ class ReportGenerator:
|
|
| 834 |
glossary = load_glossary(self.lang)
|
| 835 |
glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
|
| 836 |
|
| 837 |
-
|
| 838 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 839 |
from picarones.report.inter_engine_render import (
|
| 840 |
build_divergence_matrix_html,
|
| 841 |
build_oracle_gap_html,
|
| 842 |
)
|
| 843 |
-
|
| 844 |
-
report_data.get("inter_engine_analysis"),
|
| 845 |
-
labels=labels,
|
| 846 |
-
)
|
| 847 |
-
oracle_gap_html = build_oracle_gap_html(
|
| 848 |
-
report_data.get("inter_engine_analysis"),
|
| 849 |
-
labels=labels,
|
| 850 |
-
)
|
| 851 |
-
|
| 852 |
-
# Sprint 41 — section NER (résumé F1 par moteur + heatmap par
|
| 853 |
-
# catégorie). Vide si aucun moteur n'a de aggregated_ner.
|
| 854 |
from picarones.report.ner_render import (
|
| 855 |
build_ner_per_category_html,
|
| 856 |
build_ner_summary_html,
|
| 857 |
)
|
| 858 |
-
ner_summary_html = build_ner_summary_html(
|
| 859 |
-
report_data.get("engines", []),
|
| 860 |
-
labels=labels,
|
| 861 |
-
)
|
| 862 |
-
ner_per_category_html = build_ner_per_category_html(
|
| 863 |
-
report_data.get("engines", []),
|
| 864 |
-
labels=labels,
|
| 865 |
-
)
|
| 866 |
-
|
| 867 |
# Sprint 43 — section calibration (tableau ECE/MCE + grille de
|
| 868 |
-
# reliability diagrams par moteur).
|
| 869 |
-
# de aggregated_calibration.
|
| 870 |
from picarones.report.calibration_render import (
|
| 871 |
build_calibration_summary_html,
|
| 872 |
build_reliability_diagrams_grid_html,
|
| 873 |
)
|
| 874 |
-
|
| 875 |
-
report_data.get("engines", []),
|
| 876 |
-
labels=labels,
|
| 877 |
-
)
|
| 878 |
-
reliability_diagrams_html = build_reliability_diagrams_grid_html(
|
| 879 |
-
report_data.get("engines", []),
|
| 880 |
-
labels=labels,
|
| 881 |
-
)
|
| 882 |
-
|
| 883 |
-
# Sprint 46 — section stratifiée (tableau par strate). Vide si
|
| 884 |
-
# aucune strate disponible.
|
| 885 |
from picarones.report.stratification_render import (
|
| 886 |
build_stratified_ranking_html,
|
| 887 |
)
|
| 888 |
-
|
| 889 |
-
report_data.get("stratified_ranking"),
|
| 890 |
-
report_data.get("available_strata"),
|
| 891 |
-
report_data.get("corpus_homogeneity"),
|
| 892 |
-
labels=labels,
|
| 893 |
-
)
|
| 894 |
-
|
| 895 |
-
# Sprint 62 — profil philologique (6 sections adaptive sur les
|
| 896 |
-
# modules philologiques Sprints 55-60). Vide si aucun moteur
|
| 897 |
-
# n'a de aggregated_philological.
|
| 898 |
from picarones.report.philological_render import (
|
| 899 |
build_philological_profile_html,
|
| 900 |
)
|
| 901 |
-
|
| 902 |
-
report_data.get("engines", []),
|
| 903 |
-
labels=labels,
|
| 904 |
-
)
|
| 905 |
-
|
| 906 |
-
# Sprint 86 — A.II.5 : recherchabilité fuzzy +
|
| 907 |
-
# séquences numériques. Adaptive : "" si aucun signal.
|
| 908 |
from picarones.report.searchability_render import (
|
| 909 |
build_searchability_summary_html,
|
| 910 |
)
|
| 911 |
from picarones.report.numerical_sequences_render import (
|
| 912 |
build_numerical_sequences_html,
|
| 913 |
)
|
| 914 |
-
searchability_html = build_searchability_summary_html(
|
| 915 |
-
report_data.get("engines", []), labels=labels,
|
| 916 |
-
)
|
| 917 |
-
numerical_sequences_html = build_numerical_sequences_html(
|
| 918 |
-
report_data.get("engines", []), labels=labels,
|
| 919 |
-
)
|
| 920 |
-
|
| 921 |
# Sprint 87 — A.II.2 : lisibilité (delta Flesch).
|
| 922 |
-
# Adaptive : "" si aucun moteur n'a de signal.
|
| 923 |
from picarones.report.readability_render import (
|
| 924 |
build_readability_summary_html,
|
| 925 |
)
|
| 926 |
-
readability_html = build_readability_summary_html(
|
| 927 |
-
report_data.get("engines", []), labels=labels,
|
| 928 |
-
)
|
| 929 |
-
|
| 930 |
# Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
| 931 |
-
# Adaptive : "" si moins de 2 moteurs avec taxonomie.
|
| 932 |
from picarones.report.specialization_render import (
|
| 933 |
build_specialization_html,
|
| 934 |
)
|
| 935 |
-
#
|
| 936 |
-
# ``aggregated_taxonomy`` ; un moteur sans taxonomie
|
| 937 |
-
# est exclu.
|
| 938 |
-
_taxos: dict = {}
|
| 939 |
-
for eng in report_data.get("engines", []):
|
| 940 |
-
tax = eng.get("aggregated_taxonomy")
|
| 941 |
-
if isinstance(tax, dict):
|
| 942 |
-
counts = tax.get("counts") if "counts" in tax else tax
|
| 943 |
-
if isinstance(counts, dict) and counts:
|
| 944 |
-
_taxos[eng.get("name", "?")] = {
|
| 945 |
-
k: float(v) for k, v in counts.items()
|
| 946 |
-
if isinstance(v, (int, float))
|
| 947 |
-
}
|
| 948 |
-
specialization_html = build_specialization_html(
|
| 949 |
-
_taxos, labels=labels,
|
| 950 |
-
)
|
| 951 |
-
|
| 952 |
-
# Chantier 3 (post-Sprint 97) — 3 nouvelles vues thématiques
|
| 953 |
-
# qui regroupent les renderers orphelins en sections
|
| 954 |
-
# collapsibles. Adaptive : retourne "" si aucune sous-section
|
| 955 |
-
# n'a de signal, donc la carte du template est masquée.
|
| 956 |
from picarones.report.views import (
|
| 957 |
build_advanced_taxonomy_view_html,
|
| 958 |
build_diagnostics_view_html,
|
| 959 |
build_economics_view_html,
|
| 960 |
)
|
| 961 |
-
|
| 962 |
-
|
| 963 |
-
|
|
|
|
|
|
|
| 964 |
)
|
| 965 |
-
|
| 966 |
-
|
| 967 |
)
|
| 968 |
-
|
| 969 |
-
|
| 970 |
)
|
| 971 |
-
|
| 972 |
-
|
| 973 |
-
template = env.get_template("base.html.j2")
|
| 974 |
-
html = template.render(
|
| 975 |
-
corpus_name=self.benchmark.corpus_name,
|
| 976 |
-
picarones_version=self.benchmark.picarones_version,
|
| 977 |
-
report_data_json=report_json,
|
| 978 |
-
i18n_json=i18n_json,
|
| 979 |
-
html_lang=labels.get("html_lang", "fr"),
|
| 980 |
-
chartjs_inline=chartjs_js,
|
| 981 |
-
critical_difference_svg=cdd_svg,
|
| 982 |
-
friedman=report_data.get("statistics", {}).get("friedman", {}),
|
| 983 |
-
synthesis=synthesis,
|
| 984 |
-
glossary_json=glossary_json,
|
| 985 |
-
divergence_matrix_html=divergence_matrix_html,
|
| 986 |
-
oracle_gap_html=oracle_gap_html,
|
| 987 |
-
ner_summary_html=ner_summary_html,
|
| 988 |
-
ner_per_category_html=ner_per_category_html,
|
| 989 |
-
calibration_summary_html=calibration_summary_html,
|
| 990 |
-
reliability_diagrams_html=reliability_diagrams_html,
|
| 991 |
-
stratified_ranking_html=stratified_ranking_html,
|
| 992 |
-
philological_profile_html=philological_profile_html,
|
| 993 |
-
searchability_html=searchability_html,
|
| 994 |
-
numerical_sequences_html=numerical_sequences_html,
|
| 995 |
-
readability_html=readability_html,
|
| 996 |
-
specialization_html=specialization_html,
|
| 997 |
-
# Chantier 3 — vues thématiques composées
|
| 998 |
-
economics_view_html=economics_view_html,
|
| 999 |
-
advanced_taxonomy_view_html=advanced_taxonomy_view_html,
|
| 1000 |
-
diagnostics_view_html=diagnostics_view_html,
|
| 1001 |
)
|
| 1002 |
|
| 1003 |
-
|
| 1004 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1005 |
|
| 1006 |
@classmethod
|
| 1007 |
def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":
|
|
|
|
| 11 |
2. Galerie — grille d'images avec badge CER coloré
|
| 12 |
3. Document — image zoomable + diff coloré GT / OCR par moteur
|
| 13 |
4. Analyses — histogramme CER + graphique radar
|
| 14 |
+
|
| 15 |
+
Architecture
|
| 16 |
+
------------
|
| 17 |
+
Ce module est l'**orchestrateur**. Les responsabilités lourdes sont
|
| 18 |
+
découpées en sous-modules :
|
| 19 |
+
|
| 20 |
+
- :mod:`picarones.report.assets` — chargement vendor.js, encodage
|
| 21 |
+
base64 d'images, externalisation lazy.
|
| 22 |
+
- :mod:`picarones.report.report_data` — construction du dict JSON
|
| 23 |
+
passé au template (engines, documents, statistiques, Pareto, etc.).
|
| 24 |
+
- :mod:`picarones.report.render_helpers` — couleurs / SVG mutualisés.
|
| 25 |
+
|
| 26 |
+
Rétrocompat
|
| 27 |
+
-----------
|
| 28 |
+
Deux noms historiques sont **encore importés par des tests** sous
|
| 29 |
+
leur préfixe ``_`` et doivent être préservés :
|
| 30 |
+
|
| 31 |
+
- ``_build_report_data`` (importé par 14 fichiers de tests).
|
| 32 |
+
- ``_cer_color`` (importé par ``tests/report/test_report.py``).
|
| 33 |
+
|
| 34 |
+
Les autres noms ``_pct``, ``_safe``, ``_cer_bg``, ``_encode_image_b64``,
|
| 35 |
+
``_encode_images_b64_from_result``, ``_externalize_images_to_dir``,
|
| 36 |
+
``_load_vendor_js`` sont soit utilisés en interne (les 3 derniers,
|
| 37 |
+
voir :meth:`ReportGenerator.generate`), soit accessibles via leur
|
| 38 |
+
nom canonique dans :mod:`picarones.report.assets` ou
|
| 39 |
+
:mod:`picarones.report.render_helpers`.
|
| 40 |
"""
|
| 41 |
|
| 42 |
from __future__ import annotations
|
| 43 |
|
|
|
|
|
|
|
| 44 |
import json
|
| 45 |
import logging
|
| 46 |
from pathlib import Path
|
| 47 |
from typing import Any, Optional
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
from picarones.core.results import BenchmarkResult
|
| 50 |
+
from picarones.measurements.statistics import build_critical_difference_svg
|
| 51 |
+
from picarones.report.assets import (
|
| 52 |
+
encode_images_b64_from_result as _encode_images_b64_from_result,
|
| 53 |
+
externalize_images_to_dir as _externalize_images_to_dir,
|
| 54 |
+
load_vendor_js as _load_vendor_js,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
+
# Ré-exports rétrocompat consommés par les tests externes (cf. docstring
|
| 58 |
+
# de module). La directive de fin de ligne documente l'intention de
|
| 59 |
+
# ré-export et empêche ruff de marquer l'import comme inutilisé.
|
| 60 |
+
from picarones.report.render_helpers import cer_step_color as _cer_color # noqa: F401
|
| 61 |
+
from picarones.report.report_data import build_report_data as _build_report_data # noqa: F401
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
+
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
|
| 66 |
# ---------------------------------------------------------------------------
|
|
|
|
| 80 |
Autoescape désactivé : le comportement est équivalent à celui du
|
| 81 |
``_HTML_TEMPLATE.format()`` historique. Les variables injectées
|
| 82 |
(JSON embarqué, SVG généré, synthèse narrative issue de templates
|
| 83 |
+
internes) sont toutes produites par le code Picarones et ne
|
| 84 |
+
nécessitent pas d'échappement HTML.
|
| 85 |
"""
|
| 86 |
from jinja2 import Environment, FileSystemLoader
|
| 87 |
env = Environment(
|
|
|
|
| 223 |
glossary = load_glossary(self.lang)
|
| 224 |
glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
|
| 225 |
|
| 226 |
+
section_html = self._build_section_html(report_data, labels)
|
| 227 |
+
|
| 228 |
+
env = _build_jinja_env()
|
| 229 |
+
template = env.get_template("base.html.j2")
|
| 230 |
+
html = template.render(
|
| 231 |
+
corpus_name=self.benchmark.corpus_name,
|
| 232 |
+
picarones_version=self.benchmark.picarones_version,
|
| 233 |
+
report_data_json=report_json,
|
| 234 |
+
i18n_json=i18n_json,
|
| 235 |
+
html_lang=labels.get("html_lang", "fr"),
|
| 236 |
+
chartjs_inline=chartjs_js,
|
| 237 |
+
critical_difference_svg=cdd_svg,
|
| 238 |
+
friedman=report_data.get("statistics", {}).get("friedman", {}),
|
| 239 |
+
synthesis=synthesis,
|
| 240 |
+
glossary_json=glossary_json,
|
| 241 |
+
**section_html,
|
| 242 |
+
)
|
| 243 |
+
|
| 244 |
+
output_path.write_text(html, encoding="utf-8")
|
| 245 |
+
return output_path.resolve()
|
| 246 |
+
|
| 247 |
+
def _build_section_html(
|
| 248 |
+
self, report_data: dict, labels: dict[str, str],
|
| 249 |
+
) -> dict[str, str]:
|
| 250 |
+
"""Construit toutes les sections HTML conditionnelles du rapport.
|
| 251 |
+
|
| 252 |
+
Chaque renderer (NER, calibration, philologie, etc.) est appelé
|
| 253 |
+
de manière indépendante. Une section retourne ``""`` si aucun
|
| 254 |
+
moteur n'a de signal pour elle — le template gère l'affichage
|
| 255 |
+
conditionnel.
|
| 256 |
+
|
| 257 |
+
Returns
|
| 258 |
+
-------
|
| 259 |
+
dict[str, str]
|
| 260 |
+
Map ``{nom_de_section: html}`` à splatter dans
|
| 261 |
+
``template.render(**section_html)``.
|
| 262 |
+
"""
|
| 263 |
+
engines = report_data.get("engines", [])
|
| 264 |
+
|
| 265 |
+
# Sprint 37 — section inter-moteurs (matrice de divergence + oracle).
|
| 266 |
from picarones.report.inter_engine_render import (
|
| 267 |
build_divergence_matrix_html,
|
| 268 |
build_oracle_gap_html,
|
| 269 |
)
|
| 270 |
+
# Sprint 41 — section NER (résumé F1 par moteur + heatmap par catégorie).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 271 |
from picarones.report.ner_render import (
|
| 272 |
build_ner_per_category_html,
|
| 273 |
build_ner_summary_html,
|
| 274 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 275 |
# Sprint 43 — section calibration (tableau ECE/MCE + grille de
|
| 276 |
+
# reliability diagrams par moteur).
|
|
|
|
| 277 |
from picarones.report.calibration_render import (
|
| 278 |
build_calibration_summary_html,
|
| 279 |
build_reliability_diagrams_grid_html,
|
| 280 |
)
|
| 281 |
+
# Sprint 46 — section stratifiée (tableau par strate).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 282 |
from picarones.report.stratification_render import (
|
| 283 |
build_stratified_ranking_html,
|
| 284 |
)
|
| 285 |
+
# Sprint 62 — profil philologique (6 sections adaptive).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 286 |
from picarones.report.philological_render import (
|
| 287 |
build_philological_profile_html,
|
| 288 |
)
|
| 289 |
+
# Sprint 86 — A.II.5 : recherchabilité fuzzy + séquences numériques.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 290 |
from picarones.report.searchability_render import (
|
| 291 |
build_searchability_summary_html,
|
| 292 |
)
|
| 293 |
from picarones.report.numerical_sequences_render import (
|
| 294 |
build_numerical_sequences_html,
|
| 295 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
# Sprint 87 — A.II.2 : lisibilité (delta Flesch).
|
|
|
|
| 297 |
from picarones.report.readability_render import (
|
| 298 |
build_readability_summary_html,
|
| 299 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 300 |
# Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
|
|
|
| 301 |
from picarones.report.specialization_render import (
|
| 302 |
build_specialization_html,
|
| 303 |
)
|
| 304 |
+
# Chantier 3 (post-Sprint 97) — 3 vues thématiques composées.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 305 |
from picarones.report.views import (
|
| 306 |
build_advanced_taxonomy_view_html,
|
| 307 |
build_diagnostics_view_html,
|
| 308 |
build_economics_view_html,
|
| 309 |
)
|
| 310 |
+
# Sprint « câblage des modules test-only » (mai 2026) — sections
|
| 311 |
+
# qui consomment les nouvelles métriques calculées dans
|
| 312 |
+
# ``report_data.extra_metrics``.
|
| 313 |
+
from picarones.report.marginal_cost_render import (
|
| 314 |
+
build_marginal_cost_html,
|
| 315 |
)
|
| 316 |
+
from picarones.report.rare_token_recall_render import (
|
| 317 |
+
build_rare_token_recall_html,
|
| 318 |
)
|
| 319 |
+
from picarones.report.taxonomy_cooccurrence_render import (
|
| 320 |
+
build_taxonomy_cooccurrence_html,
|
| 321 |
)
|
| 322 |
+
from picarones.report.taxonomy_intra_doc_render import (
|
| 323 |
+
build_taxonomy_intra_doc_html,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 324 |
)
|
| 325 |
|
| 326 |
+
# Spécialisation : construit une map {engine: counts} depuis les
|
| 327 |
+
# ``aggregated_taxonomy`` ; un moteur sans taxonomie est exclu.
|
| 328 |
+
taxos: dict = {}
|
| 329 |
+
for eng in engines:
|
| 330 |
+
tax = eng.get("aggregated_taxonomy")
|
| 331 |
+
if isinstance(tax, dict):
|
| 332 |
+
counts = tax.get("counts") if "counts" in tax else tax
|
| 333 |
+
if isinstance(counts, dict) and counts:
|
| 334 |
+
taxos[eng.get("name", "?")] = {
|
| 335 |
+
k: float(v) for k, v in counts.items()
|
| 336 |
+
if isinstance(v, (int, float))
|
| 337 |
+
}
|
| 338 |
+
|
| 339 |
+
return {
|
| 340 |
+
# Sprint 37
|
| 341 |
+
"divergence_matrix_html": build_divergence_matrix_html(
|
| 342 |
+
report_data.get("inter_engine_analysis"), labels=labels,
|
| 343 |
+
),
|
| 344 |
+
"oracle_gap_html": build_oracle_gap_html(
|
| 345 |
+
report_data.get("inter_engine_analysis"), labels=labels,
|
| 346 |
+
),
|
| 347 |
+
# Sprint 41
|
| 348 |
+
"ner_summary_html": build_ner_summary_html(engines, labels=labels),
|
| 349 |
+
"ner_per_category_html": build_ner_per_category_html(engines, labels=labels),
|
| 350 |
+
# Sprint 43
|
| 351 |
+
"calibration_summary_html": build_calibration_summary_html(
|
| 352 |
+
engines, labels=labels,
|
| 353 |
+
),
|
| 354 |
+
"reliability_diagrams_html": build_reliability_diagrams_grid_html(
|
| 355 |
+
engines, labels=labels,
|
| 356 |
+
),
|
| 357 |
+
# Sprint 46
|
| 358 |
+
"stratified_ranking_html": build_stratified_ranking_html(
|
| 359 |
+
report_data.get("stratified_ranking"),
|
| 360 |
+
report_data.get("available_strata"),
|
| 361 |
+
report_data.get("corpus_homogeneity"),
|
| 362 |
+
labels=labels,
|
| 363 |
+
),
|
| 364 |
+
# Sprint 62
|
| 365 |
+
"philological_profile_html": build_philological_profile_html(
|
| 366 |
+
engines, labels=labels,
|
| 367 |
+
),
|
| 368 |
+
# Sprint 86
|
| 369 |
+
"searchability_html": build_searchability_summary_html(
|
| 370 |
+
engines, labels=labels,
|
| 371 |
+
),
|
| 372 |
+
"numerical_sequences_html": build_numerical_sequences_html(
|
| 373 |
+
engines, labels=labels,
|
| 374 |
+
),
|
| 375 |
+
# Sprint 87
|
| 376 |
+
"readability_html": build_readability_summary_html(
|
| 377 |
+
engines, labels=labels,
|
| 378 |
+
),
|
| 379 |
+
# Sprint 89
|
| 380 |
+
"specialization_html": build_specialization_html(taxos, labels=labels),
|
| 381 |
+
# Chantier 3 — vues thématiques composées
|
| 382 |
+
"economics_view_html": build_economics_view_html(
|
| 383 |
+
report_data, labels=labels,
|
| 384 |
+
engine_reports=self.benchmark.engine_reports,
|
| 385 |
+
),
|
| 386 |
+
"advanced_taxonomy_view_html": build_advanced_taxonomy_view_html(
|
| 387 |
+
report_data, labels=labels,
|
| 388 |
+
),
|
| 389 |
+
"diagnostics_view_html": build_diagnostics_view_html(
|
| 390 |
+
report_data, labels=labels,
|
| 391 |
+
),
|
| 392 |
+
# Sprint « câblage des modules test-only » (mai 2026) :
|
| 393 |
+
# 4 nouvelles sections pour les modules câblés en
|
| 394 |
+
# ``report_data.extra_metrics``. Adaptive : "" si pas de signal.
|
| 395 |
+
"taxonomy_cooccurrence_html": build_taxonomy_cooccurrence_html(
|
| 396 |
+
report_data.get("taxonomy_cooccurrence"), labels=labels,
|
| 397 |
+
),
|
| 398 |
+
"taxonomy_intra_doc_html": build_taxonomy_intra_doc_html(
|
| 399 |
+
report_data.get("taxonomy_intra_doc"), labels=labels,
|
| 400 |
+
),
|
| 401 |
+
"rare_token_recall_html": build_rare_token_recall_html(
|
| 402 |
+
report_data.get("rare_token_recall"), labels=labels,
|
| 403 |
+
),
|
| 404 |
+
"marginal_cost_html": build_marginal_cost_html(
|
| 405 |
+
report_data.get("marginal_cost"), labels=labels,
|
| 406 |
+
),
|
| 407 |
+
}
|
| 408 |
|
| 409 |
@classmethod
|
| 410 |
def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":
|
picarones/report/image_predictive_render.py
CHANGED
|
@@ -36,21 +36,7 @@ from __future__ import annotations
|
|
| 36 |
from html import escape as _e
|
| 37 |
from typing import Optional
|
| 38 |
|
| 39 |
-
|
| 40 |
-
def _color_for_score(score: float) -> str:
|
| 41 |
-
"""Vert (faible) → orange → rouge (élevé)."""
|
| 42 |
-
f = max(0.0, min(1.0, score))
|
| 43 |
-
if f < 0.5:
|
| 44 |
-
t = f / 0.5
|
| 45 |
-
r = int(167 + (235 - 167) * t)
|
| 46 |
-
g = int(240 + (180 - 240) * t)
|
| 47 |
-
b = int(167 + (60 - 167) * t)
|
| 48 |
-
else:
|
| 49 |
-
t = (f - 0.5) / 0.5
|
| 50 |
-
r = int(235 + (220 - 235) * t)
|
| 51 |
-
g = int(180 + (50 - 180) * t)
|
| 52 |
-
b = int(60 + (50 - 60) * t)
|
| 53 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 54 |
|
| 55 |
|
| 56 |
_FEATURE_LABEL_KEYS = {
|
|
@@ -79,7 +65,7 @@ def _render_complexity_block(
|
|
| 79 |
mx = float(aggregated.get("complexity_max") or 0.0)
|
| 80 |
sd = float(aggregated.get("complexity_stdev") or 0.0)
|
| 81 |
n_docs = int(aggregated.get("n_docs") or 0)
|
| 82 |
-
color_mean =
|
| 83 |
return (
|
| 84 |
f'<div style="font-weight:600;margin:.4rem 0 .3rem 0">'
|
| 85 |
f'{_e(h_complex)}</div>'
|
|
@@ -130,7 +116,7 @@ def _render_homogeneity_block(
|
|
| 130 |
"imgpred_feat_norm", "Contribution normalisée",
|
| 131 |
)
|
| 132 |
score = float(homogeneity.get("score") or 0.0)
|
| 133 |
-
color =
|
| 134 |
parts = [
|
| 135 |
f'<div style="font-weight:600;margin:.4rem 0 .3rem 0">'
|
| 136 |
f'{_e(h_homo)} : '
|
|
@@ -157,7 +143,7 @@ def _render_homogeneity_block(
|
|
| 157 |
feat_mean = float(slot.get("mean") or 0.0)
|
| 158 |
feat_stdev = float(slot.get("stdev") or 0.0)
|
| 159 |
feat_norm = float(slot.get("normalised") or 0.0)
|
| 160 |
-
norm_color =
|
| 161 |
parts.append(
|
| 162 |
f'<tr>'
|
| 163 |
f'<td style="padding:.4rem .6rem">{_e(feat_label)}</td>'
|
|
|
|
| 36 |
from html import escape as _e
|
| 37 |
from typing import Optional
|
| 38 |
|
| 39 |
+
from picarones.report.render_helpers import color_traffic_light
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
|
| 42 |
_FEATURE_LABEL_KEYS = {
|
|
|
|
| 65 |
mx = float(aggregated.get("complexity_max") or 0.0)
|
| 66 |
sd = float(aggregated.get("complexity_stdev") or 0.0)
|
| 67 |
n_docs = int(aggregated.get("n_docs") or 0)
|
| 68 |
+
color_mean = color_traffic_light(mean, low_is_good=True)
|
| 69 |
return (
|
| 70 |
f'<div style="font-weight:600;margin:.4rem 0 .3rem 0">'
|
| 71 |
f'{_e(h_complex)}</div>'
|
|
|
|
| 116 |
"imgpred_feat_norm", "Contribution normalisée",
|
| 117 |
)
|
| 118 |
score = float(homogeneity.get("score") or 0.0)
|
| 119 |
+
color = color_traffic_light(score, low_is_good=True)
|
| 120 |
parts = [
|
| 121 |
f'<div style="font-weight:600;margin:.4rem 0 .3rem 0">'
|
| 122 |
f'{_e(h_homo)} : '
|
|
|
|
| 143 |
feat_mean = float(slot.get("mean") or 0.0)
|
| 144 |
feat_stdev = float(slot.get("stdev") or 0.0)
|
| 145 |
feat_norm = float(slot.get("normalised") or 0.0)
|
| 146 |
+
norm_color = color_traffic_light(feat_norm, low_is_good=True)
|
| 147 |
parts.append(
|
| 148 |
f'<tr>'
|
| 149 |
f'<td style="padding:.4rem .6rem">{_e(feat_label)}</td>'
|
picarones/report/incremental_comparison_render.py
CHANGED
|
@@ -41,28 +41,26 @@ from __future__ import annotations
|
|
| 41 |
from html import escape as _e
|
| 42 |
from typing import Optional
|
| 43 |
|
|
|
|
| 44 |
|
| 45 |
-
|
|
|
|
| 46 |
score: float, low: float, high: float, higher_is_better: bool,
|
| 47 |
) -> str:
|
| 48 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
if high == low:
|
| 50 |
-
return
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
r = int(167 + (235 - 167) * t)
|
| 58 |
-
g = int(240 + (180 - 240) * t)
|
| 59 |
-
b = int(167 + (60 - 167) * t)
|
| 60 |
-
else:
|
| 61 |
-
t = (rel - 0.5) / 0.5
|
| 62 |
-
r = int(235 + (220 - 235) * t)
|
| 63 |
-
g = int(180 + (50 - 180) * t)
|
| 64 |
-
b = int(60 + (50 - 60) * t)
|
| 65 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 66 |
|
| 67 |
|
| 68 |
def _format_score(value: Optional[float]) -> str:
|
|
@@ -160,7 +158,7 @@ def build_incremental_comparison_html(
|
|
| 160 |
rank = d.get("mean_rank")
|
| 161 |
n_obs = int(d.get("n_observations") or 0)
|
| 162 |
if isinstance(mean, (int, float)):
|
| 163 |
-
color =
|
| 164 |
float(mean), low, high, higher_is_better,
|
| 165 |
)
|
| 166 |
mean_cell = (
|
|
|
|
| 41 |
from html import escape as _e
|
| 42 |
from typing import Optional
|
| 43 |
|
| 44 |
+
from picarones.report.render_helpers import color_traffic_light
|
| 45 |
|
| 46 |
+
|
| 47 |
+
def _bg_for_relative_score(
|
| 48 |
score: float, low: float, high: float, higher_is_better: bool,
|
| 49 |
) -> str:
|
| 50 |
+
"""Mappe ``score`` sur une plage [low, high] et retourne une cellule
|
| 51 |
+
colorée traffic-light.
|
| 52 |
+
|
| 53 |
+
Si ``higher_is_better=True``, ``score=high`` est vert ; sinon
|
| 54 |
+
``score=low`` est vert.
|
| 55 |
+
"""
|
| 56 |
if high == low:
|
| 57 |
+
return color_traffic_light(1.0) # neutre vert clair
|
| 58 |
+
return color_traffic_light(
|
| 59 |
+
score,
|
| 60 |
+
low_is_good=not higher_is_better,
|
| 61 |
+
scale_min=low,
|
| 62 |
+
scale_max=high,
|
| 63 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
|
| 66 |
def _format_score(value: Optional[float]) -> str:
|
|
|
|
| 158 |
rank = d.get("mean_rank")
|
| 159 |
n_obs = int(d.get("n_observations") or 0)
|
| 160 |
if isinstance(mean, (int, float)):
|
| 161 |
+
color = _bg_for_relative_score(
|
| 162 |
float(mean), low, high, higher_is_better,
|
| 163 |
)
|
| 164 |
mean_cell = (
|
picarones/report/inter_engine_render.py
CHANGED
|
@@ -21,20 +21,10 @@ from __future__ import annotations
|
|
| 21 |
from html import escape as _e
|
| 22 |
from typing import Optional
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
Retourne une couleur CSS hex. ``vmax = 0`` → blanc.
|
| 29 |
-
"""
|
| 30 |
-
if vmax <= 0:
|
| 31 |
-
return "#ffffff"
|
| 32 |
-
ratio = max(0.0, min(1.0, value / vmax))
|
| 33 |
-
# Blanc (255,255,255) vers rouge soutenu (200, 60, 60)
|
| 34 |
-
r = int(255 - (255 - 200) * ratio)
|
| 35 |
-
g = int(255 - (255 - 60) * ratio)
|
| 36 |
-
b = int(255 - (255 - 60) * ratio)
|
| 37 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 38 |
|
| 39 |
|
| 40 |
def build_divergence_matrix_html(
|
|
@@ -126,7 +116,10 @@ def build_divergence_matrix_html(
|
|
| 126 |
f'font-style:italic">{_e(diag_label)}</td>'
|
| 127 |
)
|
| 128 |
else:
|
| 129 |
-
bg =
|
|
|
|
|
|
|
|
|
|
| 130 |
# Texte sombre toujours lisible (pas de seuil fort sur le rouge clair).
|
| 131 |
parts.append(
|
| 132 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
|
|
|
| 21 |
from html import escape as _e
|
| 22 |
from typing import Optional
|
| 23 |
|
| 24 |
+
from picarones.report.render_helpers import (
|
| 25 |
+
GRADIENT_TARGET_RED,
|
| 26 |
+
color_single_gradient,
|
| 27 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
def build_divergence_matrix_html(
|
|
|
|
| 116 |
f'font-style:italic">{_e(diag_label)}</td>'
|
| 117 |
)
|
| 118 |
else:
|
| 119 |
+
bg = (
|
| 120 |
+
color_single_gradient(v, end_rgb=GRADIENT_TARGET_RED, max_value=vmax)
|
| 121 |
+
if vmax > 0 else "#ffffff"
|
| 122 |
+
)
|
| 123 |
# Texte sombre toujours lisible (pas de seuil fort sur le rouge clair).
|
| 124 |
parts.append(
|
| 125 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
picarones/report/levers_render.py
CHANGED
|
@@ -25,9 +25,12 @@ recommandation : la phrase est purement descriptive.
|
|
| 25 |
|
| 26 |
from __future__ import annotations
|
| 27 |
|
|
|
|
| 28 |
from html import escape as _e
|
| 29 |
from typing import Iterable, Optional
|
| 30 |
|
|
|
|
|
|
|
| 31 |
|
| 32 |
def _lever_label(lever_type: str, labels: dict[str, str]) -> str:
|
| 33 |
return labels.get(f"levers_label_{lever_type}", lever_type)
|
|
@@ -223,7 +226,12 @@ def build_levers_section_html(
|
|
| 223 |
continue
|
| 224 |
try:
|
| 225 |
sentence = formatter(payload, labels)
|
| 226 |
-
except Exception:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
continue
|
| 228 |
if not sentence:
|
| 229 |
continue
|
|
|
|
| 25 |
|
| 26 |
from __future__ import annotations
|
| 27 |
|
| 28 |
+
import logging
|
| 29 |
from html import escape as _e
|
| 30 |
from typing import Iterable, Optional
|
| 31 |
|
| 32 |
+
logger = logging.getLogger(__name__)
|
| 33 |
+
|
| 34 |
|
| 35 |
def _lever_label(lever_type: str, labels: dict[str, str]) -> str:
|
| 36 |
return labels.get(f"levers_label_{lever_type}", lever_type)
|
|
|
|
| 226 |
continue
|
| 227 |
try:
|
| 228 |
sentence = formatter(payload, labels)
|
| 229 |
+
except Exception as exc: # noqa: BLE001 — un formatter cassé ne doit pas casser la section
|
| 230 |
+
logger.warning(
|
| 231 |
+
"[levers_render] formatter %r a échoué sur payload=%r : %s — "
|
| 232 |
+
"ce levier sera omis du rapport",
|
| 233 |
+
lv_type, payload, exc,
|
| 234 |
+
)
|
| 235 |
continue
|
| 236 |
if not sentence:
|
| 237 |
continue
|
picarones/report/lexical_modernization_render.py
CHANGED
|
@@ -19,15 +19,10 @@ from html import escape as _e
|
|
| 19 |
from typing import Optional
|
| 20 |
|
| 21 |
from picarones.measurements.lexical_modernization import top_modernized_tokens
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
f = max(0.0, min(1.0, rate))
|
| 27 |
-
r = int(255 + (194 - 255) * f)
|
| 28 |
-
g = int(255 + (65 - 255) * f)
|
| 29 |
-
b = int(255 + (12 - 255) * f)
|
| 30 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 31 |
|
| 32 |
|
| 33 |
def _format_variants(variants: dict, max_show: int = 3) -> str:
|
|
@@ -96,7 +91,7 @@ def build_lexical_modernization_html(
|
|
| 96 |
rate = slot.get("rate_modernized", 0.0)
|
| 97 |
n_total = slot.get("n_total", 0)
|
| 98 |
variants_str = _format_variants(slot.get("variants") or {})
|
| 99 |
-
rate_color =
|
| 100 |
parts.append(
|
| 101 |
f'<tr>'
|
| 102 |
f'<td style="padding:.3rem .5rem;font-family:monospace">'
|
|
|
|
| 19 |
from typing import Optional
|
| 20 |
|
| 21 |
from picarones.measurements.lexical_modernization import top_modernized_tokens
|
| 22 |
+
from picarones.report.render_helpers import (
|
| 23 |
+
GRADIENT_TARGET_ORANGE,
|
| 24 |
+
color_single_gradient,
|
| 25 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
def _format_variants(variants: dict, max_show: int = 3) -> str:
|
|
|
|
| 91 |
rate = slot.get("rate_modernized", 0.0)
|
| 92 |
n_total = slot.get("n_total", 0)
|
| 93 |
variants_str = _format_variants(slot.get("variants") or {})
|
| 94 |
+
rate_color = color_single_gradient(rate, end_rgb=GRADIENT_TARGET_ORANGE)
|
| 95 |
parts.append(
|
| 96 |
f'<tr>'
|
| 97 |
f'<td style="padding:.3rem .5rem;font-family:monospace">'
|
picarones/report/longitudinal_render.py
CHANGED
|
@@ -33,32 +33,23 @@ from __future__ import annotations
|
|
| 33 |
from html import escape as _e
|
| 34 |
from typing import Optional
|
| 35 |
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
| 40 |
if abs(delta_pct) < 1.0:
|
| 41 |
return "#a7f0a7"
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
b = int(167 + (60 - 167) * t)
|
| 50 |
-
else:
|
| 51 |
-
t = (f - 0.5) / 0.5
|
| 52 |
-
r = int(235 + (220 - 235) * t)
|
| 53 |
-
g = int(180 + (50 - 180) * t)
|
| 54 |
-
b = int(60 + (50 - 60) * t)
|
| 55 |
-
else:
|
| 56 |
-
# vert → bleu (amélioration)
|
| 57 |
-
f = -f
|
| 58 |
-
r = int(167 + (90 - 167) * f)
|
| 59 |
-
g = int(240 + (160 - 240) * f)
|
| 60 |
-
b = int(167 + (210 - 167) * f)
|
| 61 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 62 |
|
| 63 |
|
| 64 |
def build_longitudinal_html(
|
|
@@ -126,7 +117,7 @@ def build_longitudinal_html(
|
|
| 126 |
first_cer = float(entry.get("first_cer") or 0.0)
|
| 127 |
last_cer = float(entry.get("last_cer") or 0.0)
|
| 128 |
delta_pct = float(entry.get("absolute_delta_pct") or 0.0)
|
| 129 |
-
delta_color =
|
| 130 |
trend = entry.get("trend") or {}
|
| 131 |
slope = trend.get("slope")
|
| 132 |
r2 = trend.get("r_squared")
|
|
|
|
| 33 |
from html import escape as _e
|
| 34 |
from typing import Optional
|
| 35 |
|
| 36 |
+
from picarones.report.render_helpers import color_diverging
|
| 37 |
|
| 38 |
+
|
| 39 |
+
def _bg_for_cer_delta(delta_pct: float) -> str:
|
| 40 |
+
"""Cellule colorée pour un delta de CER en points de pourcentage :
|
| 41 |
+
vert si delta ≈ 0, orange/rouge en régression, bleu en amélioration.
|
| 42 |
+
Saturation à ±5 points.
|
| 43 |
+
"""
|
| 44 |
if abs(delta_pct) < 1.0:
|
| 45 |
return "#a7f0a7"
|
| 46 |
+
return color_diverging(
|
| 47 |
+
delta_pct,
|
| 48 |
+
max_abs=5.0,
|
| 49 |
+
neutral_rgb=(167, 240, 167),
|
| 50 |
+
positive_rgb=(220, 50, 50),
|
| 51 |
+
negative_rgb=(90, 160, 210),
|
| 52 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
|
| 55 |
def build_longitudinal_html(
|
|
|
|
| 117 |
first_cer = float(entry.get("first_cer") or 0.0)
|
| 118 |
last_cer = float(entry.get("last_cer") or 0.0)
|
| 119 |
delta_pct = float(entry.get("absolute_delta_pct") or 0.0)
|
| 120 |
+
delta_color = _bg_for_cer_delta(delta_pct)
|
| 121 |
trend = entry.get("trend") or {}
|
| 122 |
slope = trend.get("slope")
|
| 123 |
r2 = trend.get("r_squared")
|
picarones/report/marginal_cost_render.py
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Rendu HTML du coût marginal inter-moteurs (Sprint 91, A.II.6).
|
| 2 |
+
|
| 3 |
+
Tableau récapitulatif des paires (A → B) avec le coût additionnel
|
| 4 |
+
par erreur évitée. Adaptive : retourne ``""`` si moins de 2 moteurs
|
| 5 |
+
ou si aucune paire n'a de données coût/erreur exploitables.
|
| 6 |
+
|
| 7 |
+
Permet à un archiviste de voir : *« passer de Tesseract à GPT-4o
|
| 8 |
+
coûte X € de plus par erreur évitée — est-ce justifié pour mon
|
| 9 |
+
budget ? »*
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from __future__ import annotations
|
| 13 |
+
|
| 14 |
+
from html import escape as _e
|
| 15 |
+
from typing import Optional
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def build_marginal_cost_html(
|
| 19 |
+
matrix: Optional[list[dict]],
|
| 20 |
+
labels: Optional[dict[str, str]] = None,
|
| 21 |
+
) -> str:
|
| 22 |
+
"""Construit le tableau du coût marginal inter-moteurs.
|
| 23 |
+
|
| 24 |
+
Parameters
|
| 25 |
+
----------
|
| 26 |
+
matrix:
|
| 27 |
+
Sortie de
|
| 28 |
+
:func:`picarones.report.report_data.extra_metrics.compute_marginal_cost_section`.
|
| 29 |
+
Liste de dicts triée par coût marginal croissant. Si ``None``
|
| 30 |
+
ou vide, retourne ``""``.
|
| 31 |
+
labels:
|
| 32 |
+
Dict i18n optionnel.
|
| 33 |
+
"""
|
| 34 |
+
if not matrix:
|
| 35 |
+
return ""
|
| 36 |
+
labels = labels or {}
|
| 37 |
+
title = labels.get(
|
| 38 |
+
"marginal_cost_title",
|
| 39 |
+
"Coût marginal inter-moteurs (€ par erreur évitée)",
|
| 40 |
+
)
|
| 41 |
+
note = labels.get(
|
| 42 |
+
"marginal_cost_note",
|
| 43 |
+
"Pour chaque paire de moteurs (A → B), coût additionnel par "
|
| 44 |
+
"erreur évitée en passant de A à B. Valeur basse = changement "
|
| 45 |
+
"rentable. ‘Dominé’ = B est moins cher ET plus précis. Estimation "
|
| 46 |
+
"des erreurs basée sur ``cer × 1000`` (proxy par 1000 pages).",
|
| 47 |
+
)
|
| 48 |
+
h_from = labels.get("marginal_cost_from", "Depuis")
|
| 49 |
+
h_to = labels.get("marginal_cost_to", "Vers")
|
| 50 |
+
h_avoided = labels.get("marginal_cost_avoided", "Erreurs évitées")
|
| 51 |
+
h_delta = labels.get("marginal_cost_delta", "Coût Δ (€)")
|
| 52 |
+
h_per_err = labels.get("marginal_cost_per_err", "€ / erreur évitée")
|
| 53 |
+
h_dominated = labels.get("marginal_cost_dominated", "Dominé ?")
|
| 54 |
+
|
| 55 |
+
parts = [
|
| 56 |
+
'<section class="marginal-cost-section" style="margin:1rem 0">',
|
| 57 |
+
f'<h3 style="margin:0 0 .3rem 0">{_e(title)}</h3>',
|
| 58 |
+
f'<div style="font-size:.85rem;opacity:.75;margin-bottom:.5rem">'
|
| 59 |
+
f'{_e(note)}</div>',
|
| 60 |
+
'<table style="border-collapse:collapse;width:100%;'
|
| 61 |
+
'font-size:.9rem">',
|
| 62 |
+
'<thead><tr>',
|
| 63 |
+
]
|
| 64 |
+
for h in (h_from, h_to, h_avoided, h_delta, h_per_err, h_dominated):
|
| 65 |
+
parts.append(
|
| 66 |
+
f'<th scope="col" style="padding:.4rem .6rem;text-align:left;'
|
| 67 |
+
f'border-bottom:1px solid #ccc;font-weight:600">{_e(h)}</th>'
|
| 68 |
+
)
|
| 69 |
+
parts.append('</tr></thead><tbody>')
|
| 70 |
+
|
| 71 |
+
for row in matrix:
|
| 72 |
+
engine_a = row.get("engine_a") or row.get("from") or "?"
|
| 73 |
+
engine_b = row.get("engine_b") or row.get("to") or "?"
|
| 74 |
+
n_avoided = row.get("n_errors_avoided")
|
| 75 |
+
cost_delta = row.get("cost_delta")
|
| 76 |
+
cost_per_err = row.get("cost_per_avoided_error")
|
| 77 |
+
dominated = row.get("dominated", False)
|
| 78 |
+
|
| 79 |
+
n_avoided_cell = (
|
| 80 |
+
f"{int(n_avoided)}" if isinstance(n_avoided, (int, float)) else "—"
|
| 81 |
+
)
|
| 82 |
+
cost_delta_cell = (
|
| 83 |
+
f"{cost_delta:+.2f}" if isinstance(cost_delta, (int, float)) else "—"
|
| 84 |
+
)
|
| 85 |
+
if isinstance(cost_per_err, (int, float)):
|
| 86 |
+
cost_per_err_cell = f"{cost_per_err:.2f}"
|
| 87 |
+
else:
|
| 88 |
+
cost_per_err_cell = "—"
|
| 89 |
+
dominated_cell = (
|
| 90 |
+
'<span style="color:#16a34a;font-weight:600">✓ B dominé par A</span>'
|
| 91 |
+
if dominated else "—"
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
parts.append(
|
| 95 |
+
f'<tr>'
|
| 96 |
+
f'<td style="padding:.4rem .6rem">{_e(str(engine_a))}</td>'
|
| 97 |
+
f'<td style="padding:.4rem .6rem">{_e(str(engine_b))}</td>'
|
| 98 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 99 |
+
f'font-family:monospace">{n_avoided_cell}</td>'
|
| 100 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 101 |
+
f'font-family:monospace">{cost_delta_cell}</td>'
|
| 102 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 103 |
+
f'font-family:monospace;font-weight:600">{cost_per_err_cell}</td>'
|
| 104 |
+
f'<td style="padding:.4rem .6rem">{dominated_cell}</td>'
|
| 105 |
+
f'</tr>'
|
| 106 |
+
)
|
| 107 |
+
parts.append('</tbody></table></section>')
|
| 108 |
+
return "".join(parts)
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
__all__ = ["build_marginal_cost_html"]
|
picarones/report/multirun_stability_render.py
CHANGED
|
@@ -43,21 +43,7 @@ from __future__ import annotations
|
|
| 43 |
from html import escape as _e
|
| 44 |
from typing import Optional
|
| 45 |
|
| 46 |
-
|
| 47 |
-
def _color_for_cv(cv: float) -> str:
|
| 48 |
-
"""Vert (≈0) → orange (10 %) → rouge (≥ 25 %)."""
|
| 49 |
-
f = max(0.0, min(1.0, cv / 0.25))
|
| 50 |
-
if f < 0.5:
|
| 51 |
-
t = f / 0.5
|
| 52 |
-
r = int(167 + (235 - 167) * t)
|
| 53 |
-
g = int(240 + (180 - 240) * t)
|
| 54 |
-
b = int(167 + (60 - 167) * t)
|
| 55 |
-
else:
|
| 56 |
-
t = (f - 0.5) / 0.5
|
| 57 |
-
r = int(235 + (220 - 235) * t)
|
| 58 |
-
g = int(180 + (50 - 180) * t)
|
| 59 |
-
b = int(60 + (50 - 60) * t)
|
| 60 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 61 |
|
| 62 |
|
| 63 |
def build_multirun_stability_html(
|
|
@@ -128,7 +114,7 @@ def build_multirun_stability_html(
|
|
| 128 |
else:
|
| 129 |
cer_str = "—"
|
| 130 |
if isinstance(cer_cv, (int, float)):
|
| 131 |
-
cv_color =
|
| 132 |
cv_cell = (
|
| 133 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 134 |
f'background:{cv_color};font-family:monospace;'
|
|
|
|
| 43 |
from html import escape as _e
|
| 44 |
from typing import Optional
|
| 45 |
|
| 46 |
+
from picarones.report.render_helpers import color_traffic_light
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
|
| 49 |
def build_multirun_stability_html(
|
|
|
|
| 114 |
else:
|
| 115 |
cer_str = "—"
|
| 116 |
if isinstance(cer_cv, (int, float)):
|
| 117 |
+
cv_color = color_traffic_light(float(cer_cv), low_is_good=True, scale_max=0.25)
|
| 118 |
cv_cell = (
|
| 119 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 120 |
f'background:{cv_color};font-family:monospace;'
|
picarones/report/ner_render.py
CHANGED
|
@@ -23,26 +23,7 @@ from __future__ import annotations
|
|
| 23 |
from html import escape as _e
|
| 24 |
from typing import Optional
|
| 25 |
|
| 26 |
-
|
| 27 |
-
def _color_for_f1(f1: float) -> str:
|
| 28 |
-
"""Gradient rouge → jaune → vert proportionnel à ``f1`` ∈ [0, 1].
|
| 29 |
-
|
| 30 |
-
F1 = 0 → rouge clair, F1 = 0,5 → jaune pâle, F1 = 1 → vert clair.
|
| 31 |
-
"""
|
| 32 |
-
f = max(0.0, min(1.0, f1))
|
| 33 |
-
# Interpolation linéaire 2-segments :
|
| 34 |
-
# 0 → (220, 100, 100) (rouge), 0.5 → (240, 220, 130), 1 → (130, 200, 130) (vert)
|
| 35 |
-
if f <= 0.5:
|
| 36 |
-
ratio = f / 0.5
|
| 37 |
-
r = int(220 + (240 - 220) * ratio)
|
| 38 |
-
g = int(100 + (220 - 100) * ratio)
|
| 39 |
-
b = int(100 + (130 - 100) * ratio)
|
| 40 |
-
else:
|
| 41 |
-
ratio = (f - 0.5) / 0.5
|
| 42 |
-
r = int(240 + (130 - 240) * ratio)
|
| 43 |
-
g = int(220 + (200 - 220) * ratio)
|
| 44 |
-
b = int(130 + (130 - 130) * ratio)
|
| 45 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 46 |
|
| 47 |
|
| 48 |
def _engines_with_ner(engines_summary: list[dict]) -> list[dict]:
|
|
@@ -110,7 +91,7 @@ def build_ner_summary_html(
|
|
| 110 |
doc_count = int(agg.get("doc_count") or 0)
|
| 111 |
hallucinated = int(agg.get("hallucinated_total") or 0)
|
| 112 |
missed = int(agg.get("missed_total") or 0)
|
| 113 |
-
bg =
|
| 114 |
parts.append("<tr>")
|
| 115 |
parts.append(
|
| 116 |
f'<td style="padding:.3rem .5rem;font-weight:600">'
|
|
@@ -222,7 +203,7 @@ def build_ner_per_category_html(
|
|
| 222 |
else:
|
| 223 |
f1 = float(stats.get("f1") or 0.0)
|
| 224 |
support = int(stats.get("support", 0))
|
| 225 |
-
bg =
|
| 226 |
parts.append(
|
| 227 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
| 228 |
f'background:{bg};color:#222;'
|
|
|
|
| 23 |
from html import escape as _e
|
| 24 |
from typing import Optional
|
| 25 |
|
| 26 |
+
from picarones.report.render_helpers import color_traffic_light
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
|
| 29 |
def _engines_with_ner(engines_summary: list[dict]) -> list[dict]:
|
|
|
|
| 91 |
doc_count = int(agg.get("doc_count") or 0)
|
| 92 |
hallucinated = int(agg.get("hallucinated_total") or 0)
|
| 93 |
missed = int(agg.get("missed_total") or 0)
|
| 94 |
+
bg = color_traffic_light(f1)
|
| 95 |
parts.append("<tr>")
|
| 96 |
parts.append(
|
| 97 |
f'<td style="padding:.3rem .5rem;font-weight:600">'
|
|
|
|
| 203 |
else:
|
| 204 |
f1 = float(stats.get("f1") or 0.0)
|
| 205 |
support = int(stats.get("support", 0))
|
| 206 |
+
bg = color_traffic_light(f1)
|
| 207 |
parts.append(
|
| 208 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
| 209 |
f'background:{bg};color:#222;'
|
picarones/report/numerical_sequences_render.py
CHANGED
|
@@ -24,22 +24,7 @@ from html import escape as _e
|
|
| 24 |
from typing import Optional
|
| 25 |
|
| 26 |
from picarones.measurements.numerical_sequences import CATEGORIES
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
def _color_for_score(score: float) -> str:
|
| 30 |
-
"""Gradient rouge → jaune → vert."""
|
| 31 |
-
f = max(0.0, min(1.0, score))
|
| 32 |
-
if f < 0.5:
|
| 33 |
-
t = f / 0.5
|
| 34 |
-
r = 235
|
| 35 |
-
g = int(70 + (200 - 70) * t)
|
| 36 |
-
b = 70
|
| 37 |
-
else:
|
| 38 |
-
t = (f - 0.5) / 0.5
|
| 39 |
-
r = int(235 + (60 - 235) * t)
|
| 40 |
-
g = int(200 + (160 - 200) * t)
|
| 41 |
-
b = int(70 + (90 - 70) * t)
|
| 42 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 43 |
|
| 44 |
|
| 45 |
def _category_columns_with_signal(rows: list[dict]) -> list[str]:
|
|
@@ -125,7 +110,7 @@ def build_numerical_sequences_html(
|
|
| 125 |
global_strict = float(agg.get("global_strict_score") or 0.0)
|
| 126 |
global_value = float(agg.get("global_value_score") or 0.0)
|
| 127 |
n_total = int(agg.get("n_total") or 0)
|
| 128 |
-
global_color =
|
| 129 |
parts.append(
|
| 130 |
f'<tr>'
|
| 131 |
f'<td style="padding:.4rem .6rem">{_e(str(name))}</td>'
|
|
@@ -148,7 +133,7 @@ def build_numerical_sequences_html(
|
|
| 148 |
continue
|
| 149 |
strict = float(cat_data.get("strict_score") or 0.0)
|
| 150 |
value = float(cat_data.get("value_score") or 0.0)
|
| 151 |
-
color =
|
| 152 |
parts.append(
|
| 153 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 154 |
f'background:{color};font-family:monospace">'
|
|
|
|
| 24 |
from typing import Optional
|
| 25 |
|
| 26 |
from picarones.measurements.numerical_sequences import CATEGORIES
|
| 27 |
+
from picarones.report.render_helpers import color_traffic_light
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
def _category_columns_with_signal(rows: list[dict]) -> list[str]:
|
|
|
|
| 110 |
global_strict = float(agg.get("global_strict_score") or 0.0)
|
| 111 |
global_value = float(agg.get("global_value_score") or 0.0)
|
| 112 |
n_total = int(agg.get("n_total") or 0)
|
| 113 |
+
global_color = color_traffic_light(global_strict)
|
| 114 |
parts.append(
|
| 115 |
f'<tr>'
|
| 116 |
f'<td style="padding:.4rem .6rem">{_e(str(name))}</td>'
|
|
|
|
| 133 |
continue
|
| 134 |
strict = float(cat_data.get("strict_score") or 0.0)
|
| 135 |
value = float(cat_data.get("value_score") or 0.0)
|
| 136 |
+
color = color_traffic_light(strict)
|
| 137 |
parts.append(
|
| 138 |
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 139 |
f'background:{color};font-family:monospace">'
|
picarones/report/philological_render.py
CHANGED
|
@@ -36,34 +36,14 @@ from __future__ import annotations
|
|
| 36 |
from html import escape as _e
|
| 37 |
from typing import Optional
|
| 38 |
|
|
|
|
|
|
|
| 39 |
|
| 40 |
# ──────────────────────────────────────────────────────────────────────────
|
| 41 |
# Helpers de coloration
|
| 42 |
# ──────────────────────────────────────────────────────────────────────────
|
| 43 |
|
| 44 |
|
| 45 |
-
def _color_for_score(score: float) -> str:
|
| 46 |
-
"""Gradient rouge → jaune → vert proportionnel à ``score`` ∈ [0, 1].
|
| 47 |
-
|
| 48 |
-
Identique à ``ner_render._color_for_f1``. Les scores
|
| 49 |
-
philologiques (preservation, coverage, accuracy) suivent la même
|
| 50 |
-
sémantique « plus c'est haut, mieux c'est » donc le gradient
|
| 51 |
-
est valide.
|
| 52 |
-
"""
|
| 53 |
-
f = max(0.0, min(1.0, score))
|
| 54 |
-
if f <= 0.5:
|
| 55 |
-
ratio = f / 0.5
|
| 56 |
-
r = int(220 + (240 - 220) * ratio)
|
| 57 |
-
g = int(100 + (220 - 100) * ratio)
|
| 58 |
-
b = int(100 + (130 - 100) * ratio)
|
| 59 |
-
else:
|
| 60 |
-
ratio = (f - 0.5) / 0.5
|
| 61 |
-
r = int(240 + (130 - 240) * ratio)
|
| 62 |
-
g = int(220 + (200 - 220) * ratio)
|
| 63 |
-
b = int(130 + (130 - 130) * ratio)
|
| 64 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 65 |
-
|
| 66 |
-
|
| 67 |
def _engines_with_module(
|
| 68 |
engines_summary: list[dict], module: str,
|
| 69 |
) -> list[dict]:
|
|
@@ -83,7 +63,7 @@ def _score_cell(score: Optional[float], extra: str = "") -> str:
|
|
| 83 |
'<td style="padding:.3rem .5rem;text-align:center;'
|
| 84 |
'background:#f0f0f0;color:#999">—</td>'
|
| 85 |
)
|
| 86 |
-
color =
|
| 87 |
text = f"{score * 100:.1f}%"
|
| 88 |
if extra:
|
| 89 |
text += f" <span style=\"opacity:.6;font-size:.85em\">({_e(extra)})</span>"
|
|
@@ -539,8 +519,8 @@ def build_roman_numerals_section(
|
|
| 539 |
# la sémantique « plus c'est haut, plus l'OCR a
|
| 540 |
# adopté ce statut ».
|
| 541 |
color = (
|
| 542 |
-
|
| 543 |
-
else
|
| 544 |
)
|
| 545 |
parts.append(
|
| 546 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
|
|
|
| 36 |
from html import escape as _e
|
| 37 |
from typing import Optional
|
| 38 |
|
| 39 |
+
from picarones.report.render_helpers import color_traffic_light
|
| 40 |
+
|
| 41 |
|
| 42 |
# ──────────────────────────────────────────────────────────────────────────
|
| 43 |
# Helpers de coloration
|
| 44 |
# ──────────────────────────────────────────────────────────────────────────
|
| 45 |
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
def _engines_with_module(
|
| 48 |
engines_summary: list[dict], module: str,
|
| 49 |
) -> list[dict]:
|
|
|
|
| 63 |
'<td style="padding:.3rem .5rem;text-align:center;'
|
| 64 |
'background:#f0f0f0;color:#999">—</td>'
|
| 65 |
)
|
| 66 |
+
color = color_traffic_light(score)
|
| 67 |
text = f"{score * 100:.1f}%"
|
| 68 |
if extra:
|
| 69 |
text += f" <span style=\"opacity:.6;font-size:.85em\">({_e(extra)})</span>"
|
|
|
|
| 519 |
# la sémantique « plus c'est haut, plus l'OCR a
|
| 520 |
# adopté ce statut ».
|
| 521 |
color = (
|
| 522 |
+
color_traffic_light(1.0 - ratio) if status == "lost"
|
| 523 |
+
else color_traffic_light(ratio)
|
| 524 |
)
|
| 525 |
parts.append(
|
| 526 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
picarones/report/pipeline_render.py
CHANGED
|
@@ -50,6 +50,7 @@ from typing import Optional
|
|
| 50 |
from picarones.core.modules import ArtifactType
|
| 51 |
from picarones.measurements.pipeline_benchmark import PipelineBenchmarkResult
|
| 52 |
from picarones.measurements.pipeline_comparison import PipelineComparisonResult
|
|
|
|
| 53 |
|
| 54 |
|
| 55 |
# ──────────────────────────────────────────────────────────────────────────
|
|
@@ -57,22 +58,6 @@ from picarones.measurements.pipeline_comparison import PipelineComparisonResult
|
|
| 57 |
# ──────────────────────────────────────────────────────────────────────────
|
| 58 |
|
| 59 |
|
| 60 |
-
def _color_for_success_rate(rate: float) -> str:
|
| 61 |
-
"""Gradient rouge → jaune → vert pour le taux de succès."""
|
| 62 |
-
f = max(0.0, min(1.0, rate))
|
| 63 |
-
if f <= 0.5:
|
| 64 |
-
ratio = f / 0.5
|
| 65 |
-
r = int(220 + (240 - 220) * ratio)
|
| 66 |
-
g = int(100 + (220 - 100) * ratio)
|
| 67 |
-
b = int(100 + (130 - 100) * ratio)
|
| 68 |
-
else:
|
| 69 |
-
ratio = (f - 0.5) / 0.5
|
| 70 |
-
r = int(240 + (130 - 240) * ratio)
|
| 71 |
-
g = int(220 + (200 - 220) * ratio)
|
| 72 |
-
b = int(130 + (130 - 130) * ratio)
|
| 73 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 74 |
-
|
| 75 |
-
|
| 76 |
def _format_duration(seconds: float) -> str:
|
| 77 |
"""Formate une durée en ms si < 1s, en s sinon."""
|
| 78 |
if seconds < 1.0:
|
|
@@ -109,7 +94,7 @@ def build_pipeline_summary_html(
|
|
| 109 |
failed = bench.n_pipelines_failed
|
| 110 |
total = bench.n_docs
|
| 111 |
rate = success / total if total > 0 else 0.0
|
| 112 |
-
color =
|
| 113 |
|
| 114 |
parts = [
|
| 115 |
'<div class="pipeline-summary" '
|
|
@@ -195,7 +180,7 @@ def build_pipeline_steps_table_html(
|
|
| 195 |
|
| 196 |
for agg in bench.per_step_aggregates:
|
| 197 |
rate = agg.success_rate
|
| 198 |
-
rate_color =
|
| 199 |
# Métriques aux jonctions : pour chaque type d'artefact,
|
| 200 |
# liste des métriques mean
|
| 201 |
metrics_cells: list[str] = []
|
|
@@ -381,12 +366,17 @@ class RankingSpec:
|
|
| 381 |
return f"{self.artifact_type.value}.{self.metric_name}"
|
| 382 |
|
| 383 |
|
| 384 |
-
def
|
| 385 |
-
"""Gradient vert (
|
|
|
|
|
|
|
|
|
|
|
|
|
| 386 |
if total <= 1:
|
| 387 |
-
return
|
| 388 |
-
|
| 389 |
-
|
|
|
|
| 390 |
|
| 391 |
|
| 392 |
def build_pipeline_ranking_table_html(
|
|
@@ -444,7 +434,7 @@ def build_pipeline_ranking_table_html(
|
|
| 444 |
rank += 1
|
| 445 |
rank_str = str(rank)
|
| 446 |
value_str = f"{value:.4f}"
|
| 447 |
-
rank_color =
|
| 448 |
parts.append(
|
| 449 |
f'<tr>'
|
| 450 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
|
|
|
| 50 |
from picarones.core.modules import ArtifactType
|
| 51 |
from picarones.measurements.pipeline_benchmark import PipelineBenchmarkResult
|
| 52 |
from picarones.measurements.pipeline_comparison import PipelineComparisonResult
|
| 53 |
+
from picarones.report.render_helpers import color_traffic_light
|
| 54 |
|
| 55 |
|
| 56 |
# ──────────────────────────────────────────────────────────────────────────
|
|
|
|
| 58 |
# ──────────────────────────────────────────────────────────────────────────
|
| 59 |
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
def _format_duration(seconds: float) -> str:
|
| 62 |
"""Formate une durée en ms si < 1s, en s sinon."""
|
| 63 |
if seconds < 1.0:
|
|
|
|
| 94 |
failed = bench.n_pipelines_failed
|
| 95 |
total = bench.n_docs
|
| 96 |
rate = success / total if total > 0 else 0.0
|
| 97 |
+
color = color_traffic_light(rate)
|
| 98 |
|
| 99 |
parts = [
|
| 100 |
'<div class="pipeline-summary" '
|
|
|
|
| 180 |
|
| 181 |
for agg in bench.per_step_aggregates:
|
| 182 |
rate = agg.success_rate
|
| 183 |
+
rate_color = color_traffic_light(rate)
|
| 184 |
# Métriques aux jonctions : pour chaque type d'artefact,
|
| 185 |
# liste des métriques mean
|
| 186 |
metrics_cells: list[str] = []
|
|
|
|
| 366 |
return f"{self.artifact_type.value}.{self.metric_name}"
|
| 367 |
|
| 368 |
|
| 369 |
+
def _bg_for_rank(rank: int, total: int) -> str:
|
| 370 |
+
"""Gradient vert (rang 1) → rouge (dernier rang).
|
| 371 |
+
|
| 372 |
+
Mapping : ``rank ∈ [1, total]`` → ``color_traffic_light`` avec
|
| 373 |
+
``low_is_good=True`` (rang bas = bon).
|
| 374 |
+
"""
|
| 375 |
if total <= 1:
|
| 376 |
+
return color_traffic_light(1.0)
|
| 377 |
+
return color_traffic_light(
|
| 378 |
+
float(rank), low_is_good=True, scale_min=1.0, scale_max=float(total),
|
| 379 |
+
)
|
| 380 |
|
| 381 |
|
| 382 |
def build_pipeline_ranking_table_html(
|
|
|
|
| 434 |
rank += 1
|
| 435 |
rank_str = str(rank)
|
| 436 |
value_str = f"{value:.4f}"
|
| 437 |
+
rank_color = _bg_for_rank(rank, n_with_value)
|
| 438 |
parts.append(
|
| 439 |
f'<tr>'
|
| 440 |
f'<td style="padding:.3rem .5rem;text-align:center;'
|
picarones/report/rare_token_recall_render.py
ADDED
|
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Rendu HTML du recall sur tokens rares (Sprint 71, A.I.1).
|
| 2 |
+
|
| 3 |
+
Petit tableau récapitulatif moteur × {n_rare_tokens, n_recalled,
|
| 4 |
+
recall, n_docs}. Adaptive : retourne ``""`` si aucune donnée.
|
| 5 |
+
|
| 6 |
+
Critique pour l'indexation prosopographique : un OCR qui rate
|
| 7 |
+
systématiquement les noms propres rares produit un corpus
|
| 8 |
+
inutilisable pour la recherche, même avec un CER global respectable.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
|
| 13 |
+
from html import escape as _e
|
| 14 |
+
from typing import Optional
|
| 15 |
+
|
| 16 |
+
from picarones.report.render_helpers import color_traffic_light
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def build_rare_token_recall_html(
|
| 20 |
+
per_engine: Optional[dict[str, dict]],
|
| 21 |
+
labels: Optional[dict[str, str]] = None,
|
| 22 |
+
) -> str:
|
| 23 |
+
"""Construit le tableau récapitulatif du recall sur tokens rares.
|
| 24 |
+
|
| 25 |
+
Parameters
|
| 26 |
+
----------
|
| 27 |
+
per_engine:
|
| 28 |
+
Sortie de
|
| 29 |
+
:func:`picarones.report.report_data.extra_metrics.compute_rare_token_recall_per_engine`.
|
| 30 |
+
Dict ``{engine_name: {n_rare_tokens, n_recalled, recall, n_docs, max_freq}}``.
|
| 31 |
+
Si ``None`` ou vide, retourne ``""``.
|
| 32 |
+
labels:
|
| 33 |
+
Dict i18n optionnel.
|
| 34 |
+
"""
|
| 35 |
+
if not per_engine:
|
| 36 |
+
return ""
|
| 37 |
+
labels = labels or {}
|
| 38 |
+
title = labels.get(
|
| 39 |
+
"rare_token_title", "Recall sur tokens rares (hapax + dis legomena)",
|
| 40 |
+
)
|
| 41 |
+
note = labels.get(
|
| 42 |
+
"rare_token_note",
|
| 43 |
+
"Pour chaque moteur, fraction des tokens rares (apparaissant ≤ 2 "
|
| 44 |
+
"fois dans la GT du corpus) effectivement transcrits. Critique "
|
| 45 |
+
"pour l'indexation prosopographique — un OCR qui rate les noms "
|
| 46 |
+
"propres rares rend le corpus inutilisable pour la recherche.",
|
| 47 |
+
)
|
| 48 |
+
h_engine = labels.get("rare_token_engine", "Moteur")
|
| 49 |
+
h_recall = labels.get("rare_token_recall", "Recall")
|
| 50 |
+
h_recalled = labels.get("rare_token_recalled", "Tokens recalled")
|
| 51 |
+
h_total = labels.get("rare_token_total", "Tokens rares (corpus)")
|
| 52 |
+
h_docs = labels.get("rare_token_docs", "Docs évalués")
|
| 53 |
+
|
| 54 |
+
rows = [
|
| 55 |
+
(engine, info)
|
| 56 |
+
for engine, info in per_engine.items()
|
| 57 |
+
if isinstance(info, dict)
|
| 58 |
+
]
|
| 59 |
+
if not rows:
|
| 60 |
+
return ""
|
| 61 |
+
|
| 62 |
+
parts = [
|
| 63 |
+
'<section class="rare-token-section" style="margin:1rem 0">',
|
| 64 |
+
f'<h3 style="margin:0 0 .3rem 0">{_e(title)}</h3>',
|
| 65 |
+
f'<div style="font-size:.85rem;opacity:.75;margin-bottom:.5rem">'
|
| 66 |
+
f'{_e(note)}</div>',
|
| 67 |
+
'<table style="border-collapse:collapse;width:100%;'
|
| 68 |
+
'font-size:.9rem">',
|
| 69 |
+
'<thead><tr>',
|
| 70 |
+
]
|
| 71 |
+
for h in (h_engine, h_recall, h_recalled, h_total, h_docs):
|
| 72 |
+
parts.append(
|
| 73 |
+
f'<th scope="col" style="padding:.4rem .6rem;text-align:left;'
|
| 74 |
+
f'border-bottom:1px solid #ccc;font-weight:600">{_e(h)}</th>'
|
| 75 |
+
)
|
| 76 |
+
parts.append('</tr></thead><tbody>')
|
| 77 |
+
|
| 78 |
+
# Tri par recall décroissant (les meilleurs en haut, None en queue).
|
| 79 |
+
sorted_rows = sorted(
|
| 80 |
+
rows,
|
| 81 |
+
key=lambda kv: -(kv[1].get("recall") or -1.0),
|
| 82 |
+
)
|
| 83 |
+
for engine, info in sorted_rows:
|
| 84 |
+
recall = info.get("recall")
|
| 85 |
+
n_recalled = int(info.get("n_recalled") or 0)
|
| 86 |
+
n_total = int(info.get("n_rare_tokens") or 0)
|
| 87 |
+
n_docs = int(info.get("n_docs") or 0)
|
| 88 |
+
if isinstance(recall, (int, float)):
|
| 89 |
+
recall_color = color_traffic_light(float(recall))
|
| 90 |
+
recall_cell = (
|
| 91 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 92 |
+
f'background:{recall_color};font-family:monospace;'
|
| 93 |
+
f'font-weight:600">{recall * 100:.1f} %</td>'
|
| 94 |
+
)
|
| 95 |
+
else:
|
| 96 |
+
recall_cell = (
|
| 97 |
+
'<td style="padding:.4rem .6rem;text-align:right;'
|
| 98 |
+
'opacity:.4">—</td>'
|
| 99 |
+
)
|
| 100 |
+
parts.append(
|
| 101 |
+
f'<tr>'
|
| 102 |
+
f'<td style="padding:.4rem .6rem">{_e(str(engine))}</td>'
|
| 103 |
+
f'{recall_cell}'
|
| 104 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 105 |
+
f'font-family:monospace">{n_recalled}</td>'
|
| 106 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 107 |
+
f'font-family:monospace">{n_total}</td>'
|
| 108 |
+
f'<td style="padding:.4rem .6rem;text-align:right;'
|
| 109 |
+
f'font-family:monospace">{n_docs}</td>'
|
| 110 |
+
f'</tr>'
|
| 111 |
+
)
|
| 112 |
+
parts.append('</tbody></table></section>')
|
| 113 |
+
return "".join(parts)
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
__all__ = ["build_rare_token_recall_html"]
|
picarones/report/readability_render.py
CHANGED
|
@@ -25,27 +25,22 @@ from __future__ import annotations
|
|
| 25 |
from html import escape as _e
|
| 26 |
from typing import Optional
|
| 27 |
|
|
|
|
| 28 |
|
| 29 |
-
def _color_for_delta(delta: float) -> str:
|
| 30 |
-
"""Vert au centre, orange si over-norm, bleu si under-norm.
|
| 31 |
|
| 32 |
-
|
|
|
|
|
|
|
| 33 |
"""
|
| 34 |
if abs(delta) <= 1.0:
|
| 35 |
-
return "#a7f0a7" # vert clair
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
f = -f
|
| 44 |
-
# vert → bleu profond
|
| 45 |
-
r = int(167 + (90 - 167) * f)
|
| 46 |
-
g = int(240 + (160 - 240) * f)
|
| 47 |
-
b = int(167 + (210 - 167) * f)
|
| 48 |
-
return f"#{r:02x}{g:02x}{b:02x}"
|
| 49 |
|
| 50 |
|
| 51 |
def build_readability_summary_html(
|
|
@@ -107,7 +102,7 @@ def build_readability_summary_html(
|
|
| 107 |
over_rate = float(agg.get("over_normalized_rate") or 0.0)
|
| 108 |
n_under = int(agg.get("n_under_normalized") or 0)
|
| 109 |
n_docs = int(agg.get("n_docs") or 0)
|
| 110 |
-
color =
|
| 111 |
parts.append(
|
| 112 |
f'<tr>'
|
| 113 |
f'<td style="padding:.4rem .6rem">{_e(str(name))}</td>'
|
|
|
|
| 25 |
from html import escape as _e
|
| 26 |
from typing import Optional
|
| 27 |
|
| 28 |
+
from picarones.report.render_helpers import color_diverging
|
| 29 |
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
def _bg_for_flesch_delta(delta: float) -> str:
|
| 32 |
+
"""Vert au centre (delta ≈ 0), orange en sur-normalisation (delta > 0),
|
| 33 |
+
bleu en sous-normalisation (delta < 0). Saturation à ±15 pts Flesch.
|
| 34 |
"""
|
| 35 |
if abs(delta) <= 1.0:
|
| 36 |
+
return "#a7f0a7" # neutre vert clair, indistinguable du bruit
|
| 37 |
+
return color_diverging(
|
| 38 |
+
delta,
|
| 39 |
+
max_abs=15.0,
|
| 40 |
+
neutral_rgb=(167, 240, 167),
|
| 41 |
+
positive_rgb=(220, 140, 60),
|
| 42 |
+
negative_rgb=(90, 160, 210),
|
| 43 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
|
| 46 |
def build_readability_summary_html(
|
|
|
|
| 102 |
over_rate = float(agg.get("over_normalized_rate") or 0.0)
|
| 103 |
n_under = int(agg.get("n_under_normalized") or 0)
|
| 104 |
n_docs = int(agg.get("n_docs") or 0)
|
| 105 |
+
color = _bg_for_flesch_delta(delta_mean)
|
| 106 |
parts.append(
|
| 107 |
f'<tr>'
|
| 108 |
f'<td style="padding:.4rem .6rem">{_e(str(name))}</td>'
|
picarones/report/render_helpers.py
ADDED
|
@@ -0,0 +1,422 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Helpers de rendu mutualisés.
|
| 2 |
+
|
| 3 |
+
Centralise les fonctions de coloration et le builder de grille SVG qui
|
| 4 |
+
étaient auparavant dupliqués dans chaque ``*_render.py``. Avant cette
|
| 5 |
+
consolidation, le projet comptait 25 versions différentes de
|
| 6 |
+
``_color_for_*`` (toutes des dégradés rouge/jaune/vert ou blanc/couleur
|
| 7 |
+
légèrement différentes) et 2 versions de ``_build_heatmap_svg``
|
| 8 |
+
(matrice de classes × positions). Le test
|
| 9 |
+
``tests/architecture/test_render_helpers.py`` mesure cette duplication
|
| 10 |
+
et bloque sa réapparition.
|
| 11 |
+
|
| 12 |
+
API
|
| 13 |
+
---
|
| 14 |
+
- :func:`color_traffic_light` — gradient rouge → jaune → vert. Couvre
|
| 15 |
+
la majorité des cellules du rapport (CER, F1, recall, ECE, deficit,
|
| 16 |
+
drag, CV, etc.). Argument ``low_is_good`` pour inverser la sémantique.
|
| 17 |
+
- :func:`color_single_gradient` — gradient blanc → couleur intense.
|
| 18 |
+
Utilisé pour les heatmaps Jaccard, densité, lexical modernization.
|
| 19 |
+
- :func:`color_diverging` — gradient signé (négatif → neutre → positif).
|
| 20 |
+
Utilisé pour les deltas Flesch, amélioration nette, sur/sous-norm.
|
| 21 |
+
- :func:`text_color_for_bg` — noir ou blanc selon la luminosité du fond.
|
| 22 |
+
- :func:`build_grid_svg` — builder de heatmap SVG paramétré.
|
| 23 |
+
|
| 24 |
+
Conventions de bornes
|
| 25 |
+
---------------------
|
| 26 |
+
Trois conventions de paramétrage cohabitent (par dessein, pas par
|
| 27 |
+
maladresse) :
|
| 28 |
+
|
| 29 |
+
- :func:`color_traffic_light` accepte ``scale_min`` + ``scale_max``
|
| 30 |
+
parce que les cellules concernées (CER, ECE, deficit) peuvent
|
| 31 |
+
démarrer à une borne basse non nulle (rang 1 = vert, ou
|
| 32 |
+
``scale_min=0.30`` pour démarrer le dégradé à partir d'un seuil).
|
| 33 |
+
- :func:`color_single_gradient` accepte ``max_value`` parce que ces
|
| 34 |
+
cellules (Jaccard, densité) sont toujours bornées en bas par 0 —
|
| 35 |
+
pas besoin de ``scale_min``.
|
| 36 |
+
- :func:`color_diverging` accepte ``max_abs`` parce que ces cellules
|
| 37 |
+
(deltas signés) sont symétriques autour de 0 — la borne est la
|
| 38 |
+
même des deux côtés.
|
| 39 |
+
|
| 40 |
+
Le choix des couleurs reflète la sémantique métier :
|
| 41 |
+
|
| 42 |
+
- **Traffic-light** rouge/jaune/vert : convention historique
|
| 43 |
+
largement comprise pour vision trichromate normale. **Compromis
|
| 44 |
+
d'accessibilité accepté** : la confusion rouge/vert affecte ~8 %
|
| 45 |
+
des hommes (deutéranopie/protanopie). Une migration vers la
|
| 46 |
+
palette Okabe-Ito de :mod:`picarones.report.colors` est tracée
|
| 47 |
+
comme dette dans un sprint dédié.
|
| 48 |
+
- **Diverging** bleu/vert/orange par défaut : vert au centre =
|
| 49 |
+
neutre, extrémités opposées sémantiquement, et ces 3 teintes
|
| 50 |
+
restent distinguables en daltonisme deutéranope. Choix retenu
|
| 51 |
+
parce que les cellules diverging sont moins nombreuses et
|
| 52 |
+
qu'on a pu repartir de zéro en les écrivant.
|
| 53 |
+
|
| 54 |
+
Palette
|
| 55 |
+
-------
|
| 56 |
+
Les bornes RGB des dégradés traffic-light sont la moyenne des palettes
|
| 57 |
+
ad hoc qui peuplaient les 25 helpers d'origine. Cohérence visuelle
|
| 58 |
+
unifiée tout en restant proche du rendu antérieur (≤ 10 unités RGB
|
| 59 |
+
d'écart sur la majorité des bornes), pour ne pas casser les tests
|
| 60 |
+
d'intégration HTML existants.
|
| 61 |
+
"""
|
| 62 |
+
|
| 63 |
+
from __future__ import annotations
|
| 64 |
+
|
| 65 |
+
from html import escape as _e
|
| 66 |
+
from typing import Callable, Optional
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
# ──────────────────────────────────────────────────────────────────
|
| 70 |
+
# Palettes — bornes RGB partagées par tous les dégradés.
|
| 71 |
+
#
|
| 72 |
+
# Choix éditorial : on conserve l'esprit « rouge → jaune → vert » des
|
| 73 |
+
# helpers historiques plutôt que la palette daltonien-friendly
|
| 74 |
+
# Okabe-Ito de ``colors.py`` (utilisée pour les badges principaux).
|
| 75 |
+
# Migrer les cellules de tableau vers Okabe-Ito serait un sprint
|
| 76 |
+
# d'accessibilité dédié, hors scope de la consolidation.
|
| 77 |
+
# ──────────────────────────────────────────────────────────────────
|
| 78 |
+
GRADIENT_RED_RGB: tuple[int, int, int] = (220, 100, 100)
|
| 79 |
+
GRADIENT_YELLOW_RGB: tuple[int, int, int] = (240, 220, 130)
|
| 80 |
+
GRADIENT_GREEN_RGB: tuple[int, int, int] = (130, 200, 130)
|
| 81 |
+
|
| 82 |
+
#: Couleurs cibles pour les single-gradients fréquents.
|
| 83 |
+
GRADIENT_TARGET_BLUE: tuple[int, int, int] = (30, 58, 138) # Jaccard, specialization
|
| 84 |
+
GRADIENT_TARGET_ORANGE: tuple[int, int, int] = (194, 65, 12) # densité, lexical mod.
|
| 85 |
+
GRADIENT_TARGET_RED: tuple[int, int, int] = (200, 60, 60) # divergence inter-engine
|
| 86 |
+
|
| 87 |
+
#: Couleurs cibles pour les diverging gradients.
|
| 88 |
+
DIVERGING_NEGATIVE_RGB: tuple[int, int, int] = (95, 145, 215) # bleu (under-norm)
|
| 89 |
+
DIVERGING_NEUTRAL_RGB: tuple[int, int, int] = (130, 200, 130) # vert (centre, OK)
|
| 90 |
+
DIVERGING_POSITIVE_RGB: tuple[int, int, int] = (220, 130, 60) # orange (over-norm)
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
# ──────────────────────────────────────────────────────────────────
|
| 94 |
+
# Helpers internes
|
| 95 |
+
# ──────────────────────────────────────────────────────────────────
|
| 96 |
+
def _interp(a: int, b: int, t: float) -> int:
|
| 97 |
+
"""Interpolation linéaire bornée à un canal RGB ∈ [0, 255]."""
|
| 98 |
+
return max(0, min(255, int(a + (b - a) * t)))
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
def _rgb_to_hex(r: int, g: int, b: int) -> str:
|
| 102 |
+
return f"#{r:02x}{g:02x}{b:02x}"
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
# ──────────────────────────────────────────────────────────────────
|
| 106 |
+
# API publique : couleurs
|
| 107 |
+
# ──────────────────────────────────────────────────────────────────
|
| 108 |
+
def color_traffic_light(
|
| 109 |
+
value: float,
|
| 110 |
+
*,
|
| 111 |
+
low_is_good: bool = False,
|
| 112 |
+
scale_max: float = 1.0,
|
| 113 |
+
scale_min: float = 0.0,
|
| 114 |
+
) -> str:
|
| 115 |
+
"""Gradient rouge → jaune → vert proportionnel à ``value``.
|
| 116 |
+
|
| 117 |
+
Paramètres
|
| 118 |
+
----------
|
| 119 |
+
value : float
|
| 120 |
+
Valeur à colorer.
|
| 121 |
+
low_is_good : bool, default ``False``
|
| 122 |
+
Si ``True``, ``value = scale_min`` → vert et ``value = scale_max``
|
| 123 |
+
→ rouge (sémantique « plus c'est bas, mieux c'est » : ECE,
|
| 124 |
+
deficit, drag, CV, taux d'introduction d'erreurs…).
|
| 125 |
+
Si ``False`` (défaut), c'est l'inverse (sémantique « plus c'est
|
| 126 |
+
haut, mieux c'est » : F1, recall, taux de correction…).
|
| 127 |
+
scale_max : float, default ``1.0``
|
| 128 |
+
Borne haute de l'échelle. Au-delà, la couleur sature.
|
| 129 |
+
scale_min : float, default ``0.0``
|
| 130 |
+
Borne basse de l'échelle.
|
| 131 |
+
|
| 132 |
+
Retour
|
| 133 |
+
------
|
| 134 |
+
str
|
| 135 |
+
Couleur hex au format ``#rrggbb``.
|
| 136 |
+
"""
|
| 137 |
+
span = scale_max - scale_min
|
| 138 |
+
if span <= 0:
|
| 139 |
+
f = 0.5
|
| 140 |
+
else:
|
| 141 |
+
f = (value - scale_min) / span
|
| 142 |
+
f = max(0.0, min(1.0, f))
|
| 143 |
+
if low_is_good:
|
| 144 |
+
f = 1.0 - f
|
| 145 |
+
if f <= 0.5:
|
| 146 |
+
t = f / 0.5
|
| 147 |
+
r = _interp(GRADIENT_RED_RGB[0], GRADIENT_YELLOW_RGB[0], t)
|
| 148 |
+
g = _interp(GRADIENT_RED_RGB[1], GRADIENT_YELLOW_RGB[1], t)
|
| 149 |
+
b = _interp(GRADIENT_RED_RGB[2], GRADIENT_YELLOW_RGB[2], t)
|
| 150 |
+
else:
|
| 151 |
+
t = (f - 0.5) / 0.5
|
| 152 |
+
r = _interp(GRADIENT_YELLOW_RGB[0], GRADIENT_GREEN_RGB[0], t)
|
| 153 |
+
g = _interp(GRADIENT_YELLOW_RGB[1], GRADIENT_GREEN_RGB[1], t)
|
| 154 |
+
b = _interp(GRADIENT_YELLOW_RGB[2], GRADIENT_GREEN_RGB[2], t)
|
| 155 |
+
return _rgb_to_hex(r, g, b)
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
def color_single_gradient(
|
| 159 |
+
value: float,
|
| 160 |
+
*,
|
| 161 |
+
end_rgb: tuple[int, int, int],
|
| 162 |
+
max_value: float = 1.0,
|
| 163 |
+
start_rgb: tuple[int, int, int] = (255, 255, 255),
|
| 164 |
+
) -> str:
|
| 165 |
+
"""Gradient simple ``start_rgb`` → ``end_rgb`` proportionnel à ``value/max_value``.
|
| 166 |
+
|
| 167 |
+
Utilisé pour les heatmaps qui n'ont pas de sémantique « bon/mauvais »
|
| 168 |
+
mais juste une intensité (Jaccard, densité d'occurrence, taux de
|
| 169 |
+
modernisation lexicale).
|
| 170 |
+
"""
|
| 171 |
+
if max_value <= 0:
|
| 172 |
+
f = 0.0
|
| 173 |
+
else:
|
| 174 |
+
f = max(0.0, min(1.0, value / max_value))
|
| 175 |
+
r = _interp(start_rgb[0], end_rgb[0], f)
|
| 176 |
+
g = _interp(start_rgb[1], end_rgb[1], f)
|
| 177 |
+
b = _interp(start_rgb[2], end_rgb[2], f)
|
| 178 |
+
return _rgb_to_hex(r, g, b)
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
def color_diverging(
|
| 182 |
+
value: float,
|
| 183 |
+
*,
|
| 184 |
+
max_abs: float = 1.0,
|
| 185 |
+
negative_rgb: tuple[int, int, int] = DIVERGING_NEGATIVE_RGB,
|
| 186 |
+
neutral_rgb: tuple[int, int, int] = DIVERGING_NEUTRAL_RGB,
|
| 187 |
+
positive_rgb: tuple[int, int, int] = DIVERGING_POSITIVE_RGB,
|
| 188 |
+
) -> str:
|
| 189 |
+
"""Gradient signé : ``value < 0`` → ``negative_rgb`` (par défaut bleu),
|
| 190 |
+
``value ≈ 0`` → ``neutral_rgb`` (par défaut vert),
|
| 191 |
+
``value > 0`` → ``positive_rgb`` (par défaut orange).
|
| 192 |
+
|
| 193 |
+
Saturation à ``|value| = max_abs``.
|
| 194 |
+
"""
|
| 195 |
+
if max_abs <= 0:
|
| 196 |
+
return _rgb_to_hex(*neutral_rgb)
|
| 197 |
+
f = max(-1.0, min(1.0, value / max_abs))
|
| 198 |
+
if f >= 0:
|
| 199 |
+
r = _interp(neutral_rgb[0], positive_rgb[0], f)
|
| 200 |
+
g = _interp(neutral_rgb[1], positive_rgb[1], f)
|
| 201 |
+
b = _interp(neutral_rgb[2], positive_rgb[2], f)
|
| 202 |
+
else:
|
| 203 |
+
t = -f
|
| 204 |
+
r = _interp(neutral_rgb[0], negative_rgb[0], t)
|
| 205 |
+
g = _interp(neutral_rgb[1], negative_rgb[1], t)
|
| 206 |
+
b = _interp(neutral_rgb[2], negative_rgb[2], t)
|
| 207 |
+
return _rgb_to_hex(r, g, b)
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
def text_color_for_bg(intensity: float, *, threshold: float = 0.55) -> str:
|
| 211 |
+
"""Retourne ``"#fff"`` sur fond foncé, ``"#222"`` sur fond clair.
|
| 212 |
+
|
| 213 |
+
``intensity`` ∈ [0, 1] : 0 = fond clair, 1 = fond très foncé.
|
| 214 |
+
Pour les heatmaps single-gradient, c'est typiquement la même valeur
|
| 215 |
+
que celle passée à :func:`color_single_gradient`.
|
| 216 |
+
"""
|
| 217 |
+
return "#fff" if intensity > threshold else "#222"
|
| 218 |
+
|
| 219 |
+
|
| 220 |
+
# ──────────────────────────────────────────────────────────────────
|
| 221 |
+
# API publique : barème CER par paliers (badges du rapport)
|
| 222 |
+
# ──────────────────────────────────��───────────────────────────────
|
| 223 |
+
#
|
| 224 |
+
# Les badges de qualité du rapport (galerie, tableau de classement)
|
| 225 |
+
# n'utilisent pas un dégradé continu mais un barème discret à 4
|
| 226 |
+
# paliers calibrés sur les seuils éditoriaux usuels :
|
| 227 |
+
#
|
| 228 |
+
# < 5 % : vert (qualité publication directe)
|
| 229 |
+
# < 15 % : jaune (relecture humaine légère)
|
| 230 |
+
# < 30 % : orange (relecture humaine systématique)
|
| 231 |
+
# ≥ 30 % : rouge (catastrophique, à reprendre)
|
| 232 |
+
#
|
| 233 |
+
# Les couleurs sont importées de :mod:`picarones.report.colors`
|
| 234 |
+
# (palette Okabe-Ito daltonien-friendly active par défaut).
|
| 235 |
+
|
| 236 |
+
|
| 237 |
+
def cer_step_color(cer: float) -> str:
|
| 238 |
+
"""Couleur de texte CSS pour un score CER, par paliers.
|
| 239 |
+
|
| 240 |
+
Voir le barème dans le bloc de documentation ci-dessus.
|
| 241 |
+
"""
|
| 242 |
+
from picarones.report.colors import (
|
| 243 |
+
COLOR_GREEN,
|
| 244 |
+
COLOR_ORANGE,
|
| 245 |
+
COLOR_RED,
|
| 246 |
+
COLOR_YELLOW,
|
| 247 |
+
)
|
| 248 |
+
if cer < 0.05:
|
| 249 |
+
return COLOR_GREEN
|
| 250 |
+
if cer < 0.15:
|
| 251 |
+
return COLOR_YELLOW
|
| 252 |
+
if cer < 0.30:
|
| 253 |
+
return COLOR_ORANGE
|
| 254 |
+
return COLOR_RED
|
| 255 |
+
|
| 256 |
+
|
| 257 |
+
def cer_step_bg(cer: float) -> str:
|
| 258 |
+
"""Couleur de fond CSS associée à :func:`cer_step_color`."""
|
| 259 |
+
from picarones.report.colors import (
|
| 260 |
+
BG_GREEN,
|
| 261 |
+
BG_ORANGE,
|
| 262 |
+
BG_RED,
|
| 263 |
+
BG_YELLOW,
|
| 264 |
+
)
|
| 265 |
+
if cer < 0.05:
|
| 266 |
+
return BG_GREEN
|
| 267 |
+
if cer < 0.15:
|
| 268 |
+
return BG_YELLOW
|
| 269 |
+
if cer < 0.30:
|
| 270 |
+
return BG_ORANGE
|
| 271 |
+
return BG_RED
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
# ──────────────────────────────────────────────────────────────────
|
| 275 |
+
# API publique : grille SVG
|
| 276 |
+
# ──────────────────────────────────────────────────────────────────
|
| 277 |
+
def build_grid_svg(
|
| 278 |
+
*,
|
| 279 |
+
n_rows: int,
|
| 280 |
+
n_cols: int,
|
| 281 |
+
row_label_fn: Callable[[int], str],
|
| 282 |
+
col_label_fn: Callable[[int], str],
|
| 283 |
+
cell_color_fn: Callable[[int, int], str],
|
| 284 |
+
cell_text_fn: Callable[[int, int], Optional[str]] = lambda r, c: None,
|
| 285 |
+
cell_text_color_fn: Callable[[int, int], str] = lambda r, c: "#222",
|
| 286 |
+
cell_w: int = 36,
|
| 287 |
+
cell_h: int = 36,
|
| 288 |
+
label_left: int = 130,
|
| 289 |
+
label_top: int = 80,
|
| 290 |
+
rotate_col_labels: bool = False,
|
| 291 |
+
aria_label: str = "Heatmap",
|
| 292 |
+
x_axis_title: Optional[str] = None,
|
| 293 |
+
) -> str:
|
| 294 |
+
"""Construit une heatmap SVG paramétrable.
|
| 295 |
+
|
| 296 |
+
Architecture commune des deux `_build_heatmap_svg` historiques
|
| 297 |
+
(taxonomy_cooccurrence et taxonomy_intra_doc), mutualisée ici.
|
| 298 |
+
|
| 299 |
+
Paramètres
|
| 300 |
+
----------
|
| 301 |
+
n_rows, n_cols : int
|
| 302 |
+
Dimensions de la grille.
|
| 303 |
+
row_label_fn, col_label_fn : Callable[[int], str]
|
| 304 |
+
Étiquettes des lignes (gauche) et colonnes (haut).
|
| 305 |
+
cell_color_fn : Callable[[int, int], str]
|
| 306 |
+
Retourne la couleur hex de fond pour la cellule (row, col).
|
| 307 |
+
cell_text_fn : Callable[[int, int], Optional[str]]
|
| 308 |
+
Texte à afficher dans la cellule, ou ``None`` pour ne rien afficher.
|
| 309 |
+
cell_text_color_fn : Callable[[int, int], str]
|
| 310 |
+
Couleur du texte de la cellule (typiquement obtenue via
|
| 311 |
+
:func:`text_color_for_bg`).
|
| 312 |
+
cell_w, cell_h : int
|
| 313 |
+
Dimensions de chaque cellule en pixels.
|
| 314 |
+
label_left, label_top : int
|
| 315 |
+
Marges réservées aux étiquettes.
|
| 316 |
+
rotate_col_labels : bool
|
| 317 |
+
Si ``True``, les étiquettes de colonnes sont rotées de -45°
|
| 318 |
+
(utile quand elles sont longues).
|
| 319 |
+
aria_label : str
|
| 320 |
+
Étiquette d'accessibilité du SVG.
|
| 321 |
+
x_axis_title : Optional[str]
|
| 322 |
+
Titre optionnel de l'axe horizontal, affiché en bas du SVG.
|
| 323 |
+
|
| 324 |
+
Retour
|
| 325 |
+
------
|
| 326 |
+
str
|
| 327 |
+
SVG complet, ou ``""`` si la grille est vide.
|
| 328 |
+
"""
|
| 329 |
+
if n_rows == 0 or n_cols == 0:
|
| 330 |
+
return ""
|
| 331 |
+
|
| 332 |
+
extra_bottom = 30 if x_axis_title else 10
|
| 333 |
+
width = label_left + n_cols * cell_w + 10
|
| 334 |
+
height = label_top + n_rows * cell_h + extra_bottom
|
| 335 |
+
|
| 336 |
+
parts: list[str] = [
|
| 337 |
+
f'<svg xmlns="http://www.w3.org/2000/svg" '
|
| 338 |
+
f'width="{width}" height="{height}" '
|
| 339 |
+
f'viewBox="0 0 {width} {height}" '
|
| 340 |
+
f'role="img" aria-label="{_e(aria_label)}">',
|
| 341 |
+
]
|
| 342 |
+
|
| 343 |
+
# Étiquettes de colonnes
|
| 344 |
+
for j in range(n_cols):
|
| 345 |
+
cx = label_left + j * cell_w + cell_w // 2
|
| 346 |
+
cy = label_top - 6
|
| 347 |
+
label = _e(col_label_fn(j))
|
| 348 |
+
if rotate_col_labels:
|
| 349 |
+
parts.append(
|
| 350 |
+
f'<text x="{cx}" y="{cy}" '
|
| 351 |
+
f'transform="rotate(-45 {cx} {cy})" '
|
| 352 |
+
f'font-size="11" fill="#333" text-anchor="start">'
|
| 353 |
+
f'{label}</text>'
|
| 354 |
+
)
|
| 355 |
+
else:
|
| 356 |
+
parts.append(
|
| 357 |
+
f'<text x="{cx}" y="{cy}" '
|
| 358 |
+
f'font-size="10" fill="#666" text-anchor="middle">'
|
| 359 |
+
f'{label}</text>'
|
| 360 |
+
)
|
| 361 |
+
|
| 362 |
+
# Cellules + étiquettes de lignes
|
| 363 |
+
for i in range(n_rows):
|
| 364 |
+
rx = label_left - 6
|
| 365 |
+
ry = label_top + i * cell_h + cell_h // 2 + 4
|
| 366 |
+
parts.append(
|
| 367 |
+
f'<text x="{rx}" y="{ry}" '
|
| 368 |
+
f'font-size="11" fill="#333" text-anchor="end">'
|
| 369 |
+
f'{_e(row_label_fn(i))}</text>'
|
| 370 |
+
)
|
| 371 |
+
for j in range(n_cols):
|
| 372 |
+
x = label_left + j * cell_w
|
| 373 |
+
y = label_top + i * cell_h
|
| 374 |
+
color = cell_color_fn(i, j)
|
| 375 |
+
parts.append(
|
| 376 |
+
f'<rect x="{x}" y="{y}" '
|
| 377 |
+
f'width="{cell_w}" height="{cell_h}" '
|
| 378 |
+
f'fill="{color}" stroke="#ddd" stroke-width="0.5"/>'
|
| 379 |
+
)
|
| 380 |
+
text = cell_text_fn(i, j)
|
| 381 |
+
if text is not None:
|
| 382 |
+
text_color = cell_text_color_fn(i, j)
|
| 383 |
+
parts.append(
|
| 384 |
+
f'<text x="{x + cell_w // 2}" '
|
| 385 |
+
f'y="{y + cell_h // 2 + 4}" '
|
| 386 |
+
f'font-size="10" fill="{text_color}" '
|
| 387 |
+
f'text-anchor="middle">'
|
| 388 |
+
f'{_e(text)}</text>'
|
| 389 |
+
)
|
| 390 |
+
|
| 391 |
+
if x_axis_title:
|
| 392 |
+
cx_axis = label_left + (n_cols * cell_w) // 2
|
| 393 |
+
cy_axis = height - 6
|
| 394 |
+
parts.append(
|
| 395 |
+
f'<text x="{cx_axis}" y="{cy_axis}" '
|
| 396 |
+
f'font-size="11" fill="#666" text-anchor="middle" '
|
| 397 |
+
f'font-style="italic">'
|
| 398 |
+
f'{_e(x_axis_title)}</text>'
|
| 399 |
+
)
|
| 400 |
+
|
| 401 |
+
parts.append("</svg>")
|
| 402 |
+
return "".join(parts)
|
| 403 |
+
|
| 404 |
+
|
| 405 |
+
__all__ = [
|
| 406 |
+
"GRADIENT_RED_RGB",
|
| 407 |
+
"GRADIENT_YELLOW_RGB",
|
| 408 |
+
"GRADIENT_GREEN_RGB",
|
| 409 |
+
"GRADIENT_TARGET_BLUE",
|
| 410 |
+
"GRADIENT_TARGET_ORANGE",
|
| 411 |
+
"GRADIENT_TARGET_RED",
|
| 412 |
+
"DIVERGING_NEGATIVE_RGB",
|
| 413 |
+
"DIVERGING_NEUTRAL_RGB",
|
| 414 |
+
"DIVERGING_POSITIVE_RGB",
|
| 415 |
+
"cer_step_color",
|
| 416 |
+
"cer_step_bg",
|
| 417 |
+
"color_traffic_light",
|
| 418 |
+
"color_single_gradient",
|
| 419 |
+
"color_diverging",
|
| 420 |
+
"text_color_for_bg",
|
| 421 |
+
"build_grid_svg",
|
| 422 |
+
]
|
picarones/report/report_data/__init__.py
ADDED
|
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction du dict de données consommé par le template Jinja.
|
| 2 |
+
|
| 3 |
+
Avant le découpage, ``picarones.report.generator._build_report_data``
|
| 4 |
+
faisait 463 lignes pour transformer un :class:`BenchmarkResult` en
|
| 5 |
+
dict prêt pour Jinja. Cette fonction empilait par sprint des blocs
|
| 6 |
+
indépendants — engines, documents, statistiques, scatter plots,
|
| 7 |
+
front Pareto, etc.
|
| 8 |
+
|
| 9 |
+
Ce sous-package éclate la construction en modules thématiques :
|
| 10 |
+
|
| 11 |
+
- :mod:`engines` — résumé par moteur (``engines_summary``).
|
| 12 |
+
- :mod:`documents` — vue galerie + détail + difficulté Sprint 7.
|
| 13 |
+
- :mod:`statistics` — Wilcoxon, Friedman, Nemenyi, bootstrap CIs,
|
| 14 |
+
reliability curves, Venn, error clusters, corrélations.
|
| 15 |
+
- :mod:`scatter` — Sprint 10 : Gini vs CER, ratio vs anchor.
|
| 16 |
+
- :mod:`pareto` — Sprint 19 : 3 fronts Pareto + métadonnées pricing.
|
| 17 |
+
Expose deux fonctions séparées : :func:`attach_engine_costs`
|
| 18 |
+
(mute) et :func:`build_pareto_section` (pure).
|
| 19 |
+
|
| 20 |
+
L'API publique :func:`build_report_data` orchestre ces modules dans
|
| 21 |
+
le bon ordre. La séquence Pareto en deux temps
|
| 22 |
+
(``attach_engine_costs`` → ``build_pareto_section``) rend la
|
| 23 |
+
mutation explicite — les fonctions ``build_*`` du sous-package
|
| 24 |
+
sont pures sauf ``attach_engine_costs`` dont le nom le dit.
|
| 25 |
+
"""
|
| 26 |
+
|
| 27 |
+
from __future__ import annotations
|
| 28 |
+
|
| 29 |
+
from typing import TYPE_CHECKING
|
| 30 |
+
|
| 31 |
+
if TYPE_CHECKING:
|
| 32 |
+
from picarones.core.results import BenchmarkResult
|
| 33 |
+
|
| 34 |
+
from picarones.report.report_data.documents import (
|
| 35 |
+
annotate_documents_with_difficulty,
|
| 36 |
+
build_documents,
|
| 37 |
+
)
|
| 38 |
+
from picarones.report.report_data.engines import build_engines_summary
|
| 39 |
+
from picarones.report.report_data.extra_metrics import (
|
| 40 |
+
compute_marginal_cost_section,
|
| 41 |
+
compute_rare_token_recall_per_engine,
|
| 42 |
+
compute_taxonomy_cooccurrence_section,
|
| 43 |
+
compute_taxonomy_intra_doc_section,
|
| 44 |
+
)
|
| 45 |
+
from picarones.report.report_data.pareto import (
|
| 46 |
+
attach_engine_costs,
|
| 47 |
+
build_pareto_section,
|
| 48 |
+
)
|
| 49 |
+
from picarones.report.report_data.scatter import (
|
| 50 |
+
build_gini_vs_cer,
|
| 51 |
+
build_ratio_vs_anchor,
|
| 52 |
+
)
|
| 53 |
+
from picarones.report.report_data.statistics import (
|
| 54 |
+
build_bootstrap_cis,
|
| 55 |
+
build_correlation_per_engine,
|
| 56 |
+
build_error_clusters,
|
| 57 |
+
build_friedman_and_nemenyi,
|
| 58 |
+
build_pairwise_wilcoxon,
|
| 59 |
+
build_reliability_curves,
|
| 60 |
+
build_venn_data,
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def build_report_data(
|
| 65 |
+
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 66 |
+
) -> dict:
|
| 67 |
+
"""Transforme un :class:`BenchmarkResult` en dict pour le rapport HTML.
|
| 68 |
+
|
| 69 |
+
Ordre critique :
|
| 70 |
+
|
| 71 |
+
1. Construire ``engines_summary`` (pur).
|
| 72 |
+
2. Construire ``documents`` puis annoter avec la difficulté (mute
|
| 73 |
+
``documents``).
|
| 74 |
+
3. **Attacher** les coûts à ``engines_summary`` (mute, nom
|
| 75 |
+
explicite).
|
| 76 |
+
4. **Construire** le bloc Pareto (pure, lit les coûts attachés).
|
| 77 |
+
"""
|
| 78 |
+
engines_summary = build_engines_summary(benchmark)
|
| 79 |
+
documents = build_documents(benchmark, images_b64)
|
| 80 |
+
annotate_documents_with_difficulty(benchmark, documents)
|
| 81 |
+
|
| 82 |
+
attach_engine_costs(engines_summary, benchmark)
|
| 83 |
+
pareto_data = build_pareto_section(engines_summary)
|
| 84 |
+
|
| 85 |
+
return {
|
| 86 |
+
"meta": {
|
| 87 |
+
"corpus_name": benchmark.corpus_name,
|
| 88 |
+
"corpus_source": benchmark.corpus_source,
|
| 89 |
+
"document_count": benchmark.document_count,
|
| 90 |
+
"run_date": benchmark.run_date,
|
| 91 |
+
"picarones_version": benchmark.picarones_version,
|
| 92 |
+
"metadata": benchmark.metadata,
|
| 93 |
+
},
|
| 94 |
+
"ranking": benchmark.ranking(),
|
| 95 |
+
"engines": engines_summary,
|
| 96 |
+
"documents": documents,
|
| 97 |
+
# Sprint 7
|
| 98 |
+
"statistics": {
|
| 99 |
+
"pairwise_wilcoxon": build_pairwise_wilcoxon(benchmark),
|
| 100 |
+
"bootstrap_cis": build_bootstrap_cis(benchmark),
|
| 101 |
+
**build_friedman_and_nemenyi(benchmark),
|
| 102 |
+
},
|
| 103 |
+
"reliability_curves": build_reliability_curves(benchmark),
|
| 104 |
+
"venn_data": build_venn_data(benchmark),
|
| 105 |
+
"error_clusters": build_error_clusters(benchmark),
|
| 106 |
+
"correlation_per_engine": build_correlation_per_engine(benchmark),
|
| 107 |
+
# Sprint 10
|
| 108 |
+
"gini_vs_cer": build_gini_vs_cer(benchmark),
|
| 109 |
+
"ratio_vs_anchor": build_ratio_vs_anchor(benchmark),
|
| 110 |
+
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 111 |
+
"pareto": pareto_data,
|
| 112 |
+
# Sprint 36 — analyse inter-moteurs (divergence taxonomique +
|
| 113 |
+
# complémentarité / oracle). ``None`` si moins de 2 moteurs.
|
| 114 |
+
"inter_engine_analysis": benchmark.inter_engine_analysis,
|
| 115 |
+
# Sprint 45-46 — stratification par script_type
|
| 116 |
+
"available_strata": benchmark.available_strata(),
|
| 117 |
+
"stratified_ranking": benchmark.stratified_ranking() or None,
|
| 118 |
+
"corpus_homogeneity": benchmark.corpus_homogeneity(),
|
| 119 |
+
# Sprint « câblage des modules test-only » (mai 2026) — métriques
|
| 120 |
+
# corpus-wide qui jusque-là n'étaient pas remontées dans le rapport.
|
| 121 |
+
# Sprint 71 (A.I.1) : recall sur tokens rares (hapax + dis legomena).
|
| 122 |
+
"rare_token_recall": compute_rare_token_recall_per_engine(benchmark),
|
| 123 |
+
# Sprint 75 (A.I.4) : co-occurrence taxonomique inter-classes.
|
| 124 |
+
"taxonomy_cooccurrence": compute_taxonomy_cooccurrence_section(benchmark),
|
| 125 |
+
# Sprint 76 (A.I.4) : heatmap class × position (intra-document).
|
| 126 |
+
"taxonomy_intra_doc": compute_taxonomy_intra_doc_section(benchmark),
|
| 127 |
+
# Sprint 91 (A.II.6) : matrice de coût marginal entre paires de moteurs.
|
| 128 |
+
"marginal_cost": compute_marginal_cost_section(engines_summary),
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
__all__ = ["build_report_data"]
|
picarones/report/report_data/_helpers.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Helpers numériques internes au sous-package report_data.
|
| 2 |
+
|
| 3 |
+
Petites fonctions utilitaires partagées par tous les builders de
|
| 4 |
+
sections (engines, documents, statistics, scatter, pareto). Ne pas
|
| 5 |
+
importer depuis l'extérieur du sous-package — ces helpers sont
|
| 6 |
+
spécifiques aux conventions du dict JSON consommé par le template.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import Optional
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def safe_round(v: Optional[float], decimals: int = 4) -> float:
|
| 15 |
+
"""Arrondit un float optionnel ; ``None`` devient ``0.0``."""
|
| 16 |
+
return round(v or 0.0, decimals)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def percent_string(v: Optional[float], decimals: int = 2) -> str:
|
| 20 |
+
"""Formate un ratio ∈ [0, 1] en chaîne pourcentage : ``0.4723 → "47.23 %"``.
|
| 21 |
+
|
| 22 |
+
``None`` → ``"—"``. Conservé pour rétrocompat avec d'éventuels
|
| 23 |
+
callers externes (Sprint 7 historique).
|
| 24 |
+
"""
|
| 25 |
+
if v is None:
|
| 26 |
+
return "—"
|
| 27 |
+
return f"{v * 100:.{decimals}f} %"
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
__all__ = ["safe_round", "percent_string"]
|
picarones/report/report_data/documents.py
ADDED
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction de la liste ``documents`` (vue galerie + vue détail).
|
| 2 |
+
|
| 3 |
+
Pour chaque document du corpus, agrège les hypothèses de tous les
|
| 4 |
+
moteurs avec leurs métriques, le diff caractère par caractère, et
|
| 5 |
+
les champs spécifiques aux pipelines OCR+LLM (intermédiaire, mode,
|
| 6 |
+
sur-normalisation).
|
| 7 |
+
|
| 8 |
+
:func:`annotate_documents_with_difficulty` enrichit ensuite chaque
|
| 9 |
+
document avec son score de difficulté intrinsèque (Sprint 7).
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from __future__ import annotations
|
| 13 |
+
|
| 14 |
+
from typing import TYPE_CHECKING
|
| 15 |
+
|
| 16 |
+
from picarones.core.diff_utils import compute_char_diff, compute_word_diff
|
| 17 |
+
from picarones.measurements.difficulty import (
|
| 18 |
+
compute_all_difficulties,
|
| 19 |
+
difficulty_label,
|
| 20 |
+
)
|
| 21 |
+
from picarones.report.report_data._helpers import safe_round
|
| 22 |
+
|
| 23 |
+
if TYPE_CHECKING:
|
| 24 |
+
from picarones.core.results import BenchmarkResult
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def build_documents(
|
| 28 |
+
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 29 |
+
) -> list[dict]:
|
| 30 |
+
"""Retourne la liste ordonnée des documents prêts pour le template.
|
| 31 |
+
|
| 32 |
+
L'ordre des documents préserve l'ordre d'apparition (premier moteur
|
| 33 |
+
d'abord, puis compléments depuis les moteurs suivants si certains
|
| 34 |
+
documents ne sont pas couverts par tous les moteurs).
|
| 35 |
+
"""
|
| 36 |
+
seen_doc_ids: set[str] = set()
|
| 37 |
+
doc_ids_ordered: list[str] = []
|
| 38 |
+
for report in benchmark.engine_reports:
|
| 39 |
+
for dr in report.document_results:
|
| 40 |
+
if dr.doc_id not in seen_doc_ids:
|
| 41 |
+
seen_doc_ids.add(dr.doc_id)
|
| 42 |
+
doc_ids_ordered.append(dr.doc_id)
|
| 43 |
+
|
| 44 |
+
# Index croisé : doc_id → {engine_name → DocumentResult}
|
| 45 |
+
doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
|
| 46 |
+
for report in benchmark.engine_reports:
|
| 47 |
+
for dr in report.document_results:
|
| 48 |
+
doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
|
| 49 |
+
|
| 50 |
+
documents: list[dict] = []
|
| 51 |
+
engine_names = [r.engine_name for r in benchmark.engine_reports]
|
| 52 |
+
for doc_id in doc_ids_ordered:
|
| 53 |
+
engine_results: list[dict] = []
|
| 54 |
+
gt = ""
|
| 55 |
+
image_path = ""
|
| 56 |
+
for engine_name in engine_names:
|
| 57 |
+
dr = doc_engine_map[doc_id].get(engine_name)
|
| 58 |
+
if dr is None:
|
| 59 |
+
continue
|
| 60 |
+
gt = dr.ground_truth
|
| 61 |
+
image_path = dr.image_path
|
| 62 |
+
er_entry = _build_engine_result_entry(engine_name, dr)
|
| 63 |
+
engine_results.append(er_entry)
|
| 64 |
+
|
| 65 |
+
# CER moyen sur ce document (pour le badge galerie)
|
| 66 |
+
cer_values = [er["cer"] for er in engine_results if er["error"] is None]
|
| 67 |
+
mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
|
| 68 |
+
best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
|
| 69 |
+
|
| 70 |
+
# Script type (depuis metadata par document si disponible)
|
| 71 |
+
script_type = ""
|
| 72 |
+
first_engine = engine_names[0] if engine_names else None
|
| 73 |
+
first_dr = doc_engine_map[doc_id].get(first_engine)
|
| 74 |
+
if first_dr and first_dr.image_quality:
|
| 75 |
+
script_type = first_dr.image_quality.get("script_type", "")
|
| 76 |
+
|
| 77 |
+
documents.append({
|
| 78 |
+
"doc_id": doc_id,
|
| 79 |
+
"image_path": image_path,
|
| 80 |
+
"image_b64": images_b64.get(doc_id, ""),
|
| 81 |
+
"ground_truth": gt,
|
| 82 |
+
"mean_cer": safe_round(mean_cer),
|
| 83 |
+
"best_engine": best_engine["engine"] if best_engine else "",
|
| 84 |
+
"engine_results": engine_results,
|
| 85 |
+
"script_type": script_type,
|
| 86 |
+
})
|
| 87 |
+
return documents
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def _build_engine_result_entry(engine_name: str, dr) -> dict:
|
| 91 |
+
"""Construit une entrée moteur pour un document donné (extrait pour lisibilité)."""
|
| 92 |
+
diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
|
| 93 |
+
er_entry: dict = {
|
| 94 |
+
"engine": engine_name,
|
| 95 |
+
"hypothesis": dr.hypothesis,
|
| 96 |
+
"cer": safe_round(dr.metrics.cer),
|
| 97 |
+
"cer_diplomatic": safe_round(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
|
| 98 |
+
"wer": safe_round(dr.metrics.wer),
|
| 99 |
+
"mer": safe_round(dr.metrics.mer),
|
| 100 |
+
"wil": safe_round(dr.metrics.wil),
|
| 101 |
+
"duration": dr.duration_seconds,
|
| 102 |
+
"error": dr.engine_error,
|
| 103 |
+
"diff": diff_ops,
|
| 104 |
+
}
|
| 105 |
+
# Champs spécifiques aux pipelines OCR+LLM
|
| 106 |
+
if dr.ocr_intermediate is not None:
|
| 107 |
+
er_entry["ocr_intermediate"] = dr.ocr_intermediate
|
| 108 |
+
er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
|
| 109 |
+
er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
|
| 110 |
+
if dr.pipeline_metadata:
|
| 111 |
+
on = dr.pipeline_metadata.get("over_normalization")
|
| 112 |
+
if on is not None:
|
| 113 |
+
er_entry["over_normalization"] = on
|
| 114 |
+
er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
|
| 115 |
+
# Sprint 5 — métriques avancées par document
|
| 116 |
+
if dr.char_scores is not None:
|
| 117 |
+
er_entry["ligature_score"] = safe_round(dr.char_scores.get("ligature", {}).get("score"))
|
| 118 |
+
er_entry["diacritic_score"] = safe_round(dr.char_scores.get("diacritic", {}).get("score"))
|
| 119 |
+
if dr.taxonomy is not None:
|
| 120 |
+
er_entry["taxonomy"] = dr.taxonomy
|
| 121 |
+
if dr.structure is not None:
|
| 122 |
+
er_entry["structure"] = dr.structure
|
| 123 |
+
if dr.image_quality is not None:
|
| 124 |
+
er_entry["image_quality"] = dr.image_quality
|
| 125 |
+
# Sprint 10
|
| 126 |
+
if dr.line_metrics is not None:
|
| 127 |
+
er_entry["line_metrics"] = dr.line_metrics
|
| 128 |
+
if dr.hallucination_metrics is not None:
|
| 129 |
+
er_entry["hallucination_metrics"] = dr.hallucination_metrics
|
| 130 |
+
return er_entry
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def annotate_documents_with_difficulty(
|
| 134 |
+
benchmark: "BenchmarkResult", documents: list[dict],
|
| 135 |
+
) -> None:
|
| 136 |
+
"""Annote chaque document du dict avec son score de difficulté (Sprint 7).
|
| 137 |
+
|
| 138 |
+
Modifie ``documents`` en place. Les valeurs par défaut ``0.5`` /
|
| 139 |
+
``"Modéré"`` sont retournées si la difficulté n'a pas pu être
|
| 140 |
+
calculée (par exemple corpus dégénéré).
|
| 141 |
+
"""
|
| 142 |
+
doc_ids_ordered = [d["doc_id"] for d in documents]
|
| 143 |
+
gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
|
| 144 |
+
cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
|
| 145 |
+
iq_map: dict[str, float] = {}
|
| 146 |
+
for report in benchmark.engine_reports:
|
| 147 |
+
for dr in report.document_results:
|
| 148 |
+
cer_map.setdefault(dr.doc_id, {})[report.engine_name] = safe_round(dr.metrics.cer)
|
| 149 |
+
if dr.image_quality and "quality_score" in dr.image_quality:
|
| 150 |
+
iq_map[dr.doc_id] = dr.image_quality["quality_score"]
|
| 151 |
+
difficulty_scores = compute_all_difficulties(
|
| 152 |
+
doc_ids=doc_ids_ordered,
|
| 153 |
+
ground_truths=gt_map,
|
| 154 |
+
cer_map=cer_map,
|
| 155 |
+
image_quality_map=iq_map or None,
|
| 156 |
+
)
|
| 157 |
+
for doc in documents:
|
| 158 |
+
ds = difficulty_scores.get(doc["doc_id"])
|
| 159 |
+
if ds:
|
| 160 |
+
doc["difficulty_score"] = safe_round(ds.score)
|
| 161 |
+
doc["difficulty_label"] = difficulty_label(ds.score)
|
| 162 |
+
else:
|
| 163 |
+
doc["difficulty_score"] = 0.5
|
| 164 |
+
doc["difficulty_label"] = "Modéré"
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
__all__ = ["build_documents", "annotate_documents_with_difficulty"]
|
picarones/report/report_data/engines.py
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction du résumé par moteur (``engines_summary``).
|
| 2 |
+
|
| 3 |
+
Pour chaque ``EngineReport``, accumule métriques agrégées (CER, WER,
|
| 4 |
+
MER, WIL), distribution CER pour l'histogramme, métriques avancées
|
| 5 |
+
patrimoniales (Sprint 5), distribution d'erreurs (Sprint 10), NER
|
| 6 |
+
(Sprint 41), calibration (Sprint 43), profil philologique (Sprint
|
| 7 |
+
62), recherchabilité + séquences numériques (Sprint 86), lisibilité
|
| 8 |
+
(Sprint 87) et indicateurs pipeline OCR+LLM.
|
| 9 |
+
|
| 10 |
+
Les coûts (durée moyenne, prix par 1k pages, CO₂) sont ajoutés
|
| 11 |
+
ultérieurement par :mod:`picarones.report.report_data.pareto` qui
|
| 12 |
+
en a besoin pour calculer les fronts.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
from __future__ import annotations
|
| 16 |
+
|
| 17 |
+
from typing import TYPE_CHECKING
|
| 18 |
+
|
| 19 |
+
from picarones.report.report_data._helpers import safe_round
|
| 20 |
+
|
| 21 |
+
if TYPE_CHECKING:
|
| 22 |
+
from picarones.core.results import BenchmarkResult
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def build_engines_summary(benchmark: "BenchmarkResult") -> list[dict]:
|
| 26 |
+
"""Retourne la liste des dicts moteur, une entrée par ``EngineReport``."""
|
| 27 |
+
engines_summary: list[dict] = []
|
| 28 |
+
for report in benchmark.engine_reports:
|
| 29 |
+
agg = report.aggregated_metrics
|
| 30 |
+
diplo_agg = agg.get("cer_diplomatic", {})
|
| 31 |
+
|
| 32 |
+
line_metrics = report.aggregated_line_metrics
|
| 33 |
+
halluc = report.aggregated_hallucination
|
| 34 |
+
|
| 35 |
+
entry: dict = {
|
| 36 |
+
"name": report.engine_name,
|
| 37 |
+
"version": report.engine_version,
|
| 38 |
+
"cer": safe_round(agg.get("cer", {}).get("mean")),
|
| 39 |
+
"wer": safe_round(agg.get("wer", {}).get("mean")),
|
| 40 |
+
"mer": safe_round(agg.get("mer", {}).get("mean")),
|
| 41 |
+
"wil": safe_round(agg.get("wil", {}).get("mean")),
|
| 42 |
+
"cer_median": safe_round(agg.get("cer", {}).get("median")),
|
| 43 |
+
"cer_min": safe_round(agg.get("cer", {}).get("min")),
|
| 44 |
+
"cer_max": safe_round(agg.get("cer", {}).get("max")),
|
| 45 |
+
"doc_count": agg.get("document_count", 0),
|
| 46 |
+
"failed": agg.get("failed_count", 0),
|
| 47 |
+
# CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
|
| 48 |
+
"cer_diplomatic": safe_round(diplo_agg.get("mean")) if diplo_agg else None,
|
| 49 |
+
"cer_diplomatic_profile": diplo_agg.get("profile"),
|
| 50 |
+
# Distribution pour l'histogramme : liste des CER individuels
|
| 51 |
+
"cer_values": [
|
| 52 |
+
safe_round(dr.metrics.cer)
|
| 53 |
+
for dr in report.document_results
|
| 54 |
+
if dr.metrics.error is None
|
| 55 |
+
],
|
| 56 |
+
"cer_diplomatic_values": [
|
| 57 |
+
safe_round(dr.metrics.cer_diplomatic)
|
| 58 |
+
for dr in report.document_results
|
| 59 |
+
if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
|
| 60 |
+
],
|
| 61 |
+
# Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
|
| 62 |
+
"is_pipeline": report.is_pipeline,
|
| 63 |
+
"pipeline_info": report.pipeline_info,
|
| 64 |
+
# Sprint 5 — métriques avancées patrimoniales
|
| 65 |
+
"ligature_score": safe_round(report.ligature_score) if report.ligature_score is not None else None,
|
| 66 |
+
"diacritic_score": safe_round(report.diacritic_score) if report.diacritic_score is not None else None,
|
| 67 |
+
"aggregated_confusion": report.aggregated_confusion,
|
| 68 |
+
"aggregated_taxonomy": report.aggregated_taxonomy,
|
| 69 |
+
"aggregated_structure": report.aggregated_structure,
|
| 70 |
+
"aggregated_image_quality": report.aggregated_image_quality,
|
| 71 |
+
# Sprint 10 — distribution des erreurs + hallucinations VLM
|
| 72 |
+
"gini": safe_round(line_metrics.get("gini_mean")) if line_metrics else None,
|
| 73 |
+
"cer_p90": safe_round(line_metrics.get("percentiles", {}).get("p90")) if line_metrics else None,
|
| 74 |
+
"cer_p99": safe_round(line_metrics.get("percentiles", {}).get("p99")) if line_metrics else None,
|
| 75 |
+
"catastrophic_rate_30": safe_round(line_metrics.get("catastrophic_rate", {}).get("0.3")) if line_metrics else None,
|
| 76 |
+
"aggregated_line_metrics": line_metrics,
|
| 77 |
+
"anchor_score": safe_round(halluc.get("anchor_score_mean")) if halluc else None,
|
| 78 |
+
"length_ratio": safe_round(halluc.get("length_ratio_mean")) if halluc else None,
|
| 79 |
+
"hallucinating_doc_rate": safe_round(halluc.get("hallucinating_doc_rate")) if halluc else None,
|
| 80 |
+
"aggregated_hallucination": halluc,
|
| 81 |
+
# Sprint 41 — NER agrégé (None si aucun calcul effectué)
|
| 82 |
+
"aggregated_ner": report.aggregated_ner,
|
| 83 |
+
# Sprint 43 — calibration agrégée (None si aucune confidence
|
| 84 |
+
# n'a été exposée par le moteur sur ce corpus)
|
| 85 |
+
"aggregated_calibration": report.aggregated_calibration,
|
| 86 |
+
# Sprint 62 — profil philologique agrégé (None si aucun
|
| 87 |
+
# signal philologique sur le corpus pour ce moteur)
|
| 88 |
+
"aggregated_philological": report.aggregated_philological,
|
| 89 |
+
# Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
|
| 90 |
+
# numériques). None si aucun document n'a de signal.
|
| 91 |
+
"aggregated_searchability": report.aggregated_searchability,
|
| 92 |
+
"aggregated_numerical_sequences": (
|
| 93 |
+
report.aggregated_numerical_sequences
|
| 94 |
+
),
|
| 95 |
+
# Sprint 87 — A.II.2 (delta Flesch agrégé)
|
| 96 |
+
"aggregated_readability": report.aggregated_readability,
|
| 97 |
+
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 98 |
+
}
|
| 99 |
+
engines_summary.append(entry)
|
| 100 |
+
return engines_summary
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
__all__ = ["build_engines_summary"]
|