Spaces:
Running
refactor(measurements): split runner.py (1019 → 6 sub-modules) by concern
Browse filesSprint « découpage de runner.py » — dernier god-module du repo. Le
fichier monolithique de 1019 lignes mélangeait orchestration, calcul
document, workers picklables, persistance NDJSON, agrégations et
câblage NER. Éclaté par concern en un sous-package
``picarones/measurements/runner/``.
Structure du nouveau sous-package :
- ``__init__.py`` (103 lignes) — ré-exports rétrocompat 100%.
- ``orchestration.py`` (494) — ``run_benchmark`` (boucle
principale, pools, agrégation
par moteur) + ``_build_pipeline_info``.
- ``document.py`` (190) — ``_compute_document_result``
(toutes métriques + hooks
via ``run_document_hooks``)
+ ``_calibration_from_engine_result``
+ helpers timeout/erreur.
- ``partial.py`` (140) — Persistance NDJSON des
résultats partiels (lock,
sanitize, write/read/delete).
- ``ner_attach.py`` (133) — Câblage NER post-process
(Sprint 40) + agrégation NER.
- ``workers.py`` (107) — Fonctions niveau module pour
``ProcessPoolExecutor``
(``_cpu_doc_worker``) et
``ThreadPoolExecutor``
(``_io_doc_worker``).
- ``aggregation.py`` (82) — 8 délégations rétrocompat vers
``builtin_hooks._aggregate_*``
(chantier 2 post-Sprint 97).
Le plus gros sous-module (``orchestration.py``) reste à 494 lignes,
sous le budget 575 calibré (current + ~15 %).
Rétrocompat
-----------
Les ~25 fichiers qui font ``from picarones.measurements.runner
import X`` continuent à fonctionner sans modification grâce aux
ré-exports dans ``__init__.py``. Symboles privés ré-exportés pour
les tests qui les consomment directement : ``_compute_document_result``
(test_sprint13), ``_calibration_from_engine_result`` (test_sprint42),
``_aggregate_*`` (test_sprint13/42), ``_attach_ner_metrics`` +
``_aggregate_ner`` (test_sprint40), ``_cpu_doc_worker`` /
``_io_doc_worker`` (test_sprint13), ``_save_partial_line`` +
``_load_partial`` (test_sprint13).
Préservation de l'historique git via ``git mv runner.py
runner/__init__.py`` puis Write des contenus.
Discipline d'audit récursif (2 tours)
-------------------------------------
**Tour 1** (agent Explore ciblé sur 8 angles : API publique, imports
circulaires, refs croisées, lock partagé, side-effects, docstrings,
dead code, monkey-patching) :
- 0 critique, 0 majeur. 2 mineurs faux positifs (``_aggregate_ner``
réellement dans ``ner_attach.py`` par cohérence sémantique ;
``Optional`` réellement utilisé dans les 8 signatures).
**Tour 2** (audit manuel hors-spec sur 7 angles : cohérence frontière,
``__module__`` introspection, nommage logger, imports relatifs/absolus,
symboles privés exposés vs consommés, taille comparée, parité
``statistics/``) :
- ``test_sprint40`` utilise ``caplog.at_level(logger="picarones.
measurements.runner")`` — vérifié : la propagation Python loggers
fait remonter les records de ``runner.ner_attach`` au parent. Test
passe.
- ``test_sprint13`` utilise ``__import__(...)._compute_document_result``
— vérifié : ré-export rend l'introspection transparente. Test passe.
- Tous les imports inter-modules en absolu (cohérent avec
``statistics/`` et le reste du projet).
- 6 symboles privés exposés sans consommateur de test (``_partial_write_lock``,
``_make_*_doc_result``, ``_delete_partial``, ``_sanitize_filename``,
``_build_pipeline_info``) : conservés pour rétrocompat stricte
(étaient attributs publics de ``runner.py``).
Calibration des invariants
--------------------------
- ``BROKEN_PATHS_BASELINE`` : 72 → 73. Un audit historique
(``docs/audits/institutional-readiness-2026-05.md``) référence
``picarones/measurements/runner.py`` qui est devenu un sous-package.
Convention : audit historique intouchable, baseline relevée avec
justification.
- ``FILE_BUDGETS`` : entrée ``runner.py`` retirée (n'existe plus),
ajout ``runner/orchestration.py: 575`` (current 494 + ~15 %).
- ``writing-a-pipeline-module.md`` ligne 353 : ``runner.py`` →
``runner/`` corrigé en place (doc vivante).
Test ``test_partial_file_created_during_run`` corrigé : ``patch.object(
runner_mod, "_save_partial_line")`` ne suffisait plus car
``orchestration.py`` importe directement ``_save_partial_line`` depuis
``partial.py``. Patché sur ``orchestration._save_partial_line`` à la
place.
Suite : 3865 passed, 2 skipped (parité). ruff : All checks passed!
https://claude.ai/code/session_018us43uphCvqwm2TARqyYoH
- README.md +1 -1
- docs/user/writing-a-pipeline-module.md +1 -1
- picarones/measurements/runner/__init__.py +103 -0
- picarones/measurements/runner/aggregation.py +82 -0
- picarones/measurements/runner/document.py +190 -0
- picarones/measurements/runner/ner_attach.py +133 -0
- picarones/measurements/{runner.py → runner/orchestration.py} +33 -558
- picarones/measurements/runner/partial.py +140 -0
- picarones/measurements/runner/workers.py +107 -0
- tests/architecture/test_doc_paths.py +10 -7
- tests/architecture/test_file_budgets.py +5 -1
- tests/integration/test_sprint13_parallelisation_stats.py +11 -3
|
@@ -385,7 +385,7 @@ ruff check picarones/ tests/
|
|
| 385 |
python -m mypy picarones/core/
|
| 386 |
```
|
| 387 |
|
| 388 |
-
**Test suite**: ~
|
| 389 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 390 |
requiring live HTTP.
|
| 391 |
|
|
|
|
| 385 |
python -m mypy picarones/core/
|
| 386 |
```
|
| 387 |
|
| 388 |
+
**Test suite**: ~3871 tests, ~3 min on a modern laptop. Coverage
|
| 389 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 390 |
requiring live HTTP.
|
| 391 |
|
|
@@ -350,7 +350,7 @@ brancher dans la pipeline et de mesurer.
|
|
| 350 |
### 6.b « Et si je veux juste tester une pipeline OCR seule, sans étapes en aval ? »
|
| 351 |
|
| 352 |
C'est exactement ce que fait le runner OCR historique
|
| 353 |
-
(`run_benchmark` dans `picarones/measurements/runner
|
| 354 |
toujours là, n'a pas changé, et reste la voie recommandée pour
|
| 355 |
les benchmarks d'OCR mono-étage.
|
| 356 |
|
|
|
|
| 350 |
### 6.b « Et si je veux juste tester une pipeline OCR seule, sans étapes en aval ? »
|
| 351 |
|
| 352 |
C'est exactement ce que fait le runner OCR historique
|
| 353 |
+
(`run_benchmark` dans `picarones/measurements/runner/`) — il est
|
| 354 |
toujours là, n'a pas changé, et reste la voie recommandée pour
|
| 355 |
les benchmarks d'OCR mono-étage.
|
| 356 |
|
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Orchestrateur du benchmark.
|
| 2 |
+
|
| 3 |
+
Exécute les moteurs OCR/HTR sur le corpus de manière parallèle :
|
| 4 |
+
|
| 5 |
+
- ``ProcessPoolExecutor`` pour les moteurs CPU-bound (Tesseract, Pero OCR,
|
| 6 |
+
Kraken) — les workers picklables vivent dans :mod:`workers`.
|
| 7 |
+
- ``ThreadPoolExecutor`` pour les moteurs IO-bound / API (Mistral, Google,
|
| 8 |
+
Azure, LLMs).
|
| 9 |
+
|
| 10 |
+
Avant le sprint « découpage de runner.py » (mai 2026) ce module était
|
| 11 |
+
un fichier unique de 1019 lignes. Le sous-package éclate la
|
| 12 |
+
responsabilité par concern :
|
| 13 |
+
|
| 14 |
+
- :mod:`document` — calcul d'un :class:`DocumentResult` à partir d'un
|
| 15 |
+
OCR (métriques principales + hooks via ``run_document_hooks(profile)``).
|
| 16 |
+
- :mod:`workers` — fonctions de niveau module pour ``ProcessPoolExecutor``
|
| 17 |
+
(:func:`_cpu_doc_worker`) et ``ThreadPoolExecutor`` (:func:`_io_doc_worker`).
|
| 18 |
+
- :mod:`partial` — persistance NDJSON des résultats partiels pour
|
| 19 |
+
reprise sur interruption.
|
| 20 |
+
- :mod:`orchestration` — :func:`run_benchmark` (boucle principale,
|
| 21 |
+
pools, agrégation par moteur) + :func:`_build_pipeline_info`.
|
| 22 |
+
- :mod:`aggregation` — délégations rétrocompat vers les agrégateurs de
|
| 23 |
+
``builtin_hooks`` (chantier 2 post-Sprint 97).
|
| 24 |
+
- :mod:`ner_attach` — câblage NER au post-process (Sprint 40).
|
| 25 |
+
|
| 26 |
+
Ce ``__init__.py`` ré-exporte toute l'API publique historique pour que
|
| 27 |
+
les ~25 fichiers qui importent depuis ``picarones.measurements.runner``
|
| 28 |
+
continuent à fonctionner sans modification. Les symboles privés
|
| 29 |
+
``_compute_document_result``, ``_load_partial``, ``_partial_path``,
|
| 30 |
+
``_aggregate_*``, ``_calibration_from_engine_result`` sont ré-exportés
|
| 31 |
+
car les tests Sprint 13/40/42 les consomment directement.
|
| 32 |
+
"""
|
| 33 |
+
|
| 34 |
+
from picarones.measurements.runner.aggregation import (
|
| 35 |
+
_aggregate_calibration,
|
| 36 |
+
_aggregate_char_scores,
|
| 37 |
+
_aggregate_confusion,
|
| 38 |
+
_aggregate_hallucination,
|
| 39 |
+
_aggregate_image_quality,
|
| 40 |
+
_aggregate_line_metrics,
|
| 41 |
+
_aggregate_structure,
|
| 42 |
+
_aggregate_taxonomy,
|
| 43 |
+
)
|
| 44 |
+
from picarones.measurements.runner.document import (
|
| 45 |
+
_calibration_from_engine_result,
|
| 46 |
+
_compute_document_result,
|
| 47 |
+
_make_error_doc_result,
|
| 48 |
+
_make_timeout_doc_result,
|
| 49 |
+
)
|
| 50 |
+
from picarones.measurements.runner.ner_attach import (
|
| 51 |
+
_aggregate_ner,
|
| 52 |
+
_attach_ner_metrics,
|
| 53 |
+
)
|
| 54 |
+
from picarones.measurements.runner.orchestration import (
|
| 55 |
+
_build_pipeline_info,
|
| 56 |
+
run_benchmark,
|
| 57 |
+
)
|
| 58 |
+
from picarones.measurements.runner.partial import (
|
| 59 |
+
_delete_partial,
|
| 60 |
+
_load_partial,
|
| 61 |
+
_partial_path,
|
| 62 |
+
_partial_write_lock,
|
| 63 |
+
_sanitize_filename,
|
| 64 |
+
_save_partial_line,
|
| 65 |
+
)
|
| 66 |
+
from picarones.measurements.runner.workers import (
|
| 67 |
+
_cpu_doc_worker,
|
| 68 |
+
_io_doc_worker,
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
__all__ = [
|
| 72 |
+
# API publique principale
|
| 73 |
+
"run_benchmark",
|
| 74 |
+
# Helpers calcul document
|
| 75 |
+
"_compute_document_result",
|
| 76 |
+
"_calibration_from_engine_result",
|
| 77 |
+
"_make_error_doc_result",
|
| 78 |
+
"_make_timeout_doc_result",
|
| 79 |
+
# Workers picklables
|
| 80 |
+
"_cpu_doc_worker",
|
| 81 |
+
"_io_doc_worker",
|
| 82 |
+
# Persistance partial
|
| 83 |
+
"_partial_path",
|
| 84 |
+
"_load_partial",
|
| 85 |
+
"_save_partial_line",
|
| 86 |
+
"_delete_partial",
|
| 87 |
+
"_sanitize_filename",
|
| 88 |
+
"_partial_write_lock",
|
| 89 |
+
# Orchestration helper
|
| 90 |
+
"_build_pipeline_info",
|
| 91 |
+
# Délégations agrégation (rétrocompat tests Sprint 13/42)
|
| 92 |
+
"_aggregate_calibration",
|
| 93 |
+
"_aggregate_char_scores",
|
| 94 |
+
"_aggregate_confusion",
|
| 95 |
+
"_aggregate_hallucination",
|
| 96 |
+
"_aggregate_image_quality",
|
| 97 |
+
"_aggregate_line_metrics",
|
| 98 |
+
"_aggregate_structure",
|
| 99 |
+
"_aggregate_taxonomy",
|
| 100 |
+
# NER (Sprint 40)
|
| 101 |
+
"_aggregate_ner",
|
| 102 |
+
"_attach_ner_metrics",
|
| 103 |
+
]
|
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Délégations rétrocompat vers ``builtin_hooks._aggregate_*``.
|
| 2 |
+
|
| 3 |
+
Chantier 2 (post-Sprint 97) : la logique d'agrégation par-engine de
|
| 4 |
+
toutes les métriques (confusion, taxonomy, structure, image_quality,
|
| 5 |
+
line_metrics, hallucination, calibration, char_scores) vit désormais
|
| 6 |
+
dans :mod:`picarones.measurements.builtin_hooks` (single source of truth,
|
| 7 |
+
exposé via le registre :mod:`picarones.core.metric_hooks`).
|
| 8 |
+
|
| 9 |
+
Les noms ci-dessous restent disponibles depuis
|
| 10 |
+
``picarones.measurements.runner`` pour la rétrocompat des tests
|
| 11 |
+
Sprint 13 / 42 qui les importent directement.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
from typing import Optional
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def _aggregate_confusion(doc_results: list) -> Optional[dict]:
|
| 20 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_confusion`."""
|
| 21 |
+
from picarones.measurements.builtin_hooks import _aggregate_confusion as _impl
|
| 22 |
+
return _impl(doc_results)
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def _aggregate_char_scores(doc_results: list) -> Optional[dict]:
|
| 26 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_char_scores`."""
|
| 27 |
+
from picarones.measurements.builtin_hooks import _aggregate_char_scores as _impl
|
| 28 |
+
return _impl(doc_results)
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
def _aggregate_taxonomy(doc_results: list) -> Optional[dict]:
|
| 32 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_taxonomy`."""
|
| 33 |
+
from picarones.measurements.builtin_hooks import _aggregate_taxonomy as _impl
|
| 34 |
+
return _impl(doc_results)
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def _aggregate_structure(doc_results: list) -> Optional[dict]:
|
| 38 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_structure`."""
|
| 39 |
+
from picarones.measurements.builtin_hooks import _aggregate_structure as _impl
|
| 40 |
+
return _impl(doc_results)
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def _aggregate_image_quality(doc_results: list) -> Optional[dict]:
|
| 44 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_image_quality`."""
|
| 45 |
+
from picarones.measurements.builtin_hooks import _aggregate_image_quality as _impl
|
| 46 |
+
return _impl(doc_results)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def _aggregate_line_metrics(doc_results: list) -> Optional[dict]:
|
| 50 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_line_metrics`."""
|
| 51 |
+
from picarones.measurements.builtin_hooks import _aggregate_line_metrics as _impl
|
| 52 |
+
return _impl(doc_results)
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def _aggregate_hallucination(doc_results: list) -> Optional[dict]:
|
| 56 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_hallucination`."""
|
| 57 |
+
from picarones.measurements.builtin_hooks import _aggregate_hallucination as _impl
|
| 58 |
+
return _impl(doc_results)
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def _aggregate_calibration(doc_results: list) -> Optional[dict]:
|
| 62 |
+
"""Délégation vers :func:`builtin_hooks._aggregate_calibration`.
|
| 63 |
+
|
| 64 |
+
Conservé pour la rétrocompat du test ``test_sprint42_calibration_runner``
|
| 65 |
+
qui importe directement depuis ``picarones.measurements.runner``. La
|
| 66 |
+
logique réelle vit dans :mod:`picarones.measurements.builtin_hooks`
|
| 67 |
+
(chantier 2 post-Sprint 97).
|
| 68 |
+
"""
|
| 69 |
+
from picarones.measurements.builtin_hooks import _aggregate_calibration as _impl
|
| 70 |
+
return _impl(doc_results)
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
__all__ = [
|
| 74 |
+
"_aggregate_calibration",
|
| 75 |
+
"_aggregate_char_scores",
|
| 76 |
+
"_aggregate_confusion",
|
| 77 |
+
"_aggregate_hallucination",
|
| 78 |
+
"_aggregate_image_quality",
|
| 79 |
+
"_aggregate_line_metrics",
|
| 80 |
+
"_aggregate_structure",
|
| 81 |
+
"_aggregate_taxonomy",
|
| 82 |
+
]
|
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction d'un :class:`DocumentResult` à partir d'un OCR.
|
| 2 |
+
|
| 3 |
+
Centralise le calcul de toutes les métriques attachées à un document
|
| 4 |
+
unique : métriques principales (CER/WER/MER/WIL via jiwer), hooks
|
| 5 |
+
optionnels (calibration, taxonomy, philological, etc. — exécutés via
|
| 6 |
+
``run_document_hooks(profile)``), et meta pipeline OCR+LLM.
|
| 7 |
+
|
| 8 |
+
Aussi : helpers pour construire les ``DocumentResult`` synthétiques
|
| 9 |
+
en cas de timeout ou d'erreur d'engine (``_make_timeout_doc_result``,
|
| 10 |
+
``_make_error_doc_result``).
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
from __future__ import annotations
|
| 14 |
+
|
| 15 |
+
from typing import Optional
|
| 16 |
+
|
| 17 |
+
from picarones.core.results import DocumentResult
|
| 18 |
+
from picarones.engines.base import EngineResult
|
| 19 |
+
from picarones.measurements.metrics import MetricsResult, compute_metrics
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def _calibration_from_engine_result(
|
| 23 |
+
ground_truth: str,
|
| 24 |
+
token_confidences: list,
|
| 25 |
+
) -> Optional[dict]:
|
| 26 |
+
"""Délégation vers
|
| 27 |
+
:func:`picarones.measurements.builtin_hooks.calibration_from_engine_result`.
|
| 28 |
+
|
| 29 |
+
Conservé pour la rétrocompat des tests Sprint 42 qui font
|
| 30 |
+
``from picarones.measurements.runner import _calibration_from_engine_result``.
|
| 31 |
+
Toute évolution du calcul doit se faire dans ``builtin_hooks``.
|
| 32 |
+
"""
|
| 33 |
+
from picarones.measurements.builtin_hooks import calibration_from_engine_result
|
| 34 |
+
return calibration_from_engine_result(ground_truth, token_confidences)
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def _compute_document_result(
|
| 38 |
+
doc_id: str,
|
| 39 |
+
image_path: str,
|
| 40 |
+
ground_truth: str,
|
| 41 |
+
ocr_result: EngineResult,
|
| 42 |
+
char_exclude: Optional[frozenset],
|
| 43 |
+
corpus_lang: str = "fr",
|
| 44 |
+
profile: str = "standard",
|
| 45 |
+
) -> DocumentResult:
|
| 46 |
+
"""Calcule toutes les métriques pour un document et retourne un DocumentResult.
|
| 47 |
+
|
| 48 |
+
Utilisable à la fois dans le processus principal (IO-bound) et dans les
|
| 49 |
+
sous-processus créés par ProcessPoolExecutor (CPU-bound).
|
| 50 |
+
Les imports lourds sont différés pour accélérer le démarrage des sous-processus.
|
| 51 |
+
|
| 52 |
+
Chantier 2 (post-Sprint 97) — refonte
|
| 53 |
+
------------------------------------
|
| 54 |
+
Les 11 ``try/except`` codés en dur (Sprints 5+10+39+42+61+86+87) sont
|
| 55 |
+
désormais centralisés dans ``picarones.measurements.builtin_hooks`` et
|
| 56 |
+
sélectionnés via ``run_document_hooks(profile)``. Le profil
|
| 57 |
+
``"standard"`` (défaut) reproduit strictement le comportement
|
| 58 |
+
pré-chantier-2. Les profils ``"minimal"``, ``"philological"``,
|
| 59 |
+
``"diagnostics"``, ``"economics"``, ``"pipeline"``, ``"full"``
|
| 60 |
+
permettent à l'utilisateur de moduler le coût de calcul.
|
| 61 |
+
"""
|
| 62 |
+
import logging as _logging
|
| 63 |
+
_logger = _logging.getLogger(__name__)
|
| 64 |
+
|
| 65 |
+
# Eager-load des hooks natifs pour peupler le registre dans les
|
| 66 |
+
# sous-processus du pool (le top-level ``import`` du runner ne le fait
|
| 67 |
+
# pas pour ne pas pénaliser le démarrage des moteurs minimaux).
|
| 68 |
+
import picarones.measurements.builtin_hooks # noqa: F401
|
| 69 |
+
from picarones.core.metric_hooks import run_document_hooks
|
| 70 |
+
|
| 71 |
+
if ocr_result.success:
|
| 72 |
+
metrics = compute_metrics(ground_truth, ocr_result.text, char_exclude=char_exclude)
|
| 73 |
+
else:
|
| 74 |
+
metrics = MetricsResult(
|
| 75 |
+
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 76 |
+
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 77 |
+
reference_length=len(ground_truth),
|
| 78 |
+
hypothesis_length=0,
|
| 79 |
+
error=ocr_result.error,
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
ocr_intermediate = ocr_result.metadata.get("ocr_intermediate")
|
| 83 |
+
pipeline_meta: dict = {}
|
| 84 |
+
|
| 85 |
+
if ocr_result.metadata.get("is_pipeline"):
|
| 86 |
+
pipeline_meta = {
|
| 87 |
+
"pipeline_mode": ocr_result.metadata.get("pipeline_mode"),
|
| 88 |
+
"prompt_file": ocr_result.metadata.get("prompt_file"),
|
| 89 |
+
"llm_model": ocr_result.metadata.get("llm_model"),
|
| 90 |
+
"llm_provider": ocr_result.metadata.get("llm_provider"),
|
| 91 |
+
}
|
| 92 |
+
if ocr_intermediate is not None and ocr_result.success:
|
| 93 |
+
try:
|
| 94 |
+
from picarones.pipelines.over_normalization import detect_over_normalization
|
| 95 |
+
over_norm = detect_over_normalization(
|
| 96 |
+
ground_truth=ground_truth,
|
| 97 |
+
ocr_text=ocr_intermediate,
|
| 98 |
+
llm_text=ocr_result.text,
|
| 99 |
+
)
|
| 100 |
+
pipeline_meta["over_normalization"] = over_norm.as_dict()
|
| 101 |
+
except Exception as e:
|
| 102 |
+
_logger.warning("[over_normalization] fonctionnalité dégradée : %s", e)
|
| 103 |
+
|
| 104 |
+
# Hooks document-level — chaque hook produit un attribut nommé du
|
| 105 |
+
# ``DocumentResult``. Les hooks invalides pour ce contexte (échec
|
| 106 |
+
# OCR pour les hooks ``requires_success``, absence de
|
| 107 |
+
# ``token_confidences`` pour ``calibration``) sont sautés
|
| 108 |
+
# silencieusement. Les exceptions levées par un hook sont
|
| 109 |
+
# capturées et loggées en warning par ``run_document_hooks``.
|
| 110 |
+
extras = run_document_hooks(
|
| 111 |
+
profile,
|
| 112 |
+
ground_truth=ground_truth,
|
| 113 |
+
hypothesis=ocr_result.text,
|
| 114 |
+
image_path=image_path,
|
| 115 |
+
corpus_lang=corpus_lang,
|
| 116 |
+
ocr_result=ocr_result,
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
return DocumentResult(
|
| 120 |
+
doc_id=doc_id,
|
| 121 |
+
image_path=image_path,
|
| 122 |
+
ground_truth=ground_truth,
|
| 123 |
+
hypothesis=ocr_result.text,
|
| 124 |
+
metrics=metrics,
|
| 125 |
+
duration_seconds=ocr_result.duration_seconds,
|
| 126 |
+
engine_error=ocr_result.error,
|
| 127 |
+
ocr_intermediate=ocr_intermediate,
|
| 128 |
+
pipeline_metadata=pipeline_meta,
|
| 129 |
+
confusion_matrix=extras.get("confusion_matrix"),
|
| 130 |
+
char_scores=extras.get("char_scores"),
|
| 131 |
+
taxonomy=extras.get("taxonomy"),
|
| 132 |
+
structure=extras.get("structure"),
|
| 133 |
+
image_quality=extras.get("image_quality"),
|
| 134 |
+
line_metrics=extras.get("line_metrics"),
|
| 135 |
+
hallucination_metrics=extras.get("hallucination_metrics"),
|
| 136 |
+
calibration_metrics=extras.get("calibration_metrics"),
|
| 137 |
+
philological_metrics=extras.get("philological_metrics"),
|
| 138 |
+
searchability_metrics=extras.get("searchability_metrics"),
|
| 139 |
+
numerical_sequence_metrics=extras.get("numerical_sequence_metrics"),
|
| 140 |
+
readability_metrics=extras.get("readability_metrics"),
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def _make_timeout_doc_result(doc: object, timeout_seconds: float) -> DocumentResult:
|
| 145 |
+
"""DocumentResult synthétique pour un document ayant dépassé le timeout."""
|
| 146 |
+
err = f"timeout ({timeout_seconds:.0f}s)"
|
| 147 |
+
metrics = MetricsResult(
|
| 148 |
+
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 149 |
+
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 150 |
+
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 151 |
+
hypothesis_length=0,
|
| 152 |
+
error=err,
|
| 153 |
+
)
|
| 154 |
+
return DocumentResult(
|
| 155 |
+
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 156 |
+
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 157 |
+
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 158 |
+
hypothesis="",
|
| 159 |
+
metrics=metrics,
|
| 160 |
+
duration_seconds=timeout_seconds,
|
| 161 |
+
engine_error=err,
|
| 162 |
+
)
|
| 163 |
+
|
| 164 |
+
|
| 165 |
+
def _make_error_doc_result(doc: object, error_msg: str) -> DocumentResult:
|
| 166 |
+
"""DocumentResult synthétique pour une erreur lors d'un appel engine."""
|
| 167 |
+
metrics = MetricsResult(
|
| 168 |
+
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 169 |
+
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 170 |
+
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 171 |
+
hypothesis_length=0,
|
| 172 |
+
error=error_msg,
|
| 173 |
+
)
|
| 174 |
+
return DocumentResult(
|
| 175 |
+
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 176 |
+
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 177 |
+
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 178 |
+
hypothesis="",
|
| 179 |
+
metrics=metrics,
|
| 180 |
+
duration_seconds=0.0,
|
| 181 |
+
engine_error=error_msg,
|
| 182 |
+
)
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
__all__ = [
|
| 186 |
+
"_calibration_from_engine_result",
|
| 187 |
+
"_compute_document_result",
|
| 188 |
+
"_make_error_doc_result",
|
| 189 |
+
"_make_timeout_doc_result",
|
| 190 |
+
]
|
|
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Câblage NER au post-process du benchmark (Sprint 40).
|
| 2 |
+
|
| 3 |
+
Le runner appelle :func:`_attach_ner_metrics` après que tous les
|
| 4 |
+
documents ont été calculés, pour les moteurs où la GT possède un
|
| 5 |
+
niveau ``ENTITIES`` (Sprint 32 — multi-level GT).
|
| 6 |
+
|
| 7 |
+
L'extracteur NER est typiquement un wrapper :class:`SpacyEntityExtractor`
|
| 8 |
+
construit via :func:`picarones.measurements.ner_backends.get_extractor`.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
|
| 13 |
+
import logging
|
| 14 |
+
|
| 15 |
+
from picarones.core.corpus import Corpus
|
| 16 |
+
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def _attach_ner_metrics(
|
| 21 |
+
corpus: Corpus,
|
| 22 |
+
doc_results: list,
|
| 23 |
+
entity_extractor: callable,
|
| 24 |
+
) -> None:
|
| 25 |
+
"""Calcule et attache ``DocumentResult.ner_metrics`` pour chaque doc
|
| 26 |
+
dont la GT possède un niveau ``ENTITIES`` (Sprint 32).
|
| 27 |
+
|
| 28 |
+
L'extracteur est appelé sur l'hypothèse OCR ``dr.hypothesis``.
|
| 29 |
+
Les erreurs sont dégradées en warnings (pas de propagation) afin
|
| 30 |
+
de ne pas casser le benchmark si un document spécifique fait
|
| 31 |
+
crasher le NER.
|
| 32 |
+
"""
|
| 33 |
+
try:
|
| 34 |
+
from picarones.core.corpus import GTLevel
|
| 35 |
+
from picarones.measurements.ner import compute_ner_metrics
|
| 36 |
+
except ImportError as exc:
|
| 37 |
+
logger.warning("[ner.attach] imports indisponibles : %s", exc)
|
| 38 |
+
return
|
| 39 |
+
|
| 40 |
+
docs_by_id = {d.doc_id: d for d in corpus.documents}
|
| 41 |
+
n_done = 0
|
| 42 |
+
for dr in doc_results:
|
| 43 |
+
if dr.engine_error is not None or not dr.hypothesis:
|
| 44 |
+
continue
|
| 45 |
+
doc = docs_by_id.get(dr.doc_id)
|
| 46 |
+
if doc is None or not doc.has_gt(GTLevel.ENTITIES):
|
| 47 |
+
continue
|
| 48 |
+
try:
|
| 49 |
+
gt_payload = doc.get_gt(GTLevel.ENTITIES)
|
| 50 |
+
gt_entities = list(gt_payload.entities) if gt_payload else []
|
| 51 |
+
hyp_entities = entity_extractor(dr.hypothesis) or []
|
| 52 |
+
dr.ner_metrics = compute_ner_metrics(gt_entities, hyp_entities)
|
| 53 |
+
n_done += 1
|
| 54 |
+
except Exception as exc: # noqa: BLE001
|
| 55 |
+
logger.warning(
|
| 56 |
+
"[ner.attach] %s : extraction/comparaison NER dégradée : %s",
|
| 57 |
+
dr.doc_id, exc,
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
if n_done > 0:
|
| 61 |
+
logger.info("[ner] %d documents évalués pour NER.", n_done)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def _aggregate_ner(doc_results: list) -> "dict | None":
|
| 65 |
+
"""Agrège les métriques NER au niveau du moteur.
|
| 66 |
+
|
| 67 |
+
Recalcule precision/recall/F1 *micro* à partir des sommes globales
|
| 68 |
+
de TP/FP/FN, plus le détail par catégorie, plus les compteurs
|
| 69 |
+
totaux d'hallucinations et d'entités manquées.
|
| 70 |
+
"""
|
| 71 |
+
relevant = [dr for dr in doc_results if dr.ner_metrics is not None]
|
| 72 |
+
if not relevant:
|
| 73 |
+
return None
|
| 74 |
+
|
| 75 |
+
total_tp = 0
|
| 76 |
+
total_fp = 0
|
| 77 |
+
total_fn = 0
|
| 78 |
+
cat_tp: dict[str, int] = {}
|
| 79 |
+
cat_fp: dict[str, int] = {}
|
| 80 |
+
cat_fn: dict[str, int] = {}
|
| 81 |
+
total_hallucinated = 0
|
| 82 |
+
total_missed = 0
|
| 83 |
+
iou_threshold = 0.5
|
| 84 |
+
|
| 85 |
+
for dr in relevant:
|
| 86 |
+
m = dr.ner_metrics
|
| 87 |
+
total_tp += int(m.get("true_positives", 0))
|
| 88 |
+
total_fp += int(m.get("false_positives", 0))
|
| 89 |
+
total_fn += int(m.get("false_negatives", 0))
|
| 90 |
+
total_hallucinated += len(m.get("hallucinated_entities", []) or [])
|
| 91 |
+
total_missed += len(m.get("missed_entities", []) or [])
|
| 92 |
+
iou_threshold = float(m.get("iou_threshold", iou_threshold))
|
| 93 |
+
for cat, stats in (m.get("per_category") or {}).items():
|
| 94 |
+
cat_tp[cat] = cat_tp.get(cat, 0)
|
| 95 |
+
cat_fp[cat] = cat_fp.get(cat, 0)
|
| 96 |
+
cat_fn[cat] = cat_fn.get(cat, 0)
|
| 97 |
+
# Reconstitue les sommes par catégorie via support et P/R
|
| 98 |
+
support = int(stats.get("support", 0))
|
| 99 |
+
recall = float(stats.get("recall", 0.0))
|
| 100 |
+
precision = float(stats.get("precision", 0.0))
|
| 101 |
+
tp_cat = round(support * recall) if support > 0 else 0
|
| 102 |
+
fn_cat = max(0, support - tp_cat)
|
| 103 |
+
fp_cat = (
|
| 104 |
+
round(tp_cat * (1 - precision) / precision)
|
| 105 |
+
if precision > 0 else 0
|
| 106 |
+
)
|
| 107 |
+
cat_tp[cat] += tp_cat
|
| 108 |
+
cat_fp[cat] += fp_cat
|
| 109 |
+
cat_fn[cat] += fn_cat
|
| 110 |
+
|
| 111 |
+
def _prf(tp: int, fp: int, fn: int) -> dict[str, float]:
|
| 112 |
+
p = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 113 |
+
r = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 114 |
+
f1 = 2 * p * r / (p + r) if (p + r) > 0 else 0.0
|
| 115 |
+
return {"precision": p, "recall": r, "f1": f1, "support": tp + fn}
|
| 116 |
+
|
| 117 |
+
return {
|
| 118 |
+
"global": _prf(total_tp, total_fp, total_fn),
|
| 119 |
+
"per_category": {
|
| 120 |
+
cat: _prf(cat_tp[cat], cat_fp[cat], cat_fn[cat])
|
| 121 |
+
for cat in sorted(set(cat_tp) | set(cat_fp) | set(cat_fn))
|
| 122 |
+
},
|
| 123 |
+
"true_positives": total_tp,
|
| 124 |
+
"false_positives": total_fp,
|
| 125 |
+
"false_negatives": total_fn,
|
| 126 |
+
"hallucinated_total": total_hallucinated,
|
| 127 |
+
"missed_total": total_missed,
|
| 128 |
+
"doc_count": len(relevant),
|
| 129 |
+
"iou_threshold": iou_threshold,
|
| 130 |
+
}
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
__all__ = ["_aggregate_ner", "_attach_ner_metrics"]
|
|
@@ -1,21 +1,25 @@
|
|
| 1 |
-
"""Orchestrateur du benchmark.
|
| 2 |
|
| 3 |
-
|
| 4 |
-
- ``ProcessPoolExecutor`` pour les moteurs CPU-bound (Tesseract, Pero OCR, Kraken)
|
| 5 |
-
- ``ThreadPoolExecutor`` pour les moteurs IO-bound / API (Mistral, Google, Azure, LLMs)
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
"""
|
| 11 |
|
| 12 |
from __future__ import annotations
|
| 13 |
|
| 14 |
import concurrent.futures
|
| 15 |
-
import json
|
| 16 |
import logging
|
| 17 |
-
import re
|
| 18 |
-
import tempfile
|
| 19 |
import threading
|
| 20 |
import time
|
| 21 |
from pathlib import Path
|
|
@@ -24,379 +28,28 @@ from typing import Optional
|
|
| 24 |
from tqdm import tqdm
|
| 25 |
|
| 26 |
from picarones.core.corpus import Corpus
|
| 27 |
-
from picarones.measurements.metrics import MetricsResult, compute_metrics
|
| 28 |
from picarones.core.results import BenchmarkResult, DocumentResult, EngineReport
|
| 29 |
-
from picarones.engines.base import BaseOCREngine
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
logger = logging.getLogger(__name__)
|
| 32 |
|
| 33 |
-
# Lock pour la sérialisation des écritures de résultats partiels
|
| 34 |
-
_partial_write_lock = threading.Lock()
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
# ---------------------------------------------------------------------------
|
| 38 |
-
# Workers de niveau module (requis pour ProcessPoolExecutor — picklables)
|
| 39 |
-
# ---------------------------------------------------------------------------
|
| 40 |
-
|
| 41 |
-
def _cpu_doc_worker(args: tuple) -> "DocumentResult":
|
| 42 |
-
"""Worker pour ProcessPoolExecutor (moteurs CPU-bound).
|
| 43 |
-
|
| 44 |
-
Instancie le moteur dans le sous-processus, exécute l'OCR et calcule
|
| 45 |
-
toutes les métriques. Doit être une fonction de niveau module pour être
|
| 46 |
-
sérialisable par ``pickle``.
|
| 47 |
-
|
| 48 |
-
Le tuple ``args`` peut contenir, par compatibilité ascendante :
|
| 49 |
-
- 7 éléments : legacy (Sprint 13)
|
| 50 |
-
- 8 éléments : + ``corpus_lang`` (Sprint 87)
|
| 51 |
-
- 9 éléments : + ``profile`` (chantier 2 post-Sprint 97)
|
| 52 |
-
"""
|
| 53 |
-
if len(args) == 9:
|
| 54 |
-
(engine_module, engine_class_name, engine_config, doc_id,
|
| 55 |
-
image_path, ground_truth, char_exclude_chars, corpus_lang,
|
| 56 |
-
profile) = args
|
| 57 |
-
elif len(args) == 8:
|
| 58 |
-
(engine_module, engine_class_name, engine_config, doc_id,
|
| 59 |
-
image_path, ground_truth, char_exclude_chars, corpus_lang) = args
|
| 60 |
-
profile = "standard"
|
| 61 |
-
else:
|
| 62 |
-
(engine_module, engine_class_name, engine_config, doc_id,
|
| 63 |
-
image_path, ground_truth, char_exclude_chars) = args
|
| 64 |
-
corpus_lang = "fr"
|
| 65 |
-
profile = "standard"
|
| 66 |
-
import importlib
|
| 67 |
-
mod = importlib.import_module(engine_module)
|
| 68 |
-
engine_cls = getattr(mod, engine_class_name)
|
| 69 |
-
engine = engine_cls(config=engine_config)
|
| 70 |
-
ocr_result = engine.run(image_path)
|
| 71 |
-
char_exclude = frozenset(char_exclude_chars) if char_exclude_chars else None
|
| 72 |
-
return _compute_document_result(
|
| 73 |
-
doc_id=doc_id,
|
| 74 |
-
image_path=image_path,
|
| 75 |
-
ground_truth=ground_truth,
|
| 76 |
-
ocr_result=ocr_result,
|
| 77 |
-
char_exclude=char_exclude,
|
| 78 |
-
corpus_lang=corpus_lang,
|
| 79 |
-
profile=profile,
|
| 80 |
-
)
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
def _io_doc_worker(
|
| 84 |
-
engine: BaseOCREngine,
|
| 85 |
-
doc: object,
|
| 86 |
-
char_exclude: Optional[frozenset],
|
| 87 |
-
corpus_lang: str = "fr",
|
| 88 |
-
profile: str = "standard",
|
| 89 |
-
) -> "DocumentResult":
|
| 90 |
-
"""Worker pour ThreadPoolExecutor (moteurs IO-bound / API).
|
| 91 |
-
|
| 92 |
-
Exécute l'OCR et calcule les métriques dans un thread. L'instance du
|
| 93 |
-
moteur est partagée entre les threads — les adaptateurs HTTP sont
|
| 94 |
-
généralement sans état mutable entre les appels.
|
| 95 |
-
|
| 96 |
-
Si le document possède un texte OCR pré-calculé (corpus triplet) et que
|
| 97 |
-
le moteur est un pipeline OCR+LLM, utilise ``run_with_ocr_text()`` pour
|
| 98 |
-
court-circuiter l'étape OCR et tester directement la post-correction LLM.
|
| 99 |
-
"""
|
| 100 |
-
doc_ocr_text = getattr(doc, "ocr_text", None)
|
| 101 |
-
if doc_ocr_text is not None:
|
| 102 |
-
# Corpus triplet — vérifier si le moteur supporte run_with_ocr_text
|
| 103 |
-
run_with = getattr(engine, "run_with_ocr_text", None)
|
| 104 |
-
if run_with is not None:
|
| 105 |
-
ocr_result = run_with(doc.image_path, doc_ocr_text) # type: ignore[attr-defined]
|
| 106 |
-
else:
|
| 107 |
-
# Moteur OCR classique — ignorer le texte OCR pré-calculé
|
| 108 |
-
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 109 |
-
else:
|
| 110 |
-
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 111 |
-
|
| 112 |
-
return _compute_document_result(
|
| 113 |
-
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 114 |
-
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 115 |
-
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 116 |
-
ocr_result=ocr_result,
|
| 117 |
-
char_exclude=char_exclude,
|
| 118 |
-
corpus_lang=corpus_lang,
|
| 119 |
-
profile=profile,
|
| 120 |
-
)
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
# ---------------------------------------------------------------------------
|
| 124 |
-
# Calcul documentaire centralisé
|
| 125 |
-
# ---------------------------------------------------------------------------
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
# Chantier 2 (post-Sprint 97) — la logique du helper calibration vit
|
| 129 |
-
# désormais dans :mod:`picarones.measurements.builtin_hooks`. Ce nom reste exposé
|
| 130 |
-
# ici pour la rétrocompat des tests Sprint 42 qui font
|
| 131 |
-
# ``from picarones.measurements.runner import _calibration_from_engine_result``.
|
| 132 |
-
def _calibration_from_engine_result(
|
| 133 |
-
ground_truth: str,
|
| 134 |
-
token_confidences: list,
|
| 135 |
-
) -> Optional[dict]:
|
| 136 |
-
"""Délégation vers :func:`picarones.measurements.builtin_hooks.calibration_from_engine_result`.
|
| 137 |
-
|
| 138 |
-
Conservé pour la rétrocompat des tests existants ; toute évolution
|
| 139 |
-
du calcul doit se faire dans ``builtin_hooks``.
|
| 140 |
-
"""
|
| 141 |
-
from picarones.measurements.builtin_hooks import calibration_from_engine_result
|
| 142 |
-
return calibration_from_engine_result(ground_truth, token_confidences)
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
def _compute_document_result(
|
| 148 |
-
doc_id: str,
|
| 149 |
-
image_path: str,
|
| 150 |
-
ground_truth: str,
|
| 151 |
-
ocr_result: EngineResult,
|
| 152 |
-
char_exclude: Optional[frozenset],
|
| 153 |
-
corpus_lang: str = "fr",
|
| 154 |
-
profile: str = "standard",
|
| 155 |
-
) -> DocumentResult:
|
| 156 |
-
"""Calcule toutes les métriques pour un document et retourne un DocumentResult.
|
| 157 |
-
|
| 158 |
-
Utilisable à la fois dans le processus principal (IO-bound) et dans les
|
| 159 |
-
sous-processus créés par ProcessPoolExecutor (CPU-bound).
|
| 160 |
-
Les imports lourds sont différés pour accélérer le démarrage des sous-processus.
|
| 161 |
-
|
| 162 |
-
Chantier 2 (post-Sprint 97) — refonte
|
| 163 |
-
------------------------------------
|
| 164 |
-
Les 11 ``try/except`` codés en dur (Sprints 5+10+39+42+61+86+87) sont
|
| 165 |
-
désormais centralisés dans ``picarones.measurements.builtin_hooks`` et
|
| 166 |
-
sélectionnés via ``run_document_hooks(profile)``. Le profil
|
| 167 |
-
``"standard"`` (défaut) reproduit strictement le comportement
|
| 168 |
-
pré-chantier-2. Les profils ``"minimal"``, ``"philological"``,
|
| 169 |
-
``"diagnostics"``, ``"economics"``, ``"pipeline"``, ``"full"``
|
| 170 |
-
permettent à l'utilisateur de moduler le coût de calcul.
|
| 171 |
-
"""
|
| 172 |
-
import logging as _logging
|
| 173 |
-
_logger = _logging.getLogger(__name__)
|
| 174 |
-
|
| 175 |
-
# Eager-load des hooks natifs pour peupler le registre dans les
|
| 176 |
-
# sous-processus du pool (le top-level ``import`` du runner ne le fait
|
| 177 |
-
# pas pour ne pas pénaliser le démarrage des moteurs minimaux).
|
| 178 |
-
import picarones.measurements.builtin_hooks # noqa: F401
|
| 179 |
-
from picarones.core.metric_hooks import run_document_hooks
|
| 180 |
-
|
| 181 |
-
if ocr_result.success:
|
| 182 |
-
metrics = compute_metrics(ground_truth, ocr_result.text, char_exclude=char_exclude)
|
| 183 |
-
else:
|
| 184 |
-
metrics = MetricsResult(
|
| 185 |
-
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 186 |
-
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 187 |
-
reference_length=len(ground_truth),
|
| 188 |
-
hypothesis_length=0,
|
| 189 |
-
error=ocr_result.error,
|
| 190 |
-
)
|
| 191 |
-
|
| 192 |
-
ocr_intermediate = ocr_result.metadata.get("ocr_intermediate")
|
| 193 |
-
pipeline_meta: dict = {}
|
| 194 |
-
|
| 195 |
-
if ocr_result.metadata.get("is_pipeline"):
|
| 196 |
-
pipeline_meta = {
|
| 197 |
-
"pipeline_mode": ocr_result.metadata.get("pipeline_mode"),
|
| 198 |
-
"prompt_file": ocr_result.metadata.get("prompt_file"),
|
| 199 |
-
"llm_model": ocr_result.metadata.get("llm_model"),
|
| 200 |
-
"llm_provider": ocr_result.metadata.get("llm_provider"),
|
| 201 |
-
}
|
| 202 |
-
if ocr_intermediate is not None and ocr_result.success:
|
| 203 |
-
try:
|
| 204 |
-
from picarones.pipelines.over_normalization import detect_over_normalization
|
| 205 |
-
over_norm = detect_over_normalization(
|
| 206 |
-
ground_truth=ground_truth,
|
| 207 |
-
ocr_text=ocr_intermediate,
|
| 208 |
-
llm_text=ocr_result.text,
|
| 209 |
-
)
|
| 210 |
-
pipeline_meta["over_normalization"] = over_norm.as_dict()
|
| 211 |
-
except Exception as e:
|
| 212 |
-
_logger.warning("[over_normalization] fonctionnalité dégradée : %s", e)
|
| 213 |
-
|
| 214 |
-
# Hooks document-level — chaque hook produit un attribut nommé du
|
| 215 |
-
# ``DocumentResult``. Les hooks invalides pour ce contexte (échec
|
| 216 |
-
# OCR pour les hooks ``requires_success``, absence de
|
| 217 |
-
# ``token_confidences`` pour ``calibration``) sont sautés
|
| 218 |
-
# silencieusement. Les exceptions levées par un hook sont
|
| 219 |
-
# capturées et loggées en warning par ``run_document_hooks``.
|
| 220 |
-
extras = run_document_hooks(
|
| 221 |
-
profile,
|
| 222 |
-
ground_truth=ground_truth,
|
| 223 |
-
hypothesis=ocr_result.text,
|
| 224 |
-
image_path=image_path,
|
| 225 |
-
corpus_lang=corpus_lang,
|
| 226 |
-
ocr_result=ocr_result,
|
| 227 |
-
)
|
| 228 |
-
|
| 229 |
-
return DocumentResult(
|
| 230 |
-
doc_id=doc_id,
|
| 231 |
-
image_path=image_path,
|
| 232 |
-
ground_truth=ground_truth,
|
| 233 |
-
hypothesis=ocr_result.text,
|
| 234 |
-
metrics=metrics,
|
| 235 |
-
duration_seconds=ocr_result.duration_seconds,
|
| 236 |
-
engine_error=ocr_result.error,
|
| 237 |
-
ocr_intermediate=ocr_intermediate,
|
| 238 |
-
pipeline_metadata=pipeline_meta,
|
| 239 |
-
confusion_matrix=extras.get("confusion_matrix"),
|
| 240 |
-
char_scores=extras.get("char_scores"),
|
| 241 |
-
taxonomy=extras.get("taxonomy"),
|
| 242 |
-
structure=extras.get("structure"),
|
| 243 |
-
image_quality=extras.get("image_quality"),
|
| 244 |
-
line_metrics=extras.get("line_metrics"),
|
| 245 |
-
hallucination_metrics=extras.get("hallucination_metrics"),
|
| 246 |
-
calibration_metrics=extras.get("calibration_metrics"),
|
| 247 |
-
philological_metrics=extras.get("philological_metrics"),
|
| 248 |
-
searchability_metrics=extras.get("searchability_metrics"),
|
| 249 |
-
numerical_sequence_metrics=extras.get("numerical_sequence_metrics"),
|
| 250 |
-
readability_metrics=extras.get("readability_metrics"),
|
| 251 |
-
)
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
def _make_timeout_doc_result(doc: object, timeout_seconds: float) -> DocumentResult:
|
| 255 |
-
"""DocumentResult synthétique pour un document ayant dépassé le timeout."""
|
| 256 |
-
err = f"timeout ({timeout_seconds:.0f}s)"
|
| 257 |
-
metrics = MetricsResult(
|
| 258 |
-
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 259 |
-
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 260 |
-
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 261 |
-
hypothesis_length=0,
|
| 262 |
-
error=err,
|
| 263 |
-
)
|
| 264 |
-
return DocumentResult(
|
| 265 |
-
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 266 |
-
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 267 |
-
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 268 |
-
hypothesis="",
|
| 269 |
-
metrics=metrics,
|
| 270 |
-
duration_seconds=timeout_seconds,
|
| 271 |
-
engine_error=err,
|
| 272 |
-
)
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
def _make_error_doc_result(doc: object, error_msg: str) -> DocumentResult:
|
| 276 |
-
"""DocumentResult synthétique pour un document en erreur inattendue."""
|
| 277 |
-
metrics = MetricsResult(
|
| 278 |
-
cer=1.0, cer_nfc=1.0, cer_caseless=1.0,
|
| 279 |
-
wer=1.0, wer_normalized=1.0, mer=1.0, wil=1.0,
|
| 280 |
-
reference_length=len(doc.ground_truth), # type: ignore[attr-defined]
|
| 281 |
-
hypothesis_length=0,
|
| 282 |
-
error=error_msg,
|
| 283 |
-
)
|
| 284 |
-
return DocumentResult(
|
| 285 |
-
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 286 |
-
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 287 |
-
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 288 |
-
hypothesis="",
|
| 289 |
-
metrics=metrics,
|
| 290 |
-
duration_seconds=0.0,
|
| 291 |
-
engine_error=error_msg,
|
| 292 |
-
)
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
# ---------------------------------------------------------------------------
|
| 296 |
-
# Résultats partiels (sauvegarde / reprise)
|
| 297 |
-
# ---------------------------------------------------------------------------
|
| 298 |
-
|
| 299 |
-
def _sanitize_filename(s: str) -> str:
|
| 300 |
-
return re.sub(r"[^\w\-]", "_", s)[:64]
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
def _partial_path(
|
| 304 |
-
corpus_name: str,
|
| 305 |
-
engine_name: str,
|
| 306 |
-
partial_dir: Optional[str | Path],
|
| 307 |
-
) -> Path:
|
| 308 |
-
base = Path(partial_dir) if partial_dir else Path(tempfile.gettempdir())
|
| 309 |
-
name = (
|
| 310 |
-
f"picarones_{_sanitize_filename(corpus_name)}"
|
| 311 |
-
f"_{_sanitize_filename(engine_name)}.partial.json"
|
| 312 |
-
)
|
| 313 |
-
return base / name
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
def _load_partial(
|
| 317 |
-
corpus_name: str,
|
| 318 |
-
engine_name: str,
|
| 319 |
-
partial_dir: Optional[str | Path],
|
| 320 |
-
) -> tuple[Path, list[DocumentResult]]:
|
| 321 |
-
"""Charge les résultats partiels d'une exécution précédente interrompue.
|
| 322 |
-
|
| 323 |
-
Returns
|
| 324 |
-
-------
|
| 325 |
-
(path, results) — chemin du fichier partiel et liste des DocumentResult déjà calculés.
|
| 326 |
-
"""
|
| 327 |
-
path = _partial_path(corpus_name, engine_name, partial_dir)
|
| 328 |
-
results: list[DocumentResult] = []
|
| 329 |
-
if not path.exists():
|
| 330 |
-
return path, results
|
| 331 |
-
|
| 332 |
-
try:
|
| 333 |
-
with path.open("r", encoding="utf-8") as fh:
|
| 334 |
-
for line in fh:
|
| 335 |
-
line = line.strip()
|
| 336 |
-
if not line:
|
| 337 |
-
continue
|
| 338 |
-
d = json.loads(line)
|
| 339 |
-
m = d.get("metrics", {})
|
| 340 |
-
metrics = MetricsResult(
|
| 341 |
-
cer=m.get("cer", 1.0),
|
| 342 |
-
cer_nfc=m.get("cer_nfc", 1.0),
|
| 343 |
-
cer_caseless=m.get("cer_caseless", 1.0),
|
| 344 |
-
wer=m.get("wer", 1.0),
|
| 345 |
-
wer_normalized=m.get("wer_normalized", 1.0),
|
| 346 |
-
mer=m.get("mer", 1.0),
|
| 347 |
-
wil=m.get("wil", 1.0),
|
| 348 |
-
reference_length=m.get("reference_length", 0),
|
| 349 |
-
hypothesis_length=m.get("hypothesis_length", 0),
|
| 350 |
-
error=m.get("error"),
|
| 351 |
-
)
|
| 352 |
-
results.append(DocumentResult(
|
| 353 |
-
doc_id=d["doc_id"],
|
| 354 |
-
image_path=d.get("image_path", ""),
|
| 355 |
-
ground_truth=d.get("ground_truth", ""),
|
| 356 |
-
hypothesis=d.get("hypothesis", ""),
|
| 357 |
-
metrics=metrics,
|
| 358 |
-
duration_seconds=d.get("duration_seconds", 0.0),
|
| 359 |
-
engine_error=d.get("engine_error"),
|
| 360 |
-
ocr_intermediate=d.get("ocr_intermediate"),
|
| 361 |
-
pipeline_metadata=d.get("pipeline_metadata", {}),
|
| 362 |
-
confusion_matrix=d.get("confusion_matrix"),
|
| 363 |
-
char_scores=d.get("char_scores"),
|
| 364 |
-
taxonomy=d.get("taxonomy"),
|
| 365 |
-
structure=d.get("structure"),
|
| 366 |
-
image_quality=d.get("image_quality"),
|
| 367 |
-
line_metrics=d.get("line_metrics"),
|
| 368 |
-
hallucination_metrics=d.get("hallucination_metrics"),
|
| 369 |
-
))
|
| 370 |
-
except Exception as e:
|
| 371 |
-
logger.warning("Impossible de charger les résultats partiels '%s' : %s", path, e)
|
| 372 |
-
results = []
|
| 373 |
-
|
| 374 |
-
return path, results
|
| 375 |
-
|
| 376 |
-
|
| 377 |
-
def _save_partial_line(partial_path: Path, doc_result: DocumentResult) -> None:
|
| 378 |
-
"""Ajoute une entrée NDJSON au fichier de résultats partiels (thread-safe)."""
|
| 379 |
-
try:
|
| 380 |
-
line = json.dumps(doc_result.as_dict(), ensure_ascii=False) + "\n"
|
| 381 |
-
with _partial_write_lock:
|
| 382 |
-
with partial_path.open("a", encoding="utf-8") as fh:
|
| 383 |
-
fh.write(line)
|
| 384 |
-
except Exception as e:
|
| 385 |
-
logger.warning("Impossible d'écrire dans le fichier partiel '%s' : %s", partial_path, e)
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
def _delete_partial(partial_path: Path) -> None:
|
| 389 |
-
"""Supprime le fichier de résultats partiels à la fin d'un moteur."""
|
| 390 |
-
try:
|
| 391 |
-
if partial_path.exists():
|
| 392 |
-
partial_path.unlink()
|
| 393 |
-
except Exception as e:
|
| 394 |
-
logger.warning("Impossible de supprimer le fichier partiel '%s' : %s", partial_path, e)
|
| 395 |
-
|
| 396 |
-
|
| 397 |
-
# ---------------------------------------------------------------------------
|
| 398 |
-
# Benchmark principal
|
| 399 |
-
# ---------------------------------------------------------------------------
|
| 400 |
|
| 401 |
def run_benchmark(
|
| 402 |
corpus: Corpus,
|
|
@@ -838,182 +491,4 @@ def _build_pipeline_info(engine: BaseOCREngine, doc_results: list[DocumentResult
|
|
| 838 |
return info
|
| 839 |
|
| 840 |
|
| 841 |
-
|
| 842 |
-
# Helpers d'agrégation — délégations rétrocompat
|
| 843 |
-
# ---------------------------------------------------------------------------
|
| 844 |
-
# Chantier 2 (post-Sprint 97) : les implémentations vivent désormais dans
|
| 845 |
-
# :mod:`picarones.measurements.builtin_hooks` (single source of truth, exposé via
|
| 846 |
-
# le registre :mod:`picarones.core.metric_hooks`). Les noms ci-dessous
|
| 847 |
-
# restent disponibles depuis ``picarones.measurements.runner`` pour la rétrocompat
|
| 848 |
-
# des tests Sprint 13 / 42 qui les importent directement.
|
| 849 |
-
|
| 850 |
-
def _aggregate_confusion(doc_results: list) -> Optional[dict]:
|
| 851 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_confusion`."""
|
| 852 |
-
from picarones.measurements.builtin_hooks import _aggregate_confusion as _impl
|
| 853 |
-
return _impl(doc_results)
|
| 854 |
-
|
| 855 |
-
|
| 856 |
-
def _aggregate_char_scores(doc_results: list) -> Optional[dict]:
|
| 857 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_char_scores`."""
|
| 858 |
-
from picarones.measurements.builtin_hooks import _aggregate_char_scores as _impl
|
| 859 |
-
return _impl(doc_results)
|
| 860 |
-
|
| 861 |
-
|
| 862 |
-
def _aggregate_taxonomy(doc_results: list) -> Optional[dict]:
|
| 863 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_taxonomy`."""
|
| 864 |
-
from picarones.measurements.builtin_hooks import _aggregate_taxonomy as _impl
|
| 865 |
-
return _impl(doc_results)
|
| 866 |
-
|
| 867 |
-
|
| 868 |
-
def _aggregate_structure(doc_results: list) -> Optional[dict]:
|
| 869 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_structure`."""
|
| 870 |
-
from picarones.measurements.builtin_hooks import _aggregate_structure as _impl
|
| 871 |
-
return _impl(doc_results)
|
| 872 |
-
|
| 873 |
-
|
| 874 |
-
def _aggregate_image_quality(doc_results: list) -> Optional[dict]:
|
| 875 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_image_quality`."""
|
| 876 |
-
from picarones.measurements.builtin_hooks import _aggregate_image_quality as _impl
|
| 877 |
-
return _impl(doc_results)
|
| 878 |
-
|
| 879 |
-
|
| 880 |
-
def _aggregate_line_metrics(doc_results: list) -> Optional[dict]:
|
| 881 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_line_metrics`."""
|
| 882 |
-
from picarones.measurements.builtin_hooks import _aggregate_line_metrics as _impl
|
| 883 |
-
return _impl(doc_results)
|
| 884 |
-
|
| 885 |
-
|
| 886 |
-
def _aggregate_hallucination(doc_results: list) -> Optional[dict]:
|
| 887 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_hallucination`."""
|
| 888 |
-
from picarones.measurements.builtin_hooks import _aggregate_hallucination as _impl
|
| 889 |
-
return _impl(doc_results)
|
| 890 |
-
|
| 891 |
-
|
| 892 |
-
# ────────────────────────────────────────��─────────────────────────────────
|
| 893 |
-
# Sprint 40 — extraction NER au post-process et agrégation
|
| 894 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 895 |
-
|
| 896 |
-
|
| 897 |
-
def _attach_ner_metrics(
|
| 898 |
-
corpus: Corpus,
|
| 899 |
-
doc_results: list,
|
| 900 |
-
entity_extractor: callable,
|
| 901 |
-
) -> None:
|
| 902 |
-
"""Calcule et attache ``DocumentResult.ner_metrics`` pour chaque doc
|
| 903 |
-
dont la GT possède un niveau ``ENTITIES`` (Sprint 32).
|
| 904 |
-
|
| 905 |
-
L'extracteur est appelé sur l'hypothèse OCR ``dr.hypothesis``.
|
| 906 |
-
Les erreurs sont dégradées en warnings (pas de propagation) afin
|
| 907 |
-
de ne pas casser le benchmark si un document spécifique fait
|
| 908 |
-
crasher le NER.
|
| 909 |
-
"""
|
| 910 |
-
try:
|
| 911 |
-
from picarones.core.corpus import GTLevel
|
| 912 |
-
from picarones.measurements.ner import compute_ner_metrics
|
| 913 |
-
except ImportError as exc:
|
| 914 |
-
logger.warning("[ner.attach] imports indisponibles : %s", exc)
|
| 915 |
-
return
|
| 916 |
-
|
| 917 |
-
docs_by_id = {d.doc_id: d for d in corpus.documents}
|
| 918 |
-
n_done = 0
|
| 919 |
-
for dr in doc_results:
|
| 920 |
-
if dr.engine_error is not None or not dr.hypothesis:
|
| 921 |
-
continue
|
| 922 |
-
doc = docs_by_id.get(dr.doc_id)
|
| 923 |
-
if doc is None or not doc.has_gt(GTLevel.ENTITIES):
|
| 924 |
-
continue
|
| 925 |
-
try:
|
| 926 |
-
gt_payload = doc.get_gt(GTLevel.ENTITIES)
|
| 927 |
-
gt_entities = list(gt_payload.entities) if gt_payload else []
|
| 928 |
-
hyp_entities = entity_extractor(dr.hypothesis) or []
|
| 929 |
-
dr.ner_metrics = compute_ner_metrics(gt_entities, hyp_entities)
|
| 930 |
-
n_done += 1
|
| 931 |
-
except Exception as exc: # noqa: BLE001
|
| 932 |
-
logger.warning(
|
| 933 |
-
"[ner.attach] %s : extraction/comparaison NER dégradée : %s",
|
| 934 |
-
dr.doc_id, exc,
|
| 935 |
-
)
|
| 936 |
-
|
| 937 |
-
if n_done > 0:
|
| 938 |
-
logger.info("[ner] %d documents évalués pour NER.", n_done)
|
| 939 |
-
|
| 940 |
-
|
| 941 |
-
def _aggregate_calibration(doc_results: list) -> Optional[dict]:
|
| 942 |
-
"""Délégation vers :func:`builtin_hooks._aggregate_calibration`.
|
| 943 |
-
|
| 944 |
-
Conservé pour la rétrocompat du test ``test_sprint42_calibration_runner``
|
| 945 |
-
qui importe directement depuis ``picarones.measurements.runner``. La logique
|
| 946 |
-
réelle vit dans :mod:`picarones.measurements.builtin_hooks` (chantier 2
|
| 947 |
-
post-Sprint 97).
|
| 948 |
-
"""
|
| 949 |
-
from picarones.measurements.builtin_hooks import _aggregate_calibration as _impl
|
| 950 |
-
return _impl(doc_results)
|
| 951 |
-
|
| 952 |
-
|
| 953 |
-
def _aggregate_ner(doc_results: list) -> Optional[dict]:
|
| 954 |
-
"""Agrège les métriques NER au niveau du moteur.
|
| 955 |
-
|
| 956 |
-
Recalcule precision/recall/F1 *micro* à partir des sommes globales
|
| 957 |
-
de TP/FP/FN, plus le détail par catégorie, plus les compteurs
|
| 958 |
-
totaux d'hallucinations et d'entités manquées.
|
| 959 |
-
"""
|
| 960 |
-
relevant = [dr for dr in doc_results if dr.ner_metrics is not None]
|
| 961 |
-
if not relevant:
|
| 962 |
-
return None
|
| 963 |
-
|
| 964 |
-
total_tp = 0
|
| 965 |
-
total_fp = 0
|
| 966 |
-
total_fn = 0
|
| 967 |
-
cat_tp: dict[str, int] = {}
|
| 968 |
-
cat_fp: dict[str, int] = {}
|
| 969 |
-
cat_fn: dict[str, int] = {}
|
| 970 |
-
total_hallucinated = 0
|
| 971 |
-
total_missed = 0
|
| 972 |
-
iou_threshold = 0.5
|
| 973 |
-
|
| 974 |
-
for dr in relevant:
|
| 975 |
-
m = dr.ner_metrics
|
| 976 |
-
total_tp += int(m.get("true_positives", 0))
|
| 977 |
-
total_fp += int(m.get("false_positives", 0))
|
| 978 |
-
total_fn += int(m.get("false_negatives", 0))
|
| 979 |
-
total_hallucinated += len(m.get("hallucinated_entities", []) or [])
|
| 980 |
-
total_missed += len(m.get("missed_entities", []) or [])
|
| 981 |
-
iou_threshold = float(m.get("iou_threshold", iou_threshold))
|
| 982 |
-
for cat, stats in (m.get("per_category") or {}).items():
|
| 983 |
-
cat_tp[cat] = cat_tp.get(cat, 0)
|
| 984 |
-
cat_fp[cat] = cat_fp.get(cat, 0)
|
| 985 |
-
cat_fn[cat] = cat_fn.get(cat, 0)
|
| 986 |
-
# Reconstitue les sommes par catégorie via support et P/R
|
| 987 |
-
support = int(stats.get("support", 0))
|
| 988 |
-
recall = float(stats.get("recall", 0.0))
|
| 989 |
-
precision = float(stats.get("precision", 0.0))
|
| 990 |
-
tp_cat = round(support * recall) if support > 0 else 0
|
| 991 |
-
fn_cat = max(0, support - tp_cat)
|
| 992 |
-
fp_cat = (
|
| 993 |
-
round(tp_cat * (1 - precision) / precision)
|
| 994 |
-
if precision > 0 else 0
|
| 995 |
-
)
|
| 996 |
-
cat_tp[cat] += tp_cat
|
| 997 |
-
cat_fp[cat] += fp_cat
|
| 998 |
-
cat_fn[cat] += fn_cat
|
| 999 |
-
|
| 1000 |
-
def _prf(tp: int, fp: int, fn: int) -> dict[str, float]:
|
| 1001 |
-
p = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 1002 |
-
r = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 1003 |
-
f1 = 2 * p * r / (p + r) if (p + r) > 0 else 0.0
|
| 1004 |
-
return {"precision": p, "recall": r, "f1": f1, "support": tp + fn}
|
| 1005 |
-
|
| 1006 |
-
return {
|
| 1007 |
-
"global": _prf(total_tp, total_fp, total_fn),
|
| 1008 |
-
"per_category": {
|
| 1009 |
-
cat: _prf(cat_tp[cat], cat_fp[cat], cat_fn[cat])
|
| 1010 |
-
for cat in sorted(set(cat_tp) | set(cat_fp) | set(cat_fn))
|
| 1011 |
-
},
|
| 1012 |
-
"true_positives": total_tp,
|
| 1013 |
-
"false_positives": total_fp,
|
| 1014 |
-
"false_negatives": total_fn,
|
| 1015 |
-
"hallucinated_total": total_hallucinated,
|
| 1016 |
-
"missed_total": total_missed,
|
| 1017 |
-
"doc_count": len(relevant),
|
| 1018 |
-
"iou_threshold": iou_threshold,
|
| 1019 |
-
}
|
|
|
|
| 1 |
+
"""Orchestrateur principal du benchmark.
|
| 2 |
|
| 3 |
+
Contient :func:`run_benchmark` et son helper :func:`_build_pipeline_info`.
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
Le runner exécute chaque moteur de la liste sur le corpus complet :
|
| 6 |
+
|
| 7 |
+
- Pour les moteurs CPU-bound (``execution_mode == "cpu"`` :
|
| 8 |
+
Tesseract, Pero OCR, Kraken), utilise un ``ProcessPoolExecutor``
|
| 9 |
+
et délègue aux workers picklables de :mod:`workers`.
|
| 10 |
+
- Pour les moteurs IO-bound (Mistral, Google Vision, Azure, LLMs),
|
| 11 |
+
utilise un ``ThreadPoolExecutor``.
|
| 12 |
+
|
| 13 |
+
Les résultats partiels (NDJSON par moteur) sont gérés par
|
| 14 |
+
:mod:`partial` ; le calcul d'un :class:`DocumentResult` individuel
|
| 15 |
+
par :mod:`document` ; l'agrégation finale par les hooks délégués à
|
| 16 |
+
:mod:`builtin_hooks` (chantier 2 post-Sprint 97).
|
| 17 |
"""
|
| 18 |
|
| 19 |
from __future__ import annotations
|
| 20 |
|
| 21 |
import concurrent.futures
|
|
|
|
| 22 |
import logging
|
|
|
|
|
|
|
| 23 |
import threading
|
| 24 |
import time
|
| 25 |
from pathlib import Path
|
|
|
|
| 28 |
from tqdm import tqdm
|
| 29 |
|
| 30 |
from picarones.core.corpus import Corpus
|
|
|
|
| 31 |
from picarones.core.results import BenchmarkResult, DocumentResult, EngineReport
|
| 32 |
+
from picarones.engines.base import BaseOCREngine
|
| 33 |
+
from picarones.measurements.runner.document import (
|
| 34 |
+
_make_error_doc_result,
|
| 35 |
+
_make_timeout_doc_result,
|
| 36 |
+
)
|
| 37 |
+
from picarones.measurements.runner.ner_attach import (
|
| 38 |
+
_aggregate_ner,
|
| 39 |
+
_attach_ner_metrics,
|
| 40 |
+
)
|
| 41 |
+
from picarones.measurements.runner.partial import (
|
| 42 |
+
_delete_partial,
|
| 43 |
+
_load_partial,
|
| 44 |
+
_save_partial_line,
|
| 45 |
+
)
|
| 46 |
+
from picarones.measurements.runner.workers import (
|
| 47 |
+
_cpu_doc_worker,
|
| 48 |
+
_io_doc_worker,
|
| 49 |
+
)
|
| 50 |
|
| 51 |
logger = logging.getLogger(__name__)
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
def run_benchmark(
|
| 55 |
corpus: Corpus,
|
|
|
|
| 491 |
return info
|
| 492 |
|
| 493 |
|
| 494 |
+
__all__ = ["_build_pipeline_info", "run_benchmark"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Persistance des résultats partiels du benchmark (NDJSON).
|
| 2 |
+
|
| 3 |
+
Quand le runner traite un corpus, il écrit chaque ``DocumentResult``
|
| 4 |
+
dans un fichier ``{partial_dir}/picarones_{corpus}_{engine}.partial.json``
|
| 5 |
+
au format NDJSON. Si le benchmark est interrompu (Ctrl+C, crash, kill),
|
| 6 |
+
la prochaine exécution reprend depuis ce fichier sans perdre le travail
|
| 7 |
+
déjà fait.
|
| 8 |
+
|
| 9 |
+
Thread-safe : le module utilise un :class:`threading.Lock` partagé
|
| 10 |
+
entre toutes les écritures pour sérialiser les appends.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
from __future__ import annotations
|
| 14 |
+
|
| 15 |
+
import json
|
| 16 |
+
import logging
|
| 17 |
+
import re
|
| 18 |
+
import tempfile
|
| 19 |
+
import threading
|
| 20 |
+
from pathlib import Path
|
| 21 |
+
from typing import Optional
|
| 22 |
+
|
| 23 |
+
from picarones.core.results import DocumentResult
|
| 24 |
+
from picarones.measurements.metrics import MetricsResult
|
| 25 |
+
|
| 26 |
+
logger = logging.getLogger(__name__)
|
| 27 |
+
|
| 28 |
+
# Lock pour la sérialisation des écritures de résultats partiels.
|
| 29 |
+
# Partagé entre tous les call sites (workers IO et CPU se relayent
|
| 30 |
+
# sur la même file).
|
| 31 |
+
_partial_write_lock = threading.Lock()
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def _sanitize_filename(s: str) -> str:
|
| 35 |
+
return re.sub(r"[^\w\-]", "_", s)[:64]
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def _partial_path(
|
| 39 |
+
corpus_name: str,
|
| 40 |
+
engine_name: str,
|
| 41 |
+
partial_dir: Optional[str | Path],
|
| 42 |
+
) -> Path:
|
| 43 |
+
base = Path(partial_dir) if partial_dir else Path(tempfile.gettempdir())
|
| 44 |
+
name = (
|
| 45 |
+
f"picarones_{_sanitize_filename(corpus_name)}"
|
| 46 |
+
f"_{_sanitize_filename(engine_name)}.partial.json"
|
| 47 |
+
)
|
| 48 |
+
return base / name
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def _load_partial(
|
| 52 |
+
corpus_name: str,
|
| 53 |
+
engine_name: str,
|
| 54 |
+
partial_dir: Optional[str | Path],
|
| 55 |
+
) -> tuple[Path, list[DocumentResult]]:
|
| 56 |
+
"""Charge les résultats partiels d'une exécution précédente interrompue.
|
| 57 |
+
|
| 58 |
+
Returns
|
| 59 |
+
-------
|
| 60 |
+
(path, results) — chemin du fichier partiel et liste des
|
| 61 |
+
DocumentResult déjà calculés.
|
| 62 |
+
"""
|
| 63 |
+
path = _partial_path(corpus_name, engine_name, partial_dir)
|
| 64 |
+
results: list[DocumentResult] = []
|
| 65 |
+
if not path.exists():
|
| 66 |
+
return path, results
|
| 67 |
+
|
| 68 |
+
try:
|
| 69 |
+
with path.open("r", encoding="utf-8") as fh:
|
| 70 |
+
for line in fh:
|
| 71 |
+
line = line.strip()
|
| 72 |
+
if not line:
|
| 73 |
+
continue
|
| 74 |
+
d = json.loads(line)
|
| 75 |
+
m = d.get("metrics", {})
|
| 76 |
+
metrics = MetricsResult(
|
| 77 |
+
cer=m.get("cer", 1.0),
|
| 78 |
+
cer_nfc=m.get("cer_nfc", 1.0),
|
| 79 |
+
cer_caseless=m.get("cer_caseless", 1.0),
|
| 80 |
+
wer=m.get("wer", 1.0),
|
| 81 |
+
wer_normalized=m.get("wer_normalized", 1.0),
|
| 82 |
+
mer=m.get("mer", 1.0),
|
| 83 |
+
wil=m.get("wil", 1.0),
|
| 84 |
+
reference_length=m.get("reference_length", 0),
|
| 85 |
+
hypothesis_length=m.get("hypothesis_length", 0),
|
| 86 |
+
error=m.get("error"),
|
| 87 |
+
)
|
| 88 |
+
results.append(DocumentResult(
|
| 89 |
+
doc_id=d["doc_id"],
|
| 90 |
+
image_path=d.get("image_path", ""),
|
| 91 |
+
ground_truth=d.get("ground_truth", ""),
|
| 92 |
+
hypothesis=d.get("hypothesis", ""),
|
| 93 |
+
metrics=metrics,
|
| 94 |
+
duration_seconds=d.get("duration_seconds", 0.0),
|
| 95 |
+
engine_error=d.get("engine_error"),
|
| 96 |
+
ocr_intermediate=d.get("ocr_intermediate"),
|
| 97 |
+
pipeline_metadata=d.get("pipeline_metadata", {}),
|
| 98 |
+
confusion_matrix=d.get("confusion_matrix"),
|
| 99 |
+
char_scores=d.get("char_scores"),
|
| 100 |
+
taxonomy=d.get("taxonomy"),
|
| 101 |
+
structure=d.get("structure"),
|
| 102 |
+
image_quality=d.get("image_quality"),
|
| 103 |
+
line_metrics=d.get("line_metrics"),
|
| 104 |
+
hallucination_metrics=d.get("hallucination_metrics"),
|
| 105 |
+
))
|
| 106 |
+
except Exception as e:
|
| 107 |
+
logger.warning("Impossible de charger les résultats partiels '%s' : %s", path, e)
|
| 108 |
+
results = []
|
| 109 |
+
|
| 110 |
+
return path, results
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def _save_partial_line(partial_path: Path, doc_result: DocumentResult) -> None:
|
| 114 |
+
"""Ajoute une entrée NDJSON au fichier de résultats partiels (thread-safe)."""
|
| 115 |
+
try:
|
| 116 |
+
line = json.dumps(doc_result.as_dict(), ensure_ascii=False) + "\n"
|
| 117 |
+
with _partial_write_lock:
|
| 118 |
+
with partial_path.open("a", encoding="utf-8") as fh:
|
| 119 |
+
fh.write(line)
|
| 120 |
+
except Exception as e:
|
| 121 |
+
logger.warning("Impossible d'écrire dans le fichier partiel '%s' : %s", partial_path, e)
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def _delete_partial(partial_path: Path) -> None:
|
| 125 |
+
"""Supprime le fichier de résultats partiels à la fin d'un moteur."""
|
| 126 |
+
try:
|
| 127 |
+
if partial_path.exists():
|
| 128 |
+
partial_path.unlink()
|
| 129 |
+
except Exception as e:
|
| 130 |
+
logger.warning("Impossible de supprimer le fichier partiel '%s' : %s", partial_path, e)
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
__all__ = [
|
| 134 |
+
"_delete_partial",
|
| 135 |
+
"_load_partial",
|
| 136 |
+
"_partial_path",
|
| 137 |
+
"_partial_write_lock",
|
| 138 |
+
"_sanitize_filename",
|
| 139 |
+
"_save_partial_line",
|
| 140 |
+
]
|
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Workers de niveau module pour les pools d'exécution.
|
| 2 |
+
|
| 3 |
+
Deux workers correspondant aux deux modes d'exécution :
|
| 4 |
+
|
| 5 |
+
- :func:`_cpu_doc_worker` — pour ``ProcessPoolExecutor`` (moteurs
|
| 6 |
+
CPU-bound, instanciés dans le sous-processus). Doit être picklable :
|
| 7 |
+
c'est pour ça qu'il est défini au niveau module.
|
| 8 |
+
- :func:`_io_doc_worker` — pour ``ThreadPoolExecutor`` (moteurs
|
| 9 |
+
IO-bound / API HTTP). L'instance du moteur est partagée entre les
|
| 10 |
+
threads.
|
| 11 |
+
|
| 12 |
+
Les deux finissent par appeler :func:`_compute_document_result` du
|
| 13 |
+
sous-module :mod:`document` pour calculer toutes les métriques.
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
from __future__ import annotations
|
| 17 |
+
|
| 18 |
+
from typing import Optional
|
| 19 |
+
|
| 20 |
+
from picarones.core.results import DocumentResult
|
| 21 |
+
from picarones.engines.base import BaseOCREngine
|
| 22 |
+
from picarones.measurements.runner.document import _compute_document_result
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def _cpu_doc_worker(args: tuple) -> "DocumentResult":
|
| 26 |
+
"""Worker pour ProcessPoolExecutor (moteurs CPU-bound).
|
| 27 |
+
|
| 28 |
+
Instancie le moteur dans le sous-processus, exécute l'OCR et calcule
|
| 29 |
+
toutes les métriques. Doit être une fonction de niveau module pour être
|
| 30 |
+
sérialisable par ``pickle``.
|
| 31 |
+
|
| 32 |
+
Le tuple ``args`` peut contenir, par compatibilité ascendante :
|
| 33 |
+
- 7 éléments : legacy (Sprint 13)
|
| 34 |
+
- 8 éléments : + ``corpus_lang`` (Sprint 87)
|
| 35 |
+
- 9 éléments : + ``profile`` (chantier 2 post-Sprint 97)
|
| 36 |
+
"""
|
| 37 |
+
if len(args) == 9:
|
| 38 |
+
(engine_module, engine_class_name, engine_config, doc_id,
|
| 39 |
+
image_path, ground_truth, char_exclude_chars, corpus_lang,
|
| 40 |
+
profile) = args
|
| 41 |
+
elif len(args) == 8:
|
| 42 |
+
(engine_module, engine_class_name, engine_config, doc_id,
|
| 43 |
+
image_path, ground_truth, char_exclude_chars, corpus_lang) = args
|
| 44 |
+
profile = "standard"
|
| 45 |
+
else:
|
| 46 |
+
(engine_module, engine_class_name, engine_config, doc_id,
|
| 47 |
+
image_path, ground_truth, char_exclude_chars) = args
|
| 48 |
+
corpus_lang = "fr"
|
| 49 |
+
profile = "standard"
|
| 50 |
+
import importlib
|
| 51 |
+
mod = importlib.import_module(engine_module)
|
| 52 |
+
engine_cls = getattr(mod, engine_class_name)
|
| 53 |
+
engine = engine_cls(config=engine_config)
|
| 54 |
+
ocr_result = engine.run(image_path)
|
| 55 |
+
char_exclude = frozenset(char_exclude_chars) if char_exclude_chars else None
|
| 56 |
+
return _compute_document_result(
|
| 57 |
+
doc_id=doc_id,
|
| 58 |
+
image_path=image_path,
|
| 59 |
+
ground_truth=ground_truth,
|
| 60 |
+
ocr_result=ocr_result,
|
| 61 |
+
char_exclude=char_exclude,
|
| 62 |
+
corpus_lang=corpus_lang,
|
| 63 |
+
profile=profile,
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def _io_doc_worker(
|
| 68 |
+
engine: BaseOCREngine,
|
| 69 |
+
doc: object,
|
| 70 |
+
char_exclude: Optional[frozenset],
|
| 71 |
+
corpus_lang: str = "fr",
|
| 72 |
+
profile: str = "standard",
|
| 73 |
+
) -> "DocumentResult":
|
| 74 |
+
"""Worker pour ThreadPoolExecutor (moteurs IO-bound / API).
|
| 75 |
+
|
| 76 |
+
Exécute l'OCR et calcule les métriques dans un thread. L'instance du
|
| 77 |
+
moteur est partagée entre les threads — les adaptateurs HTTP sont
|
| 78 |
+
généralement sans état mutable entre les appels.
|
| 79 |
+
|
| 80 |
+
Si le document possède un texte OCR pré-calculé (corpus triplet) et que
|
| 81 |
+
le moteur est un pipeline OCR+LLM, utilise ``run_with_ocr_text()`` pour
|
| 82 |
+
court-circuiter l'étape OCR et tester directement la post-correction LLM.
|
| 83 |
+
"""
|
| 84 |
+
doc_ocr_text = getattr(doc, "ocr_text", None)
|
| 85 |
+
if doc_ocr_text is not None:
|
| 86 |
+
# Corpus triplet — vérifier si le moteur supporte run_with_ocr_text
|
| 87 |
+
run_with = getattr(engine, "run_with_ocr_text", None)
|
| 88 |
+
if run_with is not None:
|
| 89 |
+
ocr_result = run_with(doc.image_path, doc_ocr_text) # type: ignore[attr-defined]
|
| 90 |
+
else:
|
| 91 |
+
# Moteur OCR classique — ignorer le texte OCR pré-calculé
|
| 92 |
+
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 93 |
+
else:
|
| 94 |
+
ocr_result = engine.run(doc.image_path) # type: ignore[attr-defined]
|
| 95 |
+
|
| 96 |
+
return _compute_document_result(
|
| 97 |
+
doc_id=doc.doc_id, # type: ignore[attr-defined]
|
| 98 |
+
image_path=str(doc.image_path), # type: ignore[attr-defined]
|
| 99 |
+
ground_truth=doc.ground_truth, # type: ignore[attr-defined]
|
| 100 |
+
ocr_result=ocr_result,
|
| 101 |
+
char_exclude=char_exclude,
|
| 102 |
+
corpus_lang=corpus_lang,
|
| 103 |
+
profile=profile,
|
| 104 |
+
)
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
__all__ = ["_cpu_doc_worker", "_io_doc_worker"]
|
|
@@ -38,15 +38,18 @@ REPO_ROOT = Path(__file__).resolve().parents[2]
|
|
| 38 |
#: historiques référencent ``picarones/measurements/statistics.py``
|
| 39 |
#: qui est maintenant un sous-package. Baseline relevée.
|
| 40 |
#: - 72 (sprint « zéro dette actionnable », 2026-05-02) : 50 chemins
|
| 41 |
-
#: massivement corrigés — 44 dans CLAUDE.md
|
| 42 |
-
#:
|
| 43 |
-
#:
|
| 44 |
-
#:
|
|
|
|
|
|
|
|
|
|
| 45 |
#:
|
| 46 |
-
#: Les
|
| 47 |
#: - ``CHANGELOG.md`` (67) : journal historique versionné, intouchable.
|
| 48 |
-
#: - ``docs/audits/*.md`` (
|
| 49 |
-
BROKEN_PATHS_BASELINE =
|
| 50 |
|
| 51 |
#: Patrons de fichiers de documentation à scanner.
|
| 52 |
DOC_GLOBS: tuple[str, ...] = (
|
|
|
|
| 38 |
#: historiques référencent ``picarones/measurements/statistics.py``
|
| 39 |
#: qui est maintenant un sous-package. Baseline relevée.
|
| 40 |
#: - 72 (sprint « zéro dette actionnable », 2026-05-02) : 50 chemins
|
| 41 |
+
#: massivement corrigés — 44 dans CLAUDE.md + 6 dans docs vivants.
|
| 42 |
+
#: - 73 (sprint « découpage de runner.py », 2026-05-03) :
|
| 43 |
+
#: ``picarones/measurements/runner.py`` est désormais un sous-package
|
| 44 |
+
#: ``runner/``. ``docs/user/writing-a-pipeline-module.md`` a été
|
| 45 |
+
#: corrigé en place ; un audit historique
|
| 46 |
+
#: (``docs/audits/institutional-readiness-2026-05.md``) référence
|
| 47 |
+
#: l'ancien chemin et reste intouché par convention.
|
| 48 |
#:
|
| 49 |
+
#: Les 73 restants sont **TOUS** dans :
|
| 50 |
#: - ``CHANGELOG.md`` (67) : journal historique versionné, intouchable.
|
| 51 |
+
#: - ``docs/audits/*.md`` (6) : audits historiques, intouchables.
|
| 52 |
+
BROKEN_PATHS_BASELINE = 73
|
| 53 |
|
| 54 |
#: Patrons de fichiers de documentation à scanner.
|
| 55 |
DOC_GLOBS: tuple[str, ...] = (
|
|
@@ -39,7 +39,11 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 39 |
# ``picarones/measurements/statistics/`` lors du sprint
|
| 40 |
# « découpage de statistics.py » (2026-05-02). Plus aucun fichier
|
| 41 |
# de la famille ne dépasse 350 lignes, donc aucune entrée requise.
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
# --- Refactor (sprint « découpage de generator.py ») : passé de
|
| 44 |
# 1063 à 431 lignes via extraction vers picarones/report/assets.py
|
| 45 |
# et le sous-package picarones/report/report_data/. Budget serré
|
|
|
|
| 39 |
# ``picarones/measurements/statistics/`` lors du sprint
|
| 40 |
# « découpage de statistics.py » (2026-05-02). Plus aucun fichier
|
| 41 |
# de la famille ne dépasse 350 lignes, donc aucune entrée requise.
|
| 42 |
+
# runner.py (1019 lignes) a été éclaté en sous-package
|
| 43 |
+
# ``picarones/measurements/runner/`` lors du sprint
|
| 44 |
+
# « découpage de runner.py » (2026-05-03). Le plus gros sous-module
|
| 45 |
+
# est ``orchestration.py`` (494 lignes), surveillé ci-dessous.
|
| 46 |
+
"picarones/measurements/runner/orchestration.py": 575, # actuel 494
|
| 47 |
# --- Refactor (sprint « découpage de generator.py ») : passé de
|
| 48 |
# 1063 à 431 lignes via extraction vers picarones/report/assets.py
|
| 49 |
# et le sous-package picarones/report/report_data/. Budget serré
|
|
@@ -260,10 +260,16 @@ class TestRunnerPartialResults:
|
|
| 260 |
from picarones.core.corpus import load_corpus_from_directory
|
| 261 |
from picarones.measurements.runner import run_benchmark
|
| 262 |
from picarones.engines.base import BaseOCREngine
|
| 263 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 264 |
|
| 265 |
save_calls: list[str] = []
|
| 266 |
-
original_save =
|
| 267 |
|
| 268 |
def tracking_save(path, doc_result):
|
| 269 |
save_calls.append(doc_result.doc_id)
|
|
@@ -276,7 +282,9 @@ class TestRunnerPartialResults:
|
|
| 276 |
def _run_ocr(self, image_path): return "texte"
|
| 277 |
|
| 278 |
corpus = load_corpus_from_directory(str(tmp_corpus))
|
| 279 |
-
|
|
|
|
|
|
|
| 280 |
run_benchmark(
|
| 281 |
corpus, [MockEngine()],
|
| 282 |
show_progress=False,
|
|
|
|
| 260 |
from picarones.core.corpus import load_corpus_from_directory
|
| 261 |
from picarones.measurements.runner import run_benchmark
|
| 262 |
from picarones.engines.base import BaseOCREngine
|
| 263 |
+
# Sprint « découpage de runner.py » (mai 2026) : ``_save_partial_line``
|
| 264 |
+
# vit désormais dans le sous-module ``runner.partial`` ; le ré-export
|
| 265 |
+
# dans ``runner.__init__`` est une référence figée. Pour patcher
|
| 266 |
+
# dynamiquement la fonction utilisée par ``run_benchmark``, il faut
|
| 267 |
+
# cibler le module source.
|
| 268 |
+
from picarones.measurements.runner import partial as _partial_mod
|
| 269 |
+
from picarones.measurements.runner import orchestration as _orch_mod
|
| 270 |
|
| 271 |
save_calls: list[str] = []
|
| 272 |
+
original_save = _partial_mod._save_partial_line
|
| 273 |
|
| 274 |
def tracking_save(path, doc_result):
|
| 275 |
save_calls.append(doc_result.doc_id)
|
|
|
|
| 282 |
def _run_ocr(self, image_path): return "texte"
|
| 283 |
|
| 284 |
corpus = load_corpus_from_directory(str(tmp_corpus))
|
| 285 |
+
# Patche la fonction directement dans l'orchestrateur, qui
|
| 286 |
+
# l'a importée depuis ``partial`` au moment du chargement.
|
| 287 |
+
with patch.object(_orch_mod, "_save_partial_line", side_effect=tracking_save):
|
| 288 |
run_benchmark(
|
| 289 |
corpus, [MockEngine()],
|
| 290 |
show_progress=False,
|