Spaces:
Sleeping
feat(services): Phase B3-final commit 6 — supprimer les 3 modules purement legacy
Browse filesPhase B3-final commit 6/7. Vérification que plus aucun caller actif
n'importe les 3 modules legacy, puis suppression.
Modules supprimés (3, ~1000 LOC nettes)
- picarones/app/services/benchmark_runner.py
(entry point ``run_benchmark_via_service`` deprecated en B3)
- picarones/app/services/_benchmark_execution.py
(helper interne de l'orchestration legacy)
- picarones/app/services/_benchmark_orchestration.py
(run_benchmark_unified / run_benchmark_with_partial)
Tests supprimés (devenus obsolètes)
- tests/integration/test_migration_invariance.py
- tests/integration/snapshots/migration_invariance.json
(rôle = garantir invariance du BenchmarkResult pendant la
migration ; migration terminée, garde-fou rempli)
Tests migrés (imports redirigés vers helpers canoniques privés)
- test_sprint_d2cdef_features.py : _aggregate_ner_metrics depuis
_benchmark_ner (au lieu du re-export benchmark_runner)
- test_sprint_d2b_partial_dir_resume.py : _engine_config_for_fingerprint
depuis _benchmark_helpers
- test_sprint_h2b_canonical_in_runner.py + test_s9_resolver_collision.py
+ test_s9_ocr_engine_naming_contract.py : build_adapter_resolver +
engine_to_pipeline_spec depuis _benchmark_adapter_resolver
- test_phase1_post_rewrite_wiring.py : _engine_config_for_fingerprint
depuis _benchmark_helpers
Tests adaptés (inspectent l'API moderne au lieu de la legacy)
- test_public_api.py : TestRunnerApi (contract test legacy) supprimé,
remplacé par test_prepare_preset_args_exposed_at_root (contract
test de l'API moderne). test_run_benchmark_via_service_still_callable_with_warning
supprimé (la fonction n'existe plus).
- test_sprint_a14_s1_normalization_propagation.py : inspecte
RunSpec.normalization_profile + prepare_preset_args (au lieu de
run_benchmark_via_service).
- test_sprint6_web_interface.py : inspecte
RunOrchestrator.execute_preset.progress_callback (au lieu de
run_benchmark_via_service).
- test_run_spec_b1_extended.py : compare defaults RunSpec ↔
prepare_preset_args (au lieu de run_benchmark_via_service).
picarones/__init__.py
- Docstring d'exemple : remplace
``from picarones.app.services.benchmark_runner import run_benchmark_via_service``
par
``from picarones import RunOrchestrator, RunSpec, load_run_spec_from_yaml``
+ ``from picarones.app.services import prepare_preset_args,
run_result_to_benchmark_result``.
Vérification (grep) : aucun import résiduel de
``picarones.app.services.benchmark_runner``,
``_benchmark_execution`` ou ``_benchmark_orchestration`` dans
``picarones/`` ni ``tests/``.
Tests : 786 passed sur le périmètre impacté (app/web/public_api/
sprint_a14_s1/phase1_security). Suite globale full en cours.
- picarones/__init__.py +2 -1
- picarones/app/services/_benchmark_execution.py +0 -168
- picarones/app/services/_benchmark_orchestration.py +0 -303
- picarones/app/services/benchmark_runner.py +0 -335
- tests/app/schemas/test_run_spec_b1_extended.py +11 -11
- tests/app/test_s9_resolver_collision.py +3 -1
- tests/app/test_sprint_d2b_partial_dir_resume.py +1 -1
- tests/app/test_sprint_d2cdef_features.py +2 -2
- tests/app/test_sprint_h2b_canonical_in_runner.py +1 -1
- tests/evaluation/metrics/test_sprint_a14_s1_normalization_propagation.py +21 -5
- tests/evaluation/test_public_api.py +21 -63
- tests/integration/snapshots/migration_invariance.json +0 -470
- tests/integration/test_migration_invariance.py +0 -289
- tests/security/test_phase1_post_rewrite_wiring.py +1 -1
- tests/web/test_s9_ocr_engine_naming_contract.py +3 -1
- tests/web/test_sprint6_web_interface.py +7 -6
|
@@ -11,7 +11,8 @@ ici pour permettre :
|
|
| 11 |
Pour les implémentations (calcul de métriques, runner, adapters OCR…),
|
| 12 |
utiliser les sous-packages explicites :
|
| 13 |
|
| 14 |
-
>>> from picarones
|
|
|
|
| 15 |
>>> from picarones.evaluation.metrics.text_metrics import compute_metrics
|
| 16 |
>>> from picarones.adapters.ocr.tesseract import TesseractAdapter
|
| 17 |
|
|
|
|
| 11 |
Pour les implémentations (calcul de métriques, runner, adapters OCR…),
|
| 12 |
utiliser les sous-packages explicites :
|
| 13 |
|
| 14 |
+
>>> from picarones import RunOrchestrator, RunSpec, load_run_spec_from_yaml
|
| 15 |
+
>>> from picarones.app.services import prepare_preset_args, run_result_to_benchmark_result
|
| 16 |
>>> from picarones.evaluation.metrics.text_metrics import compute_metrics
|
| 17 |
>>> from picarones.adapters.ocr.tesseract import TesseractAdapter
|
| 18 |
|
|
@@ -1,168 +0,0 @@
|
|
| 1 |
-
"""Orchestration ``BenchmarkService`` — module extrait du god-module
|
| 2 |
-
``benchmark_runner.py`` lors de la Phase 6 (round 4) de l'audit
|
| 3 |
-
code-quality (2026-05).
|
| 4 |
-
|
| 5 |
-
.. deprecated:: 2.0.0
|
| 6 |
-
Module helper interne du chemin legacy
|
| 7 |
-
``run_benchmark_via_service``. Phase B7 (mai 2026) — sera
|
| 8 |
-
supprimé en Phase B8 quand ``run_benchmark_via_service`` partira.
|
| 9 |
-
Le ``RunOrchestrator`` implémente sa propre orchestration via
|
| 10 |
-
``execute()`` / ``execute_preset()`` sans dépendre de ce module.
|
| 11 |
-
|
| 12 |
-
Surface publique (rééxportée par ``benchmark_runner.py`` pour
|
| 13 |
-
préserver les imports internes existants) :
|
| 14 |
-
|
| 15 |
-
- :func:`execute_via_benchmark_service` — lance
|
| 16 |
-
``BenchmarkService.run`` sur les specs converties. Wrappe la
|
| 17 |
-
factory d'inputs + GT + RunContext + cancel_event.
|
| 18 |
-
|
| 19 |
-
Les fonctions ``_run_benchmark_unified`` et
|
| 20 |
-
``_run_benchmark_with_partial`` (qui consomment le ``BenchmarkResult``
|
| 21 |
-
final) restent dans ``benchmark_runner.py`` car elles dépendent
|
| 22 |
-
d'un grand nombre d'helpers internes (NER attach, fingerprint,
|
| 23 |
-
partial store, etc.). Leur extraction nécessiterait d'extraire
|
| 24 |
-
aussi tous ces helpers — chantier reporté.
|
| 25 |
-
"""
|
| 26 |
-
|
| 27 |
-
from __future__ import annotations
|
| 28 |
-
|
| 29 |
-
import logging
|
| 30 |
-
import threading
|
| 31 |
-
from typing import TYPE_CHECKING, Any, Callable
|
| 32 |
-
|
| 33 |
-
from picarones.domain.artifacts import ArtifactType
|
| 34 |
-
from picarones.domain.corpus import CorpusSpec
|
| 35 |
-
from picarones.domain.documents import DocumentRef
|
| 36 |
-
from picarones.domain.errors import PicaronesError
|
| 37 |
-
from picarones.domain.pipeline_spec import PipelineSpec
|
| 38 |
-
|
| 39 |
-
if TYPE_CHECKING:
|
| 40 |
-
pass
|
| 41 |
-
|
| 42 |
-
logger = logging.getLogger(__name__)
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
def execute_via_benchmark_service(
|
| 46 |
-
*,
|
| 47 |
-
corpus_spec: CorpusSpec,
|
| 48 |
-
pipeline_specs: list[PipelineSpec],
|
| 49 |
-
adapter_resolver: Callable[[str], Any],
|
| 50 |
-
workspace_uri: str,
|
| 51 |
-
code_version: str,
|
| 52 |
-
timeout_seconds: float,
|
| 53 |
-
progress_callback: Callable[[str, int, str], None] | None = None,
|
| 54 |
-
cancel_event: Any | None = None,
|
| 55 |
-
pipeline_to_engine_name: dict[str, str] | None = None,
|
| 56 |
-
) -> Any:
|
| 57 |
-
"""Lance ``BenchmarkService.run`` sur les specs converties.
|
| 58 |
-
|
| 59 |
-
Vues passées en liste vide — les métriques sont calculées
|
| 60 |
-
côté converter via ``compute_metrics`` directement sur les
|
| 61 |
-
hypothèses extraites des artefacts. Pattern simple, cohérent :
|
| 62 |
-
on calcule aussi les métriques au moment du benchmark
|
| 63 |
-
(pas via ``EvaluationView``).
|
| 64 |
-
"""
|
| 65 |
-
from picarones.app.services.benchmark_service import BenchmarkService
|
| 66 |
-
from picarones.evaluation.projectors.registry import ProjectorRegistry
|
| 67 |
-
from picarones.evaluation.registry.registry import MetricRegistry
|
| 68 |
-
from picarones.evaluation.views.executor import (
|
| 69 |
-
DefaultEvaluationViewExecutor,
|
| 70 |
-
)
|
| 71 |
-
from picarones.pipeline.executor import PipelineExecutor
|
| 72 |
-
from picarones.pipeline.runner import CorpusRunner
|
| 73 |
-
from picarones.pipeline.types import RunContext
|
| 74 |
-
|
| 75 |
-
executor = PipelineExecutor(adapter_resolver=adapter_resolver)
|
| 76 |
-
runner = CorpusRunner(
|
| 77 |
-
executor,
|
| 78 |
-
max_in_flight=2,
|
| 79 |
-
timeout_seconds_per_doc=timeout_seconds,
|
| 80 |
-
)
|
| 81 |
-
|
| 82 |
-
# ViewExecutor minimal : registres vides.
|
| 83 |
-
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 84 |
-
metric_registry=MetricRegistry(),
|
| 85 |
-
projector_registry=ProjectorRegistry(),
|
| 86 |
-
payload_loader=lambda art: None,
|
| 87 |
-
)
|
| 88 |
-
bench = BenchmarkService(
|
| 89 |
-
corpus_runner=runner,
|
| 90 |
-
view_executor=view_executor,
|
| 91 |
-
code_version=code_version,
|
| 92 |
-
)
|
| 93 |
-
|
| 94 |
-
# Factory pour les inputs initiaux (toujours IMAGE depuis l'URI).
|
| 95 |
-
def inputs_factory(doc: DocumentRef) -> dict[ArtifactType, Any]:
|
| 96 |
-
from picarones.domain.artifacts import Artifact
|
| 97 |
-
|
| 98 |
-
if doc.image_uri is None:
|
| 99 |
-
raise PicaronesError(
|
| 100 |
-
f"Document {doc.id!r} sans image_uri — la pipeline "
|
| 101 |
-
"par défaut consomme une IMAGE en entrée.",
|
| 102 |
-
)
|
| 103 |
-
return {
|
| 104 |
-
ArtifactType.IMAGE: Artifact(
|
| 105 |
-
id=f"{doc.id}:image",
|
| 106 |
-
document_id=doc.id,
|
| 107 |
-
type=ArtifactType.IMAGE,
|
| 108 |
-
uri=doc.image_uri,
|
| 109 |
-
),
|
| 110 |
-
}
|
| 111 |
-
|
| 112 |
-
# GT factory : pas utilisée car ``views=[]``.
|
| 113 |
-
def gt_factory(doc: DocumentRef, art_type: ArtifactType) -> Any:
|
| 114 |
-
return None
|
| 115 |
-
|
| 116 |
-
counter_lock = threading.Lock()
|
| 117 |
-
counter_state = {"doc_idx": 0}
|
| 118 |
-
|
| 119 |
-
def context_factory(
|
| 120 |
-
doc: DocumentRef, pipeline_name: str,
|
| 121 |
-
) -> RunContext:
|
| 122 |
-
if progress_callback is not None:
|
| 123 |
-
with counter_lock:
|
| 124 |
-
idx = counter_state["doc_idx"]
|
| 125 |
-
counter_state["doc_idx"] = idx + 1
|
| 126 |
-
engine_name = (
|
| 127 |
-
pipeline_to_engine_name.get(pipeline_name, pipeline_name)
|
| 128 |
-
if pipeline_to_engine_name is not None
|
| 129 |
-
else pipeline_name
|
| 130 |
-
)
|
| 131 |
-
try:
|
| 132 |
-
progress_callback(engine_name, idx, doc.id)
|
| 133 |
-
except Exception as exc: # noqa: BLE001
|
| 134 |
-
# On ignore silencieusement les erreurs du callback ;
|
| 135 |
-
# un caller qui crashe ne doit pas faire tomber le
|
| 136 |
-
# benchmark. Logge en debug pour diagnostic.
|
| 137 |
-
logger.debug(
|
| 138 |
-
"[benchmark_execution] progress_callback raised: %s",
|
| 139 |
-
exc,
|
| 140 |
-
)
|
| 141 |
-
return RunContext(
|
| 142 |
-
document_id=doc.id,
|
| 143 |
-
code_version=code_version,
|
| 144 |
-
pipeline_name=pipeline_name,
|
| 145 |
-
workspace_uri=workspace_uri,
|
| 146 |
-
)
|
| 147 |
-
|
| 148 |
-
# Propagation du cancel_event au CorpusRunner.
|
| 149 |
-
if cancel_event is not None:
|
| 150 |
-
original_run = runner.run
|
| 151 |
-
|
| 152 |
-
def _runner_run_with_cancel(*args: Any, **kwargs: Any) -> Any:
|
| 153 |
-
kwargs.setdefault("cancel_event", cancel_event)
|
| 154 |
-
return original_run(*args, **kwargs)
|
| 155 |
-
|
| 156 |
-
runner.run = _runner_run_with_cancel # type: ignore[method-assign]
|
| 157 |
-
|
| 158 |
-
return bench.run(
|
| 159 |
-
corpus=corpus_spec,
|
| 160 |
-
pipelines=pipeline_specs,
|
| 161 |
-
views=[],
|
| 162 |
-
ground_truth_factory=gt_factory,
|
| 163 |
-
pipeline_inputs_factory=inputs_factory,
|
| 164 |
-
context_factory=context_factory,
|
| 165 |
-
)
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
__all__ = ["execute_via_benchmark_service"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,303 +0,0 @@
|
|
| 1 |
-
"""Orchestration interne du benchmark : unified vs with_partial.
|
| 2 |
-
|
| 3 |
-
Module extrait du god-module ``benchmark_runner.py`` lors de la
|
| 4 |
-
Phase 6 (round 6) de l'audit code-quality (2026-05).
|
| 5 |
-
|
| 6 |
-
.. deprecated:: 2.0.0
|
| 7 |
-
Module helper interne du chemin legacy
|
| 8 |
-
``run_benchmark_via_service``. Phase B7 (mai 2026) — sera
|
| 9 |
-
supprimé en Phase B8 quand ``run_benchmark_via_service`` partira.
|
| 10 |
-
|
| 11 |
-
Le ``RunOrchestrator`` implémente sa propre reprise sur
|
| 12 |
-
interruption pivot-par-pipeline via
|
| 13 |
-
``picarones.app.services._orchestrator_partial`` qui sérialise
|
| 14 |
-
des ``PipelineResult`` typés (format JSONL distinct de
|
| 15 |
-
``partial_store`` legacy qui sérialise des ``DocumentResult``).
|
| 16 |
-
|
| 17 |
-
Surface publique (rééxportée par ``benchmark_runner.py`` avec
|
| 18 |
-
préfixe ``_`` pour préserver l'API privée historique) :
|
| 19 |
-
|
| 20 |
-
- :func:`run_benchmark_unified` — chemin rapide sans persistance
|
| 21 |
-
intermédiaire (un seul ``BenchmarkService.run`` multi-engine).
|
| 22 |
-
- :func:`run_benchmark_with_partial` — chemin reprise per-engine
|
| 23 |
-
avec NDJSON intermédiaire. Si le run crashe ou est annulé,
|
| 24 |
-
les engines déjà traités sont conservés ; la reprise charge
|
| 25 |
-
les partials et ne re-calcule que les docs manquants.
|
| 26 |
-
|
| 27 |
-
La distinction entre les deux est gouvernée par l'argument
|
| 28 |
-
``partial_dir`` de ``run_benchmark_via_service`` :
|
| 29 |
-
|
| 30 |
-
- ``None`` → ``run_benchmark_unified`` (workflow demo, CI, smoke).
|
| 31 |
-
- ``Path(...)`` → ``run_benchmark_with_partial`` (workflow long,
|
| 32 |
-
prod, benchmark institutionnel).
|
| 33 |
-
"""
|
| 34 |
-
|
| 35 |
-
from __future__ import annotations
|
| 36 |
-
|
| 37 |
-
import logging
|
| 38 |
-
import tempfile
|
| 39 |
-
from pathlib import Path
|
| 40 |
-
from typing import TYPE_CHECKING, Any, Callable
|
| 41 |
-
|
| 42 |
-
from picarones.app.services._benchmark_adapter_resolver import (
|
| 43 |
-
build_adapter_resolver,
|
| 44 |
-
engine_to_pipeline_spec,
|
| 45 |
-
)
|
| 46 |
-
from picarones.app.services._benchmark_conversions import (
|
| 47 |
-
corpus_to_corpus_spec,
|
| 48 |
-
)
|
| 49 |
-
from picarones.app.services._benchmark_converter import (
|
| 50 |
-
run_result_to_benchmark_result,
|
| 51 |
-
)
|
| 52 |
-
from picarones.app.services._benchmark_execution import (
|
| 53 |
-
execute_via_benchmark_service,
|
| 54 |
-
)
|
| 55 |
-
from picarones.app.services._benchmark_helpers import (
|
| 56 |
-
_build_pipeline_info,
|
| 57 |
-
_engine_config_for_fingerprint,
|
| 58 |
-
_safe_engine_version,
|
| 59 |
-
)
|
| 60 |
-
|
| 61 |
-
if TYPE_CHECKING:
|
| 62 |
-
from picarones.evaluation.corpus import Corpus
|
| 63 |
-
|
| 64 |
-
logger = logging.getLogger(__name__)
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
def run_benchmark_unified(
|
| 68 |
-
*,
|
| 69 |
-
corpus: "Corpus",
|
| 70 |
-
engines: list[Any],
|
| 71 |
-
char_exclude: Any | None,
|
| 72 |
-
normalization_profile: Any | None,
|
| 73 |
-
profile: str,
|
| 74 |
-
code_version: str,
|
| 75 |
-
progress_callback: Callable[[str, int, str], None] | None,
|
| 76 |
-
timeout_seconds: float,
|
| 77 |
-
cancel_event: Any | None,
|
| 78 |
-
) -> Any:
|
| 79 |
-
"""Chemin rapide : un seul ``BenchmarkService.run`` multi-engine.
|
| 80 |
-
|
| 81 |
-
Pas de persistance intermédiaire — si le run crashe, tout est
|
| 82 |
-
perdu. Utilisé quand ``partial_dir`` est ``None``.
|
| 83 |
-
"""
|
| 84 |
-
with tempfile.TemporaryDirectory(prefix="picarones_bench_") as ws:
|
| 85 |
-
workspace = Path(ws)
|
| 86 |
-
gt_dir = workspace / "gt"
|
| 87 |
-
gt_dir.mkdir()
|
| 88 |
-
run_dir = workspace / "run"
|
| 89 |
-
run_dir.mkdir()
|
| 90 |
-
|
| 91 |
-
corpus_spec = corpus_to_corpus_spec(corpus, workspace_dir=gt_dir)
|
| 92 |
-
pipeline_specs = [engine_to_pipeline_spec(e) for e in engines]
|
| 93 |
-
adapter_resolver = build_adapter_resolver(engines)
|
| 94 |
-
pipeline_to_engine_name = {
|
| 95 |
-
spec.name: engine.name
|
| 96 |
-
for spec, engine in zip(pipeline_specs, engines)
|
| 97 |
-
}
|
| 98 |
-
|
| 99 |
-
run_result = execute_via_benchmark_service(
|
| 100 |
-
corpus_spec=corpus_spec,
|
| 101 |
-
pipeline_specs=pipeline_specs,
|
| 102 |
-
adapter_resolver=adapter_resolver,
|
| 103 |
-
workspace_uri=str(run_dir),
|
| 104 |
-
code_version=code_version,
|
| 105 |
-
timeout_seconds=timeout_seconds,
|
| 106 |
-
progress_callback=progress_callback,
|
| 107 |
-
cancel_event=cancel_event,
|
| 108 |
-
pipeline_to_engine_name=pipeline_to_engine_name,
|
| 109 |
-
)
|
| 110 |
-
|
| 111 |
-
return run_result_to_benchmark_result(
|
| 112 |
-
run_result,
|
| 113 |
-
corpus=corpus,
|
| 114 |
-
engines=engines,
|
| 115 |
-
char_exclude=char_exclude,
|
| 116 |
-
normalization_profile=normalization_profile,
|
| 117 |
-
profile=profile,
|
| 118 |
-
)
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
def run_benchmark_with_partial(
|
| 122 |
-
*,
|
| 123 |
-
corpus: "Corpus",
|
| 124 |
-
engines: list[Any],
|
| 125 |
-
partial_dir: Path,
|
| 126 |
-
char_exclude: Any | None,
|
| 127 |
-
normalization_profile: Any | None,
|
| 128 |
-
profile: str,
|
| 129 |
-
code_version: str,
|
| 130 |
-
progress_callback: Callable[[str, int, str], None] | None,
|
| 131 |
-
timeout_seconds: float,
|
| 132 |
-
cancel_event: Any | None,
|
| 133 |
-
) -> Any:
|
| 134 |
-
"""Chemin reprise : per-engine avec NDJSON intermédiaire.
|
| 135 |
-
|
| 136 |
-
Pour chaque engine, charge le partial existant, filtre les docs
|
| 137 |
-
déjà traités, lance ``BenchmarkService`` sur les restants,
|
| 138 |
-
persiste chaque nouveau ``DocumentResult`` au fil de l'eau.
|
| 139 |
-
"""
|
| 140 |
-
from picarones.app.services.partial_store import (
|
| 141 |
-
_delete_partial,
|
| 142 |
-
_load_partial,
|
| 143 |
-
_save_partial_line,
|
| 144 |
-
partial_path_for_engine,
|
| 145 |
-
)
|
| 146 |
-
from picarones.evaluation.benchmark_result import (
|
| 147 |
-
BenchmarkResult,
|
| 148 |
-
EngineReport,
|
| 149 |
-
)
|
| 150 |
-
from picarones.evaluation.corpus import Corpus as LegacyCorpus
|
| 151 |
-
from picarones.evaluation.metric_hooks import run_corpus_aggregators
|
| 152 |
-
# Force l'auto-enregistrement des hooks builtin (décorateurs).
|
| 153 |
-
import picarones.evaluation.metrics.builtin_hooks # noqa: F401
|
| 154 |
-
from picarones.evaluation.metric_result import aggregate_metrics
|
| 155 |
-
|
| 156 |
-
partial_dir.mkdir(parents=True, exist_ok=True)
|
| 157 |
-
|
| 158 |
-
# Index des docs par ID — permet de ré-ordonner les
|
| 159 |
-
# DocumentResult rechargés selon l'ordre original du corpus.
|
| 160 |
-
doc_order = {doc.doc_id: idx for idx, doc in enumerate(corpus.documents)}
|
| 161 |
-
|
| 162 |
-
engine_reports: list[Any] = []
|
| 163 |
-
|
| 164 |
-
for engine in engines:
|
| 165 |
-
# Vérifier la cancellation entre engines.
|
| 166 |
-
if cancel_event is not None and getattr(
|
| 167 |
-
cancel_event, "is_set", lambda: False,
|
| 168 |
-
)():
|
| 169 |
-
logger.info(
|
| 170 |
-
"[partial_dir] benchmark annulé avant l'engine '%s' "
|
| 171 |
-
"— partials conservés pour reprise.", engine.name,
|
| 172 |
-
)
|
| 173 |
-
break
|
| 174 |
-
|
| 175 |
-
# Phase 2.3 — fingerprint inclut config moteur + profil
|
| 176 |
-
# normalisation + char_exclude + corpus files (mtime/size) +
|
| 177 |
-
# version code. Deux runs avec configs différentes →
|
| 178 |
-
# fichiers partiels distincts → pas de réutilisation
|
| 179 |
-
# silencieuse de résultats incompatibles.
|
| 180 |
-
partial_path = partial_path_for_engine(
|
| 181 |
-
corpus=corpus,
|
| 182 |
-
engine=engine,
|
| 183 |
-
partial_dir=partial_dir,
|
| 184 |
-
engine_config=_engine_config_for_fingerprint(engine),
|
| 185 |
-
normalization_profile=normalization_profile,
|
| 186 |
-
char_exclude=char_exclude,
|
| 187 |
-
profile=profile,
|
| 188 |
-
code_version=code_version,
|
| 189 |
-
)
|
| 190 |
-
loaded_results = _load_partial(partial_path)
|
| 191 |
-
loaded_doc_ids = {dr.doc_id for dr in loaded_results}
|
| 192 |
-
|
| 193 |
-
if loaded_results:
|
| 194 |
-
logger.info(
|
| 195 |
-
"[partial_dir] reprise '%s' : %d/%d docs déjà traités.",
|
| 196 |
-
engine.name, len(loaded_results), len(corpus.documents),
|
| 197 |
-
)
|
| 198 |
-
|
| 199 |
-
remaining_docs = [
|
| 200 |
-
d for d in corpus.documents if d.doc_id not in loaded_doc_ids
|
| 201 |
-
]
|
| 202 |
-
|
| 203 |
-
new_doc_results: list[Any] = []
|
| 204 |
-
if remaining_docs:
|
| 205 |
-
# Sub-corpus avec uniquement les docs restants. On
|
| 206 |
-
# conserve le ``name`` original pour que les chemins de
|
| 207 |
-
# partial restent cohérents si un re-run arrive.
|
| 208 |
-
sub_corpus = LegacyCorpus(
|
| 209 |
-
name=corpus.name,
|
| 210 |
-
documents=remaining_docs,
|
| 211 |
-
source_path=corpus.source_path,
|
| 212 |
-
)
|
| 213 |
-
|
| 214 |
-
with tempfile.TemporaryDirectory(
|
| 215 |
-
prefix="picarones_bench_partial_",
|
| 216 |
-
) as ws:
|
| 217 |
-
workspace = Path(ws)
|
| 218 |
-
gt_dir = workspace / "gt"
|
| 219 |
-
gt_dir.mkdir()
|
| 220 |
-
run_dir = workspace / "run"
|
| 221 |
-
run_dir.mkdir()
|
| 222 |
-
|
| 223 |
-
sub_corpus_spec = corpus_to_corpus_spec(
|
| 224 |
-
sub_corpus, workspace_dir=gt_dir,
|
| 225 |
-
)
|
| 226 |
-
pipeline_spec = engine_to_pipeline_spec(engine)
|
| 227 |
-
adapter_resolver = build_adapter_resolver([engine])
|
| 228 |
-
pipeline_to_engine_name = {pipeline_spec.name: engine.name}
|
| 229 |
-
|
| 230 |
-
run_result = execute_via_benchmark_service(
|
| 231 |
-
corpus_spec=sub_corpus_spec,
|
| 232 |
-
pipeline_specs=[pipeline_spec],
|
| 233 |
-
adapter_resolver=adapter_resolver,
|
| 234 |
-
workspace_uri=str(run_dir),
|
| 235 |
-
code_version=code_version,
|
| 236 |
-
timeout_seconds=timeout_seconds,
|
| 237 |
-
progress_callback=progress_callback,
|
| 238 |
-
cancel_event=cancel_event,
|
| 239 |
-
pipeline_to_engine_name=pipeline_to_engine_name,
|
| 240 |
-
)
|
| 241 |
-
|
| 242 |
-
# Convertir ce sous-RunResult en EngineReport avec
|
| 243 |
-
# uniquement les docs restants — puis extraire les
|
| 244 |
-
# ``DocumentResult`` pour append au partial.
|
| 245 |
-
sub_report = run_result_to_benchmark_result(
|
| 246 |
-
run_result,
|
| 247 |
-
corpus=sub_corpus,
|
| 248 |
-
engines=[engine],
|
| 249 |
-
char_exclude=char_exclude,
|
| 250 |
-
normalization_profile=normalization_profile,
|
| 251 |
-
profile=profile,
|
| 252 |
-
)
|
| 253 |
-
new_doc_results = list(
|
| 254 |
-
sub_report.engine_reports[0].document_results,
|
| 255 |
-
)
|
| 256 |
-
|
| 257 |
-
# Append au partial : un cancel mid-engine préserve
|
| 258 |
-
# ce qui a déjà été calculé.
|
| 259 |
-
for dr in new_doc_results:
|
| 260 |
-
_save_partial_line(partial_path, dr)
|
| 261 |
-
|
| 262 |
-
# Fusion : loaded + new, ré-ordonné selon le corpus original.
|
| 263 |
-
all_doc_results = list(loaded_results) + new_doc_results
|
| 264 |
-
all_doc_results.sort(key=lambda dr: doc_order.get(dr.doc_id, 0))
|
| 265 |
-
|
| 266 |
-
aggregated = aggregate_metrics([d.metrics for d in all_doc_results])
|
| 267 |
-
pipeline_info = _build_pipeline_info(engine)
|
| 268 |
-
agg_values = run_corpus_aggregators(profile, all_doc_results)
|
| 269 |
-
|
| 270 |
-
engine_reports.append(
|
| 271 |
-
EngineReport(
|
| 272 |
-
engine_name=engine.name,
|
| 273 |
-
engine_version=_safe_engine_version(engine),
|
| 274 |
-
engine_config=getattr(engine, "config", {}) or {},
|
| 275 |
-
document_results=all_doc_results,
|
| 276 |
-
aggregated_metrics=aggregated,
|
| 277 |
-
pipeline_info=pipeline_info,
|
| 278 |
-
**agg_values,
|
| 279 |
-
),
|
| 280 |
-
)
|
| 281 |
-
|
| 282 |
-
# Engine traité avec succès → cleanup du partial. Si on
|
| 283 |
-
# arrive ici sans exception, tous les docs sont dans
|
| 284 |
-
# ``all_doc_results``.
|
| 285 |
-
_delete_partial(partial_path)
|
| 286 |
-
|
| 287 |
-
# Phase 3.2 audit code-quality — consume_fallback_log idempotent.
|
| 288 |
-
from picarones.adapters.corpus._fallback_log import consume_fallback_log
|
| 289 |
-
fallbacks = consume_fallback_log()
|
| 290 |
-
metadata: dict[str, Any] = {}
|
| 291 |
-
if fallbacks:
|
| 292 |
-
metadata["importer_fallbacks"] = fallbacks
|
| 293 |
-
|
| 294 |
-
return BenchmarkResult(
|
| 295 |
-
corpus_name=corpus.name,
|
| 296 |
-
corpus_source=str(corpus.source_path) if corpus.source_path else None,
|
| 297 |
-
document_count=len(corpus.documents),
|
| 298 |
-
engine_reports=engine_reports,
|
| 299 |
-
metadata=metadata,
|
| 300 |
-
)
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
__all__ = ["run_benchmark_unified", "run_benchmark_with_partial"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,335 +0,0 @@
|
|
| 1 |
-
"""Entry point CLI/web — façade ``run_benchmark_via_service``.
|
| 2 |
-
|
| 3 |
-
.. deprecated:: 2.0.0
|
| 4 |
-
Module deprecated en Phase B7 du chantier de migration Option B
|
| 5 |
-
(mai 2026). Utiliser :class:`picarones.RunOrchestrator` qui
|
| 6 |
-
consomme un ``RunSpec`` Pydantic.
|
| 7 |
-
|
| 8 |
-
- La fonction ``run_benchmark_via_service`` émet une
|
| 9 |
-
``DeprecationWarning`` à chaque appel.
|
| 10 |
-
- Aucun call site actif ne subsiste dans ``picarones/`` —
|
| 11 |
-
CLI/Web utilisent désormais directement le pattern 3 étapes
|
| 12 |
-
``prepare_preset_args → execute_preset →
|
| 13 |
-
run_result_to_benchmark_result`` (cf.
|
| 14 |
-
:mod:`picarones.app.services.python_helpers`).
|
| 15 |
-
- Retrait du module prévu **Phase B3-final commit 6** (suivant).
|
| 16 |
-
|
| 17 |
-
Pour migrer votre code, voir le guide
|
| 18 |
-
``docs/migration/option_b_user_guide.md``.
|
| 19 |
-
|
| 20 |
-
Présente l'API mono-call ``run_benchmark_via_service(corpus,
|
| 21 |
-
engines, ...)`` consommée par ``picarones.interfaces.cli`` et
|
| 22 |
-
``picarones.interfaces.web``. S'appuie en interne sur le service
|
| 23 |
-
canonique (``BenchmarkService``, ``PipelineExecutor``,
|
| 24 |
-
``CorpusRunner``).
|
| 25 |
-
|
| 26 |
-
Pourquoi cette façade
|
| 27 |
-
---------------------
|
| 28 |
-
``BenchmarkService`` consomme ``CorpusSpec`` (références
|
| 29 |
-
filesystem, Pydantic, immutable) et ``PipelineSpec`` (déclaratif).
|
| 30 |
-
Les interfaces utilisateur (CLI, web upload) raisonnent en
|
| 31 |
-
``Corpus`` riche en behavior + liste de moteurs OCR/LLM. Ce
|
| 32 |
-
module fait la conversion entre les deux modèles, expose une API
|
| 33 |
-
mono-call ergonomique et restitue un ``BenchmarkResult``.
|
| 34 |
-
"""
|
| 35 |
-
|
| 36 |
-
from __future__ import annotations
|
| 37 |
-
|
| 38 |
-
import logging
|
| 39 |
-
from pathlib import Path
|
| 40 |
-
from typing import TYPE_CHECKING, Any, Callable
|
| 41 |
-
|
| 42 |
-
if TYPE_CHECKING:
|
| 43 |
-
from picarones.evaluation.corpus import Corpus
|
| 44 |
-
|
| 45 |
-
logger = logging.getLogger(__name__)
|
| 46 |
-
|
| 47 |
-
# Le ``OCRLLMPipelineConfig`` (couche 4) est consommé exclusivement
|
| 48 |
-
# par duck typing (``is_pipeline``, ``ocr_adapter``, ``llm_adapter``,
|
| 49 |
-
# ``mode``, ``prompt_template``) pour respecter l'inward-only :
|
| 50 |
-
# ``app/`` ne doit pas importer ``pipeline/llm_pipeline_config``
|
| 51 |
-
# directement.
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 55 |
-
# Mapping Document → DocumentRef
|
| 56 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
# Phase 6 (round 3) audit code-quality (2026-05) — extraction des
|
| 60 |
-
# conversions Document/Corpus + helpers GT vers
|
| 61 |
-
# ``_benchmark_conversions.py``. Réexport pour préserver l'API
|
| 62 |
-
# publique (CLI/web consomment ces noms).
|
| 63 |
-
from picarones.app.services._benchmark_conversions import ( # noqa: F401
|
| 64 |
-
_DEFAULT_SUFFIXES,
|
| 65 |
-
_has_text_gt,
|
| 66 |
-
_payload_to_text,
|
| 67 |
-
_resolve_gt_uri,
|
| 68 |
-
_safe_doc_id,
|
| 69 |
-
corpus_to_corpus_spec,
|
| 70 |
-
document_to_document_ref,
|
| 71 |
-
)
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 75 |
-
# Mapping RunResult → BenchmarkResult
|
| 76 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
# Phase 6 (round 6) audit code-quality (2026-05) — converter
|
| 80 |
-
# ``run_result_to_benchmark_result`` extrait vers le module dédié.
|
| 81 |
-
from picarones.app.services._benchmark_converter import ( # noqa: F401
|
| 82 |
-
run_result_to_benchmark_result,
|
| 83 |
-
)
|
| 84 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 85 |
-
# Helpers privés du converter RunResult → BenchmarkResult
|
| 86 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
# Phase 6 (round 5) audit code-quality (2026-05) — extraction des
|
| 90 |
-
# helpers internes de conversion ``RunResult → BenchmarkResult``
|
| 91 |
-
# vers ``_benchmark_helpers.py`` (~260 LOC). Réexport pour les
|
| 92 |
-
# appels internes et les tests qui patchent ces symboles.
|
| 93 |
-
from picarones.app.services._benchmark_helpers import ( # noqa: F401
|
| 94 |
-
_OCRResultLike,
|
| 95 |
-
_build_pipeline_info,
|
| 96 |
-
_build_pipeline_metadata,
|
| 97 |
-
_engine_config_for_fingerprint,
|
| 98 |
-
_extract_first_error,
|
| 99 |
-
_extract_text_outputs,
|
| 100 |
-
_extract_token_confidences,
|
| 101 |
-
_resolve_corpus_lang,
|
| 102 |
-
_safe_engine_version,
|
| 103 |
-
)
|
| 104 |
-
# Phase 6 (round 2) — extraction du bloc engine→spec + resolver.
|
| 105 |
-
from picarones.app.services._benchmark_adapter_resolver import ( # noqa: F401
|
| 106 |
-
_canonical_adapter_to_spec,
|
| 107 |
-
_is_canonical_adapter,
|
| 108 |
-
_llm_adapter_name,
|
| 109 |
-
_ocr_llm_pipeline_to_spec,
|
| 110 |
-
_safe_pipeline_name,
|
| 111 |
-
build_adapter_resolver,
|
| 112 |
-
engine_to_pipeline_spec,
|
| 113 |
-
)
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
def run_benchmark_via_service(
|
| 117 |
-
corpus: "Corpus",
|
| 118 |
-
engines: list[Any],
|
| 119 |
-
*,
|
| 120 |
-
char_exclude: Any | None = None,
|
| 121 |
-
normalization_profile: Any | None = None,
|
| 122 |
-
output_json: Any | None = None,
|
| 123 |
-
code_version: str | None = None,
|
| 124 |
-
show_progress: bool = True, # noqa: ARG001
|
| 125 |
-
progress_callback: Callable[[str, int, str], None] | None = None,
|
| 126 |
-
timeout_seconds: float = 60.0,
|
| 127 |
-
cancel_event: Any | None = None,
|
| 128 |
-
partial_dir: str | Path | None = None,
|
| 129 |
-
entity_extractor: Callable[[str], list[dict]] | None = None,
|
| 130 |
-
profile: str = "standard",
|
| 131 |
-
) -> Any:
|
| 132 |
-
"""Façade ``run_benchmark`` →
|
| 133 |
-
``BenchmarkService`` rewrite.
|
| 134 |
-
|
| 135 |
-
Présente la signature historique de
|
| 136 |
-
``picarones.app.services.benchmark_runner.run_benchmark`` mais s'appuie
|
| 137 |
-
en interne sur le rewrite (``CorpusSpec``, ``PipelineSpec``,
|
| 138 |
-
``PipelineExecutor``, ``BenchmarkService``). Pivot du Sprint D
|
| 139 |
-
du plan v2.0.
|
| 140 |
-
|
| 141 |
-
Périmètre actuel (D.1.d, MVP)
|
| 142 |
-
-----------------------------
|
| 143 |
-
Cette première version fonctionne pour le cas le plus simple :
|
| 144 |
-
|
| 145 |
-
- Un ou plusieurs ``BaseOCREngine`` (OCR seul ou pipeline OCR+LLM
|
| 146 |
-
via ``OCRLLMPipeline``).
|
| 147 |
-
- Un ``Corpus`` avec image_path + ground_truth (TEXT) par doc.
|
| 148 |
-
- Métriques CER/WER calculées via ``compute_metrics`` sur les
|
| 149 |
-
hypothèses extraites des artefacts produits.
|
| 150 |
-
- Conversion en ``BenchmarkResult`` compatible avec les
|
| 151 |
-
consommateurs historiques (rapport HTML, narrative engine).
|
| 152 |
-
|
| 153 |
-
Périmètre reporté (D.2)
|
| 154 |
-
-----------------------
|
| 155 |
-
Les paramètres suivants sont **acceptés mais ignorés** dans
|
| 156 |
-
cette MVP — le rewrite gère ces aspects nativement :
|
| 157 |
-
|
| 158 |
-
- ``show_progress`` (tqdm).
|
| 159 |
-
|
| 160 |
-
Pour régler le parallélisme corpus-wide, passer par
|
| 161 |
-
``CorpusRunner.max_in_flight`` directement (couche pipeline).
|
| 162 |
-
|
| 163 |
-
Profil de mesures (D.2.f)
|
| 164 |
-
-------------------------
|
| 165 |
-
``profile`` est validé au démarrage via
|
| 166 |
-
``picarones.evaluation.metric_hooks.validate_profile``. Un
|
| 167 |
-
profil inconnu lève ``PicaronesError``. La valeur n'a pas
|
| 168 |
-
encore d'effet sur les hooks document-level (ce serait l'objet
|
| 169 |
-
d'un sprint ultérieur, hors du périmètre v2.0).
|
| 170 |
-
|
| 171 |
-
NER attach (D.2.e)
|
| 172 |
-
------------------
|
| 173 |
-
Si ``entity_extractor`` est fourni, après le calcul des
|
| 174 |
-
``DocumentResult``, le service appelle l'extracteur sur chaque
|
| 175 |
-
hypothèse OCR pour les documents dont la GT possède un niveau
|
| 176 |
-
``ENTITIES``, puis attache les métriques NER (``ner_metrics``
|
| 177 |
-
par document, ``aggregated_ner`` au niveau engine).
|
| 178 |
-
|
| 179 |
-
Reprise sur interruption (D.2.b)
|
| 180 |
-
--------------------------------
|
| 181 |
-
Si ``partial_dir`` est fourni, le bench est exécuté en mode
|
| 182 |
-
**per-engine resumable** :
|
| 183 |
-
|
| 184 |
-
- Pour chaque engine, on cherche un fichier
|
| 185 |
-
``{partial_dir}/picarones_{corpus}_{engine}.partial.jsonl``
|
| 186 |
-
d'une exécution précédente interrompue.
|
| 187 |
-
- Les ``DocumentResult`` qui y sont déjà persistés sont
|
| 188 |
-
réutilisés tels quels (pas de recalcul).
|
| 189 |
-
- Seuls les documents restants sont soumis au ``BenchmarkService``.
|
| 190 |
-
- Chaque nouveau ``DocumentResult`` est ajouté en append au
|
| 191 |
-
partial avant de passer au suivant.
|
| 192 |
-
- À la fin d'un engine traité avec succès, son partial est
|
| 193 |
-
supprimé.
|
| 194 |
-
|
| 195 |
-
Quand ``partial_dir`` est ``None`` (défaut), une seule passe
|
| 196 |
-
multi-engine est lancée (chemin rapide, pas de persistance
|
| 197 |
-
intermédiaire).
|
| 198 |
-
|
| 199 |
-
Parameters
|
| 200 |
-
----------
|
| 201 |
-
corpus:
|
| 202 |
-
Corpus.
|
| 203 |
-
engines:
|
| 204 |
-
Liste d'engines/pipelines à benchmarker.
|
| 205 |
-
char_exclude:
|
| 206 |
-
Filtre passé à ``compute_metrics``.
|
| 207 |
-
normalization_profile:
|
| 208 |
-
Profil de normalisation passé à ``compute_metrics``.
|
| 209 |
-
output_json:
|
| 210 |
-
Si fourni, le ``BenchmarkResult`` est sérialisé en JSON
|
| 211 |
-
à ce chemin (sérialisation BenchmarkResult).
|
| 212 |
-
code_version:
|
| 213 |
-
Version du code injectée dans le ``RunContext`` /
|
| 214 |
-
``RunManifest``. Défaut : ``picarones.__version__``.
|
| 215 |
-
timeout_seconds:
|
| 216 |
-
Timeout par document propagé au ``CorpusRunner``.
|
| 217 |
-
|
| 218 |
-
Returns
|
| 219 |
-
-------
|
| 220 |
-
BenchmarkResult
|
| 221 |
-
Format compatible avec les consommateurs historiques.
|
| 222 |
-
|
| 223 |
-
Raises
|
| 224 |
-
------
|
| 225 |
-
PicaronesError
|
| 226 |
-
Si les engines ne déclarent pas tous un ``name`` unique
|
| 227 |
-
(cf. ``build_adapter_resolver``).
|
| 228 |
-
"""
|
| 229 |
-
# Phase B3 migration Option B (mai 2026) — ``run_benchmark_via_service``
|
| 230 |
-
# est désormais déprécié. Utiliser ``picarones.RunOrchestrator``
|
| 231 |
-
# qui consomme un ``RunSpec`` Pydantic et expose nativement les 4
|
| 232 |
-
# fichiers JSONL. La fonction sera retirée en Phase B8 (post-
|
| 233 |
-
# deprecation release) ; cette warning aide à identifier les call
|
| 234 |
-
# sites à migrer.
|
| 235 |
-
#
|
| 236 |
-
# ``stacklevel=2`` pour que la warning pointe sur le caller (et non
|
| 237 |
-
# cette ligne). ``stacklevel=3`` ferait pointer sur le caller du
|
| 238 |
-
# caller (utile si on emballe encore dans un helper privé).
|
| 239 |
-
import warnings as _warnings
|
| 240 |
-
_warnings.warn(
|
| 241 |
-
"run_benchmark_via_service est déprécié depuis Phase B3 de la "
|
| 242 |
-
"migration Option B. Utiliser picarones.RunOrchestrator qui "
|
| 243 |
-
"consomme un RunSpec Pydantic. Retrait prévu en Phase B8.",
|
| 244 |
-
DeprecationWarning,
|
| 245 |
-
stacklevel=2,
|
| 246 |
-
)
|
| 247 |
-
|
| 248 |
-
# D.2.f : valide ``profile`` tôt — un nom inconnu lève
|
| 249 |
-
# ``PicaronesError`` avant que le bench ne démarre, plutôt
|
| 250 |
-
# que de dégrader silencieusement plus loin.
|
| 251 |
-
from picarones.evaluation.metric_hooks import validate_profile
|
| 252 |
-
|
| 253 |
-
validate_profile(profile)
|
| 254 |
-
|
| 255 |
-
if code_version is None:
|
| 256 |
-
# Le scanner d'archi rejette ``from picarones import __version__``
|
| 257 |
-
# parce qu'il classe ``picarones`` (sans sous-package) comme une
|
| 258 |
-
# lib externe non whitelistée pour la couche ``app/``. On
|
| 259 |
-
# contourne via importlib (déclaration dynamique).
|
| 260 |
-
import importlib
|
| 261 |
-
|
| 262 |
-
try:
|
| 263 |
-
code_version = importlib.import_module("picarones").__version__
|
| 264 |
-
except (ImportError, AttributeError):
|
| 265 |
-
code_version = "unknown"
|
| 266 |
-
|
| 267 |
-
if partial_dir is None:
|
| 268 |
-
benchmark_result = _run_benchmark_unified(
|
| 269 |
-
corpus=corpus,
|
| 270 |
-
engines=engines,
|
| 271 |
-
char_exclude=char_exclude,
|
| 272 |
-
normalization_profile=normalization_profile,
|
| 273 |
-
profile=profile,
|
| 274 |
-
code_version=code_version,
|
| 275 |
-
progress_callback=progress_callback,
|
| 276 |
-
timeout_seconds=timeout_seconds,
|
| 277 |
-
cancel_event=cancel_event,
|
| 278 |
-
)
|
| 279 |
-
else:
|
| 280 |
-
benchmark_result = _run_benchmark_with_partial(
|
| 281 |
-
corpus=corpus,
|
| 282 |
-
engines=engines,
|
| 283 |
-
partial_dir=Path(partial_dir),
|
| 284 |
-
char_exclude=char_exclude,
|
| 285 |
-
normalization_profile=normalization_profile,
|
| 286 |
-
profile=profile,
|
| 287 |
-
code_version=code_version,
|
| 288 |
-
progress_callback=progress_callback,
|
| 289 |
-
timeout_seconds=timeout_seconds,
|
| 290 |
-
cancel_event=cancel_event,
|
| 291 |
-
)
|
| 292 |
-
|
| 293 |
-
# D.2.e : NER attach post-process. Idempotent — re-calcule à
|
| 294 |
-
# chaque run même en mode resume (les ner_metrics ne sont pas
|
| 295 |
-
# persistées dans le partial NDJSON
|
| 296 |
-
# qui calculait NER après le doc loop).
|
| 297 |
-
if entity_extractor is not None:
|
| 298 |
-
_attach_ner_metrics_to_benchmark(
|
| 299 |
-
benchmark_result, corpus, entity_extractor,
|
| 300 |
-
)
|
| 301 |
-
|
| 302 |
-
# Sérialisation JSON optionnelle
|
| 303 |
-
if output_json is not None:
|
| 304 |
-
_persist_benchmark_result_json(benchmark_result, Path(output_json))
|
| 305 |
-
|
| 306 |
-
return benchmark_result
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
# Phase 6 audit code-quality (2026-05) — extraction NER aggregation
|
| 310 |
-
# vers ``_benchmark_ner.py``. Les noms ``_attach_ner_metrics_to_benchmark``
|
| 311 |
-
# et ``_aggregate_ner_metrics`` restent ici comme alias pour ne pas
|
| 312 |
-
# casser les appels internes (les autres fonctions du runner s'y
|
| 313 |
-
# réfèrent) et les tests qui patchent ces symboles via monkeypatch.
|
| 314 |
-
from picarones.app.services._benchmark_ner import ( # noqa: F401
|
| 315 |
-
aggregate_ner_metrics as _aggregate_ner_metrics,
|
| 316 |
-
attach_ner_metrics_to_benchmark as _attach_ner_metrics_to_benchmark,
|
| 317 |
-
)
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
# Phase 6 (round 6) — orchestration extraite.
|
| 321 |
-
from picarones.app.services._benchmark_orchestration import ( # noqa: F401
|
| 322 |
-
run_benchmark_unified as _run_benchmark_unified,
|
| 323 |
-
run_benchmark_with_partial as _run_benchmark_with_partial,
|
| 324 |
-
)
|
| 325 |
-
|
| 326 |
-
# Phase 6 (round 4) audit code-quality (2026-05) — extraction de
|
| 327 |
-
# ``_execute_via_benchmark_service`` vers ``_benchmark_execution.py``.
|
| 328 |
-
# Alias conservé pour les appels internes de
|
| 329 |
-
# ``_run_benchmark_unified`` et ``_run_benchmark_with_partial``.
|
| 330 |
-
from picarones.app.services._benchmark_execution import ( # noqa: F401
|
| 331 |
-
execute_via_benchmark_service as _execute_via_benchmark_service,
|
| 332 |
-
)
|
| 333 |
-
from picarones.app.services._benchmark_persistence import (
|
| 334 |
-
persist_benchmark_result_json as _persist_benchmark_result_json,
|
| 335 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -82,19 +82,22 @@ class TestDefaults:
|
|
| 82 |
assert spec.output_json is None
|
| 83 |
assert spec.timeout_seconds_per_doc == 60.0
|
| 84 |
|
| 85 |
-
def
|
| 86 |
self, tmp_path: Path,
|
| 87 |
) -> None:
|
| 88 |
"""Les valeurs par défaut de ``RunSpec`` matchent celles de
|
| 89 |
-
``
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
"""
|
| 92 |
-
from picarones.app.services.benchmark_runner import (
|
| 93 |
-
run_benchmark_via_service,
|
| 94 |
-
)
|
| 95 |
import inspect
|
| 96 |
|
| 97 |
-
|
|
|
|
|
|
|
| 98 |
defaults = {
|
| 99 |
name: param.default
|
| 100 |
for name, param in sig.parameters.items()
|
|
@@ -102,14 +105,11 @@ class TestDefaults:
|
|
| 102 |
}
|
| 103 |
spec = load_run_spec_from_yaml(_minimal_yaml(output_dir=tmp_path / "out"))
|
| 104 |
|
| 105 |
-
# Les noms diffèrent légèrement (RunSpec.timeout_seconds_per_doc
|
| 106 |
-
# vs run_benchmark_via_service.timeout_seconds — mais la
|
| 107 |
-
# sémantique est identique : timeout par document).
|
| 108 |
assert spec.char_exclude == defaults["char_exclude"]
|
| 109 |
assert spec.normalization_profile == defaults["normalization_profile"]
|
| 110 |
assert spec.partial_dir == defaults["partial_dir"]
|
| 111 |
assert spec.profile == defaults["profile"]
|
| 112 |
-
assert spec.timeout_seconds_per_doc == defaults["
|
| 113 |
|
| 114 |
|
| 115 |
# ──────────────────────────────────────────────────────────────────────
|
|
|
|
| 82 |
assert spec.output_json is None
|
| 83 |
assert spec.timeout_seconds_per_doc == 60.0
|
| 84 |
|
| 85 |
+
def test_defaults_match_prepare_preset_args_defaults(
|
| 86 |
self, tmp_path: Path,
|
| 87 |
) -> None:
|
| 88 |
"""Les valeurs par défaut de ``RunSpec`` matchent celles de
|
| 89 |
+
``prepare_preset_args`` pour cohérence avec l'API publique
|
| 90 |
+
Python (callers qui instancient des adapters).
|
| 91 |
+
|
| 92 |
+
Phase B3-final (mai 2026) — ce test remplace l'ancien
|
| 93 |
+
``test_defaults_match_run_benchmark_via_service_defaults``
|
| 94 |
+
qui inspectait la fonction legacy supprimée.
|
| 95 |
"""
|
|
|
|
|
|
|
|
|
|
| 96 |
import inspect
|
| 97 |
|
| 98 |
+
from picarones.app.services import prepare_preset_args
|
| 99 |
+
|
| 100 |
+
sig = inspect.signature(prepare_preset_args)
|
| 101 |
defaults = {
|
| 102 |
name: param.default
|
| 103 |
for name, param in sig.parameters.items()
|
|
|
|
| 105 |
}
|
| 106 |
spec = load_run_spec_from_yaml(_minimal_yaml(output_dir=tmp_path / "out"))
|
| 107 |
|
|
|
|
|
|
|
|
|
|
| 108 |
assert spec.char_exclude == defaults["char_exclude"]
|
| 109 |
assert spec.normalization_profile == defaults["normalization_profile"]
|
| 110 |
assert spec.partial_dir == defaults["partial_dir"]
|
| 111 |
assert spec.profile == defaults["profile"]
|
| 112 |
+
assert spec.timeout_seconds_per_doc == defaults["timeout_seconds_per_doc"]
|
| 113 |
|
| 114 |
|
| 115 |
# ──────────────────────────────────────────────────────────────────────
|
|
@@ -29,7 +29,9 @@ from __future__ import annotations
|
|
| 29 |
|
| 30 |
import pytest
|
| 31 |
|
| 32 |
-
from picarones.app.services.
|
|
|
|
|
|
|
| 33 |
from picarones.domain.errors import PicaronesError
|
| 34 |
|
| 35 |
|
|
|
|
| 29 |
|
| 30 |
import pytest
|
| 31 |
|
| 32 |
+
from picarones.app.services._benchmark_adapter_resolver import (
|
| 33 |
+
build_adapter_resolver,
|
| 34 |
+
)
|
| 35 |
from picarones.domain.errors import PicaronesError
|
| 36 |
|
| 37 |
|
|
@@ -29,7 +29,7 @@ from picarones.app.services.partial_store import (
|
|
| 29 |
_save_partial_line,
|
| 30 |
partial_path_for_engine,
|
| 31 |
)
|
| 32 |
-
from picarones.app.services.
|
| 33 |
_engine_config_for_fingerprint,
|
| 34 |
)
|
| 35 |
from tests._migration_helpers import run_via_orchestrator
|
|
|
|
| 29 |
_save_partial_line,
|
| 30 |
partial_path_for_engine,
|
| 31 |
)
|
| 32 |
+
from picarones.app.services._benchmark_helpers import (
|
| 33 |
_engine_config_for_fingerprint,
|
| 34 |
)
|
| 35 |
from tests._migration_helpers import run_via_orchestrator
|
|
@@ -22,8 +22,8 @@ import pytest
|
|
| 22 |
|
| 23 |
from picarones.adapters.llm.base import BaseLLMAdapter
|
| 24 |
from picarones.adapters.ocr.base import BaseOCRAdapter
|
| 25 |
-
from picarones.app.services.
|
| 26 |
-
_aggregate_ner_metrics,
|
| 27 |
)
|
| 28 |
from picarones.domain.artifacts import Artifact, ArtifactType
|
| 29 |
from picarones.evaluation.corpus import (
|
|
|
|
| 22 |
|
| 23 |
from picarones.adapters.llm.base import BaseLLMAdapter
|
| 24 |
from picarones.adapters.ocr.base import BaseOCRAdapter
|
| 25 |
+
from picarones.app.services._benchmark_ner import (
|
| 26 |
+
aggregate_ner_metrics as _aggregate_ner_metrics,
|
| 27 |
)
|
| 28 |
from picarones.domain.artifacts import Artifact, ArtifactType
|
| 29 |
from picarones.evaluation.corpus import (
|
|
@@ -20,7 +20,7 @@ from picarones.adapters.ocr import (
|
|
| 20 |
PrecomputedTextAdapter,
|
| 21 |
ocr_adapter_from_name,
|
| 22 |
)
|
| 23 |
-
from picarones.app.services.
|
| 24 |
build_adapter_resolver,
|
| 25 |
engine_to_pipeline_spec,
|
| 26 |
)
|
|
|
|
| 20 |
PrecomputedTextAdapter,
|
| 21 |
ocr_adapter_from_name,
|
| 22 |
)
|
| 23 |
+
from picarones.app.services._benchmark_adapter_resolver import (
|
| 24 |
build_adapter_resolver,
|
| 25 |
engine_to_pipeline_spec,
|
| 26 |
)
|
|
@@ -19,7 +19,8 @@ from __future__ import annotations
|
|
| 19 |
|
| 20 |
import inspect
|
| 21 |
|
| 22 |
-
from picarones.app.
|
|
|
|
| 23 |
from picarones.evaluation.metrics.normalization import (
|
| 24 |
NORMALIZATION_PROFILES,
|
| 25 |
get_builtin_profile,
|
|
@@ -27,11 +28,26 @@ from picarones.evaluation.metrics.normalization import (
|
|
| 27 |
|
| 28 |
|
| 29 |
class TestRunBenchmarkSignature:
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
assert "normalization_profile" in sig.parameters
|
| 34 |
-
#
|
| 35 |
assert sig.parameters["normalization_profile"].default is None
|
| 36 |
|
| 37 |
|
|
|
|
| 19 |
|
| 20 |
import inspect
|
| 21 |
|
| 22 |
+
from picarones.app.schemas.run_spec import RunSpec
|
| 23 |
+
from picarones.app.services import prepare_preset_args
|
| 24 |
from picarones.evaluation.metrics.normalization import (
|
| 25 |
NORMALIZATION_PROFILES,
|
| 26 |
get_builtin_profile,
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
class TestRunBenchmarkSignature:
|
| 31 |
+
"""Phase B3-final (mai 2026) — la propagation de
|
| 32 |
+
``normalization_profile`` est désormais portée par ``RunSpec``
|
| 33 |
+
(champ Pydantic) et par ``prepare_preset_args`` (kwarg).
|
| 34 |
+
``run_benchmark_via_service`` a été supprimé."""
|
| 35 |
+
|
| 36 |
+
def test_run_spec_exposes_normalization_profile(self) -> None:
|
| 37 |
+
"""``RunSpec.normalization_profile`` est un champ Pydantic
|
| 38 |
+
documenté (cf. Phase B1)."""
|
| 39 |
+
assert "normalization_profile" in RunSpec.model_fields
|
| 40 |
+
field = RunSpec.model_fields["normalization_profile"]
|
| 41 |
+
# Champ optionnel — défaut None.
|
| 42 |
+
assert field.default is None
|
| 43 |
+
|
| 44 |
+
def test_prepare_preset_args_accepts_normalization_profile(
|
| 45 |
+
self,
|
| 46 |
+
) -> None:
|
| 47 |
+
"""``prepare_preset_args`` propage le profil au RunSpec."""
|
| 48 |
+
sig = inspect.signature(prepare_preset_args)
|
| 49 |
assert "normalization_profile" in sig.parameters
|
| 50 |
+
# Optionnel par défaut.
|
| 51 |
assert sig.parameters["normalization_profile"].default is None
|
| 52 |
|
| 53 |
|
|
@@ -197,44 +197,16 @@ class TestMetricsApi:
|
|
| 197 |
|
| 198 |
|
| 199 |
# ──────────────────────────────────────────────────────────────────────────
|
| 200 |
-
# 5. picarones.app.services.benchmark_runner —
|
|
|
|
| 201 |
# ──────────────────────────────────────────────────────────────────────────
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
"picarones.app.services.benchmark_runner",
|
| 210 |
-
"run_benchmark_via_service",
|
| 211 |
-
)
|
| 212 |
-
|
| 213 |
-
def test_run_benchmark_via_service_keyword_args(self):
|
| 214 |
-
"""Les paramètres clés (corpus, engines, profile…) doivent rester
|
| 215 |
-
accessibles dans l'adapter rewrite. Ajout d'un argument requis =
|
| 216 |
-
breaking change."""
|
| 217 |
-
from picarones.app.services.benchmark_runner import (
|
| 218 |
-
run_benchmark_via_service,
|
| 219 |
-
)
|
| 220 |
-
sig = inspect.signature(run_benchmark_via_service)
|
| 221 |
-
params = sig.parameters
|
| 222 |
-
# Arguments contractuels — leur présence est garantie pour
|
| 223 |
-
# rester compatible avec les callers historiques.
|
| 224 |
-
# Phase 4.1 audit code-quality (2026-05) : retrait de
|
| 225 |
-
# ``max_workers`` (paramètre absorbé sans effet via
|
| 226 |
-
# ``noqa: ARG001`` ; le rewrite passe par
|
| 227 |
-
# ``CorpusRunner.max_in_flight``). Rupture mineure
|
| 228 |
-
# documentée dans CHANGELOG v2.0.
|
| 229 |
-
for name in [
|
| 230 |
-
"corpus", "engines", "output_json", "show_progress",
|
| 231 |
-
"char_exclude", "timeout_seconds",
|
| 232 |
-
"profile",
|
| 233 |
-
]:
|
| 234 |
-
assert name in params, (
|
| 235 |
-
f"run_benchmark_via_service : argument '{name}' a disparu "
|
| 236 |
-
f"(signature : {sig})"
|
| 237 |
-
)
|
| 238 |
|
| 239 |
|
| 240 |
# ──────────────────────────────────────────────────────────────────────────
|
|
@@ -289,33 +261,19 @@ class TestRunOrchestratorApi:
|
|
| 289 |
f"Phase B3 — '{name}' devrait être dans picarones.__all__"
|
| 290 |
)
|
| 291 |
|
| 292 |
-
def
|
| 293 |
-
"""
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
from picarones.app.services
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
corp = Corpus(name="warn_test", documents=[])
|
| 302 |
-
with warnings.catch_warnings(record=True) as caught:
|
| 303 |
-
warnings.simplefilter("always")
|
| 304 |
-
try:
|
| 305 |
-
run_benchmark_via_service(corp, [])
|
| 306 |
-
except Exception:
|
| 307 |
-
# Le bench échoue sur un corpus vide mais peu importe —
|
| 308 |
-
# on teste juste l'émission du warning.
|
| 309 |
-
pass
|
| 310 |
-
|
| 311 |
-
deprecation_warnings = [
|
| 312 |
-
w for w in caught if issubclass(w.category, DeprecationWarning)
|
| 313 |
-
]
|
| 314 |
-
assert len(deprecation_warnings) >= 1, (
|
| 315 |
-
"run_benchmark_via_service devrait émettre une "
|
| 316 |
-
"DeprecationWarning (Phase B3)"
|
| 317 |
)
|
| 318 |
-
assert
|
|
|
|
|
|
|
| 319 |
|
| 320 |
|
| 321 |
# ──────────────────────────────────────────────────────────────────────────
|
|
|
|
| 197 |
|
| 198 |
|
| 199 |
# ──────────────────────────────────────────────────────────────────────────
|
| 200 |
+
# 5. (anciennement) ``picarones.app.services.benchmark_runner`` —
|
| 201 |
+
# supprimé en Phase B3-final (mai 2026, migration Option B).
|
| 202 |
# ──────────────────────────────────────────────────────────────────────────
|
| 203 |
+
# Le module ``benchmark_runner.py`` portait l'entry point legacy
|
| 204 |
+
# ``run_benchmark_via_service`` qui a été remplacé par
|
| 205 |
+
# ``picarones.RunOrchestrator`` (consommant un ``RunSpec`` Pydantic
|
| 206 |
+
# ou des objets domain pré-construits via ``execute_preset()``).
|
| 207 |
+
# Le contract test du legacy a été supprimé avec le module. Voir
|
| 208 |
+
# ``TestRunOrchestratorApi`` ci-dessous pour le contrat de
|
| 209 |
+
# l'entry point canonique actuel.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
|
| 211 |
|
| 212 |
# ──────────────────────────────────────────────────────────────────────────
|
|
|
|
| 261 |
f"Phase B3 — '{name}' devrait être dans picarones.__all__"
|
| 262 |
)
|
| 263 |
|
| 264 |
+
def test_prepare_preset_args_exposed_at_root(self):
|
| 265 |
+
"""Phase B3-final — ``prepare_preset_args`` est l'API
|
| 266 |
+
publique pour les callers Python qui instancient leurs adapters
|
| 267 |
+
en mémoire (par opposition au chargement YAML via ``RunSpec``).
|
| 268 |
+
"""
|
| 269 |
+
from picarones.app.services import (
|
| 270 |
+
PresetArgs,
|
| 271 |
+
prepare_preset_args,
|
| 272 |
+
run_result_to_benchmark_result,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 273 |
)
|
| 274 |
+
assert callable(prepare_preset_args)
|
| 275 |
+
assert callable(run_result_to_benchmark_result)
|
| 276 |
+
assert inspect.isclass(PresetArgs)
|
| 277 |
|
| 278 |
|
| 279 |
# ──────────────────────────────────────────────────────────────────────────
|
|
@@ -1,470 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"corpus": {
|
| 3 |
-
"document_count": 2,
|
| 4 |
-
"name": "invariance_corpus",
|
| 5 |
-
"source": null
|
| 6 |
-
},
|
| 7 |
-
"engine_reports": [
|
| 8 |
-
{
|
| 9 |
-
"aggregated_char_scores": {
|
| 10 |
-
"diacritic": {
|
| 11 |
-
"correctly_recognized": 0,
|
| 12 |
-
"score": 1.0,
|
| 13 |
-
"total_in_gt": 0
|
| 14 |
-
},
|
| 15 |
-
"ligature": {
|
| 16 |
-
"correctly_recognized": 0,
|
| 17 |
-
"per_ligature": {},
|
| 18 |
-
"score": 1.0,
|
| 19 |
-
"total_in_gt": 0
|
| 20 |
-
}
|
| 21 |
-
},
|
| 22 |
-
"aggregated_confusion": {
|
| 23 |
-
"matrix": {},
|
| 24 |
-
"total_deletions": 0,
|
| 25 |
-
"total_insertions": 0,
|
| 26 |
-
"total_substitutions": 1
|
| 27 |
-
},
|
| 28 |
-
"aggregated_hallucination": {
|
| 29 |
-
"anchor_score_mean": 0.5,
|
| 30 |
-
"anchor_score_min": 0.0,
|
| 31 |
-
"document_count": 2,
|
| 32 |
-
"hallucinating_doc_count": 1,
|
| 33 |
-
"hallucinating_doc_rate": 0.5,
|
| 34 |
-
"length_ratio_mean": 1.0,
|
| 35 |
-
"net_insertion_rate_mean": 0.25
|
| 36 |
-
},
|
| 37 |
-
"aggregated_line_metrics": {
|
| 38 |
-
"catastrophic_rate": {
|
| 39 |
-
"0.3": 0.0,
|
| 40 |
-
"0.5": 0.0,
|
| 41 |
-
"1.0": 0.0
|
| 42 |
-
},
|
| 43 |
-
"document_count": 2,
|
| 44 |
-
"gini_mean": 0.0,
|
| 45 |
-
"gini_stdev": 0.0,
|
| 46 |
-
"heatmap": [
|
| 47 |
-
0.0,
|
| 48 |
-
0.0,
|
| 49 |
-
0.0,
|
| 50 |
-
0.0,
|
| 51 |
-
0.0,
|
| 52 |
-
0.0,
|
| 53 |
-
0.0,
|
| 54 |
-
0.0,
|
| 55 |
-
0.0,
|
| 56 |
-
0.045455
|
| 57 |
-
],
|
| 58 |
-
"mean_cer_mean": 0.045455,
|
| 59 |
-
"percentiles": {
|
| 60 |
-
"p50": 0.045455,
|
| 61 |
-
"p75": 0.045455,
|
| 62 |
-
"p90": 0.045455,
|
| 63 |
-
"p95": 0.045455,
|
| 64 |
-
"p99": 0.045455
|
| 65 |
-
}
|
| 66 |
-
},
|
| 67 |
-
"aggregated_metrics": {
|
| 68 |
-
"cer": {
|
| 69 |
-
"max": 0.090909,
|
| 70 |
-
"mean": 0.045455,
|
| 71 |
-
"median": 0.045455,
|
| 72 |
-
"min": 0.0,
|
| 73 |
-
"stdev": 0.064282
|
| 74 |
-
},
|
| 75 |
-
"cer_caseless": {
|
| 76 |
-
"max": 0.090909,
|
| 77 |
-
"mean": 0.045455,
|
| 78 |
-
"median": 0.045455,
|
| 79 |
-
"min": 0.0,
|
| 80 |
-
"stdev": 0.064282
|
| 81 |
-
},
|
| 82 |
-
"cer_diplomatic": {
|
| 83 |
-
"max": 0.090909,
|
| 84 |
-
"mean": 0.045455,
|
| 85 |
-
"median": 0.045455,
|
| 86 |
-
"min": 0.0,
|
| 87 |
-
"profile": "medieval_french",
|
| 88 |
-
"stdev": 0.064282
|
| 89 |
-
},
|
| 90 |
-
"cer_nfc": {
|
| 91 |
-
"max": 0.090909,
|
| 92 |
-
"mean": 0.045455,
|
| 93 |
-
"median": 0.045455,
|
| 94 |
-
"min": 0.0,
|
| 95 |
-
"stdev": 0.064282
|
| 96 |
-
},
|
| 97 |
-
"document_count": 2,
|
| 98 |
-
"failed_count": 0,
|
| 99 |
-
"mer": {
|
| 100 |
-
"max": 0.5,
|
| 101 |
-
"mean": 0.25,
|
| 102 |
-
"median": 0.25,
|
| 103 |
-
"min": 0.0,
|
| 104 |
-
"stdev": 0.353553
|
| 105 |
-
},
|
| 106 |
-
"wer": {
|
| 107 |
-
"max": 0.5,
|
| 108 |
-
"mean": 0.25,
|
| 109 |
-
"median": 0.25,
|
| 110 |
-
"min": 0.0,
|
| 111 |
-
"stdev": 0.353553
|
| 112 |
-
},
|
| 113 |
-
"wer_normalized": {
|
| 114 |
-
"max": 0.5,
|
| 115 |
-
"mean": 0.25,
|
| 116 |
-
"median": 0.25,
|
| 117 |
-
"min": 0.0,
|
| 118 |
-
"stdev": 0.353553
|
| 119 |
-
},
|
| 120 |
-
"wil": {
|
| 121 |
-
"max": 0.75,
|
| 122 |
-
"mean": 0.375,
|
| 123 |
-
"median": 0.375,
|
| 124 |
-
"min": 0.0,
|
| 125 |
-
"stdev": 0.53033
|
| 126 |
-
}
|
| 127 |
-
},
|
| 128 |
-
"aggregated_searchability": {
|
| 129 |
-
"max_distance": 2,
|
| 130 |
-
"missed_tokens_sample": [],
|
| 131 |
-
"n_docs": 2,
|
| 132 |
-
"n_gt_tokens": 5,
|
| 133 |
-
"n_searchable": 5,
|
| 134 |
-
"recall": 1.0
|
| 135 |
-
},
|
| 136 |
-
"aggregated_structure": {
|
| 137 |
-
"document_count": 2,
|
| 138 |
-
"mean_line_accuracy": 1.0,
|
| 139 |
-
"mean_line_fragmentation_rate": 0.0,
|
| 140 |
-
"mean_line_fusion_rate": 0.0,
|
| 141 |
-
"mean_paragraph_conservation": 1.0,
|
| 142 |
-
"mean_reading_order_score": 0.75
|
| 143 |
-
},
|
| 144 |
-
"aggregated_taxonomy": {
|
| 145 |
-
"class_distribution": {
|
| 146 |
-
"abbreviation_error": 0.0,
|
| 147 |
-
"case_error": 0.0,
|
| 148 |
-
"diacritic_error": 0.0,
|
| 149 |
-
"hapax": 1.0,
|
| 150 |
-
"lacuna": 0.0,
|
| 151 |
-
"ligature_error": 0.0,
|
| 152 |
-
"oov_character": 0.0,
|
| 153 |
-
"segmentation_error": 0.0,
|
| 154 |
-
"visual_confusion": 0.0
|
| 155 |
-
},
|
| 156 |
-
"counts": {
|
| 157 |
-
"abbreviation_error": 0,
|
| 158 |
-
"case_error": 0,
|
| 159 |
-
"diacritic_error": 0,
|
| 160 |
-
"hapax": 1,
|
| 161 |
-
"lacuna": 0,
|
| 162 |
-
"ligature_error": 0,
|
| 163 |
-
"oov_character": 0,
|
| 164 |
-
"segmentation_error": 0,
|
| 165 |
-
"visual_confusion": 0
|
| 166 |
-
},
|
| 167 |
-
"total_errors": 1
|
| 168 |
-
},
|
| 169 |
-
"document_results": [
|
| 170 |
-
{
|
| 171 |
-
"char_scores": {
|
| 172 |
-
"diacritic": {
|
| 173 |
-
"correctly_recognized": 0,
|
| 174 |
-
"per_diacritic": {},
|
| 175 |
-
"score": 1.0,
|
| 176 |
-
"total_in_gt": 0
|
| 177 |
-
},
|
| 178 |
-
"ligature": {
|
| 179 |
-
"correctly_recognized": 0,
|
| 180 |
-
"per_ligature": {},
|
| 181 |
-
"score": 1.0,
|
| 182 |
-
"total_in_gt": 0
|
| 183 |
-
}
|
| 184 |
-
},
|
| 185 |
-
"confusion_matrix": {
|
| 186 |
-
"matrix": {},
|
| 187 |
-
"total_deletions": 0,
|
| 188 |
-
"total_insertions": 0,
|
| 189 |
-
"total_substitutions": 0
|
| 190 |
-
},
|
| 191 |
-
"doc_id": "doc1",
|
| 192 |
-
"duration_seconds": 0.0,
|
| 193 |
-
"engine_error": null,
|
| 194 |
-
"ground_truth": "Bonjour le monde",
|
| 195 |
-
"hallucination_metrics": {
|
| 196 |
-
"anchor_score": 1.0,
|
| 197 |
-
"anchor_threshold_used": 0.5,
|
| 198 |
-
"gt_word_count": 3,
|
| 199 |
-
"hallucinated_blocks": [],
|
| 200 |
-
"hyp_word_count": 3,
|
| 201 |
-
"is_hallucinating": false,
|
| 202 |
-
"length_ratio": 1.0,
|
| 203 |
-
"length_ratio_threshold_used": 1.2,
|
| 204 |
-
"net_inserted_words": 0,
|
| 205 |
-
"net_insertion_rate": 0.0,
|
| 206 |
-
"ngram_size_used": 3
|
| 207 |
-
},
|
| 208 |
-
"hypothesis": "Bonjour le monde",
|
| 209 |
-
"image_path": "FIXTURES/doc1.png",
|
| 210 |
-
"line_metrics": {
|
| 211 |
-
"catastrophic_rate": {
|
| 212 |
-
"0.3": 0.0,
|
| 213 |
-
"0.5": 0.0,
|
| 214 |
-
"1.0": 0.0
|
| 215 |
-
},
|
| 216 |
-
"cer_per_line": [
|
| 217 |
-
0.0
|
| 218 |
-
],
|
| 219 |
-
"gini": 0.0,
|
| 220 |
-
"heatmap": [
|
| 221 |
-
0.0,
|
| 222 |
-
0.0,
|
| 223 |
-
0.0,
|
| 224 |
-
0.0,
|
| 225 |
-
0.0,
|
| 226 |
-
0.0,
|
| 227 |
-
0.0,
|
| 228 |
-
0.0,
|
| 229 |
-
0.0,
|
| 230 |
-
0.0
|
| 231 |
-
],
|
| 232 |
-
"line_count": 1,
|
| 233 |
-
"mean_cer": 0.0,
|
| 234 |
-
"percentiles": {
|
| 235 |
-
"p50": 0.0,
|
| 236 |
-
"p75": 0.0,
|
| 237 |
-
"p90": 0.0,
|
| 238 |
-
"p95": 0.0,
|
| 239 |
-
"p99": 0.0
|
| 240 |
-
}
|
| 241 |
-
},
|
| 242 |
-
"metrics": {
|
| 243 |
-
"cer": 0.0,
|
| 244 |
-
"cer_caseless": 0.0,
|
| 245 |
-
"cer_diplomatic": 0.0,
|
| 246 |
-
"cer_nfc": 0.0,
|
| 247 |
-
"diplomatic_profile_name": "medieval_french",
|
| 248 |
-
"error": null,
|
| 249 |
-
"hypothesis_length": 16,
|
| 250 |
-
"mer": 0.0,
|
| 251 |
-
"reference_length": 16,
|
| 252 |
-
"wer": 0.0,
|
| 253 |
-
"wer_normalized": 0.0,
|
| 254 |
-
"wil": 0.0
|
| 255 |
-
},
|
| 256 |
-
"searchability_metrics": {
|
| 257 |
-
"max_distance": 2,
|
| 258 |
-
"missed_tokens": [],
|
| 259 |
-
"n_gt_tokens": 3,
|
| 260 |
-
"n_searchable": 3,
|
| 261 |
-
"recall": 1.0
|
| 262 |
-
},
|
| 263 |
-
"structure": {
|
| 264 |
-
"gt_line_count": 1,
|
| 265 |
-
"line_accuracy": 1.0,
|
| 266 |
-
"line_fragmentation_count": 0,
|
| 267 |
-
"line_fragmentation_rate": 0.0,
|
| 268 |
-
"line_fusion_count": 0,
|
| 269 |
-
"line_fusion_rate": 0.0,
|
| 270 |
-
"ocr_line_count": 1,
|
| 271 |
-
"paragraph_conservation_score": 1.0,
|
| 272 |
-
"reading_order_score": 1.0
|
| 273 |
-
},
|
| 274 |
-
"taxonomy": {
|
| 275 |
-
"class_distribution": {},
|
| 276 |
-
"counts": {
|
| 277 |
-
"abbreviation_error": 0,
|
| 278 |
-
"case_error": 0,
|
| 279 |
-
"diacritic_error": 0,
|
| 280 |
-
"hapax": 0,
|
| 281 |
-
"lacuna": 0,
|
| 282 |
-
"ligature_error": 0,
|
| 283 |
-
"oov_character": 0,
|
| 284 |
-
"segmentation_error": 0,
|
| 285 |
-
"visual_confusion": 0
|
| 286 |
-
},
|
| 287 |
-
"examples": {
|
| 288 |
-
"abbreviation_error": [],
|
| 289 |
-
"case_error": [],
|
| 290 |
-
"diacritic_error": [],
|
| 291 |
-
"hapax": [],
|
| 292 |
-
"lacuna": [],
|
| 293 |
-
"ligature_error": [],
|
| 294 |
-
"oov_character": [],
|
| 295 |
-
"segmentation_error": [],
|
| 296 |
-
"visual_confusion": []
|
| 297 |
-
},
|
| 298 |
-
"total_errors": 0
|
| 299 |
-
}
|
| 300 |
-
},
|
| 301 |
-
{
|
| 302 |
-
"char_scores": {
|
| 303 |
-
"diacritic": {
|
| 304 |
-
"correctly_recognized": 0,
|
| 305 |
-
"per_diacritic": {},
|
| 306 |
-
"score": 1.0,
|
| 307 |
-
"total_in_gt": 0
|
| 308 |
-
},
|
| 309 |
-
"ligature": {
|
| 310 |
-
"correctly_recognized": 0,
|
| 311 |
-
"per_ligature": {},
|
| 312 |
-
"score": 1.0,
|
| 313 |
-
"total_in_gt": 0
|
| 314 |
-
}
|
| 315 |
-
},
|
| 316 |
-
"confusion_matrix": {
|
| 317 |
-
"matrix": {
|
| 318 |
-
"l": {
|
| 319 |
-
"i": 1
|
| 320 |
-
}
|
| 321 |
-
},
|
| 322 |
-
"total_deletions": 0,
|
| 323 |
-
"total_insertions": 0,
|
| 324 |
-
"total_substitutions": 1
|
| 325 |
-
},
|
| 326 |
-
"doc_id": "doc2",
|
| 327 |
-
"duration_seconds": 0.0,
|
| 328 |
-
"engine_error": null,
|
| 329 |
-
"ground_truth": "Hello world",
|
| 330 |
-
"hallucination_metrics": {
|
| 331 |
-
"anchor_score": 0.0,
|
| 332 |
-
"anchor_threshold_used": 0.5,
|
| 333 |
-
"gt_word_count": 2,
|
| 334 |
-
"hallucinated_blocks": [],
|
| 335 |
-
"hyp_word_count": 2,
|
| 336 |
-
"is_hallucinating": true,
|
| 337 |
-
"length_ratio": 1.0,
|
| 338 |
-
"length_ratio_threshold_used": 1.2,
|
| 339 |
-
"net_inserted_words": 1,
|
| 340 |
-
"net_insertion_rate": 0.5,
|
| 341 |
-
"ngram_size_used": 3
|
| 342 |
-
},
|
| 343 |
-
"hypothesis": "Helio world",
|
| 344 |
-
"image_path": "FIXTURES/doc2.png",
|
| 345 |
-
"line_metrics": {
|
| 346 |
-
"catastrophic_rate": {
|
| 347 |
-
"0.3": 0.0,
|
| 348 |
-
"0.5": 0.0,
|
| 349 |
-
"1.0": 0.0
|
| 350 |
-
},
|
| 351 |
-
"cer_per_line": [
|
| 352 |
-
0.090909
|
| 353 |
-
],
|
| 354 |
-
"gini": 0.0,
|
| 355 |
-
"heatmap": [
|
| 356 |
-
0.0,
|
| 357 |
-
0.0,
|
| 358 |
-
0.0,
|
| 359 |
-
0.0,
|
| 360 |
-
0.0,
|
| 361 |
-
0.0,
|
| 362 |
-
0.0,
|
| 363 |
-
0.0,
|
| 364 |
-
0.0,
|
| 365 |
-
0.090909
|
| 366 |
-
],
|
| 367 |
-
"line_count": 1,
|
| 368 |
-
"mean_cer": 0.090909,
|
| 369 |
-
"percentiles": {
|
| 370 |
-
"p50": 0.090909,
|
| 371 |
-
"p75": 0.090909,
|
| 372 |
-
"p90": 0.090909,
|
| 373 |
-
"p95": 0.090909,
|
| 374 |
-
"p99": 0.090909
|
| 375 |
-
}
|
| 376 |
-
},
|
| 377 |
-
"metrics": {
|
| 378 |
-
"cer": 0.090909,
|
| 379 |
-
"cer_caseless": 0.090909,
|
| 380 |
-
"cer_diplomatic": 0.090909,
|
| 381 |
-
"cer_nfc": 0.090909,
|
| 382 |
-
"diplomatic_profile_name": "medieval_french",
|
| 383 |
-
"error": null,
|
| 384 |
-
"hypothesis_length": 11,
|
| 385 |
-
"mer": 0.5,
|
| 386 |
-
"reference_length": 11,
|
| 387 |
-
"wer": 0.5,
|
| 388 |
-
"wer_normalized": 0.5,
|
| 389 |
-
"wil": 0.75
|
| 390 |
-
},
|
| 391 |
-
"searchability_metrics": {
|
| 392 |
-
"max_distance": 2,
|
| 393 |
-
"missed_tokens": [],
|
| 394 |
-
"n_gt_tokens": 2,
|
| 395 |
-
"n_searchable": 2,
|
| 396 |
-
"recall": 1.0
|
| 397 |
-
},
|
| 398 |
-
"structure": {
|
| 399 |
-
"gt_line_count": 1,
|
| 400 |
-
"line_accuracy": 1.0,
|
| 401 |
-
"line_fragmentation_count": 0,
|
| 402 |
-
"line_fragmentation_rate": 0.0,
|
| 403 |
-
"line_fusion_count": 0,
|
| 404 |
-
"line_fusion_rate": 0.0,
|
| 405 |
-
"ocr_line_count": 1,
|
| 406 |
-
"paragraph_conservation_score": 1.0,
|
| 407 |
-
"reading_order_score": 0.5
|
| 408 |
-
},
|
| 409 |
-
"taxonomy": {
|
| 410 |
-
"class_distribution": {
|
| 411 |
-
"abbreviation_error": 0.0,
|
| 412 |
-
"case_error": 0.0,
|
| 413 |
-
"diacritic_error": 0.0,
|
| 414 |
-
"hapax": 1.0,
|
| 415 |
-
"lacuna": 0.0,
|
| 416 |
-
"ligature_error": 0.0,
|
| 417 |
-
"oov_character": 0.0,
|
| 418 |
-
"segmentation_error": 0.0,
|
| 419 |
-
"visual_confusion": 0.0
|
| 420 |
-
},
|
| 421 |
-
"counts": {
|
| 422 |
-
"abbreviation_error": 0,
|
| 423 |
-
"case_error": 0,
|
| 424 |
-
"diacritic_error": 0,
|
| 425 |
-
"hapax": 1,
|
| 426 |
-
"lacuna": 0,
|
| 427 |
-
"ligature_error": 0,
|
| 428 |
-
"oov_character": 0,
|
| 429 |
-
"segmentation_error": 0,
|
| 430 |
-
"visual_confusion": 0
|
| 431 |
-
},
|
| 432 |
-
"examples": {
|
| 433 |
-
"abbreviation_error": [],
|
| 434 |
-
"case_error": [],
|
| 435 |
-
"diacritic_error": [],
|
| 436 |
-
"hapax": [
|
| 437 |
-
{
|
| 438 |
-
"gt": "Hello",
|
| 439 |
-
"ocr": "Helio"
|
| 440 |
-
}
|
| 441 |
-
],
|
| 442 |
-
"lacuna": [],
|
| 443 |
-
"ligature_error": [],
|
| 444 |
-
"oov_character": [],
|
| 445 |
-
"segmentation_error": [],
|
| 446 |
-
"visual_confusion": []
|
| 447 |
-
},
|
| 448 |
-
"total_errors": 1
|
| 449 |
-
}
|
| 450 |
-
}
|
| 451 |
-
],
|
| 452 |
-
"engine_config": {},
|
| 453 |
-
"engine_name": "precomputed_invariance",
|
| 454 |
-
"engine_version": "PINNED"
|
| 455 |
-
}
|
| 456 |
-
],
|
| 457 |
-
"metadata": {},
|
| 458 |
-
"picarones_version": "PINNED",
|
| 459 |
-
"ranking": [
|
| 460 |
-
{
|
| 461 |
-
"documents": 2,
|
| 462 |
-
"engine": "precomputed_invariance",
|
| 463 |
-
"failed": 0,
|
| 464 |
-
"mean_cer": 0.045455,
|
| 465 |
-
"mean_wer": 0.25,
|
| 466 |
-
"median_cer": 0.045455
|
| 467 |
-
}
|
| 468 |
-
],
|
| 469 |
-
"run_date": "PINNED"
|
| 470 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,289 +0,0 @@
|
|
| 1 |
-
"""Test d'invariance run-to-run pour la migration Option B.
|
| 2 |
-
|
| 3 |
-
Phase B0 du chantier de migration ``run_benchmark_via_service`` →
|
| 4 |
-
``RunOrchestrator.execute(RunSpec)``.
|
| 5 |
-
|
| 6 |
-
Rôle
|
| 7 |
-
----
|
| 8 |
-
Ce test exécute un benchmark **déterministe** (corpus mini de 2 docs +
|
| 9 |
-
``PrecomputedTextAdapter``) via la façade actuelle
|
| 10 |
-
``run_benchmark_via_service`` et compare son ``BenchmarkResult``
|
| 11 |
-
normalisé à un snapshot JSON enregistré dans
|
| 12 |
-
``tests/integration/snapshots/migration_invariance.json``.
|
| 13 |
-
|
| 14 |
-
Pourquoi
|
| 15 |
-
--------
|
| 16 |
-
Pendant la migration vers ``RunOrchestrator``, on porte 7 features
|
| 17 |
-
(``progress_callback``, ``cancel_event``, ``partial_dir``,
|
| 18 |
-
``entity_extractor``, ``char_exclude``, ``normalization_profile``,
|
| 19 |
-
``profile``, ``output_json``). Chaque port doit préserver
|
| 20 |
-
**exactement** le comportement numérique du chemin existant. Ce test
|
| 21 |
-
sert de filet de sécurité : si une refactorisation interne modifie le
|
| 22 |
-
résultat (CER, agrégation, ordre des engines, structure du JSON), le
|
| 23 |
-
snapshot diverge et la CI échoue.
|
| 24 |
-
|
| 25 |
-
Le test n'utilise **aucune** dépendance externe (pas de Tesseract, pas
|
| 26 |
-
de réseau). Le ``PrecomputedTextAdapter`` lit un fichier texte écrit
|
| 27 |
-
sur disque — sortie 100% déterministe.
|
| 28 |
-
|
| 29 |
-
Mise à jour du snapshot
|
| 30 |
-
-----------------------
|
| 31 |
-
Si une modification **volontaire** change le résultat (ex. nouveau
|
| 32 |
-
champ dans ``BenchmarkResult``), régénérer le snapshot :
|
| 33 |
-
|
| 34 |
-
PICARONES_UPDATE_SNAPSHOT=1 python -m pytest \
|
| 35 |
-
tests/integration/test_migration_invariance.py
|
| 36 |
-
|
| 37 |
-
Et inspecter le diff git du snapshot avant commit.
|
| 38 |
-
|
| 39 |
-
Normalisation
|
| 40 |
-
-------------
|
| 41 |
-
Les champs volatils sont neutralisés avant comparaison :
|
| 42 |
-
|
| 43 |
-
- ``picarones_version`` → ``"PINNED"``
|
| 44 |
-
- ``run_date`` → ``"PINNED"``
|
| 45 |
-
- ``corpus.source`` → ``"FIXTURES/corpus"``
|
| 46 |
-
- ``image_path`` → ``"FIXTURES/docN.png"``
|
| 47 |
-
- ``duration_seconds`` → ``0.0``
|
| 48 |
-
- Tout autre champ contenant le ``tmp_path`` → remplacé par
|
| 49 |
-
``"FIXTURES/..."``
|
| 50 |
-
|
| 51 |
-
Cela garantit que le snapshot reste stable cross-OS et cross-run.
|
| 52 |
-
"""
|
| 53 |
-
|
| 54 |
-
from __future__ import annotations
|
| 55 |
-
|
| 56 |
-
import json
|
| 57 |
-
import os
|
| 58 |
-
import re
|
| 59 |
-
from pathlib import Path
|
| 60 |
-
from typing import Any
|
| 61 |
-
|
| 62 |
-
import pytest
|
| 63 |
-
|
| 64 |
-
from picarones.adapters.ocr.precomputed import PrecomputedTextAdapter
|
| 65 |
-
from picarones.app.services.benchmark_runner import run_benchmark_via_service
|
| 66 |
-
from picarones.evaluation.corpus import Corpus, Document
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
SNAPSHOT_PATH = (
|
| 70 |
-
Path(__file__).parent / "snapshots" / "migration_invariance.json"
|
| 71 |
-
)
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 75 |
-
# Fixtures déterministes
|
| 76 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
def _make_invariance_corpus(tmp_path: Path) -> Corpus:
|
| 80 |
-
"""Corpus mini de 2 documents avec GT + texte précalculé.
|
| 81 |
-
|
| 82 |
-
Le texte précalculé est légèrement différent de la GT pour produire
|
| 83 |
-
des métriques CER/WER non triviales (et donc plus discriminantes
|
| 84 |
-
dans le snapshot).
|
| 85 |
-
"""
|
| 86 |
-
documents: list[Document] = []
|
| 87 |
-
|
| 88 |
-
# Doc 1 : GT = "Bonjour le monde", OCR = "Bonjour le monde" → CER 0.0
|
| 89 |
-
doc1_img = tmp_path / "doc1.png"
|
| 90 |
-
doc1_img.write_bytes(b"\x89PNG\r\n\x1a\n") # PNG header minimal
|
| 91 |
-
doc1_ocr = tmp_path / "doc1.invariance.txt"
|
| 92 |
-
doc1_ocr.write_text("Bonjour le monde", encoding="utf-8")
|
| 93 |
-
documents.append(Document(
|
| 94 |
-
image_path=doc1_img,
|
| 95 |
-
ground_truth="Bonjour le monde",
|
| 96 |
-
doc_id="doc1",
|
| 97 |
-
))
|
| 98 |
-
|
| 99 |
-
# Doc 2 : GT = "Hello world", OCR = "Helio world" → CER non nul
|
| 100 |
-
doc2_img = tmp_path / "doc2.png"
|
| 101 |
-
doc2_img.write_bytes(b"\x89PNG\r\n\x1a\n")
|
| 102 |
-
doc2_ocr = tmp_path / "doc2.invariance.txt"
|
| 103 |
-
doc2_ocr.write_text("Helio world", encoding="utf-8")
|
| 104 |
-
documents.append(Document(
|
| 105 |
-
image_path=doc2_img,
|
| 106 |
-
ground_truth="Hello world",
|
| 107 |
-
doc_id="doc2",
|
| 108 |
-
))
|
| 109 |
-
|
| 110 |
-
return Corpus(name="invariance_corpus", documents=documents)
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
def _make_invariance_engine() -> PrecomputedTextAdapter:
|
| 114 |
-
"""``PrecomputedTextAdapter`` qui lit ``<stem>.invariance.txt``."""
|
| 115 |
-
return PrecomputedTextAdapter(source_label="invariance")
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 119 |
-
# Normalisation du snapshot
|
| 120 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
def _normalize_for_snapshot(data: Any, tmp_path: Path) -> Any:
|
| 124 |
-
"""Normalise récursivement les champs volatils du ``BenchmarkResult``.
|
| 125 |
-
|
| 126 |
-
Remplace ``tmp_path`` par ``"FIXTURES"`` dans toutes les valeurs
|
| 127 |
-
string. Neutralise les champs explicitement volatils
|
| 128 |
-
(``duration_seconds``, ``run_date``, ``picarones_version``,
|
| 129 |
-
``engine_version``, ``code_version``).
|
| 130 |
-
"""
|
| 131 |
-
tmp_str = str(tmp_path)
|
| 132 |
-
# Pattern pour matcher tmp_path/quelque-chose (pour les chemins
|
| 133 |
-
# absolus qui n'apparaissent pas en clé mais en valeur string).
|
| 134 |
-
tmp_re = re.compile(re.escape(tmp_str))
|
| 135 |
-
|
| 136 |
-
def _normalize(value: Any, *, key: str | None = None) -> Any:
|
| 137 |
-
if isinstance(value, dict):
|
| 138 |
-
return {k: _normalize(v, key=k) for k, v in value.items()}
|
| 139 |
-
if isinstance(value, list):
|
| 140 |
-
return [_normalize(item) for item in value]
|
| 141 |
-
if isinstance(value, str):
|
| 142 |
-
return tmp_re.sub("FIXTURES", value)
|
| 143 |
-
if isinstance(value, float):
|
| 144 |
-
# Neutralise les durées (volatiles d'un run à l'autre).
|
| 145 |
-
if key == "duration_seconds":
|
| 146 |
-
return 0.0
|
| 147 |
-
# Garde les autres floats avec une précision raisonnable
|
| 148 |
-
# pour absorber le bruit de calcul minimum.
|
| 149 |
-
return round(value, 6)
|
| 150 |
-
return value
|
| 151 |
-
|
| 152 |
-
normalized = _normalize(data)
|
| 153 |
-
|
| 154 |
-
# Champs volatils au niveau racine — neutralisés en post-traitement
|
| 155 |
-
# parce que leur valeur ne contient pas ``tmp_path``.
|
| 156 |
-
if isinstance(normalized, dict):
|
| 157 |
-
for volatile_key in ("picarones_version", "run_date"):
|
| 158 |
-
if volatile_key in normalized:
|
| 159 |
-
normalized[volatile_key] = "PINNED"
|
| 160 |
-
|
| 161 |
-
# engine_version peut apparaître dans chaque engine_report.
|
| 162 |
-
for report in normalized.get("engine_reports", []):
|
| 163 |
-
if "engine_version" in report:
|
| 164 |
-
report["engine_version"] = "PINNED"
|
| 165 |
-
# Les pipeline_info portent parfois des chemins ou metadata.
|
| 166 |
-
pipeline_info = report.get("pipeline_info")
|
| 167 |
-
if isinstance(pipeline_info, dict):
|
| 168 |
-
if "code_version" in pipeline_info:
|
| 169 |
-
pipeline_info["code_version"] = "PINNED"
|
| 170 |
-
|
| 171 |
-
return normalized
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 175 |
-
# Comparaison snapshot
|
| 176 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
def _load_snapshot() -> dict | None:
|
| 180 |
-
if not SNAPSHOT_PATH.exists():
|
| 181 |
-
return None
|
| 182 |
-
return json.loads(SNAPSHOT_PATH.read_text(encoding="utf-8"))
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
def _write_snapshot(data: dict) -> None:
|
| 186 |
-
SNAPSHOT_PATH.parent.mkdir(parents=True, exist_ok=True)
|
| 187 |
-
SNAPSHOT_PATH.write_text(
|
| 188 |
-
json.dumps(data, ensure_ascii=False, indent=2, sort_keys=True),
|
| 189 |
-
encoding="utf-8",
|
| 190 |
-
)
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
def _should_update_snapshot() -> bool:
|
| 194 |
-
return os.environ.get("PICARONES_UPDATE_SNAPSHOT") == "1"
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 198 |
-
# Test principal
|
| 199 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
def test_run_benchmark_via_service_invariance(tmp_path: Path) -> None:
|
| 203 |
-
"""Snapshot d'invariance du comportement actuel.
|
| 204 |
-
|
| 205 |
-
Ce test est le filet de sécurité de la migration Option B. Il doit
|
| 206 |
-
rester vert à chaque étape du chantier (B1, B2, B3, B4, ...) tant
|
| 207 |
-
que ``run_benchmark_via_service`` est la façade publique.
|
| 208 |
-
|
| 209 |
-
Quand la migration sera terminée et ``run_benchmark_via_service``
|
| 210 |
-
supprimée (Phase B8), ce test sera retiré ou migré vers
|
| 211 |
-
``RunOrchestrator.execute()``.
|
| 212 |
-
"""
|
| 213 |
-
corpus = _make_invariance_corpus(tmp_path)
|
| 214 |
-
engine = _make_invariance_engine()
|
| 215 |
-
|
| 216 |
-
benchmark_result = run_benchmark_via_service(
|
| 217 |
-
corpus=corpus,
|
| 218 |
-
engines=[engine],
|
| 219 |
-
code_version="invariance-test-1.0.0",
|
| 220 |
-
)
|
| 221 |
-
|
| 222 |
-
actual_normalized = _normalize_for_snapshot(
|
| 223 |
-
benchmark_result.as_dict(), tmp_path,
|
| 224 |
-
)
|
| 225 |
-
|
| 226 |
-
snapshot = _load_snapshot()
|
| 227 |
-
if snapshot is None or _should_update_snapshot():
|
| 228 |
-
_write_snapshot(actual_normalized)
|
| 229 |
-
if snapshot is None:
|
| 230 |
-
pytest.skip(
|
| 231 |
-
f"Snapshot créé pour la première fois à "
|
| 232 |
-
f"{SNAPSHOT_PATH.relative_to(Path.cwd())}. "
|
| 233 |
-
f"Vérifier son contenu puis ré-exécuter le test."
|
| 234 |
-
)
|
| 235 |
-
else:
|
| 236 |
-
# Mode update explicite : on a écrit, le test passe sans
|
| 237 |
-
# vérification additionnelle. L'opérateur est responsable
|
| 238 |
-
# d'inspecter le diff git.
|
| 239 |
-
return
|
| 240 |
-
|
| 241 |
-
assert actual_normalized == snapshot, (
|
| 242 |
-
"BenchmarkResult diverge du snapshot d'invariance.\n"
|
| 243 |
-
f"Snapshot : {SNAPSHOT_PATH}\n"
|
| 244 |
-
"Si la divergence est intentionnelle, régénérer avec :\n"
|
| 245 |
-
" PICARONES_UPDATE_SNAPSHOT=1 python -m pytest "
|
| 246 |
-
f"{Path(__file__).relative_to(Path.cwd())}\n"
|
| 247 |
-
"et inspecter le diff git du snapshot avant commit."
|
| 248 |
-
)
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 252 |
-
# Test annexe — vérifie que la normalisation elle-même est stable
|
| 253 |
-
# ──────────────────────────────────────────────────────────────────────
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
def test_normalization_is_idempotent(tmp_path: Path) -> None:
|
| 257 |
-
"""La normalisation d'un dict déjà normalisé ne le change pas.
|
| 258 |
-
|
| 259 |
-
Garantit qu'on peut ré-appliquer la normalisation sans dériver.
|
| 260 |
-
Test pédagogique de la mécanique du snapshot.
|
| 261 |
-
"""
|
| 262 |
-
sample = {
|
| 263 |
-
"picarones_version": "2.0.0",
|
| 264 |
-
"run_date": "2026-05-14T12:00:00Z",
|
| 265 |
-
"corpus": {"source": str(tmp_path / "corpus.zip")},
|
| 266 |
-
"engine_reports": [
|
| 267 |
-
{
|
| 268 |
-
"engine_version": "1.2.3",
|
| 269 |
-
"document_results": [
|
| 270 |
-
{
|
| 271 |
-
"image_path": str(tmp_path / "doc1.png"),
|
| 272 |
-
"duration_seconds": 0.123456,
|
| 273 |
-
"metrics": {"cer": 0.05},
|
| 274 |
-
},
|
| 275 |
-
],
|
| 276 |
-
},
|
| 277 |
-
],
|
| 278 |
-
}
|
| 279 |
-
|
| 280 |
-
once = _normalize_for_snapshot(sample, tmp_path)
|
| 281 |
-
twice = _normalize_for_snapshot(once, tmp_path)
|
| 282 |
-
|
| 283 |
-
assert once == twice
|
| 284 |
-
assert once["picarones_version"] == "PINNED"
|
| 285 |
-
assert once["run_date"] == "PINNED"
|
| 286 |
-
assert once["engine_reports"][0]["engine_version"] == "PINNED"
|
| 287 |
-
assert once["engine_reports"][0]["document_results"][0]["duration_seconds"] == 0.0
|
| 288 |
-
assert "FIXTURES" in once["corpus"]["source"]
|
| 289 |
-
assert "FIXTURES" in once["engine_reports"][0]["document_results"][0]["image_path"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -813,7 +813,7 @@ class TestPartialStoreFingerprint:
|
|
| 813 |
def test_engine_config_for_fingerprint_distinguishes_psm(self) -> None:
|
| 814 |
"""``_engine_config_for_fingerprint`` capture les attributs
|
| 815 |
opérationnels d'un adapter OCR (lang, psm, model, …)."""
|
| 816 |
-
from picarones.app.services.
|
| 817 |
_engine_config_for_fingerprint,
|
| 818 |
)
|
| 819 |
|
|
|
|
| 813 |
def test_engine_config_for_fingerprint_distinguishes_psm(self) -> None:
|
| 814 |
"""``_engine_config_for_fingerprint`` capture les attributs
|
| 815 |
opérationnels d'un adapter OCR (lang, psm, model, …)."""
|
| 816 |
+
from picarones.app.services._benchmark_helpers import (
|
| 817 |
_engine_config_for_fingerprint,
|
| 818 |
)
|
| 819 |
|
|
@@ -26,7 +26,9 @@ from __future__ import annotations
|
|
| 26 |
|
| 27 |
import pytest
|
| 28 |
|
| 29 |
-
from picarones.app.services.
|
|
|
|
|
|
|
| 30 |
from picarones.interfaces.web.benchmark_utils import (
|
| 31 |
_OCR_KWARGS_BUILDERS,
|
| 32 |
_engine_from_competitor,
|
|
|
|
| 26 |
|
| 27 |
import pytest
|
| 28 |
|
| 29 |
+
from picarones.app.services._benchmark_adapter_resolver import (
|
| 30 |
+
build_adapter_resolver,
|
| 31 |
+
)
|
| 32 |
from picarones.interfaces.web.benchmark_utils import (
|
| 33 |
_OCR_KWARGS_BUILDERS,
|
| 34 |
_engine_from_competitor,
|
|
@@ -734,17 +734,18 @@ class TestCLIServeCommand:
|
|
| 734 |
class TestRunnerProgressCallback:
|
| 735 |
|
| 736 |
def test_callback_signature_accepted(self):
|
| 737 |
-
"""
|
|
|
|
| 738 |
import inspect
|
| 739 |
-
from picarones.app.services
|
| 740 |
-
sig = inspect.signature(
|
| 741 |
assert "progress_callback" in sig.parameters
|
| 742 |
|
| 743 |
def test_callback_is_optional(self):
|
| 744 |
-
"""progress_callback est optionnel (valeur par défaut None)."""
|
| 745 |
import inspect
|
| 746 |
-
from picarones.app.services
|
| 747 |
-
sig = inspect.signature(
|
| 748 |
param = sig.parameters["progress_callback"]
|
| 749 |
assert param.default is None
|
| 750 |
|
|
|
|
| 734 |
class TestRunnerProgressCallback:
|
| 735 |
|
| 736 |
def test_callback_signature_accepted(self):
|
| 737 |
+
"""Phase B3-final — ``RunOrchestrator.execute_preset`` accepte
|
| 738 |
+
un kwarg ``progress_callback``."""
|
| 739 |
import inspect
|
| 740 |
+
from picarones.app.services import RunOrchestrator
|
| 741 |
+
sig = inspect.signature(RunOrchestrator.execute_preset)
|
| 742 |
assert "progress_callback" in sig.parameters
|
| 743 |
|
| 744 |
def test_callback_is_optional(self):
|
| 745 |
+
"""``progress_callback`` est optionnel (valeur par défaut None)."""
|
| 746 |
import inspect
|
| 747 |
+
from picarones.app.services import RunOrchestrator
|
| 748 |
+
sig = inspect.signature(RunOrchestrator.execute_preset)
|
| 749 |
param = sig.parameters["progress_callback"]
|
| 750 |
assert param.default is None
|
| 751 |
|