Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Sleeping

Claude commited on May 14

Commit

91dc42d

unverified ·

1 Parent(s): 1ef330c

feat(services): Phase B3-final commit 6 — supprimer les 3 modules purement legacy

Phase B3-final commit 6/7. Vérification que plus aucun caller actif
n'importe les 3 modules legacy, puis suppression.

Modules supprimés (3, ~1000 LOC nettes)
- picarones/app/services/benchmark_runner.py
(entry point ``run_benchmark_via_service`` deprecated en B3)
- picarones/app/services/_benchmark_execution.py
(helper interne de l'orchestration legacy)
- picarones/app/services/_benchmark_orchestration.py
(run_benchmark_unified / run_benchmark_with_partial)

Tests supprimés (devenus obsolètes)
- tests/integration/test_migration_invariance.py
- tests/integration/snapshots/migration_invariance.json
(rôle = garantir invariance du BenchmarkResult pendant la
migration ; migration terminée, garde-fou rempli)

Tests migrés (imports redirigés vers helpers canoniques privés)
- test_sprint_d2cdef_features.py : _aggregate_ner_metrics depuis
_benchmark_ner (au lieu du re-export benchmark_runner)
- test_sprint_d2b_partial_dir_resume.py : _engine_config_for_fingerprint
depuis _benchmark_helpers
- test_sprint_h2b_canonical_in_runner.py + test_s9_resolver_collision.py
+ test_s9_ocr_engine_naming_contract.py : build_adapter_resolver +
engine_to_pipeline_spec depuis _benchmark_adapter_resolver
- test_phase1_post_rewrite_wiring.py : _engine_config_for_fingerprint
depuis _benchmark_helpers

Tests adaptés (inspectent l'API moderne au lieu de la legacy)
- test_public_api.py : TestRunnerApi (contract test legacy) supprimé,
remplacé par test_prepare_preset_args_exposed_at_root (contract
test de l'API moderne). test_run_benchmark_via_service_still_callable_with_warning
supprimé (la fonction n'existe plus).
- test_sprint_a14_s1_normalization_propagation.py : inspecte
RunSpec.normalization_profile + prepare_preset_args (au lieu de
run_benchmark_via_service).
- test_sprint6_web_interface.py : inspecte
RunOrchestrator.execute_preset.progress_callback (au lieu de
run_benchmark_via_service).
- test_run_spec_b1_extended.py : compare defaults RunSpec ↔
prepare_preset_args (au lieu de run_benchmark_via_service).

picarones/__init__.py
- Docstring d'exemple : remplace
``from picarones.app.services.benchmark_runner import run_benchmark_via_service``
par
``from picarones import RunOrchestrator, RunSpec, load_run_spec_from_yaml``
+ ``from picarones.app.services import prepare_preset_args,
run_result_to_benchmark_result``.

Vérification (grep) : aucun import résiduel de
``picarones.app.services.benchmark_runner``,
``_benchmark_execution`` ou ``_benchmark_orchestration`` dans
``picarones/`` ni ``tests/``.

Tests : 786 passed sur le périmètre impacté (app/web/public_api/
sprint_a14_s1/phase1_security). Suite globale full en cours.

Files changed (16) hide show

picarones/__init__.py +2 -1
picarones/app/services/_benchmark_execution.py +0 -168
picarones/app/services/_benchmark_orchestration.py +0 -303
picarones/app/services/benchmark_runner.py +0 -335
tests/app/schemas/test_run_spec_b1_extended.py +11 -11
tests/app/test_s9_resolver_collision.py +3 -1
tests/app/test_sprint_d2b_partial_dir_resume.py +1 -1
tests/app/test_sprint_d2cdef_features.py +2 -2
tests/app/test_sprint_h2b_canonical_in_runner.py +1 -1
tests/evaluation/metrics/test_sprint_a14_s1_normalization_propagation.py +21 -5
tests/evaluation/test_public_api.py +21 -63
tests/integration/snapshots/migration_invariance.json +0 -470
tests/integration/test_migration_invariance.py +0 -289
tests/security/test_phase1_post_rewrite_wiring.py +1 -1
tests/web/test_s9_ocr_engine_naming_contract.py +3 -1
tests/web/test_sprint6_web_interface.py +7 -6

picarones/__init__.py CHANGED Viewed

@@ -11,7 +11,8 @@ ici pour permettre :
 Pour les implémentations (calcul de métriques, runner, adapters OCR…),
 utiliser les sous-packages explicites :
->>> from picarones.app.services.benchmark_runner import run_benchmark_via_service
 >>> from picarones.evaluation.metrics.text_metrics import compute_metrics
 >>> from picarones.adapters.ocr.tesseract import TesseractAdapter

 Pour les implémentations (calcul de métriques, runner, adapters OCR…),
 utiliser les sous-packages explicites :
+>>> from picarones import RunOrchestrator, RunSpec, load_run_spec_from_yaml
+>>> from picarones.app.services import prepare_preset_args, run_result_to_benchmark_result
 >>> from picarones.evaluation.metrics.text_metrics import compute_metrics
 >>> from picarones.adapters.ocr.tesseract import TesseractAdapter

picarones/app/services/_benchmark_execution.py DELETED Viewed

@@ -1,168 +0,0 @@
-"""Orchestration ``BenchmarkService`` — module extrait du god-module
-``benchmark_runner.py`` lors de la Phase 6 (round 4) de l'audit
-code-quality (2026-05).
-.. deprecated:: 2.0.0
-    Module helper interne du chemin legacy
-    ``run_benchmark_via_service``.  Phase B7 (mai 2026) — sera
-    supprimé en Phase B8 quand ``run_benchmark_via_service`` partira.
-    Le ``RunOrchestrator`` implémente sa propre orchestration via
-    ``execute()`` / ``execute_preset()`` sans dépendre de ce module.
-Surface publique (rééxportée par ``benchmark_runner.py`` pour
-préserver les imports internes existants) :
-- :func:`execute_via_benchmark_service` — lance
-  ``BenchmarkService.run`` sur les specs converties.  Wrappe la
-  factory d'inputs + GT + RunContext + cancel_event.
-Les fonctions ``_run_benchmark_unified`` et
-``_run_benchmark_with_partial`` (qui consomment le ``BenchmarkResult``
-final) restent dans ``benchmark_runner.py`` car elles dépendent
-d'un grand nombre d'helpers internes (NER attach, fingerprint,
-partial store, etc.).  Leur extraction nécessiterait d'extraire
-aussi tous ces helpers — chantier reporté.
-"""
-from __future__ import annotations
-import logging
-import threading
-from typing import TYPE_CHECKING, Any, Callable
-from picarones.domain.artifacts import ArtifactType
-from picarones.domain.corpus import CorpusSpec
-from picarones.domain.documents import DocumentRef
-from picarones.domain.errors import PicaronesError
-from picarones.domain.pipeline_spec import PipelineSpec
-if TYPE_CHECKING:
-    pass
-logger = logging.getLogger(__name__)
-def execute_via_benchmark_service(
-    *,
-    corpus_spec: CorpusSpec,
-    pipeline_specs: list[PipelineSpec],
-    adapter_resolver: Callable[[str], Any],
-    workspace_uri: str,
-    code_version: str,
-    timeout_seconds: float,
-    progress_callback: Callable[[str, int, str], None] | None = None,
-    cancel_event: Any | None = None,
-    pipeline_to_engine_name: dict[str, str] | None = None,
-) -> Any:
-    """Lance ``BenchmarkService.run`` sur les specs converties.
-    Vues passées en liste vide — les métriques sont calculées
-    côté converter via ``compute_metrics`` directement sur les
-    hypothèses extraites des artefacts.  Pattern simple, cohérent :
-    on calcule aussi les métriques au moment du benchmark
-    (pas via ``EvaluationView``).
-    """
-    from picarones.app.services.benchmark_service import BenchmarkService
-    from picarones.evaluation.projectors.registry import ProjectorRegistry
-    from picarones.evaluation.registry.registry import MetricRegistry
-    from picarones.evaluation.views.executor import (
-        DefaultEvaluationViewExecutor,
-    )
-    from picarones.pipeline.executor import PipelineExecutor
-    from picarones.pipeline.runner import CorpusRunner
-    from picarones.pipeline.types import RunContext
-    executor = PipelineExecutor(adapter_resolver=adapter_resolver)
-    runner = CorpusRunner(
-        executor,
-        max_in_flight=2,
-        timeout_seconds_per_doc=timeout_seconds,
-    )
-    # ViewExecutor minimal : registres vides.
-    view_executor = DefaultEvaluationViewExecutor.from_registries(
-        metric_registry=MetricRegistry(),
-        projector_registry=ProjectorRegistry(),
-        payload_loader=lambda art: None,
-    )
-    bench = BenchmarkService(
-        corpus_runner=runner,
-        view_executor=view_executor,
-        code_version=code_version,
-    )
-    # Factory pour les inputs initiaux (toujours IMAGE depuis l'URI).
-    def inputs_factory(doc: DocumentRef) -> dict[ArtifactType, Any]:
-        from picarones.domain.artifacts import Artifact
-        if doc.image_uri is None:
-            raise PicaronesError(
-                f"Document {doc.id!r} sans image_uri — la pipeline "
-                "par défaut consomme une IMAGE en entrée.",
-            )
-        return {
-            ArtifactType.IMAGE: Artifact(
-                id=f"{doc.id}:image",
-                document_id=doc.id,
-                type=ArtifactType.IMAGE,
-                uri=doc.image_uri,
-            ),
-        }
-    # GT factory : pas utilisée car ``views=[]``.
-    def gt_factory(doc: DocumentRef, art_type: ArtifactType) -> Any:
-        return None
-    counter_lock = threading.Lock()
-    counter_state = {"doc_idx": 0}
-    def context_factory(
-        doc: DocumentRef, pipeline_name: str,
-    ) -> RunContext:
-        if progress_callback is not None:
-            with counter_lock:
-                idx = counter_state["doc_idx"]
-                counter_state["doc_idx"] = idx + 1
-            engine_name = (
-                pipeline_to_engine_name.get(pipeline_name, pipeline_name)
-                if pipeline_to_engine_name is not None
-                else pipeline_name
-            )
-            try:
-                progress_callback(engine_name, idx, doc.id)
-            except Exception as exc:  # noqa: BLE001
-                # On ignore silencieusement les erreurs du callback ;
-                # un caller qui crashe ne doit pas faire tomber le
-                # benchmark.  Logge en debug pour diagnostic.
-                logger.debug(
-                    "[benchmark_execution] progress_callback raised: %s",
-                    exc,
-                )
-        return RunContext(
-            document_id=doc.id,
-            code_version=code_version,
-            pipeline_name=pipeline_name,
-            workspace_uri=workspace_uri,
-        )
-    # Propagation du cancel_event au CorpusRunner.
-    if cancel_event is not None:
-        original_run = runner.run
-        def _runner_run_with_cancel(*args: Any, **kwargs: Any) -> Any:
-            kwargs.setdefault("cancel_event", cancel_event)
-            return original_run(*args, **kwargs)
-        runner.run = _runner_run_with_cancel  # type: ignore[method-assign]
-    return bench.run(
-        corpus=corpus_spec,
-        pipelines=pipeline_specs,
-        views=[],
-        ground_truth_factory=gt_factory,
-        pipeline_inputs_factory=inputs_factory,
-        context_factory=context_factory,
-    )
-__all__ = ["execute_via_benchmark_service"]

picarones/app/services/_benchmark_orchestration.py DELETED Viewed

@@ -1,303 +0,0 @@
-"""Orchestration interne du benchmark : unified vs with_partial.
-Module extrait du god-module ``benchmark_runner.py`` lors de la
-Phase 6 (round 6) de l'audit code-quality (2026-05).
-.. deprecated:: 2.0.0
-    Module helper interne du chemin legacy
-    ``run_benchmark_via_service``.  Phase B7 (mai 2026) — sera
-    supprimé en Phase B8 quand ``run_benchmark_via_service`` partira.
-    Le ``RunOrchestrator`` implémente sa propre reprise sur
-    interruption pivot-par-pipeline via
-    ``picarones.app.services._orchestrator_partial`` qui sérialise
-    des ``PipelineResult`` typés (format JSONL distinct de
-    ``partial_store`` legacy qui sérialise des ``DocumentResult``).
-Surface publique (rééxportée par ``benchmark_runner.py`` avec
-préfixe ``_`` pour préserver l'API privée historique) :
-- :func:`run_benchmark_unified` — chemin rapide sans persistance
-  intermédiaire (un seul ``BenchmarkService.run`` multi-engine).
-- :func:`run_benchmark_with_partial` — chemin reprise per-engine
-  avec NDJSON intermédiaire.  Si le run crashe ou est annulé,
-  les engines déjà traités sont conservés ; la reprise charge
-  les partials et ne re-calcule que les docs manquants.
-La distinction entre les deux est gouvernée par l'argument
-``partial_dir`` de ``run_benchmark_via_service`` :
-- ``None`` → ``run_benchmark_unified`` (workflow demo, CI, smoke).
-- ``Path(...)`` → ``run_benchmark_with_partial`` (workflow long,
-  prod, benchmark institutionnel).
-"""
-from __future__ import annotations
-import logging
-import tempfile
-from pathlib import Path
-from typing import TYPE_CHECKING, Any, Callable
-from picarones.app.services._benchmark_adapter_resolver import (
-    build_adapter_resolver,
-    engine_to_pipeline_spec,
-)
-from picarones.app.services._benchmark_conversions import (
-    corpus_to_corpus_spec,
-)
-from picarones.app.services._benchmark_converter import (
-    run_result_to_benchmark_result,
-)
-from picarones.app.services._benchmark_execution import (
-    execute_via_benchmark_service,
-)
-from picarones.app.services._benchmark_helpers import (
-    _build_pipeline_info,
-    _engine_config_for_fingerprint,
-    _safe_engine_version,
-)
-if TYPE_CHECKING:
-    from picarones.evaluation.corpus import Corpus
-logger = logging.getLogger(__name__)
-def run_benchmark_unified(
-    *,
-    corpus: "Corpus",
-    engines: list[Any],
-    char_exclude: Any | None,
-    normalization_profile: Any | None,
-    profile: str,
-    code_version: str,
-    progress_callback: Callable[[str, int, str], None] | None,
-    timeout_seconds: float,
-    cancel_event: Any | None,
-) -> Any:
-    """Chemin rapide : un seul ``BenchmarkService.run`` multi-engine.
-    Pas de persistance intermédiaire — si le run crashe, tout est
-    perdu.  Utilisé quand ``partial_dir`` est ``None``.
-    """
-    with tempfile.TemporaryDirectory(prefix="picarones_bench_") as ws:
-        workspace = Path(ws)
-        gt_dir = workspace / "gt"
-        gt_dir.mkdir()
-        run_dir = workspace / "run"
-        run_dir.mkdir()
-        corpus_spec = corpus_to_corpus_spec(corpus, workspace_dir=gt_dir)
-        pipeline_specs = [engine_to_pipeline_spec(e) for e in engines]
-        adapter_resolver = build_adapter_resolver(engines)
-        pipeline_to_engine_name = {
-            spec.name: engine.name
-            for spec, engine in zip(pipeline_specs, engines)
-        }
-        run_result = execute_via_benchmark_service(
-            corpus_spec=corpus_spec,
-            pipeline_specs=pipeline_specs,
-            adapter_resolver=adapter_resolver,
-            workspace_uri=str(run_dir),
-            code_version=code_version,
-            timeout_seconds=timeout_seconds,
-            progress_callback=progress_callback,
-            cancel_event=cancel_event,
-            pipeline_to_engine_name=pipeline_to_engine_name,
-        )
-        return run_result_to_benchmark_result(
-            run_result,
-            corpus=corpus,
-            engines=engines,
-            char_exclude=char_exclude,
-            normalization_profile=normalization_profile,
-            profile=profile,
-        )
-def run_benchmark_with_partial(
-    *,
-    corpus: "Corpus",
-    engines: list[Any],
-    partial_dir: Path,
-    char_exclude: Any | None,
-    normalization_profile: Any | None,
-    profile: str,
-    code_version: str,
-    progress_callback: Callable[[str, int, str], None] | None,
-    timeout_seconds: float,
-    cancel_event: Any | None,
-) -> Any:
-    """Chemin reprise : per-engine avec NDJSON intermédiaire.
-    Pour chaque engine, charge le partial existant, filtre les docs
-    déjà traités, lance ``BenchmarkService`` sur les restants,
-    persiste chaque nouveau ``DocumentResult`` au fil de l'eau.
-    """
-    from picarones.app.services.partial_store import (
-        _delete_partial,
-        _load_partial,
-        _save_partial_line,
-        partial_path_for_engine,
-    )
-    from picarones.evaluation.benchmark_result import (
-        BenchmarkResult,
-        EngineReport,
-    )
-    from picarones.evaluation.corpus import Corpus as LegacyCorpus
-    from picarones.evaluation.metric_hooks import run_corpus_aggregators
-    # Force l'auto-enregistrement des hooks builtin (décorateurs).
-    import picarones.evaluation.metrics.builtin_hooks  # noqa: F401
-    from picarones.evaluation.metric_result import aggregate_metrics
-    partial_dir.mkdir(parents=True, exist_ok=True)
-    # Index des docs par ID — permet de ré-ordonner les
-    # DocumentResult rechargés selon l'ordre original du corpus.
-    doc_order = {doc.doc_id: idx for idx, doc in enumerate(corpus.documents)}
-    engine_reports: list[Any] = []
-    for engine in engines:
-        # Vérifier la cancellation entre engines.
-        if cancel_event is not None and getattr(
-            cancel_event, "is_set", lambda: False,
-        )():
-            logger.info(
-                "[partial_dir] benchmark annulé avant l'engine '%s' "
-                "— partials conservés pour reprise.", engine.name,
-            )
-            break
-        # Phase 2.3 — fingerprint inclut config moteur + profil
-        # normalisation + char_exclude + corpus files (mtime/size) +
-        # version code.  Deux runs avec configs différentes →
-        # fichiers partiels distincts → pas de réutilisation
-        # silencieuse de résultats incompatibles.
-        partial_path = partial_path_for_engine(
-            corpus=corpus,
-            engine=engine,
-            partial_dir=partial_dir,
-            engine_config=_engine_config_for_fingerprint(engine),
-            normalization_profile=normalization_profile,
-            char_exclude=char_exclude,
-            profile=profile,
-            code_version=code_version,
-        )
-        loaded_results = _load_partial(partial_path)
-        loaded_doc_ids = {dr.doc_id for dr in loaded_results}
-        if loaded_results:
-            logger.info(
-                "[partial_dir] reprise '%s' : %d/%d docs déjà traités.",
-                engine.name, len(loaded_results), len(corpus.documents),
-            )
-        remaining_docs = [
-            d for d in corpus.documents if d.doc_id not in loaded_doc_ids
-        ]
-        new_doc_results: list[Any] = []
-        if remaining_docs:
-            # Sub-corpus avec uniquement les docs restants.  On
-            # conserve le ``name`` original pour que les chemins de
-            # partial restent cohérents si un re-run arrive.
-            sub_corpus = LegacyCorpus(
-                name=corpus.name,
-                documents=remaining_docs,
-                source_path=corpus.source_path,
-            )
-            with tempfile.TemporaryDirectory(
-                prefix="picarones_bench_partial_",
-            ) as ws:
-                workspace = Path(ws)
-                gt_dir = workspace / "gt"
-                gt_dir.mkdir()
-                run_dir = workspace / "run"
-                run_dir.mkdir()
-                sub_corpus_spec = corpus_to_corpus_spec(
-                    sub_corpus, workspace_dir=gt_dir,
-                )
-                pipeline_spec = engine_to_pipeline_spec(engine)
-                adapter_resolver = build_adapter_resolver([engine])
-                pipeline_to_engine_name = {pipeline_spec.name: engine.name}
-                run_result = execute_via_benchmark_service(
-                    corpus_spec=sub_corpus_spec,
-                    pipeline_specs=[pipeline_spec],
-                    adapter_resolver=adapter_resolver,
-                    workspace_uri=str(run_dir),
-                    code_version=code_version,
-                    timeout_seconds=timeout_seconds,
-                    progress_callback=progress_callback,
-                    cancel_event=cancel_event,
-                    pipeline_to_engine_name=pipeline_to_engine_name,
-                )
-                # Convertir ce sous-RunResult en EngineReport avec
-                # uniquement les docs restants — puis extraire les
-                # ``DocumentResult`` pour append au partial.
-                sub_report = run_result_to_benchmark_result(
-                    run_result,
-                    corpus=sub_corpus,
-                    engines=[engine],
-                    char_exclude=char_exclude,
-                    normalization_profile=normalization_profile,
-                    profile=profile,
-                )
-                new_doc_results = list(
-                    sub_report.engine_reports[0].document_results,
-                )
-                # Append au partial : un cancel mid-engine préserve
-                # ce qui a déjà été calculé.
-                for dr in new_doc_results:
-                    _save_partial_line(partial_path, dr)
-        # Fusion : loaded + new, ré-ordonné selon le corpus original.
-        all_doc_results = list(loaded_results) + new_doc_results
-        all_doc_results.sort(key=lambda dr: doc_order.get(dr.doc_id, 0))
-        aggregated = aggregate_metrics([d.metrics for d in all_doc_results])
-        pipeline_info = _build_pipeline_info(engine)
-        agg_values = run_corpus_aggregators(profile, all_doc_results)
-        engine_reports.append(
-            EngineReport(
-                engine_name=engine.name,
-                engine_version=_safe_engine_version(engine),
-                engine_config=getattr(engine, "config", {}) or {},
-                document_results=all_doc_results,
-                aggregated_metrics=aggregated,
-                pipeline_info=pipeline_info,
-                **agg_values,
-            ),
-        )
-        # Engine traité avec succès → cleanup du partial.  Si on
-        # arrive ici sans exception, tous les docs sont dans
-        # ``all_doc_results``.
-        _delete_partial(partial_path)
-    # Phase 3.2 audit code-quality — consume_fallback_log idempotent.
-    from picarones.adapters.corpus._fallback_log import consume_fallback_log
-    fallbacks = consume_fallback_log()
-    metadata: dict[str, Any] = {}
-    if fallbacks:
-        metadata["importer_fallbacks"] = fallbacks
-    return BenchmarkResult(
-        corpus_name=corpus.name,
-        corpus_source=str(corpus.source_path) if corpus.source_path else None,
-        document_count=len(corpus.documents),
-        engine_reports=engine_reports,
-        metadata=metadata,
-    )
-__all__ = ["run_benchmark_unified", "run_benchmark_with_partial"]

picarones/app/services/benchmark_runner.py DELETED Viewed

@@ -1,335 +0,0 @@
-"""Entry point CLI/web — façade ``run_benchmark_via_service``.
-.. deprecated:: 2.0.0
-    Module deprecated en Phase B7 du chantier de migration Option B
-    (mai 2026).  Utiliser :class:`picarones.RunOrchestrator` qui
-    consomme un ``RunSpec`` Pydantic.
-    - La fonction ``run_benchmark_via_service`` émet une
-      ``DeprecationWarning`` à chaque appel.
-    - Aucun call site actif ne subsiste dans ``picarones/`` —
-      CLI/Web utilisent désormais directement le pattern 3 étapes
-      ``prepare_preset_args → execute_preset →
-      run_result_to_benchmark_result`` (cf.
-      :mod:`picarones.app.services.python_helpers`).
-    - Retrait du module prévu **Phase B3-final commit 6** (suivant).
-    Pour migrer votre code, voir le guide
-    ``docs/migration/option_b_user_guide.md``.
-Présente l'API mono-call ``run_benchmark_via_service(corpus,
-engines, ...)`` consommée par ``picarones.interfaces.cli`` et
-``picarones.interfaces.web``.  S'appuie en interne sur le service
-canonique (``BenchmarkService``, ``PipelineExecutor``,
-``CorpusRunner``).
-Pourquoi cette façade
----------------------
-``BenchmarkService`` consomme ``CorpusSpec`` (références
-filesystem, Pydantic, immutable) et ``PipelineSpec`` (déclaratif).
-Les interfaces utilisateur (CLI, web upload) raisonnent en
-``Corpus`` riche en behavior + liste de moteurs OCR/LLM.  Ce
-module fait la conversion entre les deux modèles, expose une API
-mono-call ergonomique et restitue un ``BenchmarkResult``.
-"""
-from __future__ import annotations
-import logging
-from pathlib import Path
-from typing import TYPE_CHECKING, Any, Callable
-if TYPE_CHECKING:
-    from picarones.evaluation.corpus import Corpus
-logger = logging.getLogger(__name__)
-# Le ``OCRLLMPipelineConfig`` (couche 4) est consommé exclusivement
-# par duck typing (``is_pipeline``, ``ocr_adapter``, ``llm_adapter``,
-# ``mode``, ``prompt_template``) pour respecter l'inward-only :
-# ``app/`` ne doit pas importer ``pipeline/llm_pipeline_config``
-# directement.
-# ──────────────────────────────────────────────────────────────────────
-# Mapping Document → DocumentRef
-# ──────────────────────────────────────────────────────────────────────
-# Phase 6 (round 3) audit code-quality (2026-05) — extraction des
-# conversions Document/Corpus + helpers GT vers
-# ``_benchmark_conversions.py``.  Réexport pour préserver l'API
-# publique (CLI/web consomment ces noms).
-from picarones.app.services._benchmark_conversions import (  # noqa: F401
-    _DEFAULT_SUFFIXES,
-    _has_text_gt,
-    _payload_to_text,
-    _resolve_gt_uri,
-    _safe_doc_id,
-    corpus_to_corpus_spec,
-    document_to_document_ref,
-)
-# ──────────────────────────────────────────────────────────────────────
-# Mapping RunResult → BenchmarkResult
-# ──────────────────────────────────────────────────────────────────────
-# Phase 6 (round 6) audit code-quality (2026-05) — converter
-# ``run_result_to_benchmark_result`` extrait vers le module dédié.
-from picarones.app.services._benchmark_converter import (  # noqa: F401
-    run_result_to_benchmark_result,
-)
-# ──────────────────────────────────────────────────────────────────────
-# Helpers privés du converter RunResult → BenchmarkResult
-# ──────────────────────────────────────────────────────────────────────
-# Phase 6 (round 5) audit code-quality (2026-05) — extraction des
-# helpers internes de conversion ``RunResult → BenchmarkResult``
-# vers ``_benchmark_helpers.py`` (~260 LOC).  Réexport pour les
-# appels internes et les tests qui patchent ces symboles.
-from picarones.app.services._benchmark_helpers import (  # noqa: F401
-    _OCRResultLike,
-    _build_pipeline_info,
-    _build_pipeline_metadata,
-    _engine_config_for_fingerprint,
-    _extract_first_error,
-    _extract_text_outputs,
-    _extract_token_confidences,
-    _resolve_corpus_lang,
-    _safe_engine_version,
-)
-# Phase 6 (round 2) — extraction du bloc engine→spec + resolver.
-from picarones.app.services._benchmark_adapter_resolver import (  # noqa: F401
-    _canonical_adapter_to_spec,
-    _is_canonical_adapter,
-    _llm_adapter_name,
-    _ocr_llm_pipeline_to_spec,
-    _safe_pipeline_name,
-    build_adapter_resolver,
-    engine_to_pipeline_spec,
-)
-def run_benchmark_via_service(
-    corpus: "Corpus",
-    engines: list[Any],
-    *,
-    char_exclude: Any | None = None,
-    normalization_profile: Any | None = None,
-    output_json: Any | None = None,
-    code_version: str | None = None,
-    show_progress: bool = True,  # noqa: ARG001
-    progress_callback: Callable[[str, int, str], None] | None = None,
-    timeout_seconds: float = 60.0,
-    cancel_event: Any | None = None,
-    partial_dir: str | Path | None = None,
-    entity_extractor: Callable[[str], list[dict]] | None = None,
-    profile: str = "standard",
-) -> Any:
-    """Façade ``run_benchmark`` →
-    ``BenchmarkService`` rewrite.
-    Présente la signature historique de
-    ``picarones.app.services.benchmark_runner.run_benchmark`` mais s'appuie
-    en interne sur le rewrite (``CorpusSpec``, ``PipelineSpec``,
-    ``PipelineExecutor``, ``BenchmarkService``).  Pivot du Sprint D
-    du plan v2.0.
-    Périmètre actuel (D.1.d, MVP)
-    -----------------------------
-    Cette première version fonctionne pour le cas le plus simple :
-    - Un ou plusieurs ``BaseOCREngine`` (OCR seul ou pipeline OCR+LLM
-      via ``OCRLLMPipeline``).
-    - Un ``Corpus`` avec image_path + ground_truth (TEXT) par doc.
-    - Métriques CER/WER calculées via ``compute_metrics`` sur les
-      hypothèses extraites des artefacts produits.
-    - Conversion en ``BenchmarkResult`` compatible avec les
-      consommateurs historiques (rapport HTML, narrative engine).
-    Périmètre reporté (D.2)
-    -----------------------
-    Les paramètres suivants sont **acceptés mais ignorés** dans
-    cette MVP — le rewrite gère ces aspects nativement :
-    - ``show_progress`` (tqdm).
-    Pour régler le parallélisme corpus-wide, passer par
-    ``CorpusRunner.max_in_flight`` directement (couche pipeline).
-    Profil de mesures (D.2.f)
-    -------------------------
-    ``profile`` est validé au démarrage via
-    ``picarones.evaluation.metric_hooks.validate_profile``.  Un
-    profil inconnu lève ``PicaronesError``.  La valeur n'a pas
-    encore d'effet sur les hooks document-level (ce serait l'objet
-    d'un sprint ultérieur, hors du périmètre v2.0).
-    NER attach (D.2.e)
-    ------------------
-    Si ``entity_extractor`` est fourni, après le calcul des
-    ``DocumentResult``, le service appelle l'extracteur sur chaque
-    hypothèse OCR pour les documents dont la GT possède un niveau
-    ``ENTITIES``, puis attache les métriques NER (``ner_metrics``
-    par document, ``aggregated_ner`` au niveau engine).
-    Reprise sur interruption (D.2.b)
-    --------------------------------
-    Si ``partial_dir`` est fourni, le bench est exécuté en mode
-    **per-engine resumable** :
-    - Pour chaque engine, on cherche un fichier
-      ``{partial_dir}/picarones_{corpus}_{engine}.partial.jsonl``
-      d'une exécution précédente interrompue.
-    - Les ``DocumentResult`` qui y sont déjà persistés sont
-      réutilisés tels quels (pas de recalcul).
-    - Seuls les documents restants sont soumis au ``BenchmarkService``.
-    - Chaque nouveau ``DocumentResult`` est ajouté en append au
-      partial avant de passer au suivant.
-    - À la fin d'un engine traité avec succès, son partial est
-      supprimé.
-    Quand ``partial_dir`` est ``None`` (défaut), une seule passe
-    multi-engine est lancée (chemin rapide, pas de persistance
-    intermédiaire).
-    Parameters
-    ----------
-    corpus:
-        Corpus.
-    engines:
-        Liste d'engines/pipelines à benchmarker.
-    char_exclude:
-        Filtre passé à ``compute_metrics``.
-    normalization_profile:
-        Profil de normalisation passé à ``compute_metrics``.
-    output_json:
-        Si fourni, le ``BenchmarkResult`` est sérialisé en JSON
-        à ce chemin (sérialisation BenchmarkResult).
-    code_version:
-        Version du code injectée dans le ``RunContext`` /
-        ``RunManifest``.  Défaut : ``picarones.__version__``.
-    timeout_seconds:
-        Timeout par document propagé au ``CorpusRunner``.
-    Returns
-    -------
-    BenchmarkResult
-        Format compatible avec les consommateurs historiques.
-    Raises
-    ------
-    PicaronesError
-        Si les engines ne déclarent pas tous un ``name`` unique
-        (cf. ``build_adapter_resolver``).
-    """
-    # Phase B3 migration Option B (mai 2026) — ``run_benchmark_via_service``
-    # est désormais déprécié.  Utiliser ``picarones.RunOrchestrator``
-    # qui consomme un ``RunSpec`` Pydantic et expose nativement les 4
-    # fichiers JSONL.  La fonction sera retirée en Phase B8 (post-
-    # deprecation release) ; cette warning aide à identifier les call
-    # sites à migrer.
-    #
-    # ``stacklevel=2`` pour que la warning pointe sur le caller (et non
-    # cette ligne).  ``stacklevel=3`` ferait pointer sur le caller du
-    # caller (utile si on emballe encore dans un helper privé).
-    import warnings as _warnings
-    _warnings.warn(
-        "run_benchmark_via_service est déprécié depuis Phase B3 de la "
-        "migration Option B.  Utiliser picarones.RunOrchestrator qui "
-        "consomme un RunSpec Pydantic.  Retrait prévu en Phase B8.",
-        DeprecationWarning,
-        stacklevel=2,
-    )
-    # D.2.f : valide ``profile`` tôt — un nom inconnu lève
-    # ``PicaronesError`` avant que le bench ne démarre, plutôt
-    # que de dégrader silencieusement plus loin.
-    from picarones.evaluation.metric_hooks import validate_profile
-    validate_profile(profile)
-    if code_version is None:
-        # Le scanner d'archi rejette ``from picarones import __version__``
-        # parce qu'il classe ``picarones`` (sans sous-package) comme une
-        # lib externe non whitelistée pour la couche ``app/``.  On
-        # contourne via importlib (déclaration dynamique).
-        import importlib
-        try:
-            code_version = importlib.import_module("picarones").__version__
-        except (ImportError, AttributeError):
-            code_version = "unknown"
-    if partial_dir is None:
-        benchmark_result = _run_benchmark_unified(
-            corpus=corpus,
-            engines=engines,
-            char_exclude=char_exclude,
-            normalization_profile=normalization_profile,
-            profile=profile,
-            code_version=code_version,
-            progress_callback=progress_callback,
-            timeout_seconds=timeout_seconds,
-            cancel_event=cancel_event,
-        )
-    else:
-        benchmark_result = _run_benchmark_with_partial(
-            corpus=corpus,
-            engines=engines,
-            partial_dir=Path(partial_dir),
-            char_exclude=char_exclude,
-            normalization_profile=normalization_profile,
-            profile=profile,
-            code_version=code_version,
-            progress_callback=progress_callback,
-            timeout_seconds=timeout_seconds,
-            cancel_event=cancel_event,
-        )
-    # D.2.e : NER attach post-process.  Idempotent — re-calcule à
-    # chaque run même en mode resume (les ner_metrics ne sont pas
-    # persistées dans le partial NDJSON
-    # qui calculait NER après le doc loop).
-    if entity_extractor is not None:
-        _attach_ner_metrics_to_benchmark(
-            benchmark_result, corpus, entity_extractor,
-        )
-    # Sérialisation JSON optionnelle
-    if output_json is not None:
-        _persist_benchmark_result_json(benchmark_result, Path(output_json))
-    return benchmark_result
-# Phase 6 audit code-quality (2026-05) — extraction NER aggregation
-# vers ``_benchmark_ner.py``.  Les noms ``_attach_ner_metrics_to_benchmark``
-# et ``_aggregate_ner_metrics`` restent ici comme alias pour ne pas
-# casser les appels internes (les autres fonctions du runner s'y
-# réfèrent) et les tests qui patchent ces symboles via monkeypatch.
-from picarones.app.services._benchmark_ner import (  # noqa: F401
-    aggregate_ner_metrics as _aggregate_ner_metrics,
-    attach_ner_metrics_to_benchmark as _attach_ner_metrics_to_benchmark,
-)
-# Phase 6 (round 6) — orchestration extraite.
-from picarones.app.services._benchmark_orchestration import (  # noqa: F401
-    run_benchmark_unified as _run_benchmark_unified,
-    run_benchmark_with_partial as _run_benchmark_with_partial,
-)
-# Phase 6 (round 4) audit code-quality (2026-05) — extraction de
-# ``_execute_via_benchmark_service`` vers ``_benchmark_execution.py``.
-# Alias conservé pour les appels internes de
-# ``_run_benchmark_unified`` et ``_run_benchmark_with_partial``.
-from picarones.app.services._benchmark_execution import (  # noqa: F401
-    execute_via_benchmark_service as _execute_via_benchmark_service,
-)
-from picarones.app.services._benchmark_persistence import (
-    persist_benchmark_result_json as _persist_benchmark_result_json,
-)

tests/app/schemas/test_run_spec_b1_extended.py CHANGED Viewed

@@ -82,19 +82,22 @@ class TestDefaults:
         assert spec.output_json is None
         assert spec.timeout_seconds_per_doc == 60.0
-    def test_defaults_match_run_benchmark_via_service_defaults(
         self, tmp_path: Path,
     ) -> None:
         """Les valeurs par défaut de ``RunSpec`` matchent celles de
-        ``run_benchmark_via_service`` pour préserver l'équivalence
-        fonctionnelle pendant la migration.
         """
-        from picarones.app.services.benchmark_runner import (
-            run_benchmark_via_service,
-        )
         import inspect
-        sig = inspect.signature(run_benchmark_via_service)
         defaults = {
             name: param.default
             for name, param in sig.parameters.items()
@@ -102,14 +105,11 @@ class TestDefaults:
         }
         spec = load_run_spec_from_yaml(_minimal_yaml(output_dir=tmp_path / "out"))
-        # Les noms diffèrent légèrement (RunSpec.timeout_seconds_per_doc
-        # vs run_benchmark_via_service.timeout_seconds — mais la
-        # sémantique est identique : timeout par document).
         assert spec.char_exclude == defaults["char_exclude"]
         assert spec.normalization_profile == defaults["normalization_profile"]
         assert spec.partial_dir == defaults["partial_dir"]
         assert spec.profile == defaults["profile"]
-        assert spec.timeout_seconds_per_doc == defaults["timeout_seconds"]
 # ──────────────────────────────────────────────────────────────────────

         assert spec.output_json is None
         assert spec.timeout_seconds_per_doc == 60.0
+    def test_defaults_match_prepare_preset_args_defaults(
         self, tmp_path: Path,
     ) -> None:
         """Les valeurs par défaut de ``RunSpec`` matchent celles de
+        ``prepare_preset_args`` pour cohérence avec l'API publique
+        Python (callers qui instancient des adapters).
+        Phase B3-final (mai 2026) — ce test remplace l'ancien
+        ``test_defaults_match_run_benchmark_via_service_defaults``
+        qui inspectait la fonction legacy supprimée.
         """
         import inspect
+        from picarones.app.services import prepare_preset_args
+        sig = inspect.signature(prepare_preset_args)
         defaults = {
             name: param.default
             for name, param in sig.parameters.items()
         }
         spec = load_run_spec_from_yaml(_minimal_yaml(output_dir=tmp_path / "out"))
         assert spec.char_exclude == defaults["char_exclude"]
         assert spec.normalization_profile == defaults["normalization_profile"]
         assert spec.partial_dir == defaults["partial_dir"]
         assert spec.profile == defaults["profile"]
+        assert spec.timeout_seconds_per_doc == defaults["timeout_seconds_per_doc"]
 # ──────────────────────────────────────────────────────────────────────

tests/app/test_s9_resolver_collision.py CHANGED Viewed

@@ -29,7 +29,9 @@ from __future__ import annotations
 import pytest
-from picarones.app.services.benchmark_runner import build_adapter_resolver
 from picarones.domain.errors import PicaronesError

 import pytest
+from picarones.app.services._benchmark_adapter_resolver import (
+    build_adapter_resolver,
+)
 from picarones.domain.errors import PicaronesError

tests/app/test_sprint_d2b_partial_dir_resume.py CHANGED Viewed

@@ -29,7 +29,7 @@ from picarones.app.services.partial_store import (
     _save_partial_line,
     partial_path_for_engine,
 )
-from picarones.app.services.benchmark_runner import (
     _engine_config_for_fingerprint,
 )
 from tests._migration_helpers import run_via_orchestrator

     _save_partial_line,
     partial_path_for_engine,
 )
+from picarones.app.services._benchmark_helpers import (
     _engine_config_for_fingerprint,
 )
 from tests._migration_helpers import run_via_orchestrator

tests/app/test_sprint_d2cdef_features.py CHANGED Viewed

@@ -22,8 +22,8 @@ import pytest
 from picarones.adapters.llm.base import BaseLLMAdapter
 from picarones.adapters.ocr.base import BaseOCRAdapter
-from picarones.app.services.benchmark_runner import (
-    _aggregate_ner_metrics,
 )
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.evaluation.corpus import (

 from picarones.adapters.llm.base import BaseLLMAdapter
 from picarones.adapters.ocr.base import BaseOCRAdapter
+from picarones.app.services._benchmark_ner import (
+    aggregate_ner_metrics as _aggregate_ner_metrics,
 )
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.evaluation.corpus import (

tests/app/test_sprint_h2b_canonical_in_runner.py CHANGED Viewed

@@ -20,7 +20,7 @@ from picarones.adapters.ocr import (
     PrecomputedTextAdapter,
     ocr_adapter_from_name,
 )
-from picarones.app.services.benchmark_runner import (
     build_adapter_resolver,
     engine_to_pipeline_spec,
 )

     PrecomputedTextAdapter,
     ocr_adapter_from_name,
 )
+from picarones.app.services._benchmark_adapter_resolver import (
     build_adapter_resolver,
     engine_to_pipeline_spec,
 )

tests/evaluation/metrics/test_sprint_a14_s1_normalization_propagation.py CHANGED Viewed

@@ -19,7 +19,8 @@ from __future__ import annotations
 import inspect
-from picarones.app.services.benchmark_runner import run_benchmark_via_service
 from picarones.evaluation.metrics.normalization import (
     NORMALIZATION_PROFILES,
     get_builtin_profile,
@@ -27,11 +28,26 @@ from picarones.evaluation.metrics.normalization import (
 class TestRunBenchmarkSignature:
-    def test_run_benchmark_accepts_normalization_profile(self) -> None:
-        """La signature publique doit exposer ``normalization_profile``."""
-        sig = inspect.signature(run_benchmark_via_service)
         assert "normalization_profile" in sig.parameters
-        # Et avec une valeur par défaut sûre.
         assert sig.parameters["normalization_profile"].default is None

 import inspect
+from picarones.app.schemas.run_spec import RunSpec
+from picarones.app.services import prepare_preset_args
 from picarones.evaluation.metrics.normalization import (
     NORMALIZATION_PROFILES,
     get_builtin_profile,
 class TestRunBenchmarkSignature:
+    """Phase B3-final (mai 2026) — la propagation de
+    ``normalization_profile`` est désormais portée par ``RunSpec``
+    (champ Pydantic) et par ``prepare_preset_args`` (kwarg).
+    ``run_benchmark_via_service`` a été supprimé."""
+    def test_run_spec_exposes_normalization_profile(self) -> None:
+        """``RunSpec.normalization_profile`` est un champ Pydantic
+        documenté (cf. Phase B1)."""
+        assert "normalization_profile" in RunSpec.model_fields
+        field = RunSpec.model_fields["normalization_profile"]
+        # Champ optionnel — défaut None.
+        assert field.default is None
+    def test_prepare_preset_args_accepts_normalization_profile(
+        self,
+    ) -> None:
+        """``prepare_preset_args`` propage le profil au RunSpec."""
+        sig = inspect.signature(prepare_preset_args)
         assert "normalization_profile" in sig.parameters
+        # Optionnel par défaut.
         assert sig.parameters["normalization_profile"].default is None

tests/evaluation/test_public_api.py CHANGED Viewed

@@ -197,44 +197,16 @@ class TestMetricsApi:
 # ──────────────────────────────────────────────────────────────────────────
-# 5. picarones.app.services.benchmark_runner — run_benchmark_via_service
 # ──────────────────────────────────────────────────────────────────────────
-class TestRunnerApi:
-    def test_run_benchmark_via_service_exists(self):
-        """Sprint D du plan v2.0 — l'adapter rewrite remplace
-        ``measurements.runner.run_benchmark`` (legacy supprimé en D.6)."""
-        _assert_function(
-            "picarones.app.services.benchmark_runner",
-            "run_benchmark_via_service",
-        )
-    def test_run_benchmark_via_service_keyword_args(self):
-        """Les paramètres clés (corpus, engines, profile…) doivent rester
-        accessibles dans l'adapter rewrite. Ajout d'un argument requis =
-        breaking change."""
-        from picarones.app.services.benchmark_runner import (
-            run_benchmark_via_service,
-        )
-        sig = inspect.signature(run_benchmark_via_service)
-        params = sig.parameters
-        # Arguments contractuels — leur présence est garantie pour
-        # rester compatible avec les callers historiques.
-        # Phase 4.1 audit code-quality (2026-05) : retrait de
-        # ``max_workers`` (paramètre absorbé sans effet via
-        # ``noqa: ARG001`` ; le rewrite passe par
-        # ``CorpusRunner.max_in_flight``).  Rupture mineure
-        # documentée dans CHANGELOG v2.0.
-        for name in [
-            "corpus", "engines", "output_json", "show_progress",
-            "char_exclude", "timeout_seconds",
-            "profile",
-        ]:
-            assert name in params, (
-                f"run_benchmark_via_service : argument '{name}' a disparu "
-                f"(signature : {sig})"
-            )
 # ──────────────────────────────────────────────────────────────────────────
@@ -289,33 +261,19 @@ class TestRunOrchestratorApi:
             f"Phase B3 — '{name}' devrait être dans picarones.__all__"
         )
-    def test_run_benchmark_via_service_still_callable_with_warning(self):
-        """Compat ascendante : ``run_benchmark_via_service`` est toujours
-        appelable mais émet une ``DeprecationWarning``."""
-        import warnings
-        from picarones.evaluation.corpus import Corpus
-        from picarones.app.services.benchmark_runner import (
-            run_benchmark_via_service,
-        )
-        corp = Corpus(name="warn_test", documents=[])
-        with warnings.catch_warnings(record=True) as caught:
-            warnings.simplefilter("always")
-            try:
-                run_benchmark_via_service(corp, [])
-            except Exception:
-                # Le bench échoue sur un corpus vide mais peu importe —
-                # on teste juste l'émission du warning.
-                pass
-        deprecation_warnings = [
-            w for w in caught if issubclass(w.category, DeprecationWarning)
-        ]
-        assert len(deprecation_warnings) >= 1, (
-            "run_benchmark_via_service devrait émettre une "
-            "DeprecationWarning (Phase B3)"
         )
-        assert "RunOrchestrator" in str(deprecation_warnings[0].message)
 # ──────────────────────────────────────────────────────────────────────────

 # ──────────────────────────────────────────────────────────────────────────
+# 5. (anciennement) ``picarones.app.services.benchmark_runner`` —
+#    supprimé en Phase B3-final (mai 2026, migration Option B).
 # ──────────────────────────────────────────────────────────────────────────
+# Le module ``benchmark_runner.py`` portait l'entry point legacy
+# ``run_benchmark_via_service`` qui a été remplacé par
+# ``picarones.RunOrchestrator`` (consommant un ``RunSpec`` Pydantic
+# ou des objets domain pré-construits via ``execute_preset()``).
+# Le contract test du legacy a été supprimé avec le module.  Voir
+# ``TestRunOrchestratorApi`` ci-dessous pour le contrat de
+# l'entry point canonique actuel.
 # ──────────────────────────────────────────────────────────────────────────
             f"Phase B3 — '{name}' devrait être dans picarones.__all__"
         )
+    def test_prepare_preset_args_exposed_at_root(self):
+        """Phase B3-final — ``prepare_preset_args`` est l'API
+        publique pour les callers Python qui instancient leurs adapters
+        en mémoire (par opposition au chargement YAML via ``RunSpec``).
+        """
+        from picarones.app.services import (
+            PresetArgs,
+            prepare_preset_args,
+            run_result_to_benchmark_result,
         )
+        assert callable(prepare_preset_args)
+        assert callable(run_result_to_benchmark_result)
+        assert inspect.isclass(PresetArgs)
 # ──────────────────────────────────────────────────────────────────────────

tests/integration/snapshots/migration_invariance.json DELETED Viewed

@@ -1,470 +0,0 @@
-{
-  "corpus": {
-    "document_count": 2,
-    "name": "invariance_corpus",
-    "source": null
-  },
-  "engine_reports": [
-    {
-      "aggregated_char_scores": {
-        "diacritic": {
-          "correctly_recognized": 0,
-          "score": 1.0,
-          "total_in_gt": 0
-        },
-        "ligature": {
-          "correctly_recognized": 0,
-          "per_ligature": {},
-          "score": 1.0,
-          "total_in_gt": 0
-        }
-      },
-      "aggregated_confusion": {
-        "matrix": {},
-        "total_deletions": 0,
-        "total_insertions": 0,
-        "total_substitutions": 1
-      },
-      "aggregated_hallucination": {
-        "anchor_score_mean": 0.5,
-        "anchor_score_min": 0.0,
-        "document_count": 2,
-        "hallucinating_doc_count": 1,
-        "hallucinating_doc_rate": 0.5,
-        "length_ratio_mean": 1.0,
-        "net_insertion_rate_mean": 0.25
-      },
-      "aggregated_line_metrics": {
-        "catastrophic_rate": {
-          "0.3": 0.0,
-          "0.5": 0.0,
-          "1.0": 0.0
-        },
-        "document_count": 2,
-        "gini_mean": 0.0,
-        "gini_stdev": 0.0,
-        "heatmap": [
-          0.0,
-          0.0,
-          0.0,
-          0.0,
-          0.0,
-          0.0,
-          0.0,
-          0.0,
-          0.0,
-          0.045455
-        ],
-        "mean_cer_mean": 0.045455,
-        "percentiles": {
-          "p50": 0.045455,
-          "p75": 0.045455,
-          "p90": 0.045455,
-          "p95": 0.045455,
-          "p99": 0.045455
-        }
-      },
-      "aggregated_metrics": {
-        "cer": {
-          "max": 0.090909,
-          "mean": 0.045455,
-          "median": 0.045455,
-          "min": 0.0,
-          "stdev": 0.064282
-        },
-        "cer_caseless": {
-          "max": 0.090909,
-          "mean": 0.045455,
-          "median": 0.045455,
-          "min": 0.0,
-          "stdev": 0.064282
-        },
-        "cer_diplomatic": {
-          "max": 0.090909,
-          "mean": 0.045455,
-          "median": 0.045455,
-          "min": 0.0,
-          "profile": "medieval_french",
-          "stdev": 0.064282
-        },
-        "cer_nfc": {
-          "max": 0.090909,
-          "mean": 0.045455,
-          "median": 0.045455,
-          "min": 0.0,
-          "stdev": 0.064282
-        },
-        "document_count": 2,
-        "failed_count": 0,
-        "mer": {
-          "max": 0.5,
-          "mean": 0.25,
-          "median": 0.25,
-          "min": 0.0,
-          "stdev": 0.353553
-        },
-        "wer": {
-          "max": 0.5,
-          "mean": 0.25,
-          "median": 0.25,
-          "min": 0.0,
-          "stdev": 0.353553
-        },
-        "wer_normalized": {
-          "max": 0.5,
-          "mean": 0.25,
-          "median": 0.25,
-          "min": 0.0,
-          "stdev": 0.353553
-        },
-        "wil": {
-          "max": 0.75,
-          "mean": 0.375,
-          "median": 0.375,
-          "min": 0.0,
-          "stdev": 0.53033
-        }
-      },
-      "aggregated_searchability": {
-        "max_distance": 2,
-        "missed_tokens_sample": [],
-        "n_docs": 2,
-        "n_gt_tokens": 5,
-        "n_searchable": 5,
-        "recall": 1.0
-      },
-      "aggregated_structure": {
-        "document_count": 2,
-        "mean_line_accuracy": 1.0,
-        "mean_line_fragmentation_rate": 0.0,
-        "mean_line_fusion_rate": 0.0,
-        "mean_paragraph_conservation": 1.0,
-        "mean_reading_order_score": 0.75
-      },
-      "aggregated_taxonomy": {
-        "class_distribution": {
-          "abbreviation_error": 0.0,
-          "case_error": 0.0,
-          "diacritic_error": 0.0,
-          "hapax": 1.0,
-          "lacuna": 0.0,
-          "ligature_error": 0.0,
-          "oov_character": 0.0,
-          "segmentation_error": 0.0,
-          "visual_confusion": 0.0
-        },
-        "counts": {
-          "abbreviation_error": 0,
-          "case_error": 0,
-          "diacritic_error": 0,
-          "hapax": 1,
-          "lacuna": 0,
-          "ligature_error": 0,
-          "oov_character": 0,
-          "segmentation_error": 0,
-          "visual_confusion": 0
-        },
-        "total_errors": 1
-      },
-      "document_results": [
-        {
-          "char_scores": {
-            "diacritic": {
-              "correctly_recognized": 0,
-              "per_diacritic": {},
-              "score": 1.0,
-              "total_in_gt": 0
-            },
-            "ligature": {
-              "correctly_recognized": 0,
-              "per_ligature": {},
-              "score": 1.0,
-              "total_in_gt": 0
-            }
-          },
-          "confusion_matrix": {
-            "matrix": {},
-            "total_deletions": 0,
-            "total_insertions": 0,
-            "total_substitutions": 0
-          },
-          "doc_id": "doc1",
-          "duration_seconds": 0.0,
-          "engine_error": null,
-          "ground_truth": "Bonjour le monde",
-          "hallucination_metrics": {
-            "anchor_score": 1.0,
-            "anchor_threshold_used": 0.5,
-            "gt_word_count": 3,
-            "hallucinated_blocks": [],
-            "hyp_word_count": 3,
-            "is_hallucinating": false,
-            "length_ratio": 1.0,
-            "length_ratio_threshold_used": 1.2,
-            "net_inserted_words": 0,
-            "net_insertion_rate": 0.0,
-            "ngram_size_used": 3
-          },
-          "hypothesis": "Bonjour le monde",
-          "image_path": "FIXTURES/doc1.png",
-          "line_metrics": {
-            "catastrophic_rate": {
-              "0.3": 0.0,
-              "0.5": 0.0,
-              "1.0": 0.0
-            },
-            "cer_per_line": [
-              0.0
-            ],
-            "gini": 0.0,
-            "heatmap": [
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0
-            ],
-            "line_count": 1,
-            "mean_cer": 0.0,
-            "percentiles": {
-              "p50": 0.0,
-              "p75": 0.0,
-              "p90": 0.0,
-              "p95": 0.0,
-              "p99": 0.0
-            }
-          },
-          "metrics": {
-            "cer": 0.0,
-            "cer_caseless": 0.0,
-            "cer_diplomatic": 0.0,
-            "cer_nfc": 0.0,
-            "diplomatic_profile_name": "medieval_french",
-            "error": null,
-            "hypothesis_length": 16,
-            "mer": 0.0,
-            "reference_length": 16,
-            "wer": 0.0,
-            "wer_normalized": 0.0,
-            "wil": 0.0
-          },
-          "searchability_metrics": {
-            "max_distance": 2,
-            "missed_tokens": [],
-            "n_gt_tokens": 3,
-            "n_searchable": 3,
-            "recall": 1.0
-          },
-          "structure": {
-            "gt_line_count": 1,
-            "line_accuracy": 1.0,
-            "line_fragmentation_count": 0,
-            "line_fragmentation_rate": 0.0,
-            "line_fusion_count": 0,
-            "line_fusion_rate": 0.0,
-            "ocr_line_count": 1,
-            "paragraph_conservation_score": 1.0,
-            "reading_order_score": 1.0
-          },
-          "taxonomy": {
-            "class_distribution": {},
-            "counts": {
-              "abbreviation_error": 0,
-              "case_error": 0,
-              "diacritic_error": 0,
-              "hapax": 0,
-              "lacuna": 0,
-              "ligature_error": 0,
-              "oov_character": 0,
-              "segmentation_error": 0,
-              "visual_confusion": 0
-            },
-            "examples": {
-              "abbreviation_error": [],
-              "case_error": [],
-              "diacritic_error": [],
-              "hapax": [],
-              "lacuna": [],
-              "ligature_error": [],
-              "oov_character": [],
-              "segmentation_error": [],
-              "visual_confusion": []
-            },
-            "total_errors": 0
-          }
-        },
-        {
-          "char_scores": {
-            "diacritic": {
-              "correctly_recognized": 0,
-              "per_diacritic": {},
-              "score": 1.0,
-              "total_in_gt": 0
-            },
-            "ligature": {
-              "correctly_recognized": 0,
-              "per_ligature": {},
-              "score": 1.0,
-              "total_in_gt": 0
-            }
-          },
-          "confusion_matrix": {
-            "matrix": {
-              "l": {
-                "i": 1
-              }
-            },
-            "total_deletions": 0,
-            "total_insertions": 0,
-            "total_substitutions": 1
-          },
-          "doc_id": "doc2",
-          "duration_seconds": 0.0,
-          "engine_error": null,
-          "ground_truth": "Hello world",
-          "hallucination_metrics": {
-            "anchor_score": 0.0,
-            "anchor_threshold_used": 0.5,
-            "gt_word_count": 2,
-            "hallucinated_blocks": [],
-            "hyp_word_count": 2,
-            "is_hallucinating": true,
-            "length_ratio": 1.0,
-            "length_ratio_threshold_used": 1.2,
-            "net_inserted_words": 1,
-            "net_insertion_rate": 0.5,
-            "ngram_size_used": 3
-          },
-          "hypothesis": "Helio world",
-          "image_path": "FIXTURES/doc2.png",
-          "line_metrics": {
-            "catastrophic_rate": {
-              "0.3": 0.0,
-              "0.5": 0.0,
-              "1.0": 0.0
-            },
-            "cer_per_line": [
-              0.090909
-            ],
-            "gini": 0.0,
-            "heatmap": [
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.0,
-              0.090909
-            ],
-            "line_count": 1,
-            "mean_cer": 0.090909,
-            "percentiles": {
-              "p50": 0.090909,
-              "p75": 0.090909,
-              "p90": 0.090909,
-              "p95": 0.090909,
-              "p99": 0.090909
-            }
-          },
-          "metrics": {
-            "cer": 0.090909,
-            "cer_caseless": 0.090909,
-            "cer_diplomatic": 0.090909,
-            "cer_nfc": 0.090909,
-            "diplomatic_profile_name": "medieval_french",
-            "error": null,
-            "hypothesis_length": 11,
-            "mer": 0.5,
-            "reference_length": 11,
-            "wer": 0.5,
-            "wer_normalized": 0.5,
-            "wil": 0.75
-          },
-          "searchability_metrics": {
-            "max_distance": 2,
-            "missed_tokens": [],
-            "n_gt_tokens": 2,
-            "n_searchable": 2,
-            "recall": 1.0
-          },
-          "structure": {
-            "gt_line_count": 1,
-            "line_accuracy": 1.0,
-            "line_fragmentation_count": 0,
-            "line_fragmentation_rate": 0.0,
-            "line_fusion_count": 0,
-            "line_fusion_rate": 0.0,
-            "ocr_line_count": 1,
-            "paragraph_conservation_score": 1.0,
-            "reading_order_score": 0.5
-          },
-          "taxonomy": {
-            "class_distribution": {
-              "abbreviation_error": 0.0,
-              "case_error": 0.0,
-              "diacritic_error": 0.0,
-              "hapax": 1.0,
-              "lacuna": 0.0,
-              "ligature_error": 0.0,
-              "oov_character": 0.0,
-              "segmentation_error": 0.0,
-              "visual_confusion": 0.0
-            },
-            "counts": {
-              "abbreviation_error": 0,
-              "case_error": 0,
-              "diacritic_error": 0,
-              "hapax": 1,
-              "lacuna": 0,
-              "ligature_error": 0,
-              "oov_character": 0,
-              "segmentation_error": 0,
-              "visual_confusion": 0
-            },
-            "examples": {
-              "abbreviation_error": [],
-              "case_error": [],
-              "diacritic_error": [],
-              "hapax": [
-                {
-                  "gt": "Hello",
-                  "ocr": "Helio"
-                }
-              ],
-              "lacuna": [],
-              "ligature_error": [],
-              "oov_character": [],
-              "segmentation_error": [],
-              "visual_confusion": []
-            },
-            "total_errors": 1
-          }
-        }
-      ],
-      "engine_config": {},
-      "engine_name": "precomputed_invariance",
-      "engine_version": "PINNED"
-    }
-  ],
-  "metadata": {},
-  "picarones_version": "PINNED",
-  "ranking": [
-    {
-      "documents": 2,
-      "engine": "precomputed_invariance",
-      "failed": 0,
-      "mean_cer": 0.045455,
-      "mean_wer": 0.25,
-      "median_cer": 0.045455
-    }
-  ],
-  "run_date": "PINNED"
-}

tests/integration/test_migration_invariance.py DELETED Viewed

@@ -1,289 +0,0 @@
-"""Test d'invariance run-to-run pour la migration Option B.
-Phase B0 du chantier de migration ``run_benchmark_via_service`` →
-``RunOrchestrator.execute(RunSpec)``.
-Rôle
-----
-Ce test exécute un benchmark **déterministe** (corpus mini de 2 docs +
-``PrecomputedTextAdapter``) via la façade actuelle
-``run_benchmark_via_service`` et compare son ``BenchmarkResult``
-normalisé à un snapshot JSON enregistré dans
-``tests/integration/snapshots/migration_invariance.json``.
-Pourquoi
---------
-Pendant la migration vers ``RunOrchestrator``, on porte 7 features
-(``progress_callback``, ``cancel_event``, ``partial_dir``,
-``entity_extractor``, ``char_exclude``, ``normalization_profile``,
-``profile``, ``output_json``).  Chaque port doit préserver
-**exactement** le comportement numérique du chemin existant.  Ce test
-sert de filet de sécurité : si une refactorisation interne modifie le
-résultat (CER, agrégation, ordre des engines, structure du JSON), le
-snapshot diverge et la CI échoue.
-Le test n'utilise **aucune** dépendance externe (pas de Tesseract, pas
-de réseau).  Le ``PrecomputedTextAdapter`` lit un fichier texte écrit
-sur disque — sortie 100% déterministe.
-Mise à jour du snapshot
------------------------
-Si une modification **volontaire** change le résultat (ex. nouveau
-champ dans ``BenchmarkResult``), régénérer le snapshot :
-    PICARONES_UPDATE_SNAPSHOT=1 python -m pytest \
-        tests/integration/test_migration_invariance.py
-Et inspecter le diff git du snapshot avant commit.
-Normalisation
--------------
-Les champs volatils sont neutralisés avant comparaison :
-- ``picarones_version`` → ``"PINNED"``
-- ``run_date`` → ``"PINNED"``
-- ``corpus.source`` → ``"FIXTURES/corpus"``
-- ``image_path`` → ``"FIXTURES/docN.png"``
-- ``duration_seconds`` → ``0.0``
-- Tout autre champ contenant le ``tmp_path`` → remplacé par
-  ``"FIXTURES/..."``
-Cela garantit que le snapshot reste stable cross-OS et cross-run.
-"""
-from __future__ import annotations
-import json
-import os
-import re
-from pathlib import Path
-from typing import Any
-import pytest
-from picarones.adapters.ocr.precomputed import PrecomputedTextAdapter
-from picarones.app.services.benchmark_runner import run_benchmark_via_service
-from picarones.evaluation.corpus import Corpus, Document
-SNAPSHOT_PATH = (
-    Path(__file__).parent / "snapshots" / "migration_invariance.json"
-)
-# ──────────────────────────────────────────────────────────────────────
-# Fixtures déterministes
-# ──────────────────────────────────────────────────────────────────────
-def _make_invariance_corpus(tmp_path: Path) -> Corpus:
-    """Corpus mini de 2 documents avec GT + texte précalculé.
-    Le texte précalculé est légèrement différent de la GT pour produire
-    des métriques CER/WER non triviales (et donc plus discriminantes
-    dans le snapshot).
-    """
-    documents: list[Document] = []
-    # Doc 1 : GT = "Bonjour le monde", OCR = "Bonjour le monde" → CER 0.0
-    doc1_img = tmp_path / "doc1.png"
-    doc1_img.write_bytes(b"\x89PNG\r\n\x1a\n")  # PNG header minimal
-    doc1_ocr = tmp_path / "doc1.invariance.txt"
-    doc1_ocr.write_text("Bonjour le monde", encoding="utf-8")
-    documents.append(Document(
-        image_path=doc1_img,
-        ground_truth="Bonjour le monde",
-        doc_id="doc1",
-    ))
-    # Doc 2 : GT = "Hello world", OCR = "Helio world" → CER non nul
-    doc2_img = tmp_path / "doc2.png"
-    doc2_img.write_bytes(b"\x89PNG\r\n\x1a\n")
-    doc2_ocr = tmp_path / "doc2.invariance.txt"
-    doc2_ocr.write_text("Helio world", encoding="utf-8")
-    documents.append(Document(
-        image_path=doc2_img,
-        ground_truth="Hello world",
-        doc_id="doc2",
-    ))
-    return Corpus(name="invariance_corpus", documents=documents)
-def _make_invariance_engine() -> PrecomputedTextAdapter:
-    """``PrecomputedTextAdapter`` qui lit ``<stem>.invariance.txt``."""
-    return PrecomputedTextAdapter(source_label="invariance")
-# ──────────────────────────────────────────────────────────────────────
-# Normalisation du snapshot
-# ──────────────────────────────────────────────────────────────────────
-def _normalize_for_snapshot(data: Any, tmp_path: Path) -> Any:
-    """Normalise récursivement les champs volatils du ``BenchmarkResult``.
-    Remplace ``tmp_path`` par ``"FIXTURES"`` dans toutes les valeurs
-    string.  Neutralise les champs explicitement volatils
-    (``duration_seconds``, ``run_date``, ``picarones_version``,
-    ``engine_version``, ``code_version``).
-    """
-    tmp_str = str(tmp_path)
-    # Pattern pour matcher tmp_path/quelque-chose (pour les chemins
-    # absolus qui n'apparaissent pas en clé mais en valeur string).
-    tmp_re = re.compile(re.escape(tmp_str))
-    def _normalize(value: Any, *, key: str | None = None) -> Any:
-        if isinstance(value, dict):
-            return {k: _normalize(v, key=k) for k, v in value.items()}
-        if isinstance(value, list):
-            return [_normalize(item) for item in value]
-        if isinstance(value, str):
-            return tmp_re.sub("FIXTURES", value)
-        if isinstance(value, float):
-            # Neutralise les durées (volatiles d'un run à l'autre).
-            if key == "duration_seconds":
-                return 0.0
-            # Garde les autres floats avec une précision raisonnable
-            # pour absorber le bruit de calcul minimum.
-            return round(value, 6)
-        return value
-    normalized = _normalize(data)
-    # Champs volatils au niveau racine — neutralisés en post-traitement
-    # parce que leur valeur ne contient pas ``tmp_path``.
-    if isinstance(normalized, dict):
-        for volatile_key in ("picarones_version", "run_date"):
-            if volatile_key in normalized:
-                normalized[volatile_key] = "PINNED"
-        # engine_version peut apparaître dans chaque engine_report.
-        for report in normalized.get("engine_reports", []):
-            if "engine_version" in report:
-                report["engine_version"] = "PINNED"
-            # Les pipeline_info portent parfois des chemins ou metadata.
-            pipeline_info = report.get("pipeline_info")
-            if isinstance(pipeline_info, dict):
-                if "code_version" in pipeline_info:
-                    pipeline_info["code_version"] = "PINNED"
-    return normalized
-# ──────────────────────────────────────────────────────────────────────
-# Comparaison snapshot
-# ──────────────────────────────────────────────────────────────────────
-def _load_snapshot() -> dict | None:
-    if not SNAPSHOT_PATH.exists():
-        return None
-    return json.loads(SNAPSHOT_PATH.read_text(encoding="utf-8"))
-def _write_snapshot(data: dict) -> None:
-    SNAPSHOT_PATH.parent.mkdir(parents=True, exist_ok=True)
-    SNAPSHOT_PATH.write_text(
-        json.dumps(data, ensure_ascii=False, indent=2, sort_keys=True),
-        encoding="utf-8",
-    )
-def _should_update_snapshot() -> bool:
-    return os.environ.get("PICARONES_UPDATE_SNAPSHOT") == "1"
-# ──────────────────────────────────────────────────────────────────────
-# Test principal
-# ──────────────────────────────────────────────────────────────────────
-def test_run_benchmark_via_service_invariance(tmp_path: Path) -> None:
-    """Snapshot d'invariance du comportement actuel.
-    Ce test est le filet de sécurité de la migration Option B.  Il doit
-    rester vert à chaque étape du chantier (B1, B2, B3, B4, ...) tant
-    que ``run_benchmark_via_service`` est la façade publique.
-    Quand la migration sera terminée et ``run_benchmark_via_service``
-    supprimée (Phase B8), ce test sera retiré ou migré vers
-    ``RunOrchestrator.execute()``.
-    """
-    corpus = _make_invariance_corpus(tmp_path)
-    engine = _make_invariance_engine()
-    benchmark_result = run_benchmark_via_service(
-        corpus=corpus,
-        engines=[engine],
-        code_version="invariance-test-1.0.0",
-    )
-    actual_normalized = _normalize_for_snapshot(
-        benchmark_result.as_dict(), tmp_path,
-    )
-    snapshot = _load_snapshot()
-    if snapshot is None or _should_update_snapshot():
-        _write_snapshot(actual_normalized)
-        if snapshot is None:
-            pytest.skip(
-                f"Snapshot créé pour la première fois à "
-                f"{SNAPSHOT_PATH.relative_to(Path.cwd())}. "
-                f"Vérifier son contenu puis ré-exécuter le test."
-            )
-        else:
-            # Mode update explicite : on a écrit, le test passe sans
-            # vérification additionnelle.  L'opérateur est responsable
-            # d'inspecter le diff git.
-            return
-    assert actual_normalized == snapshot, (
-        "BenchmarkResult diverge du snapshot d'invariance.\n"
-        f"Snapshot : {SNAPSHOT_PATH}\n"
-        "Si la divergence est intentionnelle, régénérer avec :\n"
-        "    PICARONES_UPDATE_SNAPSHOT=1 python -m pytest "
-        f"{Path(__file__).relative_to(Path.cwd())}\n"
-        "et inspecter le diff git du snapshot avant commit."
-    )
-# ──────────────────────────────────────────────────────────────────────
-# Test annexe — vérifie que la normalisation elle-même est stable
-# ──────────────────────────────────────────────────────────────────────
-def test_normalization_is_idempotent(tmp_path: Path) -> None:
-    """La normalisation d'un dict déjà normalisé ne le change pas.
-    Garantit qu'on peut ré-appliquer la normalisation sans dériver.
-    Test pédagogique de la mécanique du snapshot.
-    """
-    sample = {
-        "picarones_version": "2.0.0",
-        "run_date": "2026-05-14T12:00:00Z",
-        "corpus": {"source": str(tmp_path / "corpus.zip")},
-        "engine_reports": [
-            {
-                "engine_version": "1.2.3",
-                "document_results": [
-                    {
-                        "image_path": str(tmp_path / "doc1.png"),
-                        "duration_seconds": 0.123456,
-                        "metrics": {"cer": 0.05},
-                    },
-                ],
-            },
-        ],
-    }
-    once = _normalize_for_snapshot(sample, tmp_path)
-    twice = _normalize_for_snapshot(once, tmp_path)
-    assert once == twice
-    assert once["picarones_version"] == "PINNED"
-    assert once["run_date"] == "PINNED"
-    assert once["engine_reports"][0]["engine_version"] == "PINNED"
-    assert once["engine_reports"][0]["document_results"][0]["duration_seconds"] == 0.0
-    assert "FIXTURES" in once["corpus"]["source"]
-    assert "FIXTURES" in once["engine_reports"][0]["document_results"][0]["image_path"]

tests/security/test_phase1_post_rewrite_wiring.py CHANGED Viewed

@@ -813,7 +813,7 @@ class TestPartialStoreFingerprint:
     def test_engine_config_for_fingerprint_distinguishes_psm(self) -> None:
         """``_engine_config_for_fingerprint`` capture les attributs
         opérationnels d'un adapter OCR (lang, psm, model, …)."""
-        from picarones.app.services.benchmark_runner import (
             _engine_config_for_fingerprint,
         )

     def test_engine_config_for_fingerprint_distinguishes_psm(self) -> None:
         """``_engine_config_for_fingerprint`` capture les attributs
         opérationnels d'un adapter OCR (lang, psm, model, …)."""
+        from picarones.app.services._benchmark_helpers import (
             _engine_config_for_fingerprint,
         )

tests/web/test_s9_ocr_engine_naming_contract.py CHANGED Viewed

@@ -26,7 +26,9 @@ from __future__ import annotations
 import pytest
-from picarones.app.services.benchmark_runner import build_adapter_resolver
 from picarones.interfaces.web.benchmark_utils import (
     _OCR_KWARGS_BUILDERS,
     _engine_from_competitor,

 import pytest
+from picarones.app.services._benchmark_adapter_resolver import (
+    build_adapter_resolver,
+)
 from picarones.interfaces.web.benchmark_utils import (
     _OCR_KWARGS_BUILDERS,
     _engine_from_competitor,

tests/web/test_sprint6_web_interface.py CHANGED Viewed

@@ -734,17 +734,18 @@ class TestCLIServeCommand:
 class TestRunnerProgressCallback:
     def test_callback_signature_accepted(self):
-        """run_benchmark accepte un paramètre progress_callback."""
         import inspect
-        from picarones.app.services.benchmark_runner import run_benchmark_via_service
-        sig = inspect.signature(run_benchmark_via_service)
         assert "progress_callback" in sig.parameters
     def test_callback_is_optional(self):
-        """progress_callback est optionnel (valeur par défaut None)."""
         import inspect
-        from picarones.app.services.benchmark_runner import run_benchmark_via_service
-        sig = inspect.signature(run_benchmark_via_service)
         param = sig.parameters["progress_callback"]
         assert param.default is None

 class TestRunnerProgressCallback:
     def test_callback_signature_accepted(self):
+        """Phase B3-final — ``RunOrchestrator.execute_preset`` accepte
+        un kwarg ``progress_callback``."""
         import inspect
+        from picarones.app.services import RunOrchestrator
+        sig = inspect.signature(RunOrchestrator.execute_preset)
         assert "progress_callback" in sig.parameters
     def test_callback_is_optional(self):
+        """``progress_callback`` est optionnel (valeur par défaut None)."""
         import inspect
+        from picarones.app.services import RunOrchestrator
+        sig = inspect.signature(RunOrchestrator.execute_preset)
         param = sig.parameters["progress_callback"]
         assert param.default is None