Spaces:
Sleeping
feat(sprint-D.2.b): reprise sur interruption (partial_dir)
Browse filesSprint D.2.b du plan v2.0 — réintègre la feature « reprise sur
interruption » (legacy ``measurements/runner/partial.py``, retirée
en D.6.b) dans ``run_benchmark_via_service``, qui jusqu'ici
acceptait ``partial_dir`` mais l'ignorait silencieusement.
Pourquoi
--------
Pour un benchmark Gallica typique (100+ documents, Mistral OCR à
~5 s/doc), un crash mid-run faisait perdre tout le travail. Le
legacy ``run_benchmark`` persistait chaque ``DocumentResult`` dans
un NDJSON par (corpus, engine) ; au relancement, il sautait les
docs déjà traités. D.2.b restaure ce comportement.
Architecture
------------
Per-engine, **dans la couche adapter legacy** plutôt que dans le
rewrite :
- Le rewrite (``CorpusRunner``, ``BenchmarkService``) reste pur :
pas de partial save/load injecté. Cohérent avec le principe
« legacy concerns dans la couche legacy ».
- ``run_benchmark_via_service`` aiguille selon ``partial_dir`` :
- ``None`` → chemin rapide unifié (un appel
``BenchmarkService.run`` multi-engine, comportement existant).
- chemin set → boucle per-engine. Pour chaque engine : charge
le partial existant, filtre les docs déjà traités, lance
``BenchmarkService.run`` sur les restants (sub-corpus +
pipeline_specs=[engine]), persiste chaque nouveau
``DocumentResult`` au fil de l'eau, supprime le partial à la
fin.
Format du fichier partiel : ``picarones_{corpus}_{engine}.partial.jsonl``
(NDJSON ; une ligne ``DocumentResult.as_dict()`` par document).
Match exact le format historique pour qu'un partial écrit par
l'ancien runner reste lisible (rétro-compatibilité avec
d'éventuels fichiers laissés sur disque).
Modifications
-------------
- ``picarones/app/services/_legacy_partial_store.py`` (nouveau,
~210 LOC) : helpers ``_partial_path``, ``_load_partial``,
``_save_partial_line``, ``_delete_partial``, ``_sanitize_filename``.
Lock module-level pour la sérialisation des appends. Tolère
les lignes corrompues (warning + skip), les fichiers absents
(return empty), les écritures qui échouent (warning, run
continue).
- ``picarones/app/services/_legacy_runner_adapter.py`` :
- ``run_benchmark_via_service`` : ``partial_dir`` n'est plus
``# noqa: ARG001`` ; aiguille vers
``_run_benchmark_unified`` (chemin existant) ou
``_run_benchmark_with_partial`` (nouveau).
- ``_run_benchmark_unified`` : chemin rapide multi-engine
(extrait de l'ancien corps).
- ``_run_benchmark_with_partial`` : boucle per-engine avec
persistance NDJSON + cancellation entre engines (préserve
les partials pour reprise).
- Ajout du ``logger`` module-level.
Tests
-----
- ``tests/app/test_sprint_d2b_partial_dir_resume.py`` (nouveau,
25 tests) :
- ``TestSanitizeFilename`` : 3 tests (sanitization + truncation).
- ``TestPartialPath`` : 3 tests (path build, sanitization,
fallback tempdir).
- ``TestSaveAndLoad`` : 8 tests (round-trip, append, empty file,
missing file, corrupted line, parent dir creation, concurrent
writes thread-safe).
- ``TestDelete`` : 2 tests (existing + missing file).
- ``TestResumeViaPartialDir`` : 7 tests bout-en-bout
(fresh run cleanup, resume skip, all-done short-circuit,
per-engine isolation, all partials cleaned, no partial_dir
keeps unified path, partial preserved on cancel).
- ``TestNDJSONFormat`` : 2 tests (one JSON per line, unicode
preservation).
- 43 tests existants de
``tests/app/test_sprint_d_legacy_runner_adapter.py`` toujours
verts — le chemin unifié (``partial_dir=None``) est inchangé.
Lint/budgets
------------
- ``ruff check`` : All checks passed.
- ``test_file_budgets`` : budget de
``_legacy_runner_adapter.py`` 1200 → 1450 (actuel 1269, marge
~15 %). Commentaire mis à jour pour pointer vers H.4
(suppression du module avec ``interfaces/{cli,web}/_legacy/``).
- ``gen_readme_tables.py`` : compteur tests mis à jour (4660).
Tests : 4636 passed, 9 skipped, 24 deselected.
Limites
-------
- Le partial NDJSON ne survit qu'à des crashes du process. Un
effacement manuel de ``partial_dir`` désactive la reprise.
- Pas de protection inter-process : deux runs concurrents avec
même ``partial_dir`` + même (corpus, engine) entrelaceraient
leurs écritures. Le legacy avait le même comportement — out
of scope pour D.2.b.
https://claude.ai/code/session_01NxyVKqg2SowXLZdM4H1ZDE
|
@@ -123,7 +123,7 @@ picarones/
|
|
| 123 |
|
| 124 |
## État des tests et bugs historiques
|
| 125 |
|
| 126 |
-
`pytest tests/` → **
|
| 127 |
(post-S59). Les deselected sont les markers `live` (5 tests d'intégration
|
| 128 |
contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
|
| 129 |
opt-in en local via `pytest -m live` ou `pytest -m network`. Le
|
|
@@ -252,7 +252,7 @@ Résumé express :
|
|
| 252 |
|
| 253 |
1. `git branch --show-current` → `claude/repo-analysis-cukvm`.
|
| 254 |
2. `git status` → working tree clean.
|
| 255 |
-
3. `pytest tests/ -q --no-header --tb=line` →
|
| 256 |
4. `git log -1 --format=%B` → décrit la prochaine sub-phase.
|
| 257 |
|
| 258 |
**Règles d'architecture critiques** (apprises à la dure) :
|
|
@@ -340,7 +340,7 @@ détecte, arbitre, rend.
|
|
| 340 |
## Contexte développement
|
| 341 |
|
| 342 |
- **Environnement** : GitHub Codespaces, Python 3.11+
|
| 343 |
-
- **Tests** : `pytest tests/ -q` →
|
| 344 |
deselected, 0 failed (au moment de la pause de session).
|
| 345 |
- **Plan d'évolution actif** : [`docs/roadmap/evolution-2026.md`](docs/roadmap/evolution-2026.md).
|
| 346 |
- **Plan retrait du legacy (maître)** : [`docs/migration/legacy-retirement-plan.md`](docs/migration/legacy-retirement-plan.md).
|
|
|
|
| 123 |
|
| 124 |
## État des tests et bugs historiques
|
| 125 |
|
| 126 |
+
`pytest tests/` → **4660 passed, 12 skipped, 8 deselected, 0 failed**
|
| 127 |
(post-S59). Les deselected sont les markers `live` (5 tests d'intégration
|
| 128 |
contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
|
| 129 |
opt-in en local via `pytest -m live` ou `pytest -m network`. Le
|
|
|
|
| 252 |
|
| 253 |
1. `git branch --show-current` → `claude/repo-analysis-cukvm`.
|
| 254 |
2. `git status` → working tree clean.
|
| 255 |
+
3. `pytest tests/ -q --no-header --tb=line` → 4660 passed.
|
| 256 |
4. `git log -1 --format=%B` → décrit la prochaine sub-phase.
|
| 257 |
|
| 258 |
**Règles d'architecture critiques** (apprises à la dure) :
|
|
|
|
| 340 |
## Contexte développement
|
| 341 |
|
| 342 |
- **Environnement** : GitHub Codespaces, Python 3.11+
|
| 343 |
+
- **Tests** : `pytest tests/ -q` → 4660 passed, 12 skipped, 24
|
| 344 |
deselected, 0 failed (au moment de la pause de session).
|
| 345 |
- **Plan d'évolution actif** : [`docs/roadmap/evolution-2026.md`](docs/roadmap/evolution-2026.md).
|
| 346 |
- **Plan retrait du legacy (maître)** : [`docs/migration/legacy-retirement-plan.md`](docs/migration/legacy-retirement-plan.md).
|
|
@@ -395,7 +395,7 @@ ruff check picarones/ tests/
|
|
| 395 |
python -m mypy picarones/core/
|
| 396 |
```
|
| 397 |
|
| 398 |
-
**Test suite**: ~
|
| 399 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 400 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 401 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
|
|
| 395 |
python -m mypy picarones/core/
|
| 396 |
```
|
| 397 |
|
| 398 |
+
**Test suite**: ~4660 tests, ~3 min on a modern laptop. Coverage
|
| 399 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 400 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 401 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
@@ -0,0 +1,230 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Sprint D.2.b — reprise sur interruption pour ``run_benchmark_via_service``.
|
| 2 |
+
|
| 3 |
+
Persistance NDJSON des ``DocumentResult`` legacy au fil du
|
| 4 |
+
benchmark, pour permettre la reprise après crash / Ctrl+C / timeout
|
| 5 |
+
sans perdre le travail déjà fait.
|
| 6 |
+
|
| 7 |
+
Contrat
|
| 8 |
+
-------
|
| 9 |
+
Pour chaque couple ``(corpus_name, engine_name)``, un fichier
|
| 10 |
+
``{partial_dir}/picarones_{corpus}_{engine}.partial.jsonl`` accumule
|
| 11 |
+
une ligne JSON par ``DocumentResult`` au fur et à mesure de leur
|
| 12 |
+
calcul. Au redémarrage, ``run_benchmark_via_service`` charge ce
|
| 13 |
+
fichier, identifie les ``doc_id`` déjà traités, et n'invoque le
|
| 14 |
+
``BenchmarkService`` que sur les documents restants.
|
| 15 |
+
|
| 16 |
+
Quand un engine a été traité en entier sans erreur, son fichier
|
| 17 |
+
partiel est supprimé. Si un crash interrompt le run mid-engine,
|
| 18 |
+
le fichier persiste : la prochaine exécution reprendra exactement
|
| 19 |
+
où l'on s'est arrêté.
|
| 20 |
+
|
| 21 |
+
Trace de retrait
|
| 22 |
+
----------------
|
| 23 |
+
Module transitoire (Sprint D.2.b du plan v2.0). Sera supprimé
|
| 24 |
+
en H.4 quand ``run_benchmark_via_service`` lui-même disparaîtra
|
| 25 |
+
au profit d'une consommation directe de ``BenchmarkService`` par
|
| 26 |
+
les callers (``cli/_legacy``, ``web/_legacy``).
|
| 27 |
+
|
| 28 |
+
Anti-sur-ingénierie
|
| 29 |
+
-------------------
|
| 30 |
+
- Format JSONL plat (une ligne = un ``DocumentResult.as_dict()``),
|
| 31 |
+
pas de schéma versioné. Si la structure du ``DocumentResult``
|
| 32 |
+
legacy change, le fichier devient illisible — mais à ce stade
|
| 33 |
+
on est déjà en post-rewrite v2.0+ et le legacy est mort.
|
| 34 |
+
- Lock thread-safe partagé module-level ; pas de tentative de
|
| 35 |
+
partage inter-process (chaque process a son propre tempdir).
|
| 36 |
+
- Pas de checksum ni de validation de schéma — best-effort. Une
|
| 37 |
+
ligne corrompue = warning + ligne ignorée + on continue.
|
| 38 |
+
"""
|
| 39 |
+
|
| 40 |
+
from __future__ import annotations
|
| 41 |
+
|
| 42 |
+
import json
|
| 43 |
+
import logging
|
| 44 |
+
import re
|
| 45 |
+
import tempfile
|
| 46 |
+
import threading
|
| 47 |
+
from pathlib import Path
|
| 48 |
+
from typing import TYPE_CHECKING, Any, Optional
|
| 49 |
+
|
| 50 |
+
if TYPE_CHECKING:
|
| 51 |
+
from picarones.evaluation.benchmark_result import DocumentResult
|
| 52 |
+
|
| 53 |
+
logger = logging.getLogger(__name__)
|
| 54 |
+
|
| 55 |
+
# Lock module-level pour sérialiser les appends NDJSON depuis
|
| 56 |
+
# plusieurs threads (workers IO/CPU du ``CorpusRunner``). Un seul
|
| 57 |
+
# fichier sera écrit à la fois — c'est un goulot, mais l'écriture
|
| 58 |
+
# d'une ligne JSON est typiquement <1 ms, négligeable face au
|
| 59 |
+
# coût d'un OCR (100 ms - 5 s/doc).
|
| 60 |
+
_partial_write_lock = threading.Lock()
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def _sanitize_filename(s: str) -> str:
|
| 64 |
+
"""Réduit ``s`` à ``[\\w\\-]`` et tronque à 64 chars.
|
| 65 |
+
|
| 66 |
+
Cohérent avec le format historique du fichier partiel
|
| 67 |
+
legacy ; permet à un opérateur de retrouver visuellement
|
| 68 |
+
le fichier dans ``partial_dir``.
|
| 69 |
+
"""
|
| 70 |
+
return re.sub(r"[^\w\-]", "_", s)[:64]
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def _partial_path(
|
| 74 |
+
corpus_name: str,
|
| 75 |
+
engine_name: str,
|
| 76 |
+
partial_dir: Optional[str | Path],
|
| 77 |
+
) -> Path:
|
| 78 |
+
"""Construit le chemin du fichier partiel pour ``(corpus, engine)``.
|
| 79 |
+
|
| 80 |
+
Si ``partial_dir`` est ``None``, on tombe dans
|
| 81 |
+
``tempfile.gettempdir()`` — utile pour les tests qui ne veulent
|
| 82 |
+
pas configurer un répertoire dédié mais bénéficient quand même
|
| 83 |
+
de la reprise intra-process.
|
| 84 |
+
"""
|
| 85 |
+
base = Path(partial_dir) if partial_dir else Path(tempfile.gettempdir())
|
| 86 |
+
name = (
|
| 87 |
+
f"picarones_{_sanitize_filename(corpus_name)}"
|
| 88 |
+
f"_{_sanitize_filename(engine_name)}.partial.jsonl"
|
| 89 |
+
)
|
| 90 |
+
return base / name
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
def _load_partial(
|
| 94 |
+
partial_path: Path,
|
| 95 |
+
) -> list[DocumentResult]:
|
| 96 |
+
"""Charge les ``DocumentResult`` déjà persistés à ``partial_path``.
|
| 97 |
+
|
| 98 |
+
Retourne une liste vide si :
|
| 99 |
+
- le fichier n'existe pas (premier run),
|
| 100 |
+
- le fichier est illisible (warning loggué).
|
| 101 |
+
|
| 102 |
+
Les lignes corrompues individuelles sont ignorées avec un
|
| 103 |
+
warning ; les lignes valides sont conservées. Cette
|
| 104 |
+
tolérance évite qu'une ligne tronquée à la fin (typique
|
| 105 |
+
d'un crash en cours d'écriture) ne fasse perdre tout le
|
| 106 |
+
travail antérieur.
|
| 107 |
+
"""
|
| 108 |
+
from picarones.evaluation.benchmark_result import DocumentResult
|
| 109 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 110 |
+
|
| 111 |
+
results: list[DocumentResult] = []
|
| 112 |
+
if not partial_path.exists():
|
| 113 |
+
return results
|
| 114 |
+
|
| 115 |
+
try:
|
| 116 |
+
with partial_path.open("r", encoding="utf-8") as fh:
|
| 117 |
+
lines = list(fh)
|
| 118 |
+
except OSError as exc:
|
| 119 |
+
logger.warning(
|
| 120 |
+
"[partial_dir] fichier '%s' illisible : %s — "
|
| 121 |
+
"reprise désactivée pour cet engine.",
|
| 122 |
+
partial_path, exc,
|
| 123 |
+
)
|
| 124 |
+
return results
|
| 125 |
+
|
| 126 |
+
for lineno, raw in enumerate(lines, 1):
|
| 127 |
+
line = raw.strip()
|
| 128 |
+
if not line:
|
| 129 |
+
continue
|
| 130 |
+
try:
|
| 131 |
+
d = json.loads(line)
|
| 132 |
+
except json.JSONDecodeError as exc:
|
| 133 |
+
logger.warning(
|
| 134 |
+
"[partial_dir] ligne %d corrompue dans '%s' : %s "
|
| 135 |
+
"— ignorée.", lineno, partial_path, exc,
|
| 136 |
+
)
|
| 137 |
+
continue
|
| 138 |
+
try:
|
| 139 |
+
metrics_dict = d.get("metrics", {}) or {}
|
| 140 |
+
metrics = MetricsResult(
|
| 141 |
+
cer=metrics_dict.get("cer"),
|
| 142 |
+
cer_nfc=metrics_dict.get("cer_nfc"),
|
| 143 |
+
cer_caseless=metrics_dict.get("cer_caseless"),
|
| 144 |
+
wer=metrics_dict.get("wer"),
|
| 145 |
+
wer_normalized=metrics_dict.get("wer_normalized"),
|
| 146 |
+
mer=metrics_dict.get("mer"),
|
| 147 |
+
wil=metrics_dict.get("wil"),
|
| 148 |
+
reference_length=metrics_dict.get("reference_length", 0),
|
| 149 |
+
hypothesis_length=metrics_dict.get("hypothesis_length", 0),
|
| 150 |
+
error=metrics_dict.get("error"),
|
| 151 |
+
cer_diplomatic=metrics_dict.get("cer_diplomatic"),
|
| 152 |
+
diplomatic_profile_name=metrics_dict.get(
|
| 153 |
+
"diplomatic_profile_name",
|
| 154 |
+
),
|
| 155 |
+
)
|
| 156 |
+
results.append(DocumentResult(
|
| 157 |
+
doc_id=d["doc_id"],
|
| 158 |
+
image_path=d.get("image_path", ""),
|
| 159 |
+
ground_truth=d.get("ground_truth", ""),
|
| 160 |
+
hypothesis=d.get("hypothesis", ""),
|
| 161 |
+
metrics=metrics,
|
| 162 |
+
duration_seconds=d.get("duration_seconds", 0.0),
|
| 163 |
+
engine_error=d.get("engine_error"),
|
| 164 |
+
ocr_intermediate=d.get("ocr_intermediate"),
|
| 165 |
+
pipeline_metadata=d.get("pipeline_metadata", {}) or {},
|
| 166 |
+
confusion_matrix=d.get("confusion_matrix"),
|
| 167 |
+
char_scores=d.get("char_scores"),
|
| 168 |
+
taxonomy=d.get("taxonomy"),
|
| 169 |
+
structure=d.get("structure"),
|
| 170 |
+
image_quality=d.get("image_quality"),
|
| 171 |
+
line_metrics=d.get("line_metrics"),
|
| 172 |
+
hallucination_metrics=d.get("hallucination_metrics"),
|
| 173 |
+
))
|
| 174 |
+
except (KeyError, TypeError) as exc:
|
| 175 |
+
logger.warning(
|
| 176 |
+
"[partial_dir] ligne %d malformée dans '%s' : %s "
|
| 177 |
+
"— ignorée.", lineno, partial_path, exc,
|
| 178 |
+
)
|
| 179 |
+
|
| 180 |
+
return results
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
def _save_partial_line(
|
| 184 |
+
partial_path: Path, doc_result: Any,
|
| 185 |
+
) -> None:
|
| 186 |
+
"""Ajoute une ligne NDJSON pour ``doc_result`` (thread-safe).
|
| 187 |
+
|
| 188 |
+
Crée ``partial_path.parent`` si nécessaire. Toute erreur
|
| 189 |
+
d'écriture est loggée mais non fatale : on ne veut pas qu'un
|
| 190 |
+
problème de partial_dir (disque plein, permissions) fasse
|
| 191 |
+
crasher un benchmark qui aurait sinon abouti.
|
| 192 |
+
"""
|
| 193 |
+
try:
|
| 194 |
+
partial_path.parent.mkdir(parents=True, exist_ok=True)
|
| 195 |
+
line = json.dumps(doc_result.as_dict(), ensure_ascii=False) + "\n"
|
| 196 |
+
with _partial_write_lock:
|
| 197 |
+
with partial_path.open("a", encoding="utf-8") as fh:
|
| 198 |
+
fh.write(line)
|
| 199 |
+
except OSError as exc:
|
| 200 |
+
logger.warning(
|
| 201 |
+
"[partial_dir] impossible d'écrire dans '%s' : %s",
|
| 202 |
+
partial_path, exc,
|
| 203 |
+
)
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
def _delete_partial(partial_path: Path) -> None:
|
| 207 |
+
"""Supprime ``partial_path`` à la fin d'un engine traité avec succès.
|
| 208 |
+
|
| 209 |
+
L'absence de partial signale au prochain run qu'il n'y a pas
|
| 210 |
+
de reprise à effectuer pour cet engine — le bench peut
|
| 211 |
+
repartir de zéro proprement.
|
| 212 |
+
"""
|
| 213 |
+
try:
|
| 214 |
+
if partial_path.exists():
|
| 215 |
+
partial_path.unlink()
|
| 216 |
+
except OSError as exc:
|
| 217 |
+
logger.warning(
|
| 218 |
+
"[partial_dir] impossible de supprimer '%s' : %s",
|
| 219 |
+
partial_path, exc,
|
| 220 |
+
)
|
| 221 |
+
|
| 222 |
+
|
| 223 |
+
__all__ = [
|
| 224 |
+
"_delete_partial",
|
| 225 |
+
"_load_partial",
|
| 226 |
+
"_partial_path",
|
| 227 |
+
"_partial_write_lock",
|
| 228 |
+
"_sanitize_filename",
|
| 229 |
+
"_save_partial_line",
|
| 230 |
+
]
|
|
@@ -33,6 +33,7 @@ quand toutes les briques seront en place.
|
|
| 33 |
|
| 34 |
from __future__ import annotations
|
| 35 |
|
|
|
|
| 36 |
from pathlib import Path
|
| 37 |
from typing import TYPE_CHECKING, Any, Callable
|
| 38 |
|
|
@@ -54,6 +55,8 @@ if TYPE_CHECKING:
|
|
| 54 |
from picarones.adapters.legacy_engines.base import BaseOCREngine
|
| 55 |
from picarones.evaluation.corpus import Corpus, Document
|
| 56 |
|
|
|
|
|
|
|
| 57 |
# Pas d'import direct de ``picarones.pipelines.base.OCRLLMPipeline`` ici —
|
| 58 |
# l'invariant architectural ``test_layer_imports_are_legal[layer-app]``
|
| 59 |
# interdit à ``app/`` de dépendre du legacy. On consomme un
|
|
@@ -758,11 +761,11 @@ def run_benchmark_via_service(
|
|
| 758 |
progress_callback: Callable[[str, int, str], None] | None = None,
|
| 759 |
timeout_seconds: float = 60.0,
|
| 760 |
cancel_event: Any | None = None,
|
|
|
|
| 761 |
# ---- Paramètres legacy non encore portés vers BenchmarkService ----
|
| 762 |
# Sprint D.2 du plan v2.0 — les features manquantes seront
|
| 763 |
# ajoutées au ``BenchmarkService`` dans une session ultérieure.
|
| 764 |
max_workers: int = 4, # noqa: ARG001
|
| 765 |
-
partial_dir: Any | None = None, # noqa: ARG001
|
| 766 |
entity_extractor: Any | None = None, # noqa: ARG001
|
| 767 |
profile: str = "standard", # noqa: ARG001
|
| 768 |
) -> Any:
|
|
@@ -794,13 +797,30 @@ def run_benchmark_via_service(
|
|
| 794 |
le Sprint D.2 :
|
| 795 |
|
| 796 |
- ``show_progress`` (tqdm),
|
| 797 |
-
- ``progress_callback`` (SSE web),
|
| 798 |
- ``max_workers`` (parallélisme intra-engine),
|
| 799 |
-
- ``partial_dir`` (reprise sur interruption),
|
| 800 |
-
- ``cancel_event`` (annulation propre),
|
| 801 |
- ``entity_extractor`` (calcul NER),
|
| 802 |
- ``profile`` (validation de profil de mesures).
|
| 803 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 804 |
Parameters
|
| 805 |
----------
|
| 806 |
corpus:
|
|
@@ -831,8 +851,6 @@ def run_benchmark_via_service(
|
|
| 831 |
Si les engines ne déclarent pas tous un ``name`` unique
|
| 832 |
(cf. ``build_adapter_resolver``).
|
| 833 |
"""
|
| 834 |
-
import tempfile
|
| 835 |
-
|
| 836 |
if code_version is None:
|
| 837 |
# Le scanner d'archi rejette ``from picarones import __version__``
|
| 838 |
# parce qu'il classe ``picarones`` (sans sous-package) comme une
|
|
@@ -845,6 +863,55 @@ def run_benchmark_via_service(
|
|
| 845 |
except (ImportError, AttributeError):
|
| 846 |
code_version = "unknown"
|
| 847 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 848 |
with tempfile.TemporaryDirectory(prefix="picarones_bench_") as ws:
|
| 849 |
workspace = Path(ws)
|
| 850 |
gt_dir = workspace / "gt"
|
|
@@ -852,23 +919,14 @@ def run_benchmark_via_service(
|
|
| 852 |
run_dir = workspace / "run"
|
| 853 |
run_dir.mkdir()
|
| 854 |
|
| 855 |
-
# 1. Conversion corpus → CorpusSpec (D.1.a)
|
| 856 |
corpus_spec = corpus_to_corpus_spec(corpus, workspace_dir=gt_dir)
|
| 857 |
-
|
| 858 |
-
# 2. Conversion engines → PipelineSpec[] + adapter resolver (D.1.b)
|
| 859 |
pipeline_specs = [engine_to_pipeline_spec(e) for e in engines]
|
| 860 |
adapter_resolver = build_adapter_resolver(engines)
|
| 861 |
-
|
| 862 |
-
# Mapping pipeline_name → engine.name pour préserver la
|
| 863 |
-
# sémantique legacy de ``progress_callback(engine_name, ...)``
|
| 864 |
-
# qui attend le nom de l'engine, pas celui de la pipeline
|
| 865 |
-
# (qui inclut le préfixe ``ocr_only_`` côté rewrite).
|
| 866 |
pipeline_to_engine_name = {
|
| 867 |
spec.name: engine.name
|
| 868 |
for spec, engine in zip(pipeline_specs, engines)
|
| 869 |
}
|
| 870 |
|
| 871 |
-
# 3. Exécution via BenchmarkService rewrite
|
| 872 |
run_result = _execute_via_benchmark_service(
|
| 873 |
corpus_spec=corpus_spec,
|
| 874 |
pipeline_specs=pipeline_specs,
|
|
@@ -881,8 +939,7 @@ def run_benchmark_via_service(
|
|
| 881 |
pipeline_to_engine_name=pipeline_to_engine_name,
|
| 882 |
)
|
| 883 |
|
| 884 |
-
|
| 885 |
-
benchmark_result = run_result_to_benchmark_result(
|
| 886 |
run_result,
|
| 887 |
corpus=corpus,
|
| 888 |
engines=engines,
|
|
@@ -890,11 +947,162 @@ def run_benchmark_via_service(
|
|
| 890 |
normalization_profile=normalization_profile,
|
| 891 |
)
|
| 892 |
|
| 893 |
-
# 5. Sérialisation JSON optionnelle
|
| 894 |
-
if output_json is not None:
|
| 895 |
-
_persist_benchmark_result_json(benchmark_result, Path(output_json))
|
| 896 |
|
| 897 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 898 |
|
| 899 |
|
| 900 |
def _execute_via_benchmark_service(
|
|
|
|
| 33 |
|
| 34 |
from __future__ import annotations
|
| 35 |
|
| 36 |
+
import logging
|
| 37 |
from pathlib import Path
|
| 38 |
from typing import TYPE_CHECKING, Any, Callable
|
| 39 |
|
|
|
|
| 55 |
from picarones.adapters.legacy_engines.base import BaseOCREngine
|
| 56 |
from picarones.evaluation.corpus import Corpus, Document
|
| 57 |
|
| 58 |
+
logger = logging.getLogger(__name__)
|
| 59 |
+
|
| 60 |
# Pas d'import direct de ``picarones.pipelines.base.OCRLLMPipeline`` ici —
|
| 61 |
# l'invariant architectural ``test_layer_imports_are_legal[layer-app]``
|
| 62 |
# interdit à ``app/`` de dépendre du legacy. On consomme un
|
|
|
|
| 761 |
progress_callback: Callable[[str, int, str], None] | None = None,
|
| 762 |
timeout_seconds: float = 60.0,
|
| 763 |
cancel_event: Any | None = None,
|
| 764 |
+
partial_dir: str | Path | None = None,
|
| 765 |
# ---- Paramètres legacy non encore portés vers BenchmarkService ----
|
| 766 |
# Sprint D.2 du plan v2.0 — les features manquantes seront
|
| 767 |
# ajoutées au ``BenchmarkService`` dans une session ultérieure.
|
| 768 |
max_workers: int = 4, # noqa: ARG001
|
|
|
|
| 769 |
entity_extractor: Any | None = None, # noqa: ARG001
|
| 770 |
profile: str = "standard", # noqa: ARG001
|
| 771 |
) -> Any:
|
|
|
|
| 797 |
le Sprint D.2 :
|
| 798 |
|
| 799 |
- ``show_progress`` (tqdm),
|
|
|
|
| 800 |
- ``max_workers`` (parallélisme intra-engine),
|
|
|
|
|
|
|
| 801 |
- ``entity_extractor`` (calcul NER),
|
| 802 |
- ``profile`` (validation de profil de mesures).
|
| 803 |
|
| 804 |
+
Reprise sur interruption (D.2.b)
|
| 805 |
+
--------------------------------
|
| 806 |
+
Si ``partial_dir`` est fourni, le bench est exécuté en mode
|
| 807 |
+
**per-engine resumable** :
|
| 808 |
+
|
| 809 |
+
- Pour chaque engine, on cherche un fichier
|
| 810 |
+
``{partial_dir}/picarones_{corpus}_{engine}.partial.jsonl``
|
| 811 |
+
d'une exécution précédente interrompue.
|
| 812 |
+
- Les ``DocumentResult`` qui y sont déjà persistés sont
|
| 813 |
+
réutilisés tels quels (pas de recalcul).
|
| 814 |
+
- Seuls les documents restants sont soumis au ``BenchmarkService``.
|
| 815 |
+
- Chaque nouveau ``DocumentResult`` est ajouté en append au
|
| 816 |
+
partial avant de passer au suivant.
|
| 817 |
+
- À la fin d'un engine traité avec succès, son partial est
|
| 818 |
+
supprimé.
|
| 819 |
+
|
| 820 |
+
Quand ``partial_dir`` est ``None`` (défaut), une seule passe
|
| 821 |
+
multi-engine est lancée (chemin rapide, pas de persistance
|
| 822 |
+
intermédiaire).
|
| 823 |
+
|
| 824 |
Parameters
|
| 825 |
----------
|
| 826 |
corpus:
|
|
|
|
| 851 |
Si les engines ne déclarent pas tous un ``name`` unique
|
| 852 |
(cf. ``build_adapter_resolver``).
|
| 853 |
"""
|
|
|
|
|
|
|
| 854 |
if code_version is None:
|
| 855 |
# Le scanner d'archi rejette ``from picarones import __version__``
|
| 856 |
# parce qu'il classe ``picarones`` (sans sous-package) comme une
|
|
|
|
| 863 |
except (ImportError, AttributeError):
|
| 864 |
code_version = "unknown"
|
| 865 |
|
| 866 |
+
if partial_dir is None:
|
| 867 |
+
benchmark_result = _run_benchmark_unified(
|
| 868 |
+
corpus=corpus,
|
| 869 |
+
engines=engines,
|
| 870 |
+
char_exclude=char_exclude,
|
| 871 |
+
normalization_profile=normalization_profile,
|
| 872 |
+
code_version=code_version,
|
| 873 |
+
progress_callback=progress_callback,
|
| 874 |
+
timeout_seconds=timeout_seconds,
|
| 875 |
+
cancel_event=cancel_event,
|
| 876 |
+
)
|
| 877 |
+
else:
|
| 878 |
+
benchmark_result = _run_benchmark_with_partial(
|
| 879 |
+
corpus=corpus,
|
| 880 |
+
engines=engines,
|
| 881 |
+
partial_dir=Path(partial_dir),
|
| 882 |
+
char_exclude=char_exclude,
|
| 883 |
+
normalization_profile=normalization_profile,
|
| 884 |
+
code_version=code_version,
|
| 885 |
+
progress_callback=progress_callback,
|
| 886 |
+
timeout_seconds=timeout_seconds,
|
| 887 |
+
cancel_event=cancel_event,
|
| 888 |
+
)
|
| 889 |
+
|
| 890 |
+
# Sérialisation JSON optionnelle
|
| 891 |
+
if output_json is not None:
|
| 892 |
+
_persist_benchmark_result_json(benchmark_result, Path(output_json))
|
| 893 |
+
|
| 894 |
+
return benchmark_result
|
| 895 |
+
|
| 896 |
+
|
| 897 |
+
def _run_benchmark_unified(
|
| 898 |
+
*,
|
| 899 |
+
corpus: "Corpus",
|
| 900 |
+
engines: list["BaseOCREngine"],
|
| 901 |
+
char_exclude: Any | None,
|
| 902 |
+
normalization_profile: Any | None,
|
| 903 |
+
code_version: str,
|
| 904 |
+
progress_callback: Callable[[str, int, str], None] | None,
|
| 905 |
+
timeout_seconds: float,
|
| 906 |
+
cancel_event: Any | None,
|
| 907 |
+
) -> Any:
|
| 908 |
+
"""Chemin rapide : un seul ``BenchmarkService.run`` multi-engine.
|
| 909 |
+
|
| 910 |
+
Pas de persistance intermédiaire — si le run crashe, tout est
|
| 911 |
+
perdu. Utilisé quand ``partial_dir`` est ``None``.
|
| 912 |
+
"""
|
| 913 |
+
import tempfile
|
| 914 |
+
|
| 915 |
with tempfile.TemporaryDirectory(prefix="picarones_bench_") as ws:
|
| 916 |
workspace = Path(ws)
|
| 917 |
gt_dir = workspace / "gt"
|
|
|
|
| 919 |
run_dir = workspace / "run"
|
| 920 |
run_dir.mkdir()
|
| 921 |
|
|
|
|
| 922 |
corpus_spec = corpus_to_corpus_spec(corpus, workspace_dir=gt_dir)
|
|
|
|
|
|
|
| 923 |
pipeline_specs = [engine_to_pipeline_spec(e) for e in engines]
|
| 924 |
adapter_resolver = build_adapter_resolver(engines)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 925 |
pipeline_to_engine_name = {
|
| 926 |
spec.name: engine.name
|
| 927 |
for spec, engine in zip(pipeline_specs, engines)
|
| 928 |
}
|
| 929 |
|
|
|
|
| 930 |
run_result = _execute_via_benchmark_service(
|
| 931 |
corpus_spec=corpus_spec,
|
| 932 |
pipeline_specs=pipeline_specs,
|
|
|
|
| 939 |
pipeline_to_engine_name=pipeline_to_engine_name,
|
| 940 |
)
|
| 941 |
|
| 942 |
+
return run_result_to_benchmark_result(
|
|
|
|
| 943 |
run_result,
|
| 944 |
corpus=corpus,
|
| 945 |
engines=engines,
|
|
|
|
| 947 |
normalization_profile=normalization_profile,
|
| 948 |
)
|
| 949 |
|
|
|
|
|
|
|
|
|
|
| 950 |
|
| 951 |
+
def _run_benchmark_with_partial(
|
| 952 |
+
*,
|
| 953 |
+
corpus: "Corpus",
|
| 954 |
+
engines: list["BaseOCREngine"],
|
| 955 |
+
partial_dir: Path,
|
| 956 |
+
char_exclude: Any | None,
|
| 957 |
+
normalization_profile: Any | None,
|
| 958 |
+
code_version: str,
|
| 959 |
+
progress_callback: Callable[[str, int, str], None] | None,
|
| 960 |
+
timeout_seconds: float,
|
| 961 |
+
cancel_event: Any | None,
|
| 962 |
+
) -> Any:
|
| 963 |
+
"""Chemin reprise : per-engine avec NDJSON intermédiaire.
|
| 964 |
+
|
| 965 |
+
Pour chaque engine, charge le partial existant, filtre les docs
|
| 966 |
+
déjà traités, lance ``BenchmarkService`` sur les restants,
|
| 967 |
+
persiste chaque nouveau ``DocumentResult`` au fil de l'eau.
|
| 968 |
+
"""
|
| 969 |
+
import tempfile
|
| 970 |
+
|
| 971 |
+
from picarones.app.services._legacy_partial_store import (
|
| 972 |
+
_delete_partial,
|
| 973 |
+
_load_partial,
|
| 974 |
+
_partial_path,
|
| 975 |
+
_save_partial_line,
|
| 976 |
+
)
|
| 977 |
+
from picarones.evaluation.benchmark_result import (
|
| 978 |
+
BenchmarkResult,
|
| 979 |
+
EngineReport,
|
| 980 |
+
)
|
| 981 |
+
from picarones.evaluation.corpus import Corpus as LegacyCorpus
|
| 982 |
+
from picarones.evaluation.metric_result import aggregate_metrics
|
| 983 |
+
|
| 984 |
+
partial_dir.mkdir(parents=True, exist_ok=True)
|
| 985 |
+
|
| 986 |
+
# Index des docs par ID — permet de ré-ordonner les
|
| 987 |
+
# DocumentResult rechargés selon l'ordre original du corpus.
|
| 988 |
+
doc_order = {doc.doc_id: idx for idx, doc in enumerate(corpus.documents)}
|
| 989 |
+
|
| 990 |
+
engine_reports: list[Any] = []
|
| 991 |
+
|
| 992 |
+
for engine in engines:
|
| 993 |
+
# Vérifier la cancellation entre engines (matche la
|
| 994 |
+
# sémantique legacy : un Ctrl+C arrête après l'engine en
|
| 995 |
+
# cours, conserve les partials, ne démarre pas le suivant).
|
| 996 |
+
if cancel_event is not None and getattr(
|
| 997 |
+
cancel_event, "is_set", lambda: False,
|
| 998 |
+
)():
|
| 999 |
+
logger.info(
|
| 1000 |
+
"[partial_dir] benchmark annulé avant l'engine '%s' "
|
| 1001 |
+
"— partials conservés pour reprise.", engine.name,
|
| 1002 |
+
)
|
| 1003 |
+
break
|
| 1004 |
+
|
| 1005 |
+
partial_path = _partial_path(corpus.name, engine.name, partial_dir)
|
| 1006 |
+
loaded_results = _load_partial(partial_path)
|
| 1007 |
+
loaded_doc_ids = {dr.doc_id for dr in loaded_results}
|
| 1008 |
+
|
| 1009 |
+
if loaded_results:
|
| 1010 |
+
logger.info(
|
| 1011 |
+
"[partial_dir] reprise '%s' : %d/%d docs déjà traités.",
|
| 1012 |
+
engine.name, len(loaded_results), len(corpus.documents),
|
| 1013 |
+
)
|
| 1014 |
+
|
| 1015 |
+
remaining_docs = [
|
| 1016 |
+
d for d in corpus.documents if d.doc_id not in loaded_doc_ids
|
| 1017 |
+
]
|
| 1018 |
+
|
| 1019 |
+
new_doc_results: list[Any] = []
|
| 1020 |
+
if remaining_docs:
|
| 1021 |
+
# Sub-corpus avec uniquement les docs restants. On
|
| 1022 |
+
# conserve le ``name`` original pour que les chemins de
|
| 1023 |
+
# partial restent cohérents si un re-run arrive.
|
| 1024 |
+
sub_corpus = LegacyCorpus(
|
| 1025 |
+
name=corpus.name,
|
| 1026 |
+
documents=remaining_docs,
|
| 1027 |
+
source_path=corpus.source_path,
|
| 1028 |
+
)
|
| 1029 |
+
|
| 1030 |
+
with tempfile.TemporaryDirectory(
|
| 1031 |
+
prefix="picarones_bench_partial_",
|
| 1032 |
+
) as ws:
|
| 1033 |
+
workspace = Path(ws)
|
| 1034 |
+
gt_dir = workspace / "gt"
|
| 1035 |
+
gt_dir.mkdir()
|
| 1036 |
+
run_dir = workspace / "run"
|
| 1037 |
+
run_dir.mkdir()
|
| 1038 |
+
|
| 1039 |
+
sub_corpus_spec = corpus_to_corpus_spec(
|
| 1040 |
+
sub_corpus, workspace_dir=gt_dir,
|
| 1041 |
+
)
|
| 1042 |
+
pipeline_spec = engine_to_pipeline_spec(engine)
|
| 1043 |
+
adapter_resolver = build_adapter_resolver([engine])
|
| 1044 |
+
pipeline_to_engine_name = {pipeline_spec.name: engine.name}
|
| 1045 |
+
|
| 1046 |
+
run_result = _execute_via_benchmark_service(
|
| 1047 |
+
corpus_spec=sub_corpus_spec,
|
| 1048 |
+
pipeline_specs=[pipeline_spec],
|
| 1049 |
+
adapter_resolver=adapter_resolver,
|
| 1050 |
+
workspace_uri=str(run_dir),
|
| 1051 |
+
code_version=code_version,
|
| 1052 |
+
timeout_seconds=timeout_seconds,
|
| 1053 |
+
progress_callback=progress_callback,
|
| 1054 |
+
cancel_event=cancel_event,
|
| 1055 |
+
pipeline_to_engine_name=pipeline_to_engine_name,
|
| 1056 |
+
)
|
| 1057 |
+
|
| 1058 |
+
# Convertir ce sous-RunResult en EngineReport avec
|
| 1059 |
+
# uniquement les docs restants — puis extraire les
|
| 1060 |
+
# ``DocumentResult`` pour append au partial.
|
| 1061 |
+
sub_report = run_result_to_benchmark_result(
|
| 1062 |
+
run_result,
|
| 1063 |
+
corpus=sub_corpus,
|
| 1064 |
+
engines=[engine],
|
| 1065 |
+
char_exclude=char_exclude,
|
| 1066 |
+
normalization_profile=normalization_profile,
|
| 1067 |
+
)
|
| 1068 |
+
new_doc_results = list(
|
| 1069 |
+
sub_report.engine_reports[0].document_results,
|
| 1070 |
+
)
|
| 1071 |
+
|
| 1072 |
+
# Append au partial : un cancel mid-engine
|
| 1073 |
+
# préservera ce qui a déjà été calculé.
|
| 1074 |
+
for dr in new_doc_results:
|
| 1075 |
+
_save_partial_line(partial_path, dr)
|
| 1076 |
+
|
| 1077 |
+
# Fusion : loaded + new, ré-ordonné selon le corpus original.
|
| 1078 |
+
all_doc_results = list(loaded_results) + new_doc_results
|
| 1079 |
+
all_doc_results.sort(key=lambda dr: doc_order.get(dr.doc_id, 0))
|
| 1080 |
+
|
| 1081 |
+
aggregated = aggregate_metrics([d.metrics for d in all_doc_results])
|
| 1082 |
+
pipeline_info = _build_pipeline_info(engine)
|
| 1083 |
+
|
| 1084 |
+
engine_reports.append(
|
| 1085 |
+
EngineReport(
|
| 1086 |
+
engine_name=engine.name,
|
| 1087 |
+
engine_version=_safe_engine_version(engine),
|
| 1088 |
+
engine_config=getattr(engine, "config", {}) or {},
|
| 1089 |
+
document_results=all_doc_results,
|
| 1090 |
+
aggregated_metrics=aggregated,
|
| 1091 |
+
pipeline_info=pipeline_info,
|
| 1092 |
+
),
|
| 1093 |
+
)
|
| 1094 |
+
|
| 1095 |
+
# Engine traité avec succès → cleanup du partial. Si on
|
| 1096 |
+
# arrive ici sans exception, tous les docs sont dans
|
| 1097 |
+
# ``all_doc_results``.
|
| 1098 |
+
_delete_partial(partial_path)
|
| 1099 |
+
|
| 1100 |
+
return BenchmarkResult(
|
| 1101 |
+
corpus_name=corpus.name,
|
| 1102 |
+
corpus_source=str(corpus.source_path) if corpus.source_path else None,
|
| 1103 |
+
document_count=len(corpus.documents),
|
| 1104 |
+
engine_reports=engine_reports,
|
| 1105 |
+
)
|
| 1106 |
|
| 1107 |
|
| 1108 |
def _execute_via_benchmark_service(
|
|
@@ -0,0 +1,455 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Sprint D.2.b — reprise sur interruption (``partial_dir``) dans
|
| 2 |
+
``run_benchmark_via_service``.
|
| 3 |
+
|
| 4 |
+
Couvre :
|
| 5 |
+
|
| 6 |
+
- Helpers ``picarones.app.services._legacy_partial_store`` (chemin,
|
| 7 |
+
sérialisation NDJSON, tolérance aux lignes corrompues).
|
| 8 |
+
- Comportement bout-en-bout de ``run_benchmark_via_service`` quand
|
| 9 |
+
``partial_dir`` est fourni :
|
| 10 |
+
reprise depuis un partial existant, suppression à la fin d'un
|
| 11 |
+
engine traité avec succès, isolation per-engine.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
import json
|
| 17 |
+
import threading
|
| 18 |
+
from pathlib import Path
|
| 19 |
+
|
| 20 |
+
import pytest
|
| 21 |
+
|
| 22 |
+
from picarones.adapters.legacy_engines.base import BaseOCREngine
|
| 23 |
+
from picarones.app.services._legacy_partial_store import (
|
| 24 |
+
_delete_partial,
|
| 25 |
+
_load_partial,
|
| 26 |
+
_partial_path,
|
| 27 |
+
_sanitize_filename,
|
| 28 |
+
_save_partial_line,
|
| 29 |
+
)
|
| 30 |
+
from picarones.app.services._legacy_runner_adapter import (
|
| 31 |
+
run_benchmark_via_service,
|
| 32 |
+
)
|
| 33 |
+
from picarones.evaluation.benchmark_result import DocumentResult
|
| 34 |
+
from picarones.evaluation.corpus import Corpus, Document
|
| 35 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 39 |
+
# Mocks
|
| 40 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
class _MockOCR(BaseOCREngine):
|
| 44 |
+
def __init__(self, name: str = "mock_ocr") -> None:
|
| 45 |
+
super().__init__(config={})
|
| 46 |
+
self._name = name
|
| 47 |
+
|
| 48 |
+
@property
|
| 49 |
+
def name(self) -> str: # type: ignore[override]
|
| 50 |
+
return self._name
|
| 51 |
+
|
| 52 |
+
def version(self) -> str:
|
| 53 |
+
return "1.0"
|
| 54 |
+
|
| 55 |
+
def _run_ocr(self, image_path):
|
| 56 |
+
return "ocr text"
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
def _make_doc_result(doc_id: str, hyp: str = "h", cer: float = 0.1) -> DocumentResult:
|
| 60 |
+
return DocumentResult(
|
| 61 |
+
doc_id=doc_id,
|
| 62 |
+
image_path=f"/tmp/{doc_id}.png",
|
| 63 |
+
ground_truth="g",
|
| 64 |
+
hypothesis=hyp,
|
| 65 |
+
metrics=MetricsResult(
|
| 66 |
+
cer=cer,
|
| 67 |
+
cer_nfc=cer,
|
| 68 |
+
cer_caseless=cer,
|
| 69 |
+
wer=cer,
|
| 70 |
+
wer_normalized=cer,
|
| 71 |
+
mer=cer,
|
| 72 |
+
wil=cer,
|
| 73 |
+
reference_length=1,
|
| 74 |
+
hypothesis_length=1,
|
| 75 |
+
),
|
| 76 |
+
duration_seconds=0.5,
|
| 77 |
+
)
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 81 |
+
# 1. Helpers _legacy_partial_store
|
| 82 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
class TestSanitizeFilename:
|
| 86 |
+
def test_keeps_word_chars_and_dash(self) -> None:
|
| 87 |
+
assert _sanitize_filename("abc-123_def") == "abc-123_def"
|
| 88 |
+
|
| 89 |
+
def test_replaces_special_chars(self) -> None:
|
| 90 |
+
assert _sanitize_filename("a/b:c d") == "a_b_c_d"
|
| 91 |
+
|
| 92 |
+
def test_truncates_to_64_chars(self) -> None:
|
| 93 |
+
result = _sanitize_filename("a" * 100)
|
| 94 |
+
assert len(result) == 64
|
| 95 |
+
assert result == "a" * 64
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
class TestPartialPath:
|
| 99 |
+
def test_uses_partial_dir(self, tmp_path: Path) -> None:
|
| 100 |
+
path = _partial_path("corpus_x", "engine_y", tmp_path)
|
| 101 |
+
assert path.parent == tmp_path
|
| 102 |
+
assert "corpus_x" in path.name
|
| 103 |
+
assert "engine_y" in path.name
|
| 104 |
+
assert path.suffix == ".jsonl"
|
| 105 |
+
|
| 106 |
+
def test_sanitizes_names_in_path(self, tmp_path: Path) -> None:
|
| 107 |
+
path = _partial_path("c/orpus", "engine:a", tmp_path)
|
| 108 |
+
# Pas de slash résiduel dans le filename — uniquement dans
|
| 109 |
+
# le dirname (tmp_path).
|
| 110 |
+
assert "/" not in path.name
|
| 111 |
+
assert ":" not in path.name
|
| 112 |
+
|
| 113 |
+
def test_none_partial_dir_falls_back_to_tempdir(self) -> None:
|
| 114 |
+
import tempfile
|
| 115 |
+
path = _partial_path("c", "e", None)
|
| 116 |
+
assert path.parent == Path(tempfile.gettempdir())
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
class TestSaveAndLoad:
|
| 120 |
+
def test_round_trip_single_result(self, tmp_path: Path) -> None:
|
| 121 |
+
path = tmp_path / "r.jsonl"
|
| 122 |
+
dr = _make_doc_result("doc1", hyp="hello", cer=0.05)
|
| 123 |
+
|
| 124 |
+
_save_partial_line(path, dr)
|
| 125 |
+
loaded = _load_partial(path)
|
| 126 |
+
|
| 127 |
+
assert len(loaded) == 1
|
| 128 |
+
assert loaded[0].doc_id == "doc1"
|
| 129 |
+
assert loaded[0].hypothesis == "hello"
|
| 130 |
+
assert loaded[0].metrics.cer == pytest.approx(0.05)
|
| 131 |
+
|
| 132 |
+
def test_round_trip_preserves_optional_fields(self, tmp_path: Path) -> None:
|
| 133 |
+
path = tmp_path / "r.jsonl"
|
| 134 |
+
dr = _make_doc_result("doc1")
|
| 135 |
+
dr.ocr_intermediate = "intermediate"
|
| 136 |
+
dr.pipeline_metadata = {"mode": "post_correction_texte"}
|
| 137 |
+
|
| 138 |
+
_save_partial_line(path, dr)
|
| 139 |
+
loaded = _load_partial(path)
|
| 140 |
+
|
| 141 |
+
assert loaded[0].ocr_intermediate == "intermediate"
|
| 142 |
+
assert loaded[0].pipeline_metadata == {"mode": "post_correction_texte"}
|
| 143 |
+
|
| 144 |
+
def test_appends_multiple_results(self, tmp_path: Path) -> None:
|
| 145 |
+
path = tmp_path / "r.jsonl"
|
| 146 |
+
for i in range(3):
|
| 147 |
+
_save_partial_line(path, _make_doc_result(f"doc{i}"))
|
| 148 |
+
|
| 149 |
+
loaded = _load_partial(path)
|
| 150 |
+
assert [d.doc_id for d in loaded] == ["doc0", "doc1", "doc2"]
|
| 151 |
+
|
| 152 |
+
def test_empty_file_returns_empty_list(self, tmp_path: Path) -> None:
|
| 153 |
+
path = tmp_path / "empty.jsonl"
|
| 154 |
+
path.write_text("", encoding="utf-8")
|
| 155 |
+
assert _load_partial(path) == []
|
| 156 |
+
|
| 157 |
+
def test_missing_file_returns_empty_list(self, tmp_path: Path) -> None:
|
| 158 |
+
path = tmp_path / "nope.jsonl"
|
| 159 |
+
assert _load_partial(path) == []
|
| 160 |
+
|
| 161 |
+
def test_corrupted_line_is_skipped(
|
| 162 |
+
self, tmp_path: Path, caplog: pytest.LogCaptureFixture,
|
| 163 |
+
) -> None:
|
| 164 |
+
path = tmp_path / "r.jsonl"
|
| 165 |
+
# Une ligne valide + une corrompue + une valide.
|
| 166 |
+
_save_partial_line(path, _make_doc_result("doc0"))
|
| 167 |
+
with path.open("a", encoding="utf-8") as fh:
|
| 168 |
+
fh.write("not valid json\n")
|
| 169 |
+
_save_partial_line(path, _make_doc_result("doc2"))
|
| 170 |
+
|
| 171 |
+
with caplog.at_level("WARNING"):
|
| 172 |
+
loaded = _load_partial(path)
|
| 173 |
+
|
| 174 |
+
assert [d.doc_id for d in loaded] == ["doc0", "doc2"]
|
| 175 |
+
|
| 176 |
+
def test_save_creates_parent_directory(self, tmp_path: Path) -> None:
|
| 177 |
+
path = tmp_path / "subdir" / "r.jsonl"
|
| 178 |
+
_save_partial_line(path, _make_doc_result("doc0"))
|
| 179 |
+
assert path.exists()
|
| 180 |
+
|
| 181 |
+
def test_concurrent_writes_are_safe(self, tmp_path: Path) -> None:
|
| 182 |
+
"""Le lock module-level sérialise les appends — le fichier ne
|
| 183 |
+
contient jamais une ligne tronquée même avec N threads."""
|
| 184 |
+
path = tmp_path / "concurrent.jsonl"
|
| 185 |
+
n_threads = 8
|
| 186 |
+
per_thread = 10
|
| 187 |
+
|
| 188 |
+
def writer(tid: int) -> None:
|
| 189 |
+
for i in range(per_thread):
|
| 190 |
+
_save_partial_line(path, _make_doc_result(f"t{tid}_d{i}"))
|
| 191 |
+
|
| 192 |
+
threads = [threading.Thread(target=writer, args=(t,)) for t in range(n_threads)]
|
| 193 |
+
for t in threads:
|
| 194 |
+
t.start()
|
| 195 |
+
for t in threads:
|
| 196 |
+
t.join()
|
| 197 |
+
|
| 198 |
+
loaded = _load_partial(path)
|
| 199 |
+
assert len(loaded) == n_threads * per_thread
|
| 200 |
+
# Tous les doc_ids sont uniques et bien formés.
|
| 201 |
+
assert len({d.doc_id for d in loaded}) == n_threads * per_thread
|
| 202 |
+
|
| 203 |
+
|
| 204 |
+
class TestDelete:
|
| 205 |
+
def test_delete_existing_file(self, tmp_path: Path) -> None:
|
| 206 |
+
path = tmp_path / "r.jsonl"
|
| 207 |
+
path.write_text("x\n", encoding="utf-8")
|
| 208 |
+
_delete_partial(path)
|
| 209 |
+
assert not path.exists()
|
| 210 |
+
|
| 211 |
+
def test_delete_missing_file_is_noop(self, tmp_path: Path) -> None:
|
| 212 |
+
path = tmp_path / "nope.jsonl"
|
| 213 |
+
# Ne lève pas.
|
| 214 |
+
_delete_partial(path)
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 218 |
+
# 2. Resume bout-en-bout dans run_benchmark_via_service
|
| 219 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
class TestResumeViaPartialDir:
|
| 223 |
+
"""Sprint D.2.b — quand ``partial_dir`` est fourni,
|
| 224 |
+
``run_benchmark_via_service`` reprend depuis l'éventuel partial
|
| 225 |
+
existant et persiste chaque ``DocumentResult`` au fil de l'eau."""
|
| 226 |
+
|
| 227 |
+
def _make_corpus(self, tmp_path: Path, n: int = 3) -> Corpus:
|
| 228 |
+
docs = []
|
| 229 |
+
for i in range(n):
|
| 230 |
+
img = tmp_path / f"doc{i}.png"
|
| 231 |
+
img.write_bytes(b"x")
|
| 232 |
+
docs.append(Document(
|
| 233 |
+
image_path=img,
|
| 234 |
+
ground_truth=f"gt {i}",
|
| 235 |
+
doc_id=f"doc{i}",
|
| 236 |
+
))
|
| 237 |
+
return Corpus(name="resume_test", documents=docs)
|
| 238 |
+
|
| 239 |
+
def test_fresh_run_deletes_partial_on_success(self, tmp_path: Path) -> None:
|
| 240 |
+
partial_dir = tmp_path / "partials"
|
| 241 |
+
corpus = self._make_corpus(tmp_path, n=2)
|
| 242 |
+
ocr = _MockOCR(name="resumable")
|
| 243 |
+
ocr._run_ocr = lambda p: "match"
|
| 244 |
+
|
| 245 |
+
bm = run_benchmark_via_service(
|
| 246 |
+
corpus, [ocr], partial_dir=partial_dir,
|
| 247 |
+
)
|
| 248 |
+
|
| 249 |
+
assert bm.document_count == 2
|
| 250 |
+
# Plus aucun fichier partial pour cet engine après succès.
|
| 251 |
+
partial_path = _partial_path(corpus.name, ocr.name, partial_dir)
|
| 252 |
+
assert not partial_path.exists()
|
| 253 |
+
|
| 254 |
+
def test_resume_skips_already_done_docs(self, tmp_path: Path) -> None:
|
| 255 |
+
"""Si un partial existe avec doc0 déjà calculé, le run ne
|
| 256 |
+
ré-invoque pas l'engine pour doc0 — il prend le résultat
|
| 257 |
+
partiel tel quel."""
|
| 258 |
+
partial_dir = tmp_path / "partials"
|
| 259 |
+
partial_dir.mkdir()
|
| 260 |
+
corpus = self._make_corpus(tmp_path, n=3)
|
| 261 |
+
|
| 262 |
+
ocr = _MockOCR(name="resumable2")
|
| 263 |
+
# On compte combien de fois l'engine est appelé.
|
| 264 |
+
call_count = {"n": 0}
|
| 265 |
+
|
| 266 |
+
def counting_ocr(p):
|
| 267 |
+
call_count["n"] += 1
|
| 268 |
+
return "match"
|
| 269 |
+
|
| 270 |
+
ocr._run_ocr = counting_ocr
|
| 271 |
+
|
| 272 |
+
# Pré-écrire un partial pour doc0 avec une CER fictive de 0.99
|
| 273 |
+
# pour vérifier qu'on prend la valeur du partial, pas une
|
| 274 |
+
# nouvelle exécution.
|
| 275 |
+
partial_path = _partial_path(corpus.name, ocr.name, partial_dir)
|
| 276 |
+
pre_existing = _make_doc_result("doc0", hyp="from_partial", cer=0.99)
|
| 277 |
+
_save_partial_line(partial_path, pre_existing)
|
| 278 |
+
|
| 279 |
+
bm = run_benchmark_via_service(
|
| 280 |
+
corpus, [ocr], partial_dir=partial_dir,
|
| 281 |
+
)
|
| 282 |
+
|
| 283 |
+
# L'engine n'a été appelé que pour doc1 + doc2 (pas doc0).
|
| 284 |
+
assert call_count["n"] == 2
|
| 285 |
+
|
| 286 |
+
# Le résultat final contient bien les 3 docs, doc0 venant
|
| 287 |
+
# du partial (CER 0.99).
|
| 288 |
+
report = bm.engine_reports[0]
|
| 289 |
+
assert len(report.document_results) == 3
|
| 290 |
+
doc0_result = next(d for d in report.document_results if d.doc_id == "doc0")
|
| 291 |
+
assert doc0_result.hypothesis == "from_partial"
|
| 292 |
+
assert doc0_result.metrics.cer == pytest.approx(0.99)
|
| 293 |
+
|
| 294 |
+
def test_all_docs_already_done_skips_engine_entirely(
|
| 295 |
+
self, tmp_path: Path,
|
| 296 |
+
) -> None:
|
| 297 |
+
partial_dir = tmp_path / "partials"
|
| 298 |
+
partial_dir.mkdir()
|
| 299 |
+
corpus = self._make_corpus(tmp_path, n=2)
|
| 300 |
+
|
| 301 |
+
ocr = _MockOCR(name="alldone")
|
| 302 |
+
ocr._run_ocr = lambda p: pytest.fail(
|
| 303 |
+
"Engine ne devrait pas être appelé — tout est dans le partial.",
|
| 304 |
+
)
|
| 305 |
+
|
| 306 |
+
partial_path = _partial_path(corpus.name, ocr.name, partial_dir)
|
| 307 |
+
for i in range(2):
|
| 308 |
+
_save_partial_line(
|
| 309 |
+
partial_path, _make_doc_result(f"doc{i}", hyp=f"prefilled{i}"),
|
| 310 |
+
)
|
| 311 |
+
|
| 312 |
+
bm = run_benchmark_via_service(
|
| 313 |
+
corpus, [ocr], partial_dir=partial_dir,
|
| 314 |
+
)
|
| 315 |
+
|
| 316 |
+
report = bm.engine_reports[0]
|
| 317 |
+
assert len(report.document_results) == 2
|
| 318 |
+
# Ordre du corpus original préservé.
|
| 319 |
+
assert [d.doc_id for d in report.document_results] == ["doc0", "doc1"]
|
| 320 |
+
assert [d.hypothesis for d in report.document_results] == [
|
| 321 |
+
"prefilled0", "prefilled1",
|
| 322 |
+
]
|
| 323 |
+
|
| 324 |
+
def test_per_engine_isolation(self, tmp_path: Path) -> None:
|
| 325 |
+
"""Deux engines ont chacun leur propre fichier partial — un
|
| 326 |
+
partial pour engine_a ne pollue pas engine_b."""
|
| 327 |
+
partial_dir = tmp_path / "partials"
|
| 328 |
+
partial_dir.mkdir()
|
| 329 |
+
corpus = self._make_corpus(tmp_path, n=2)
|
| 330 |
+
|
| 331 |
+
ocr_a = _MockOCR(name="engine_a")
|
| 332 |
+
ocr_a._run_ocr = lambda p: "from_a"
|
| 333 |
+
ocr_b = _MockOCR(name="engine_b")
|
| 334 |
+
ocr_b._run_ocr = lambda p: "from_b"
|
| 335 |
+
|
| 336 |
+
# Pré-remplir uniquement le partial de engine_a pour doc0.
|
| 337 |
+
partial_a = _partial_path(corpus.name, ocr_a.name, partial_dir)
|
| 338 |
+
_save_partial_line(
|
| 339 |
+
partial_a, _make_doc_result("doc0", hyp="A_pre"),
|
| 340 |
+
)
|
| 341 |
+
|
| 342 |
+
bm = run_benchmark_via_service(
|
| 343 |
+
corpus, [ocr_a, ocr_b], partial_dir=partial_dir,
|
| 344 |
+
)
|
| 345 |
+
|
| 346 |
+
report_a = next(r for r in bm.engine_reports if r.engine_name == "engine_a")
|
| 347 |
+
report_b = next(r for r in bm.engine_reports if r.engine_name == "engine_b")
|
| 348 |
+
|
| 349 |
+
# engine_a : doc0 vient du partial, doc1 calculé.
|
| 350 |
+
a_doc0 = next(d for d in report_a.document_results if d.doc_id == "doc0")
|
| 351 |
+
assert a_doc0.hypothesis == "A_pre"
|
| 352 |
+
|
| 353 |
+
# engine_b : doc0 calculé from_b (pas de partial pour B).
|
| 354 |
+
b_doc0 = next(d for d in report_b.document_results if d.doc_id == "doc0")
|
| 355 |
+
assert b_doc0.hypothesis == "from_b"
|
| 356 |
+
|
| 357 |
+
def test_partial_files_removed_on_success(self, tmp_path: Path) -> None:
|
| 358 |
+
partial_dir = tmp_path / "partials"
|
| 359 |
+
corpus = self._make_corpus(tmp_path, n=2)
|
| 360 |
+
|
| 361 |
+
engines = [_MockOCR(name=f"e{i}") for i in range(3)]
|
| 362 |
+
for e in engines:
|
| 363 |
+
e._run_ocr = lambda p: "match"
|
| 364 |
+
|
| 365 |
+
run_benchmark_via_service(
|
| 366 |
+
corpus, engines, partial_dir=partial_dir,
|
| 367 |
+
)
|
| 368 |
+
|
| 369 |
+
# Aucun fichier partial ne survit après un run réussi.
|
| 370 |
+
leftovers = list(partial_dir.glob("*.partial.jsonl"))
|
| 371 |
+
assert leftovers == [], f"partials résiduels : {leftovers}"
|
| 372 |
+
|
| 373 |
+
def test_no_partial_dir_keeps_unified_path(self, tmp_path: Path) -> None:
|
| 374 |
+
"""Sans ``partial_dir``, le code garde le chemin rapide
|
| 375 |
+
unifié (pas de fichiers partiels créés)."""
|
| 376 |
+
corpus = self._make_corpus(tmp_path, n=2)
|
| 377 |
+
ocr = _MockOCR(name="no_partial")
|
| 378 |
+
ocr._run_ocr = lambda p: "match"
|
| 379 |
+
|
| 380 |
+
bm = run_benchmark_via_service(corpus, [ocr])
|
| 381 |
+
assert bm.document_count == 2
|
| 382 |
+
|
| 383 |
+
# Aucun .partial.jsonl créé dans tmp_path car le chemin
|
| 384 |
+
# unifié n'écrit pas de partials.
|
| 385 |
+
leftovers = list(tmp_path.rglob("*.partial.jsonl"))
|
| 386 |
+
assert leftovers == []
|
| 387 |
+
|
| 388 |
+
def test_partial_persists_when_engine_was_not_finished(
|
| 389 |
+
self, tmp_path: Path,
|
| 390 |
+
) -> None:
|
| 391 |
+
"""Si le run a réussi pour engine_a (partial supprimé) mais
|
| 392 |
+
seuls 1/2 docs sont dans le partial de engine_b avant
|
| 393 |
+
cancel, le partial de engine_b doit survivre pour reprise."""
|
| 394 |
+
partial_dir = tmp_path / "partials"
|
| 395 |
+
partial_dir.mkdir()
|
| 396 |
+
corpus = self._make_corpus(tmp_path, n=2)
|
| 397 |
+
|
| 398 |
+
# Simulation d'un état post-crash : engine_b a un partial
|
| 399 |
+
# avec doc0 mais pas doc1. cancel_event signalé avant
|
| 400 |
+
# l'engine suivant.
|
| 401 |
+
ocr_b = _MockOCR(name="incomplete_b")
|
| 402 |
+
partial_b = _partial_path(corpus.name, ocr_b.name, partial_dir)
|
| 403 |
+
_save_partial_line(
|
| 404 |
+
partial_b, _make_doc_result("doc0", hyp="B0_pre"),
|
| 405 |
+
)
|
| 406 |
+
|
| 407 |
+
# cancel_event signalé → on n'entre pas dans la boucle
|
| 408 |
+
# engine. Pas de docs traités pendant ce run.
|
| 409 |
+
cancel = threading.Event()
|
| 410 |
+
cancel.set()
|
| 411 |
+
|
| 412 |
+
bm = run_benchmark_via_service(
|
| 413 |
+
corpus, [ocr_b],
|
| 414 |
+
partial_dir=partial_dir,
|
| 415 |
+
cancel_event=cancel,
|
| 416 |
+
)
|
| 417 |
+
|
| 418 |
+
# Aucun engine traité (cancel pré-engine).
|
| 419 |
+
assert bm.engine_reports == []
|
| 420 |
+
# Le partial de engine_b est préservé pour la prochaine
|
| 421 |
+
# exécution.
|
| 422 |
+
assert partial_b.exists()
|
| 423 |
+
|
| 424 |
+
|
| 425 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 426 |
+
# 3. Sérialisation NDJSON cross-process
|
| 427 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 428 |
+
|
| 429 |
+
|
| 430 |
+
class TestNDJSONFormat:
|
| 431 |
+
"""Le format NDJSON (une ligne JSON par document) est ce qui
|
| 432 |
+
rend la reprise robuste : un crash mid-write tronque au pire
|
| 433 |
+
une ligne ; toutes les lignes complètes restent lisibles."""
|
| 434 |
+
|
| 435 |
+
def test_one_json_per_line(self, tmp_path: Path) -> None:
|
| 436 |
+
path = tmp_path / "r.jsonl"
|
| 437 |
+
_save_partial_line(path, _make_doc_result("doc0"))
|
| 438 |
+
_save_partial_line(path, _make_doc_result("doc1"))
|
| 439 |
+
|
| 440 |
+
lines = path.read_text(encoding="utf-8").splitlines()
|
| 441 |
+
assert len(lines) == 2
|
| 442 |
+
for line in lines:
|
| 443 |
+
payload = json.loads(line)
|
| 444 |
+
assert "doc_id" in payload
|
| 445 |
+
assert "metrics" in payload
|
| 446 |
+
|
| 447 |
+
def test_unicode_preserved_in_hypothesis(self, tmp_path: Path) -> None:
|
| 448 |
+
path = tmp_path / "r.jsonl"
|
| 449 |
+
dr = _make_doc_result("doc1")
|
| 450 |
+
dr.hypothesis = "Église — œ ç à é"
|
| 451 |
+
|
| 452 |
+
_save_partial_line(path, dr)
|
| 453 |
+
loaded = _load_partial(path)
|
| 454 |
+
|
| 455 |
+
assert loaded[0].hypothesis == "Église — œ ç à é"
|
|
@@ -40,8 +40,10 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 40 |
"picarones/adapters/legacy_pipelines/_executor_runner.py": 470, # actuel 410
|
| 41 |
# Sprint D.1 (plan v2.0) — adapter de compat run_benchmark legacy
|
| 42 |
# → BenchmarkService rewrite. Module transitoire qui sera
|
| 43 |
-
# supprimé en
|
| 44 |
-
|
|
|
|
|
|
|
| 45 |
# --- God-modules : budget actuel + 15 % de marge.
|
| 46 |
# Le rétrécissement sera l'objet d'un sprint de refactor dédié.
|
| 47 |
# statistics.py (1128 lignes) a été éclaté en sous-package
|
|
|
|
| 40 |
"picarones/adapters/legacy_pipelines/_executor_runner.py": 470, # actuel 410
|
| 41 |
# Sprint D.1 (plan v2.0) — adapter de compat run_benchmark legacy
|
| 42 |
# → BenchmarkService rewrite. Module transitoire qui sera
|
| 43 |
+
# supprimé en H.4 avec interfaces/{cli,web}/_legacy/.
|
| 44 |
+
# Sprint D.2.b a ajouté ~260 LOC pour la branche resumable
|
| 45 |
+
# (``_run_benchmark_with_partial``).
|
| 46 |
+
"picarones/app/services/_legacy_runner_adapter.py": 1450, # actuel 1269
|
| 47 |
# --- God-modules : budget actuel + 15 % de marge.
|
| 48 |
# Le rétrécissement sera l'objet d'un sprint de refactor dédié.
|
| 49 |
# statistics.py (1128 lignes) a été éclaté en sous-package
|