Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Running

Claude commited on 28 days ago

Commit

bb9f9b6

unverified ·

1 Parent(s): d83b13a

test(rename): dé-sprintage tests/evaluation (53 fichiers, git mv)

+ sweep générique des imports inter-tests dans le script (corrige
l'ordre-dépendance qui cassait l'import sprint23→sprint19, attrapé
par la suite). Refs docs/ patchées en lockstep.

https://claude.ai/code/session_01EmLiMPJJuB44QHEFzDWUvF

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

docs/explanation/narrative-engine.en.md +3 -3
docs/explanation/narrative-engine.md +3 -3
docs/migration/option_b_test_inventory.md +6 -6
docs/reference/comparing-views.md +1 -1
scripts/rename_sprint_tests.py +21 -14
tests/evaluation/metrics/{test_sprint56_abbreviations.py → test_abbreviations.py} +0 -0
tests/evaluation/metrics/{test_sprint23_anti_hallucination.py → test_anti_hallucination.py} +1 -1
tests/evaluation/metrics/{test_sprint73_baseline_comparison.py → test_baseline_comparison.py} +0 -0
tests/evaluation/metrics/{test_sprint39_calibration.py → test_calibration.py} +0 -0
tests/evaluation/metrics/{test_sprint79_cost_projection.py → test_cost_projection.py} +0 -0
tests/evaluation/metrics/{test_sprint29_detector_registry.py → test_detector_registry.py} +0 -0
tests/evaluation/metrics/{test_sprint58_early_modern.py → test_early_modern.py} +0 -0
tests/evaluation/metrics/{test_sprint36_ensemble_narrative.py → test_ensemble_narrative.py} +0 -0
tests/evaluation/metrics/{test_sprint78_equivalence_profile.py → test_equivalence_profile.py} +0 -0
tests/evaluation/metrics/{test_sprint10_error_distribution.py → test_error_distribution.py} +0 -0
tests/evaluation/metrics/{test_s5_extreme_inputs.py → test_extreme_inputs.py} +0 -0
tests/evaluation/metrics/{test_sprint18_friedman_nemenyi_cdd.py → test_friedman_nemenyi_cdd.py} +0 -0
tests/evaluation/metrics/{test_sprint93_image_predictive.py → test_image_predictive.py} +0 -0
tests/evaluation/metrics/{test_sprint96_incremental_comparison.py → test_incremental_comparison.py} +0 -0
tests/evaluation/metrics/{test_sprint35_inter_engine.py → test_inter_engine.py} +0 -0
tests/evaluation/metrics/{test_sprint54_layout.py → test_layout.py} +0 -0
tests/evaluation/metrics/{test_sprint15_llm_pipeline_bugs.py → test_llm_pipeline_bugs.py} +0 -0
tests/evaluation/metrics/{test_sprint8_longitudinal_robustness.py → test_longitudinal_robustness.py} +0 -0
tests/evaluation/metrics/{test_sprint44_median_default.py → test_median_default.py} +0 -0
tests/evaluation/metrics/{test_sprint59_modern_archives.py → test_modern_archives.py} +0 -0
tests/evaluation/metrics/{test_sprint97_module_policy.py → test_module_policy.py} +0 -0
tests/evaluation/metrics/{test_sprint57_mufi.py → test_mufi.py} +0 -0
tests/evaluation/metrics/{test_sprint19_narrative_engine.py → test_narrative_engine.py} +0 -0
tests/evaluation/metrics/{test_sprint16_narrative_foundations.py → test_narrative_foundations.py} +0 -0
tests/evaluation/metrics/{test_sprint38_ner_metrics.py → test_ner_metrics.py} +0 -0
tests/evaluation/metrics/{test_sprint_a14_s1_normalization_propagation.py → test_normalization_propagation.py} +0 -0
tests/evaluation/metrics/{test_sprint12_nouvelles_fonctionnalites.py → test_nouvelles_fonctionnalites.py} +0 -0
tests/evaluation/metrics/{test_sprint85_numerical_sequences.py → test_numerical_sequences.py} +0 -0
tests/evaluation/metrics/{test_sprint20_pareto_pricing.py → test_pareto_pricing.py} +0 -0
tests/evaluation/metrics/{test_sprint71_rare_tokens.py → test_rare_tokens.py} +0 -0
tests/evaluation/metrics/{test_sprint52_readability.py → test_readability.py} +0 -0
tests/evaluation/metrics/{test_sprint53_reading_order.py → test_reading_order.py} +0 -0
tests/evaluation/metrics/{test_sprint83_reliability.py → test_reliability.py} +0 -0
tests/evaluation/metrics/{test_sprint81_robustness_projection.py → test_robustness_projection.py} +0 -0
tests/evaluation/metrics/{test_sprint60_roman_numerals.py → test_roman_numerals.py} +0 -0
tests/evaluation/metrics/{test_sprint84_searchability.py → test_searchability.py} +0 -0
tests/evaluation/metrics/{test_sprint45_stratification.py → test_stratification.py} +0 -0
tests/evaluation/metrics/{test_sprint55_unicode_blocks.py → test_unicode_blocks.py} +0 -0
tests/evaluation/{test_sprint_a14_s1_compact_optin.py → test_compact_optin.py} +0 -0
tests/evaluation/{test_s8_corpus_gt_levels.py → test_corpus_gt_levels.py} +0 -0
tests/evaluation/{test_sprint_a14_s27_engines.py → test_engines.py} +0 -0
tests/evaluation/{test_sprint34_metric_registry.py → test_metric_registry.py} +0 -0
tests/evaluation/{test_sprint_a14_s1_metrics_error_returns_none.py → test_metrics_error_returns_none.py} +0 -0
tests/evaluation/{test_sprint32_multi_level_gt.py → test_multi_level_gt.py} +0 -0
tests/evaluation/{test_sprint_a14_s25_projector_payload.py → test_projector_payload.py} +0 -0

docs/explanation/narrative-engine.en.md CHANGED Viewed

@@ -92,7 +92,7 @@ In `tests/measurements/`:
   output where every number is in the payload.
 Update `tests/integration/test_chantier5.py` and
-`tests/measurements/test_sprint29_detector_registry.py` to bump
 the detector count.
 ## Editorial rules
@@ -110,8 +110,8 @@ the detector count.
 ## Testing the synthesis
 ```bash
-pytest tests/measurements/test_sprint19_narrative_engine.py
-pytest tests/measurements/test_sprint23_anti_hallucination.py
 ```
 The anti-hallucination test parses the rendered synthesis and

   output where every number is in the payload.
 Update `tests/integration/test_chantier5.py` and
+`tests/measurements/test_detector_registry.py` to bump
 the detector count.
 ## Editorial rules
 ## Testing the synthesis
 ```bash
+pytest tests/measurements/test_narrative_engine.py
+pytest tests/measurements/test_anti_hallucination.py
 ```
 The anti-hallucination test parses the rendered synthesis and

docs/explanation/narrative-engine.md CHANGED Viewed

@@ -163,7 +163,7 @@ Dans `arbiter.py`, deux choses à considérer :
 Ajoutez au minimum :
-- Un test unitaire dans `tests/test_sprint19_narrative_engine.py` (ou
   un nouveau fichier) :
 ```python
@@ -297,7 +297,7 @@ comme tolérance numérique). Cette whitelist est désormais vide :
 **Si vous ajoutez un détecteur dont le template référence un nombre
 constant** (ex. *« seuil α = 0,05 »*), vous devez **systématiquement**
 le mettre dans le `payload`. Le test
-`test_sprint19_narrative_engine.py::test_every_number_in_synthesis_is_traceable`
 plus le test
-`test_sprint23_anti_hallucination.py::TestTemplatesNoHardcodedLiterals`
 échoueront sinon.

 Ajoutez au minimum :
+- Un test unitaire dans `tests/test_narrative_engine.py` (ou
   un nouveau fichier) :
 ```python
 **Si vous ajoutez un détecteur dont le template référence un nombre
 constant** (ex. *« seuil α = 0,05 »*), vous devez **systématiquement**
 le mettre dans le `payload`. Le test
+`test_narrative_engine.py::test_every_number_in_synthesis_is_traceable`
 plus le test
+`test_anti_hallucination.py::TestTemplatesNoHardcodedLiterals`
 échoueront sinon.

docs/migration/option_b_test_inventory.md CHANGED Viewed

@@ -28,8 +28,8 @@ d'instances d'adapter en mémoire.
 | A4 | `tests/app/test_character_analysis_in_runner.py` | 246 LOC | 12 | Moyenne | Teste l'analyse caractère par engine. Conversion mécanique. |
 | A5 | `tests/app/test_sprint_h2b_canonical_in_runner.py` | 191 LOC | 9 | Moyenne | Teste l'extraction du `CANONICAL_DOCUMENT`. À adapter au nouveau ViewExecutor. |
 | A6 | `tests/evaluation/test_public_api.py` | — | 7 | Moyenne | API publique. Inclura un test de présence pour `RunOrchestrator`. |
-| A7 | `tests/evaluation/metrics/test_sprint12_nouvelles_fonctionnalites.py` | 288 LOC | 4 | Basse | Conversion mécanique. |
-| A8 | `tests/evaluation/metrics/test_sprint_a14_s1_normalization_propagation.py` | — | 2 | Basse | Vérifie `normalization_profile` — valide la Phase B2.5 (propagation via `EvaluationView`). |
 | A9 | `tests/evaluation/test_metric_hooks.py` | — | 1 | Basse | Trivial. Conversion en 1 ligne. |
 | A10 | `tests/architecture/test_file_budgets.py` | — | (référence uniquement) | Basse | Budgets des modules `_benchmark_*.py` à actualiser après Phase B2/B7. |
@@ -52,10 +52,10 @@ fait dans une fixture partagée.
 | B3 | `tests/reports/test_extra_metrics.py` | Métriques additionnelles attachées au rapport. |
 | B4 | `tests/reports/test_sprint72_worst_lines.py` | Worst-N lines (consomme `BenchmarkResult` non-compacté). |
 | B5 | `tests/evaluation/metrics/test_results.py` | API `MetricsResult` / `aggregate_metrics`. |
-| B6 | `tests/evaluation/metrics/test_sprint36_ensemble_narrative.py` | Narrative engine. Lit `benchmark_data` dict. |
-| B7 | `tests/evaluation/metrics/test_sprint44_median_default.py` | Médiane/Pareto. |
-| B8 | `tests/evaluation/metrics/test_sprint45_stratification.py` | Stratification du corpus. |
-| B9 | `tests/evaluation/test_sprint14_robust_filtering.py` | Filtre robustesse. |
 | B10 | `tests/adapters/corpus/test_sprint8_escriptorium_gallica.py` | Importer eScriptorium / Gallica. |
 | B11 | `tests/integration/test_importer_fallback_wiring.py` | Fallback importer. Test d'intégration. |
 | B12 | `tests/integration/test_s5_disk_full_simulation.py` | Disque plein. |

 | A4 | `tests/app/test_character_analysis_in_runner.py` | 246 LOC | 12 | Moyenne | Teste l'analyse caractère par engine. Conversion mécanique. |
 | A5 | `tests/app/test_sprint_h2b_canonical_in_runner.py` | 191 LOC | 9 | Moyenne | Teste l'extraction du `CANONICAL_DOCUMENT`. À adapter au nouveau ViewExecutor. |
 | A6 | `tests/evaluation/test_public_api.py` | — | 7 | Moyenne | API publique. Inclura un test de présence pour `RunOrchestrator`. |
+| A7 | `tests/evaluation/metrics/test_nouvelles_fonctionnalites.py` | 288 LOC | 4 | Basse | Conversion mécanique. |
+| A8 | `tests/evaluation/metrics/test_normalization_propagation.py` | — | 2 | Basse | Vérifie `normalization_profile` — valide la Phase B2.5 (propagation via `EvaluationView`). |
 | A9 | `tests/evaluation/test_metric_hooks.py` | — | 1 | Basse | Trivial. Conversion en 1 ligne. |
 | A10 | `tests/architecture/test_file_budgets.py` | — | (référence uniquement) | Basse | Budgets des modules `_benchmark_*.py` à actualiser après Phase B2/B7. |
 | B3 | `tests/reports/test_extra_metrics.py` | Métriques additionnelles attachées au rapport. |
 | B4 | `tests/reports/test_sprint72_worst_lines.py` | Worst-N lines (consomme `BenchmarkResult` non-compacté). |
 | B5 | `tests/evaluation/metrics/test_results.py` | API `MetricsResult` / `aggregate_metrics`. |
+| B6 | `tests/evaluation/metrics/test_ensemble_narrative.py` | Narrative engine. Lit `benchmark_data` dict. |
+| B7 | `tests/evaluation/metrics/test_median_default.py` | Médiane/Pareto. |
+| B8 | `tests/evaluation/metrics/test_stratification.py` | Stratification du corpus. |
+| B9 | `tests/evaluation/test_robust_filtering.py` | Filtre robustesse. |
 | B10 | `tests/adapters/corpus/test_sprint8_escriptorium_gallica.py` | Importer eScriptorium / Gallica. |
 | B11 | `tests/integration/test_importer_fallback_wiring.py` | Fallback importer. Test d'intégration. |
 | B12 | `tests/integration/test_s5_disk_full_simulation.py` | Disque plein. |

docs/reference/comparing-views.md CHANGED Viewed

@@ -26,7 +26,7 @@ masquerait des informations critiques.
 ### Pattern 1 : CER excellent, recherchabilité numérique catastrophique
 Démontré dans le test
-`tests/evaluation/test_sprint_a14_s16_views_consistency.py::TestDivergencePattern::test_year_corruption_invisible_to_cer_visible_to_search` :
 - **GT** : *"Charte signée à Paris le 14 juillet 1789 en présence du roi"*
 - **Hypothèse** : *"Charte signée à Paris le 14 juillet 1798 en présence du roi"*

 ### Pattern 1 : CER excellent, recherchabilité numérique catastrophique
 Démontré dans le test
+`tests/evaluation/test_views_consistency.py::TestDivergencePattern::test_year_corruption_invisible_to_cer_visible_to_search` :
 - **GT** : *"Charte signée à Paris le 14 juillet 1789 en présence du roi"*
 - **Hypothèse** : *"Charte signée à Paris le 14 juillet 1798 en présence du roi"*

scripts/rename_sprint_tests.py CHANGED Viewed

@@ -70,12 +70,6 @@ EXTERNAL_REF_FILES = [
 ]
 # Docs : nombreuses réfs ``test_s*`` — patchées par lot via grep
 # ciblé au moment du renommage du dossier correspondant (cf. --apply).
-# Import inter-tests connu (même lot tests/evaluation/metrics) :
-INTRA_TEST_IMPORT = (
-    "tests/evaluation/metrics/test_sprint23_anti_hallucination.py",
-    "tests.evaluation.metrics.test_sprint19_narrative_engine",
-    "tests.evaluation.metrics.test_narrative_engine",
-)
 def build_map() -> dict[str, str]:
@@ -158,14 +152,27 @@ def apply_dir(target_dir: str) -> int:
         if txt != orig:
             sp.write_text(txt, encoding="utf-8")
             print(f"patché refs : {src}")
-    # Import inter-tests connu.
-    if target_dir.rstrip("/") == "tests/evaluation/metrics":
-        ip = REPO / INTRA_TEST_IMPORT[0]
-        if ip.exists():
-            t = ip.read_text(encoding="utf-8").replace(
-                INTRA_TEST_IMPORT[1], INTRA_TEST_IMPORT[2])
-            ip.write_text(t, encoding="utf-8")
-            print(f"patché import inter-tests : {INTRA_TEST_IMPORT[0]}")
     return 0

 ]
 # Docs : nombreuses réfs ``test_s*`` — patchées par lot via grep
 # ciblé au moment du renommage du dossier correspondant (cf. --apply).
 def build_map() -> dict[str, str]:
         if txt != orig:
             sp.write_text(txt, encoding="utf-8")
             print(f"patché refs : {src}")
+    # Sweep GÉNÉRIQUE des imports inter-tests : tout module renommé
+    # dans ce lot, référencé en dotted-path depuis n'importe quel
+    # fichier de ``tests/`` (``from tests.x.y.<old_stem> import`` ou
+    # ``import tests.x.y.<old_stem>``), est repointé vers le nouveau
+    # stem.  Remplace l'ancien cas hardcodé fragile (ordre-dépendant).
+    stem_map = {
+        Path(old_name).stem: Path(new_name).stem
+        for old_name, new_name in renamed
+    }
+    for tp in TESTS.rglob("*.py"):
+        t = tp.read_text(encoding="utf-8")
+        orig = t
+        for old_stem, new_stem in stem_map.items():
+            # Borné par ``.`` (dotted import) — pas de match partiel
+            # sur un préfixe de nom plus long.
+            t = re.sub(rf"(?<=\.){re.escape(old_stem)}(?=\s|$|\.| import)",
+                       new_stem, t)
+        if t != orig:
+            tp.write_text(t, encoding="utf-8")
+            print(f"patché import inter-tests : "
+                  f"{tp.relative_to(REPO).as_posix()}")
     return 0

tests/evaluation/metrics/{test_sprint56_abbreviations.py → test_abbreviations.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint23_anti_hallucination.py → test_anti_hallucination.py} RENAMED Viewed

@@ -163,7 +163,7 @@ class TestEndToEndWithEmptyWhitelist:
     def test_every_number_traceable_with_empty_whitelist(self, lang):
         from picarones.reports.narrative import extract_numbers
-        from tests.evaluation.metrics.test_sprint19_narrative_engine import _numbers_in_payload
         result = build_synthesis(_full_data(), lang)
         allowed: set[str] = set()

     def test_every_number_traceable_with_empty_whitelist(self, lang):
         from picarones.reports.narrative import extract_numbers
+        from tests.evaluation.metrics.test_narrative_engine import _numbers_in_payload
         result = build_synthesis(_full_data(), lang)
         allowed: set[str] = set()

tests/evaluation/metrics/{test_sprint73_baseline_comparison.py → test_baseline_comparison.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint39_calibration.py → test_calibration.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint79_cost_projection.py → test_cost_projection.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint29_detector_registry.py → test_detector_registry.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint58_early_modern.py → test_early_modern.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint36_ensemble_narrative.py → test_ensemble_narrative.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint78_equivalence_profile.py → test_equivalence_profile.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint10_error_distribution.py → test_error_distribution.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_s5_extreme_inputs.py → test_extreme_inputs.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint18_friedman_nemenyi_cdd.py → test_friedman_nemenyi_cdd.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint93_image_predictive.py → test_image_predictive.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint96_incremental_comparison.py → test_incremental_comparison.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint35_inter_engine.py → test_inter_engine.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint54_layout.py → test_layout.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint15_llm_pipeline_bugs.py → test_llm_pipeline_bugs.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint8_longitudinal_robustness.py → test_longitudinal_robustness.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint44_median_default.py → test_median_default.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint59_modern_archives.py → test_modern_archives.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint97_module_policy.py → test_module_policy.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint57_mufi.py → test_mufi.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint19_narrative_engine.py → test_narrative_engine.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint16_narrative_foundations.py → test_narrative_foundations.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint38_ner_metrics.py → test_ner_metrics.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint_a14_s1_normalization_propagation.py → test_normalization_propagation.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint12_nouvelles_fonctionnalites.py → test_nouvelles_fonctionnalites.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint85_numerical_sequences.py → test_numerical_sequences.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint20_pareto_pricing.py → test_pareto_pricing.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint71_rare_tokens.py → test_rare_tokens.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint52_readability.py → test_readability.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint53_reading_order.py → test_reading_order.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint83_reliability.py → test_reliability.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint81_robustness_projection.py → test_robustness_projection.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint60_roman_numerals.py → test_roman_numerals.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint84_searchability.py → test_searchability.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint45_stratification.py → test_stratification.py} RENAMED Viewed

File without changes

tests/evaluation/metrics/{test_sprint55_unicode_blocks.py → test_unicode_blocks.py} RENAMED Viewed

File without changes

tests/evaluation/{test_sprint_a14_s1_compact_optin.py → test_compact_optin.py} RENAMED Viewed

File without changes

tests/evaluation/{test_s8_corpus_gt_levels.py → test_corpus_gt_levels.py} RENAMED Viewed

File without changes

tests/evaluation/{test_sprint_a14_s27_engines.py → test_engines.py} RENAMED Viewed

File without changes

tests/evaluation/{test_sprint34_metric_registry.py → test_metric_registry.py} RENAMED Viewed

File without changes

tests/evaluation/{test_sprint_a14_s1_metrics_error_returns_none.py → test_metrics_error_returns_none.py} RENAMED Viewed

File without changes

tests/evaluation/{test_sprint32_multi_level_gt.py → test_multi_level_gt.py} RENAMED Viewed

File without changes

tests/evaluation/{test_sprint_a14_s25_projector_payload.py → test_projector_payload.py} RENAMED Viewed

File without changes