Claude
feat(sprint-H.3)!: renommage reports_v2/ β†’ reports/
9011070 unverified
Raw
History Blame
5.01 kB
# Narrative rendering templates β€” English.
# Anti-hallucination rule: never introduce a number or entity name that is not
# already in the Fact ``payload``. Tests verify traceability of every number
# appearing in the rendered synthesis.
global_leader_cer: >-
On this corpus of {n_docs} documents, {engine} achieves the lowest mean CER
({cer_pct} %).
statistical_tie: >-
Engines {engines_list} are not statistically distinguishable
(Friedman-Nemenyi, Ξ± = {alpha}, n = {n_blocks} documents, CD = {critical_distance}).
significant_gap: >-
The gap between {leader} and {runner_up} is statistically significant
(Wilcoxon, p = {p_value:.4f}, Ξ” CER = {delta_cer_pct} points over {n_pairs} pairs).
stratum_winner: >-
On stratum "{stratum}" ({n_docs_stratum} documents), {engine} achieves
the lowest CER ({cer_pct} % vs. {second_cer_pct} % for {second_engine}).
stratum_collapse: >-
{engine} is globally competitive ({global_cer_pct} %) but collapses on
stratum "{stratum}" ({local_cer_pct} % over {n_docs_stratum} documents,
i.e. {delta_cer_pct} points above its own average).
error_profile_outlier: >-
{engine} has an atypical error profile: {proportion_pct} % of errors fall
into class "{error_class}", vs. a median of {median_proportion_pct} % across
other engines (Γ—{ratio_to_median} the median).
llm_hallucination_flag: >-
Hallucination signal on {engine} ({reasons_list}) β€”
{hallucinating_rate_pct} % of documents above alert thresholds.
robustness_fragile: >-
{engine} is fragile under "{degradation}" degradation: its CER rises from
{cer_baseline_pct} % to {cer_degraded_pct} % at maximum level (Γ—{ratio}).
speed_winner: >-
{engine} is the fastest ({mean_duration} s/doc, Γ—{speedup} faster than the
median) for comparable quality (CER {cer_pct} %).
confidence_warning: >-
High statistical uncertainty: the {confidence_level} % confidence interval of
{engine} spans {ci_width_pct} CER points, compared with a gap of
{gap_to_runner_up_pct} points to the runner-up.
pareto_alternative: >-
At much lower cost, {engine} offers an interesting trade-off ({cer_pct} %
CER for {cost} €/{cost_unit_pages} pages, vs {leader_cer_pct} % / {leader_cost} € for
{leader}, i.e. Γ—{cost_saving_ratio} cheaper).
cost_outlier: >-
Disproportionate cost for {engine} ({cost} €/{cost_unit_pages} pages, Γ—{ratio_to_median}
the median) without a compensating quality advantage (CER {cer_pct} %).
ensemble_opportunity: >-
Engines {pair_a} and {pair_b} have divergent error profiles
({divergence_metric}={divergence}). On this corpus of {doc_count} documents,
{best_engine} preserves {best_recall_pct} % of tokens; a majority vote
among the engines would preserve {oracle_recall_pct} % β€” i.e.
{absolute_gap_pct} points recoverable ({relative_gap_pct} % of the best
engine's errors).
median_mean_gap_warning: >-
Asymmetric distribution for {engine}: median CER {median_cer_pct} %
vs mean {mean_cer_pct} % across {n_docs} documents (relative gap
{relative_gap_pct} %). The mean is pulled by a few catastrophic
documents β€” the median (now used for default ranking) is more
representative.
stratification_recommended: >-
Heterogeneous corpus ({n_strata} strata): {leader} performs very
differently depending on document type β€” median CER
{min_stratum_cer_pct} % on "{min_stratum}" vs
{max_stratum_cer_pct} % on "{max_stratum}", a gap of {gap_pct}
points. The global ranking hides this disparity; consult the
stratified view.
engine_off_baseline: >-
{engine} achieved {cer_current_pct} % CER here, vs {cer_historical_mean_pct} %
on average over the last {n_runs} runs of your institution on this
same corpus (relative delta {relative_delta_pct} %). This corpus is
harder for it than usual.
engine_unstable: >-
Over {n_runs} successive runs, {engine} produces variable outputs
(CER CV {cer_cv_pct} %, identical-run pair rate {identical_run_rate_pct} %).
Reproducibility is limited β€” interpret the average CER with caution.
regression_in_history: >-
Over the {n_runs} historical runs for {engine}, the average CER
moved from {first_cer_pct} % to {last_cer_pct} %
(cumulative change {absolute_delta_pct} points). Investigate what
changed in the pipeline or the models.
# Sprint A3 (item B-3) β€” importer fallback incidents.
# The payload contains `importer`, `operation` and `incidents_for_importer`.
importer_fallback_triggered: >-
The "{importer}" importer fell back to degraded mode during the
"{operation}" operation ({incidents_for_importer} incident(s) this
run). Imported data may be incomplete or from a fallback β€” check
the logs for details.
# Sprint A8 (m-14) β€” pricing table expired.
# Payload: ``valid_until``, ``days_overdue``, ``today``.
pricing_staleness_warning: >-
The pricing table used expired on {valid_until} ({days_overdue}
days ago). Cost/€ and COβ‚‚ analyses reflect outdated rates β€” pass
``pricing=`` to ``ReportGenerator`` for fresh figures.