Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Sleeping

App Files Files Community

Picarones / docs /explanation /narrative-engine.en.md

Claude

docs(sprint-H.9): archive migration plans + cleanup stale doc paths

2b782d0 unverified about 2 months ago

preview code

Raw

History Blame

3.72 kB

	<!-- translation: machine + human review pending -->
	<!-- canonical: docs/explanation/narrative-engine.md (FR) -->

	# Extending the narrative engine

	> 🇫🇷 [Version française](narrative-engine.md)

	The narrative engine produces the factual synthesis at the top
	of each report (Sprint 19). It detects salient facts via 20+
	deterministic detectors, arbitrates them (importance, anti-
	contradiction), and renders them through YAML templates with
	`str.format_map` — guaranteed traceability and zero hallucination.

	## Add a new detector in 5 steps

	### 1. Add a `FactType` in `picarones/domain/facts.py`

	```python
	class FactType(str, Enum):
	# ... existing ...
	YOUR_NEW_FACT = "your_new_fact"
	"""Short docstring describing what triggers this fact."""
	```

	### 2. Add the FR + EN templates

	`picarones/reports/narrative/templates/fr.yaml`:

	```yaml
	your_new_fact: >-
	Phrase factuelle citant {engine} et {value_pct} % — pas de chiffres
	en dur, tous viennent du payload du Fact.
	```

	Same in `en.yaml` with the English version.

	### 3. Implement the detector

	In an existing detector module (e.g.
	`picarones/reports/narrative/detectors/quality.py` for
	quality-related facts) or a new one if a new family is justified:

	```python
	@register_detector(
	FactType.YOUR_NEW_FACT,
	priority=85, # ordering in the synthesis
	importance=FactImportance.MEDIUM,
	)
	def detect_your_new_fact(benchmark_data: dict) -> list[Fact]:
	"""Decide whether to emit Facts based on benchmark_data.

	Read the keys you need from benchmark_data; never invent values.
	"""
	# ... your logic ...
	return [Fact(
	type=FactType.YOUR_NEW_FACT,
	importance=FactImportance.MEDIUM,
	payload={"engine": engine_name, "value_pct": round(value * 100, 2)},
	engines_involved=(engine_name,),
	)]
	```

	Rule: every value in `payload` MUST come from `benchmark_data`.
	Never compute a fancy derived metric here that isn't already in the
	input — the anti-hallucination test would catch it.

	### 4. Register the detector in the package `__init__`

	`picarones/reports/narrative/detectors/__init__.py`:

	```python
	from picarones.evaluation.metrics.narrative.detectors.quality import (
	# ...
	detect_your_new_fact,
	)
	```

	And add it to `__all__`.

	### 5. Update the arbiter ordering

	`picarones/reports/narrative/arbiter.py` — append your new
	type to `_FALLBACK_TYPE_ORDER` at the right position.

	### 6. Write tests

	In `tests/measurements/`:

	- A unit test of your detector (3+ canonical cases: triggers,
	doesn't trigger, edge case).
	- A traceability test (FR + EN): `build_synthesis(...)` produces
	output where every number is in the payload.

	Update `tests/integration/test_chantier5.py` and
	`tests/measurements/test_sprint29_detector_registry.py` to bump
	the detector count.

	## Editorial rules

	- Factual only: no recommendation, no value judgment. "Engine
	X has a CER of 5.2%" — yes. "Engine X is the best for archives" —
	no.
	- Symmetric thresholds: thresholds are public in the detector
	source code, not hidden. They apply equally to all engines.
	- Anti-contradiction: if your detector contradicts another
	(e.g., Wilcoxon-uncorrected gap vs Nemenyi-corrected tie), the
	arbiter handles it via the `_COMPLEMENTARY_PAIRS` mechanism — add
	your pair if needed.

	## Testing the synthesis

	```bash
	pytest tests/measurements/test_sprint19_narrative_engine.py
	pytest tests/measurements/test_sprint23_anti_hallucination.py
	```

	The anti-hallucination test parses the rendered synthesis and
	verifies that every number is traceable to a Fact payload. If it
	fails after your changes, you've likely cited a value not present
	in `benchmark_data`.