File size: 20,323 Bytes
ad8d926
 
 
 
 
 
e407ec0
ad8d926
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
"""Enregistrement des hooks de mΓ©triques natifs de Picarones.

Chantier 2 du plan d'Γ©volution post-Sprint 97.

Ce module **migre** les 12 hooks document-level et 12 agrΓ©gateurs
corpus-level qui Γ©taient codΓ©s en dur dans
``picarones.app.services.benchmark_runner._compute_document_result`` et autour de la
boucle d'agrΓ©gation (lignes 794-827 du runner prΓ©-chantier-2).

Approche additive β€” rΓ©trocompat stricte
---------------------------------------
Tous les hooks sont enregistrΓ©s sur les profils ``standard``,
``philological``, ``diagnostics`` et ``full`` (i.e. activΓ©s par
défaut quand le runner est appelé sans paramètre ``profile``). Le
profil ``minimal`` n'active aucun hook (pour bench massif oΓΉ seul
CER/WER comptent). Les profils ``economics`` et ``pipeline`` sont
rΓ©servΓ©s pour des hooks futurs.

L'import de ce module **suffit** Γ  peupler les registres :
:mod:`picarones.evaluation.metric_hooks` se contente d'exposer les
dΓ©corateurs ; le runner ne dΓ©pend que d'une seule fonction β€”
``select_document_hooks(profile)`` β€” pour dΓ©couvrir les hooks actifs.

Liste complète des hooks (Sprint d'origine)
-------------------------------------------
**Document-level** (12) :

- ``confusion``           (Sprint 5)  β€” ``confusion_matrix``
- ``char_scores``         (Sprint 5)  β€” ``char_scores``
- ``taxonomy``            (Sprint 5)  β€” ``taxonomy``
- ``structure``           (Sprint 5)  β€” ``structure``
- ``image_quality``       (Sprint 5)  β€” ``image_quality``
- ``line_metrics``        (Sprint 10) β€” ``line_metrics``
- ``hallucination``       (Sprint 10) β€” ``hallucination_metrics``
- ``calibration``         (Sprint 42) β€” ``calibration_metrics``
- ``philological``        (Sprint 61) β€” ``philological_metrics``
- ``searchability``       (Sprint 86) β€” ``searchability_metrics``
- ``numerical_sequences`` (Sprint 86) β€” ``numerical_sequence_metrics``
- ``readability``         (Sprint 87) β€” ``readability_metrics``

**Corpus-level** (12) : un agrΓ©gateur par hook documentaire,
remplissant le champ ``aggregated_*`` correspondant du
``EngineReport``.

Le hook ``ner`` (Sprint 40) reste hors de ce mΓ©canisme : il dΓ©pend
d'un ``EntityExtractor`` injectΓ© Γ  la main par l'utilisateur, ce
qui n'entre pas dans la sΓ©mantique des profils.
"""

from __future__ import annotations

import logging
from collections import Counter
from typing import Optional

from picarones.evaluation.metric_hooks import (
    PROFILE_DIAGNOSTICS,
    PROFILE_FULL,
    PROFILE_PHILOLOGICAL,
    PROFILE_STANDARD,
    register_corpus_aggregator,
    register_document_metric,
)

logger = logging.getLogger(__name__)


# Profils dans lesquels les 12 hooks "standard" s'activent. Γ‰galent
# par construction le comportement runner prΓ©-chantier-2 ; le profil
# ``minimal`` est volontairement absent.
_STANDARD_PROFILES = (
    PROFILE_STANDARD,
    PROFILE_PHILOLOGICAL,
    PROFILE_DIAGNOSTICS,
    PROFILE_FULL,
)


# ──────────────────────────────────────────────────────────────────────────
# Helper de calibration (dΓ©placΓ© depuis runner.py β€” chantier 2)
# ──────────────────────────────────────────────────────────────────────────


def calibration_from_engine_result(
    ground_truth: str,
    token_confidences: list,
) -> Optional[dict]:
    """Aligne les ``token_confidences`` du moteur sur la GT (bag-of-words)
    pour produire les listes parallèles ``confidences`` / ``is_correct``,
    puis appelle ``compute_calibration_metrics`` (Sprint 39).

    Convention d'alignement (proxy bag-of-words avec multiplicitΓ©, comme
    ``oracle_token_recall`` du Sprint 35) : un token de l'hypothèse est
    "correct" si la GT contient encore une occurrence de ce token.

    Les confidences ``> 1.0`` sont supposΓ©es en pourcentage et
    normalisΓ©es Γ  ``[0, 1]``. Les confidences nΓ©gatives (Tesseract met
    -1 pour les non-mots) sont ignorΓ©es.
    """
    from picarones.evaluation.metrics.calibration import compute_calibration_metrics

    if not token_confidences:
        return None

    gt_counter = Counter((ground_truth or "").split())
    confidences: list[float] = []
    is_correct: list[int] = []

    for tc in token_confidences:
        if not isinstance(tc, dict):
            continue
        token = str(tc.get("token", ""))
        if not token:
            continue
        try:
            conf = float(tc.get("confidence"))
        except (TypeError, ValueError):
            continue
        if conf < 0:
            continue
        if conf > 1.0:
            conf = conf / 100.0
        if not 0.0 <= conf <= 1.0:
            continue
        if gt_counter[token] > 0:
            is_correct.append(1)
            gt_counter[token] -= 1
        else:
            is_correct.append(0)
        confidences.append(conf)

    if not confidences:
        return None
    return compute_calibration_metrics(confidences, is_correct)


# ──────────────────────────────────────────────────────────────────────────
# Document-level hooks (12)
# ──────────────────────────────────────────────────────────────────────────


@register_document_metric(
    name="confusion",
    attribute="confusion_matrix",
    profiles=_STANDARD_PROFILES,
    requires_success=True,
)
def _confusion_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.confusion import build_confusion_matrix
    return build_confusion_matrix(ground_truth, hypothesis).as_dict()


@register_document_metric(
    name="char_scores",
    attribute="char_scores",
    profiles=_STANDARD_PROFILES,
    requires_success=True,
)
def _char_scores_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.char_scores import (
        compute_diacritic_score,
        compute_ligature_score,
    )
    lig = compute_ligature_score(ground_truth, hypothesis)
    diac = compute_diacritic_score(ground_truth, hypothesis)
    return {"ligature": lig.as_dict(), "diacritic": diac.as_dict()}


@register_document_metric(
    name="taxonomy",
    attribute="taxonomy",
    profiles=_STANDARD_PROFILES,
    requires_success=True,
)
def _taxonomy_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.taxonomy import classify_errors
    return classify_errors(ground_truth, hypothesis).as_dict()


@register_document_metric(
    name="structure",
    attribute="structure",
    profiles=_STANDARD_PROFILES,
    requires_success=True,
)
def _structure_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.structure import analyze_structure
    return analyze_structure(ground_truth, hypothesis).as_dict()


@register_document_metric(
    name="line_metrics",
    attribute="line_metrics",
    profiles=_STANDARD_PROFILES,
    requires_success=True,
)
def _line_metrics_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.line_metrics import compute_line_metrics
    return compute_line_metrics(ground_truth, hypothesis).as_dict()


@register_document_metric(
    name="hallucination",
    attribute="hallucination_metrics",
    profiles=_STANDARD_PROFILES,
    requires_success=True,
)
def _hallucination_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.hallucination import compute_hallucination_metrics
    return compute_hallucination_metrics(ground_truth, hypothesis).as_dict()


@register_document_metric(
    name="calibration",
    attribute="calibration_metrics",
    profiles=_STANDARD_PROFILES,
    requires_token_confidences=True,
)
def _calibration_hook(*, ground_truth, ocr_result, **_):
    return calibration_from_engine_result(
        ground_truth, ocr_result.token_confidences,
    )


@register_document_metric(
    name="image_quality",
    attribute="image_quality",
    profiles=_STANDARD_PROFILES,
    # Pas de requires_success : on analyse l'image quel que soit le
    # rΓ©sultat OCR (pour comparer un Γ©chec OCR Γ  la qualitΓ© image).
)
def _image_quality_hook(*, image_path, **_):
    from picarones.evaluation.metrics.image_quality import analyze_image_quality
    iq = analyze_image_quality(image_path)
    if iq.error is not None:
        return None
    return iq.as_dict()


@register_document_metric(
    name="philological",
    attribute="philological_metrics",
    profiles=_STANDARD_PROFILES,
    # Pas de requires_success : le runner prΓ©-chantier-2 calculait
    # mΓͺme sur Γ©chec OCR (avec hyp=""). Les modules philologiques
    # retournent ``None`` quand la GT n'a pas de signal exploitable
    # β€” comportement adaptive intact.
)
def _philological_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.philological_hooks import compute_philological_metrics
    return compute_philological_metrics(ground_truth, hypothesis)


@register_document_metric(
    name="searchability",
    attribute="searchability_metrics",
    profiles=_STANDARD_PROFILES,
)
def _searchability_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.searchability_hooks import compute_searchability_metrics
    return compute_searchability_metrics(ground_truth, hypothesis)


@register_document_metric(
    name="numerical_sequences",
    attribute="numerical_sequence_metrics",
    profiles=_STANDARD_PROFILES,
)
def _numerical_sequences_hook(*, ground_truth, hypothesis, **_):
    from picarones.evaluation.metrics.numerical_sequences_hooks import (
        compute_numerical_sequence_metrics_adaptive,
    )
    return compute_numerical_sequence_metrics_adaptive(ground_truth, hypothesis)


@register_document_metric(
    name="readability",
    attribute="readability_metrics",
    profiles=_STANDARD_PROFILES,
)
def _readability_hook(*, ground_truth, hypothesis, corpus_lang, **_):
    from picarones.evaluation.metrics.readability_hooks import compute_readability_metrics
    return compute_readability_metrics(ground_truth, hypothesis, lang=corpus_lang)


# ──────────────────────────────────────────────────────────────────────────
# Corpus-level aggregators (12)
# ──────────────────────────────────────────────────────────────────────────


@register_corpus_aggregator(
    name="confusion",
    attribute="aggregated_confusion",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_confusion(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.confusion import (
        ConfusionMatrix, aggregate_confusion_matrices,
    )
    try:
        matrices = [
            ConfusionMatrix(**dr.confusion_matrix)
            for dr in doc_results
            if dr.confusion_matrix is not None
        ]
        if not matrices:
            return None
        return aggregate_confusion_matrices(matrices).as_compact_dict(min_count=2)
    except Exception as exc:  # noqa: BLE001
        logger.warning(
            "[runner] aggregate_confusion : agrΓ©gation indisponible (%s) β€” "
            "matrice de confusion absente du rapport pour ce moteur",
            exc,
        )
        return None


@register_corpus_aggregator(
    name="char_scores",
    attribute="aggregated_char_scores",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_char_scores(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.char_scores import (
        DiacriticScore,
        LigatureScore,
        aggregate_diacritic_scores,
        aggregate_ligature_scores,
    )
    lig_scores = [
        LigatureScore(**dr.char_scores["ligature"])
        for dr in doc_results
        if dr.char_scores is not None
    ]
    diac_scores = [
        DiacriticScore(**dr.char_scores["diacritic"])
        for dr in doc_results
        if dr.char_scores is not None
    ]
    if not lig_scores:
        return None
    return {
        "ligature": aggregate_ligature_scores(lig_scores),
        "diacritic": aggregate_diacritic_scores(diac_scores),
    }


@register_corpus_aggregator(
    name="taxonomy",
    attribute="aggregated_taxonomy",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_taxonomy(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.taxonomy import TaxonomyResult, aggregate_taxonomy
    results = [
        TaxonomyResult.from_dict(dr.taxonomy)
        for dr in doc_results
        if dr.taxonomy is not None
    ]
    if not results:
        return None
    return aggregate_taxonomy(results)


@register_corpus_aggregator(
    name="structure",
    attribute="aggregated_structure",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_structure(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.structure import StructureResult, aggregate_structure
    results = [
        StructureResult.from_dict(dr.structure)
        for dr in doc_results
        if dr.structure is not None
    ]
    if not results:
        return None
    return aggregate_structure(results)


@register_corpus_aggregator(
    name="image_quality",
    attribute="aggregated_image_quality",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_image_quality(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.image_quality import (
        ImageQualityResult, aggregate_image_quality,
    )
    results = [
        ImageQualityResult.from_dict(dr.image_quality)
        for dr in doc_results
        if dr.image_quality is not None
    ]
    if not results:
        return None
    return aggregate_image_quality(results)


@register_corpus_aggregator(
    name="line_metrics",
    attribute="aggregated_line_metrics",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_line_metrics(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.line_metrics import (
        LineMetrics, aggregate_line_metrics,
    )
    results = [
        LineMetrics.from_dict(dr.line_metrics)
        for dr in doc_results
        if dr.line_metrics is not None
    ]
    if not results:
        return None
    return aggregate_line_metrics(results)


@register_corpus_aggregator(
    name="hallucination",
    attribute="aggregated_hallucination",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_hallucination(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.hallucination import (
        HallucinationMetrics, aggregate_hallucination_metrics,
    )
    results = [
        HallucinationMetrics.from_dict(dr.hallucination_metrics)
        for dr in doc_results
        if dr.hallucination_metrics is not None
    ]
    if not results:
        return None
    return aggregate_hallucination_metrics(results)


@register_corpus_aggregator(
    name="calibration",
    attribute="aggregated_calibration",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_calibration(doc_results: list) -> Optional[dict]:
    """Agrège la calibration micro sur tous les docs.

    Recalcule ECE/MCE Γ  partir de la **somme des bins** de chaque
    document : pour chaque bin, on additionne ``count``, on agrège la
    confiance moyenne pondérée par count, et on agrège l'accuracy
    pondΓ©rΓ©e par count. L'ECE micro est ensuite la moyenne pondΓ©rΓ©e
    par bin de ``|conf - acc|``.

    Comportement dΓ©placΓ© verbatim depuis ``runner._aggregate_calibration``
    (chantier 2 β€” rΓ©trocompat octet par octet du sΓ©rialisΓ©).
    """
    relevant = [
        dr for dr in doc_results
        if dr.calibration_metrics is not None
        and (dr.calibration_metrics.get("bins") or [])
    ]
    if not relevant:
        return None

    n_bins = relevant[0].calibration_metrics.get("n_bins", 10)
    sum_conf: list[float] = [0.0] * n_bins
    sum_acc: list[float] = [0.0] * n_bins
    counts: list[int] = [0] * n_bins
    bin_lows: list[float] = [
        b["bin_low"] for b in relevant[0].calibration_metrics["bins"]
    ]
    bin_highs: list[float] = [
        b["bin_high"] for b in relevant[0].calibration_metrics["bins"]
    ]

    for dr in relevant:
        m = dr.calibration_metrics
        if m.get("n_bins") != n_bins:
            logger.warning(
                "[aggregate_calibration] %s : n_bins=%s β‰  %s β€” ignorΓ©",
                dr.doc_id, m.get("n_bins"), n_bins,
            )
            continue
        for k, b in enumerate(m["bins"]):
            n = int(b.get("count") or 0)
            if n == 0:
                continue
            counts[k] += n
            sum_conf[k] += float(b.get("avg_confidence") or 0.0) * n
            sum_acc[k] += float(b.get("accuracy") or 0.0) * n

    total = sum(counts)
    if total == 0:
        return None

    bins: list[dict] = []
    ece = 0.0
    mce = 0.0
    for k in range(n_bins):
        n = counts[k]
        if n == 0:
            bins.append({
                "bin_low": bin_lows[k] if k < len(bin_lows) else k / n_bins,
                "bin_high": bin_highs[k] if k < len(bin_highs) else (k + 1) / n_bins,
                "avg_confidence": None,
                "accuracy": None,
                "count": 0,
                "gap": None,
            })
            continue
        avg_conf = sum_conf[k] / n
        accuracy = sum_acc[k] / n
        gap = abs(avg_conf - accuracy)
        bins.append({
            "bin_low": bin_lows[k] if k < len(bin_lows) else k / n_bins,
            "bin_high": bin_highs[k] if k < len(bin_highs) else (k + 1) / n_bins,
            "avg_confidence": avg_conf,
            "accuracy": accuracy,
            "count": n,
            "gap": gap,
        })
        ece += (n / total) * gap
        if gap > mce:
            mce = gap

    overall_acc = sum(sum_acc) / total
    overall_conf = sum(sum_conf) / total

    return {
        "ece": ece,
        "mce": mce,
        "n_bins": n_bins,
        "n_predictions": total,
        "overall_accuracy": overall_acc,
        "overall_confidence": overall_conf,
        "bins": bins,
        "doc_count": len(relevant),
    }


@register_corpus_aggregator(
    name="philological",
    attribute="aggregated_philological",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_philological(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.philological_hooks import aggregate_philological_metrics
    return aggregate_philological_metrics(
        [dr.philological_metrics for dr in doc_results],
    )


@register_corpus_aggregator(
    name="searchability",
    attribute="aggregated_searchability",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_searchability(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.searchability_hooks import aggregate_searchability_metrics
    return aggregate_searchability_metrics(
        [dr.searchability_metrics for dr in doc_results],
    )


@register_corpus_aggregator(
    name="numerical_sequences",
    attribute="aggregated_numerical_sequences",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_numerical_sequences(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.numerical_sequences_hooks import (
        aggregate_numerical_sequence_metrics,
    )
    return aggregate_numerical_sequence_metrics(
        [dr.numerical_sequence_metrics for dr in doc_results],
    )


@register_corpus_aggregator(
    name="readability",
    attribute="aggregated_readability",
    profiles=_STANDARD_PROFILES,
)
def _aggregate_readability(doc_results: list) -> Optional[dict]:
    from picarones.evaluation.metrics.readability_hooks import aggregate_readability_metrics
    return aggregate_readability_metrics(
        [dr.readability_metrics for dr in doc_results],
    )


__all__ = ["calibration_from_engine_result"]