Spaces:
Sleeping
title: Picarones
emoji: 📜
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
Picarones
Heritage OCR / HTR / VLM and post-correction benchmarking platform
Banc d'essai d'OCR / HTR / VLM et de post-correction pour documents patrimoniaux
What is Picarones?
Picarones is an open-source benchmarking platform for OCR, HTR, VLM and post-correction pipelines on heritage documents (manuscripts, early printed books, archives).
The input is a folder of (image, ground truth) pairs — ground truth
in plain text, ALTO XML, or PAGE XML. Picarones runs the AIs you plug
in (OCR engines, VLMs, OCR+LLM pipelines, ALTO mappers, ensembles…) on
every page, compares each output to the ground truth at every relevant
level (text, ALTO, PAGE, entities, reading order), and produces a
self-contained HTML report with factual numbers, statistical tests
and a reproducibility snapshot.
Without ground truth, no benchmark — Picarones measures how well an AI matches a known reference, not how it transcribes an arbitrary document.
Version française ci-dessous.
Use case
A digital library plans to OCR a production corpus — say, several thousand 17th-century parish registers, 19th-century newspapers, or medieval glossed manuscripts. Several pipelines are on the table (alternative OCR, LLM correction, ALTO mappers, ensembles); the question is which one to deploy.
The candidates cannot be benchmarked on the production corpus itself (no ground truth). A small golden dataset matching the target profile is assembled; Picarones runs each candidate on it and reports CER, recovered fuzzy searchability, preserved numerical sequences, errors introduced by post-correctors, and statistical significance. The numbers inform the deployment decision.
En français
Picarones est une plateforme open-source de banc d'essai pour des IA d'OCR, HTR, VLM et des pipelines de post-correction sur documents patrimoniaux.
L'entrée est un dossier de paires (image, vérité terrain) — VT en
texte brut, ALTO XML ou PAGE XML. Picarones exécute les IA que vous
branchez sur chaque page, compare la sortie à la VT à tous les
niveaux pertinents et produit un rapport HTML autonome avec chiffres
factuels, tests statistiques et snapshot de reproductibilité. Sans
vérité terrain, pas de benchmark.
Features
Heritage-specific metrics
Three families of metrics calibrated for historical documents:
- Classical OCR/HTR — CER (raw, NFC, caseless, diplomatic), WER, MER, WIL via jiwer; 10-class error taxonomy; bootstrap 95% CIs; line-level Gini distribution.
- Philological — MUFI coverage, abbreviation expansion (Capelli), early-modern typography (long-s, ligatures, tilde nasals), modern archives markers, Roman numerals, Unicode block accuracy, NER precision (HIPE), reading-order F1 (ICDAR 2015), layout F1.
- Comparison & decision — Friedman + Nemenyi + Critical Difference Diagram (Demšar 2006); cross-engine taxonomic divergence + oracle complementarity; cost / speed / CO₂ Pareto front; multi-run stability (Cohen κ, Krippendorff α); longitudinal trend with change-point detection; controlled per-slot ANOVA-like comparison.
For the full list with definitions, see docs/views.md
and the contextual glossary embedded in every report (25 bilingual
entries).
OCR+LLM pipelines
Composable chains: tesseract -> gpt-4o, pero_ocr -> claude-sonnet,
zero-shot VLM, etc. Three pipeline modes: text-only post-correction,
image+text post-correction, and zero-shot. Over-normalisation
detection flags LLMs that silently modernise historical spellings.
A composed-pipeline benchmarking layer (Sprint 63+) runs N candidate
pipelines on the same corpus and ranks them by a chosen metric.
Corpus import
| Source | Method |
|---|---|
| Local folder | picarones run --corpus ./corpus/ |
| IIIF manifests (any institutional repository) | picarones import iiif <manifest-url> |
| Gallica API (BnF SRU + IIIF) | GallicaClient / picarones import iiif |
| HuggingFace Datasets | Web UI: POST /api/huggingface/import |
| HTR-United catalogue | Web UI: POST /api/htr-united/import |
| eScriptorium | EScriptoriumClient |
| ZIP upload (browser) | Web upload endpoint |
Supported corpus formats: plain text pairs, ALTO XML, PAGE XML.
Interactive HTML report
A single self-contained HTML file (or with --lazy-images for large
corpora). Five views:
- Ranking — sortable table of all engines and metrics.
- Gallery — color-coded CER badges per document.
- Document — synchronized N-way diff, triple diff for OCR+LLM.
- Analyses — distribution charts, Pareto, calibration, robustness projection, philological profile, longitudinal trends, levers.
- Characters — Unicode confusion matrix, ligature analysis.
Above the views: factual narrative synthesis (20+ deterministic detectors, every number traceable to the input — anti-hallucination proven by tests), Critical Difference Diagram, Pareto front. Side panels for contextual glossary and Advanced mode (visible columns, strata filters, opt-in personal composite score).
Web interface
FastAPI application with real-time SSE progress streaming, ZIP
upload from the browser, dynamic engine and normalization profile
selectors, browse and re-download generated reports, bilingual
French/English UI. Deployable on HuggingFace Spaces (Docker, port
7860) and on institutional infrastructure (see
docs/operations/deployment-institutional.md).
Longitudinal tracking & robustness
Optional SQLite database recording benchmark history across runs. CER evolution curves per engine, automatic regression detection between consecutive runs (Pettitt change-point analysis, Sprint 92). Robustness analysis measures engine resilience to noise, blur, rotation, resolution and binarization, projected on the real corpus quality profile (Sprint 81).
Quick start
# Install
pip install -e ".[dev,web]"
# Tesseract (system binary, required for the Tesseract engine)
sudo apt install tesseract-ocr tesseract-ocr-fra tesseract-ocr-lat # Debian/Ubuntu
brew install tesseract tesseract-lang # macOS
# Generate a demo report (no engine needed)
picarones demo --output demo_report.html
# Run a benchmark
picarones run --corpus ./corpus/ --engines tesseract --output results.json
picarones report --results results.json --output report.html
# Web UI
picarones serve --port 8080
For Docker, institutional deployment, or HuggingFace Spaces, see
INSTALL.md and
docs/operations/deployment-institutional.md.
Supported engines
| Engine | Type | Installation |
|---|---|---|
| Azure Doc Intelligence | Cloud API | AZURE_DOC_INTEL_ENDPOINT + AZURE_DOC_INTEL_KEY |
| Google Vision | Cloud API | GOOGLE_APPLICATION_CREDENTIALS env var |
| Mistral OCR | Cloud API | MISTRAL_API_KEY env var |
| Pero OCR | Local Python | pip install -e .[pero] |
| Tesseract 5 | Local CLI | pip install pytesseract + system binary |
LLM/VLM adapters (used through pipelines, not as standalone OCR
engines): GPT-4o, Claude, Mistral Large, Ollama (local). See
docs/cli-workflows.md.
The Engine table is regenerated automatically by
scripts/gen_readme_tables.py — adding a new adapter under
picarones/engines/ makes the next CI run update this table or
fail.
CLI commands
| Command | Description |
|---|---|
picarones compare |
Compare two benchmark JSON runs and flag regressions (Sprint 28) |
picarones demo |
Generate a demo report with synthetic data (no engine required) |
picarones diagnose |
Pre-wired workflow: bench + improvement levers + factual recommendations |
picarones economics |
Pre-wired workflow: bench + effective throughput + cost projection |
picarones edition |
Pre-wired workflow: bench + philological metrics for critical editing |
picarones engines |
List available OCR engines and LLM adapters |
picarones history |
Query longitudinal benchmark history (SQLite) |
picarones import |
Import a corpus from a remote source (IIIF, HF, HTR-United) |
picarones info |
Display version and system information |
picarones metrics |
Compute CER/WER between two text files |
picarones pipeline |
Run / compare composed pipelines from a YAML spec (Sprint 70) |
picarones report |
Generate an HTML report from JSON results |
picarones robustness |
Run robustness analysis with degraded images |
picarones run |
Run a full benchmark on a corpus |
picarones serve |
Launch the FastAPI web interface |
Each command supports --help for full options. See
docs/cli-workflows.md for end-to-end
examples.
Web API endpoints
The web app exposes a documented OpenAPI spec at /docs (Swagger UI)
when running. Summary:
| Method | Endpoint | Summary |
|---|---|---|
GET |
/ |
Index |
POST |
/api/benchmark/run |
Api Benchmark Run |
POST |
/api/benchmark/start |
Api Benchmark Start |
POST |
/api/benchmark/{job_id}/cancel |
Api Benchmark Cancel |
GET |
/api/benchmark/{job_id}/status |
Api Benchmark Status |
GET |
/api/benchmark/{job_id}/stream |
Api Benchmark Stream |
GET |
/api/benchmark/{job_id}/synthesis_preview |
Api Benchmark Synthesis Preview |
POST |
/api/config/load |
Api Config Load |
POST |
/api/config/save |
Api Config Save |
GET |
/api/corpus/browse |
Api Corpus Browse |
GET |
/api/corpus/image/{upload_id}/{filename} |
Api Corpus Image |
POST |
/api/corpus/upload |
Api Corpus Upload |
GET |
/api/corpus/uploads |
Api Corpus Uploads |
DELETE |
/api/corpus/uploads/{corpus_id} |
Api Corpus Delete |
GET |
/api/csrf/token |
Api Csrf Token |
GET |
/api/engines |
Api Engines |
GET |
/api/history/regressions |
Api History Regressions |
GET |
/api/htr-united/catalogue |
Api Htr United Catalogue |
POST |
/api/htr-united/import |
Api Htr United Import |
POST |
/api/huggingface/import |
Api Huggingface Import |
GET |
/api/huggingface/search |
Api Huggingface Search |
GET |
/api/lang |
Api Get Lang |
POST |
/api/lang/{lang_code} |
Api Set Lang |
GET |
/api/models/{provider} |
Api Models |
GET |
/api/normalization/profiles |
Api Normalization Profiles |
GET |
/api/reports |
Api Reports |
GET |
/api/status |
Api Status |
GET |
/health |
Health |
GET |
/reports/{filename} |
Serve Report |
The complete OpenAPI JSON is also exposed at /openapi.json for
client generation.
Normalization profiles
Picarones ships 11 built-in normalization profiles for historical
text comparison (defined in
picarones/measurements/normalization.py,
exposed via /api/normalization/profiles):
nfc, caseless, minimal, medieval_french,
early_modern_french, medieval_latin, medieval_english,
early_modern_english, secretary_hand, sans_ponctuation,
sans_apostrophes.
Custom profiles can be loaded from YAML files with user-defined
diplomatic tables and exclude_chars sets. See
docs/profiles.md.
A traceability table mapping each profile to its source standard (MUFI v4.0, TEI P5, DEAF) will ship in Sprint A12 (B-6).
Project structure
picarones/
├── core/ Cercle 1 — pure abstractions (7 modules)
├── measurements/ Cercle 2 — official metrics (~70 modules + narrative engine)
├── engines/ Cercle 2 — 5 OCR adapters
├── llm/ Cercle 2 — 4 LLM adapters
├── pipelines/ Cercle 2 — OCR+LLM pipelines
├── modules/ Cercle 2 — official BaseModule modules
├── extras/ Cercle 3 — plugins (importers, historical)
├── report/ Cercle 3 — HTML rendering
├── cli/ Cercle 3 — Click CLI (15 commands)
├── web/ Cercle 3 — FastAPI app + 11 routers
├── prompts/ 8 versioned prompt templates
└── data/ Indicative tables (pricing.yaml)
Strict 3-circle architecture: imports flow only from outer to inner.
Enforced by tests/core/test_circle_dependencies.py (Sprint A3).
See docs/architecture.md for the full
manifesto.
Environment variables
See .env.example for the complete list. Key
variables:
# Security & mode (cf. SECURITY.md)
PICARONES_PUBLIC_MODE= # 1/true/yes for HF Space (no cloud OCR)
PICARONES_CSRF_REQUIRED= # 1 for institutional deployment
PICARONES_BROWSE_ROOTS= # restrict browse to specific paths
# Cloud API keys (optional)
MISTRAL_API_KEY=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
GOOGLE_APPLICATION_CREDENTIALS=
AZURE_DOC_INTEL_ENDPOINT=
AZURE_DOC_INTEL_KEY=
# RGPD retention (Sprint A11)
PICARONES_UPLOAD_RETENTION_DAYS=7
For HuggingFace Spaces, set these in Settings → Variables and secrets.
CI/CD
GitHub Actions: .github/workflows/
ci.yml— tests on Python 3.11/3.12/3.13 × Linux/macOS/Windows, ruff, mypy strict on core/, security scanners (bandit + pip-audit- trivy), coverage gate
--cov-fail-under=85, pytest-timeout 300s.
- trivy), coverage gate
precommit.yml— replays pre-commit hooks (catches--no-verifybypass).release.yml— on tagv*.*.*: PyPI + ghcr.io multi-arch + GitHub Release with notes from CHANGELOG.perf_regression.yml— weekly cron + PR-triggered: CER anti-regression on a synthetic reference corpus.sync_to_huggingface.yml— auto-syncsmainto the HF Space.
Development
pip install -e ".[dev,web]"
pre-commit install
pytest tests/ -q
ruff check picarones/ tests/
python -m mypy picarones/core/
Test suite: ~3763 tests, ~3 min on a modern laptop. Coverage
floor at 85% (currently ~87%). The network marker excludes tests
requiring live HTTP.
For end-to-end developer guides, see
docs/developer/index.md (FR) /
docs/developer/index.en.md (EN).
Conventions
- Never
except Exception: pass— uselogger.warning("[module] degraded feature: %s", e). - One canonical home per module — circle dependency direction enforced by tests.
- Engines declare
execution_mode("io"or"cpu") so the runner picksThreadPoolExecutorvsProcessPoolExecutorappropriately. - Hardcoded UI strings forbidden — always go through i18n
(cf.
docs/developer/extending-i18n.md).
Roadmap
Detailed history and current direction live in:
CHANGELOG.md— Keep a Changelog format, one entry per sprint up to the latest release.docs/roadmap/evolution-2026.md— technical evolution roadmap (axes A and B for 2026+).docs/audits/— institutional readiness audit and remediation plan (sprints A1–A15).
The Phase 1 of the institutional readiness plan (sprints A1–A11) is complete as of May 2026: CI hardening, doc consistency gates, 3-circle refactor, web hardening, perf+concurrency tests, WCAG 2.1 AA accessibility, reproducibility ops (lock files, Docker pinning), PyPI/ghcr.io release pipeline, governance & COI policies, institutional deployment guide & RGPD documentation.
Remaining: scientific publication track (CITATION + JOSS, sprint A12), README/SPECS final polish (this sprint and A14), external audits (RGAA + security pentest, A15).
Documentation
| Audience | Entry point |
|---|---|
| End user | docs/user/reading-a-report.md (EN) |
| Developer | docs/developer/index.md (EN) |
| Operations / DSI | docs/operations/deployment-institutional.md, docs/operations/data-retention-rgpd.md, docs/operations/release-process.md |
| Architect | docs/architecture.md, docs/api-stable.md |
| Researcher | docs/case-studies/, docs/reproducibility-snapshots.md |
| Contributor | CONTRIBUTING.md, GOVERNANCE.md, CODE_OF_CONDUCT.md |
| Security | SECURITY.md |
| Accessibility | ACCESSIBILITY.md |
The complete functional specification is in
SPECS.md (full refresh planned in Sprint A14).
Citation
A CITATION.cff file and a Zenodo DOI will land in Sprint A12
(scientific publication track). Until then, cite the GitHub repo
with the commit SHA used in your benchmark — every Picarones report
embeds the commit and full snapshot for reproducibility (cf.
docs/reproducibility-snapshots.md).
Contributing
See CONTRIBUTING.md (FR) /
CONTRIBUTING.en.md (EN).
Code of conduct: CODE_OF_CONDUCT.md
(Contributor Covenant 2.1).
Governance & maintainership: GOVERNANCE.md.
License
Copyright 2024–2026 Picarones contributors.