Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Running

Claude commited on 16 days ago

Commit

f4efc9d

unverified ·

1 Parent(s): 9127230

feat(adapters): atomic_write — élimine les fichiers partiels sur kill

Phase 9 du chantier ADR-0001 — garantit l'atomicité des écritures
disque dans les adapters OCR/LLM/VLM. Aujourd'hui un adapter
killed pendant ``path.write_text(content)`` laisse un fichier
partiel/corrompu (les N premiers octets écrits, le reste manquant).
Pour un benchmark institutionnel qui lit ces fichiers en aval
(validation, métriques, rapport), un état incohérent est
inacceptable.

## Nouveau helper ``adapters/_atomic_io.py``

- ``atomic_write_text(path, content, encoding="utf-8")`` :
pattern "write-to-tmp + fsync + rename" qui garantit que
``path`` existe avec le contenu COMPLET, ou pas du tout.
- ``atomic_write_bytes(path, content)`` : variante bytes.
- Tmp nommé ``<path>.<pid>.<tid>.tmp`` (collision-safe entre
workers).
- ``os.fsync`` force le flush OS vers disque avant rename.
- ``os.replace`` (atomique POSIX + Windows depuis Python 3.3)
garantit le swap final.
- Cleanup best-effort du tmp en cas d'échec ; le ``path``
original est préservé.

## Migration mécanique

Script Python a migré 11 sites ``write_text`` dans les adapters
contexte-pipeline (où le kill cross-thread est réel) :

- 8 OCR (tesseract×2, pero, kraken, calamari, mistral, google,
azure, precomputed)
- 1 LLM (base — site partagé par les 4 adapters concrets)
- 1 VLM (base — site partagé par les 4 adapters concrets)

Le linter a appliqué les modifs ; un import multi-ligne brisé
sur ``llm/base.py`` détecté par l'audit (le script avait inséré
l'import au milieu d'un ``from ... import (\n...\n)``) puis
corrigé manuellement.

## Hors scope

Les **importers corpus** (IIIF, Gallica, HTR-United, eScript,
HuggingFace — 6 sites ``write_text``/``write_bytes``) restent
inchangés. Ils s'exécutent en mode one-shot hors du contexte
pipeline killable (un caller qui fait ``picarones import gallica``
ne risque pas de cancel cross-thread). Phase 9b si un besoin
concret apparaît.

L'``ArtifactStore`` utilise déjà un pattern tmp manuel (non
migré pour ne pas dupliquer).

## Validation

- 14 tests atomic_io (basic write, overwrite, unicode, empty,
error handling avec disque plein simulé, atomicité — path
n'existe pas pendant l'écriture, tmp inclut PID+TID,
concurrent writes don't corrupt)
- 564 tests adapters tous verts (non-régression — la migration
est transparente côté contrat)
- **6151 tests au total**, 0 régression (vs 6137 pré-9)
- ``ruff`` propre, architecture 184 verts, sprint narrative
stable à 477

## Reste pour Phase 9b (futur)

Resource reclamation enrichi :
- Tracking des artefacts écrits par tâche (via ``ArtifactStore``
qui timestamp chaque write).
- Cleanup des artefacts d'un zombie qui complète tardivement
(déjà flagé ``DEADLINE_EXCEEDED_ZOMBIE`` côté outcome — l'info
est là, manque l'action).
- Cleanup des scratch dirs subprocess sur kill.
- ``SDK.cancel`` server-side quand disponible (OpenAI, Anthropic)
pour économiser tokens facturés.

https://claude.ai/code/session_01B93huMjNh4CG2rNcexgDeL

Files changed (12) hide show

picarones/adapters/_atomic_io.py +151 -0
picarones/adapters/llm/base.py +2 -1
picarones/adapters/ocr/azure_doc_intel.py +2 -1
picarones/adapters/ocr/calamari.py +2 -1
picarones/adapters/ocr/google_vision.py +2 -1
picarones/adapters/ocr/kraken.py +2 -1
picarones/adapters/ocr/mistral_ocr.py +2 -1
picarones/adapters/ocr/pero_ocr.py +2 -1
picarones/adapters/ocr/precomputed.py +2 -1
picarones/adapters/ocr/tesseract.py +3 -2
picarones/adapters/vlm/base.py +2 -1
tests/adapters/test_atomic_io.py +245 -0

picarones/adapters/_atomic_io.py ADDED Viewed

	@@ -0,0 +1,151 @@

+"""Helpers d'écriture atomique pour les adapters.
+Problème adressé
+----------------
+Le pattern naïf ``path.write_text(content)`` n'est pas atomique :
+si le worker est tué pendant l'écriture (cancel cross-thread,
+SIGKILL d'un subprocess, crash OS, perte de courant), le fichier
+résultant peut être :
+- **partiel** (les premiers N octets écrits, le reste manquant) ;
+- **vide** (le fichier a été tronqué à 0 par open(..., "w") mais
+  l'écriture n'a pas eu lieu) ;
+- **mélangé** avec un contenu antérieur (cas rare où l'OS n'a pas
+  flushé l'ancien contenu).
+Un consommateur qui lit ce fichier (validation, métrique,
+rapport) voit un état incohérent — pour un benchmark
+institutionnel, c'est inacceptable.
+Solution
+--------
+Pattern "write to tmp + rename" :
+1. Écrire le contenu complet dans un fichier temporaire à côté
+   du chemin cible (``<path>.<pid>.<tid>.tmp``).
+2. ``fsync`` pour forcer le flush sur disque.
+3. ``rename`` atomique vers le chemin cible.
+Garanties :
+- **Tout ou rien** : soit ``path`` existe avec le contenu complet,
+  soit ``path`` n'existe pas (et un éventuel ``.tmp`` orphelin
+  peut être nettoyé au prochain run).
+- Sur POSIX, ``os.replace()`` est atomique (garanti par
+  ``rename(2)``).  Sur Windows, ``os.replace()`` est aussi
+  atomique depuis Python 3.3.
+- Si le rename échoue, le ``.tmp`` est best-effort supprimé pour
+  ne pas laisser d'orphelin.
+Cas non couverts
+----------------
+- Le système de fichiers ne survit pas à une perte de courant en
+  cours de ``fsync`` (rare, dépend du FS).  Hors scope — c'est
+  une garantie OS-level.
+- Plusieurs processes/threads écrivent simultanément le même
+  ``path`` : la dernière écriture gagne (sémantique POSIX
+  normale).  Le caller doit éviter ce cas via son orchestration.
+"""
+from __future__ import annotations
+import logging
+import os
+import threading
+from pathlib import Path
+from typing import Union
+logger = logging.getLogger(__name__)
+def _tmp_path_for(path: Path) -> Path:
+    """Construit un chemin temporaire à côté de ``path``.
+    Inclut PID + thread ID pour éviter les collisions si plusieurs
+    workers écrivent dans le même répertoire (cas hypothétique
+    mais protègeons-nous).
+    """
+    suffix = f".{os.getpid()}.{threading.get_ident()}.tmp"
+    return path.parent / (path.name + suffix)
+def atomic_write_text(
+    path: Union[str, Path],
+    content: str,
+    *,
+    encoding: str = "utf-8",
+) -> None:
+    """Écrit ``content`` dans ``path`` de façon atomique.
+    Pattern : write-to-tmp + fsync + rename.
+    Si l'écriture du tmp échoue (disque plein, permission, etc.),
+    une exception est propagée et le tmp est best-effort supprimé.
+    Le ``path`` original (s'il existait) reste inchangé.
+    Si le rename final échoue (improbable en pratique sauf
+    permission denied sur le répertoire), même comportement : tmp
+    supprimé, ``path`` inchangé.
+    """
+    path = Path(path)
+    tmp_path = _tmp_path_for(path)
+    try:
+        # Le ``with`` garantit le close avant rename — important
+        # sur Windows où on ne peut pas rename un fichier ouvert.
+        with open(tmp_path, "w", encoding=encoding, newline="") as f:
+            f.write(content)
+            f.flush()
+            # ``os.fsync`` force le flush du buffer OS vers le disque.
+            # Sans ça, un crash matériel entre flush et rename peut
+            # laisser un fichier tmp vide.  Coût : ~quelques ms par
+            # write — négligeable face aux benchmarks de plusieurs
+            # secondes par doc.
+            os.fsync(f.fileno())
+        # ``os.replace`` (et non ``rename``) parce qu'il écrase
+        # l'éventuel ``path`` existant atomiquement, et fonctionne
+        # cross-OS (POSIX + Windows depuis Python 3.3).
+        os.replace(tmp_path, path)
+    except Exception:
+        # Cleanup best-effort du tmp en cas d'échec.
+        try:
+            if tmp_path.exists():
+                tmp_path.unlink()
+        except OSError as cleanup_exc:
+            logger.warning(
+                "[atomic_io] échec cleanup du tmp %s : %s",
+                tmp_path, cleanup_exc,
+            )
+        raise
+def atomic_write_bytes(
+    path: Union[str, Path],
+    content: bytes,
+) -> None:
+    """Variante bytes de :func:`atomic_write_text`.
+    Même garantie : ``path`` existe avec le contenu complet, ou
+    pas du tout.  Utile pour les artefacts non-textuels (images
+    intermédiaires, blobs JSON binaires).
+    """
+    path = Path(path)
+    tmp_path = _tmp_path_for(path)
+    try:
+        with open(tmp_path, "wb") as f:
+            f.write(content)
+            f.flush()
+            os.fsync(f.fileno())
+        os.replace(tmp_path, path)
+    except Exception:
+        try:
+            if tmp_path.exists():
+                tmp_path.unlink()
+        except OSError as cleanup_exc:
+            logger.warning(
+                "[atomic_io] échec cleanup du tmp %s : %s",
+                tmp_path, cleanup_exc,
+            )
+        raise
+__all__ = ["atomic_write_text", "atomic_write_bytes"]

picarones/adapters/llm/base.py CHANGED Viewed

@@ -11,6 +11,7 @@ from typing import Any, Optional
 logger = logging.getLogger(__name__)
 from picarones.adapters._retry import (
     DEFAULT_BACKOFF_BASE as _DEFAULT_BACKOFF_BASE,
 )
@@ -561,7 +562,7 @@ class BaseLLMAdapter(ABC):
             suffix="corrected.txt",
             context=context,
         )
-        out_path.write_text(result.text, encoding="utf-8")
         return {
             ArtifactType.CORRECTED_TEXT: Artifact(

 logger = logging.getLogger(__name__)
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters._retry import (
     DEFAULT_BACKOFF_BASE as _DEFAULT_BACKOFF_BASE,
 )
             suffix="corrected.txt",
             context=context,
         )
+        atomic_write_text(out_path, result.text, encoding="utf-8")
         return {
             ArtifactType.CORRECTED_TEXT: Artifact(

picarones/adapters/ocr/azure_doc_intel.py CHANGED Viewed

@@ -65,6 +65,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters._retry import call_with_retry
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
@@ -239,7 +240,7 @@ class AzureDocIntelAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters._retry import call_with_retry
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

picarones/adapters/ocr/calamari.py CHANGED Viewed

@@ -54,6 +54,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
@@ -235,7 +236,7 @@ class CalamariAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

picarones/adapters/ocr/google_vision.py CHANGED Viewed

@@ -48,6 +48,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters._retry import call_with_retry
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 logger = logging.getLogger(__name__)
@@ -207,7 +208,7 @@ class GoogleVisionAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters._retry import call_with_retry
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 logger = logging.getLogger(__name__)
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

picarones/adapters/ocr/kraken.py CHANGED Viewed

@@ -54,6 +54,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
@@ -222,7 +223,7 @@ class KrakenAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

picarones/adapters/ocr/mistral_ocr.py CHANGED Viewed

@@ -58,6 +58,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters._retry import call_with_retry
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
@@ -245,7 +246,7 @@ class MistralOCRAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters._retry import call_with_retry
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

picarones/adapters/ocr/pero_ocr.py CHANGED Viewed

@@ -47,6 +47,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
@@ -210,7 +211,7 @@ class PeroOCRAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

picarones/adapters/ocr/precomputed.py CHANGED Viewed

@@ -97,6 +97,7 @@ from pathlib import Path
 from typing import Any, Literal
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
@@ -188,7 +189,7 @@ class PrecomputedTextAdapter(BaseOCRAdapter):
                 # On crée le fichier vide pour rester cohérent : tout
                 # ``Artifact`` produit a une URI vers un fichier
                 # lisible.
-                text_path.write_text("", encoding="utf-8")
             else:
                 raise OCRAdapterError(
                     f"{self.name} : fichier pré-calculé introuvable "

 from typing import Any, Literal
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
                 # On crée le fichier vide pour rester cohérent : tout
                 # ``Artifact`` produit a une URI vers un fichier
                 # lisible.
+                atomic_write_text(text_path, "", encoding="utf-8")
             else:
                 raise OCRAdapterError(
                     f"{self.name} : fichier pré-calculé introuvable "

picarones/adapters/ocr/tesseract.py CHANGED Viewed

@@ -61,6 +61,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
@@ -328,7 +329,7 @@ class TesseractAdapter(BaseOCRAdapter):
             suffix="txt",
             context=context,
         )
-        text_path.write_text(text, encoding="utf-8")
         outputs: dict = {
             ArtifactType.RAW_TEXT: Artifact(
@@ -528,7 +529,7 @@ class TesseractAdapter(BaseOCRAdapter):
         # ``write_confidences_sidecar`` : ``<stem>.<name>.alto.xml``).
         alto_path = text_path.with_suffix(".alto.xml")
         try:
-            alto_path.write_text(alto_xml, encoding="utf-8")
         except OSError as exc:
             logger.warning(
                 "[%s] ALTO non persisté (%s) — ALTO sauté.",

 from typing import Any
 from picarones.adapters.ocr.base import BaseOCRAdapter, OCRAdapterError
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.adapters.output_paths import resolve_output_path
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.pipeline.run_control import RunControl
             suffix="txt",
             context=context,
         )
+        atomic_write_text(text_path, text, encoding="utf-8")
         outputs: dict = {
             ArtifactType.RAW_TEXT: Artifact(
         # ``write_confidences_sidecar`` : ``<stem>.<name>.alto.xml``).
         alto_path = text_path.with_suffix(".alto.xml")
         try:
+            atomic_write_text(alto_path, alto_xml, encoding="utf-8")
         except OSError as exc:
             logger.warning(
                 "[%s] ALTO non persisté (%s) — ALTO sauté.",

picarones/adapters/vlm/base.py CHANGED Viewed

@@ -33,6 +33,7 @@ from pathlib import Path
 from typing import Any
 from picarones.adapters.llm.base import BaseLLMAdapter
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.domain.errors import AdapterStepError
 from picarones.pipeline.run_control import RunControl
@@ -228,7 +229,7 @@ class BaseVLMAdapter(BaseLLMAdapter):
             suffix="txt",
             context=context,
         )
-        out_path.write_text(result.text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

 from typing import Any
 from picarones.adapters.llm.base import BaseLLMAdapter
+from picarones.adapters._atomic_io import atomic_write_text
 from picarones.domain.artifacts import Artifact, ArtifactType
 from picarones.domain.errors import AdapterStepError
 from picarones.pipeline.run_control import RunControl
             suffix="txt",
             context=context,
         )
+        atomic_write_text(out_path, result.text, encoding="utf-8")
         return {
             ArtifactType.RAW_TEXT: Artifact(

tests/adapters/test_atomic_io.py ADDED Viewed

	@@ -0,0 +1,245 @@

+"""Tests de ``picarones.adapters._atomic_io`` (Phase 9 ADR-0001).
+Garantit que ``atomic_write_text`` / ``atomic_write_bytes`` :
+- Écrivent le contenu complet.
+- Survivent à un kill mid-write sans laisser de fichier partiel.
+- Cleanup le tmp en cas d'erreur (disque plein simulé).
+- Sont compatibles avec un ``path`` existant (rename remplace).
+- Fonctionnent sur unicode + bytes.
+"""
+from __future__ import annotations
+from pathlib import Path
+from unittest.mock import patch
+import pytest
+from picarones.adapters._atomic_io import (
+    atomic_write_bytes,
+    atomic_write_text,
+)
+class TestBasicWrite:
+    def test_write_text_creates_file(self, tmp_path: Path) -> None:
+        target = tmp_path / "out.txt"
+        atomic_write_text(target, "hello world")
+        assert target.read_text(encoding="utf-8") == "hello world"
+    def test_write_bytes_creates_file(self, tmp_path: Path) -> None:
+        target = tmp_path / "out.bin"
+        atomic_write_bytes(target, b"\x00\x01\x02\xff")
+        assert target.read_bytes() == b"\x00\x01\x02\xff"
+    def test_write_text_unicode(self, tmp_path: Path) -> None:
+        target = tmp_path / "unicode.txt"
+        content = "café — médiéval (œuvre du XIVᵉ siècle) — ⚜"
+        atomic_write_text(target, content)
+        assert target.read_text(encoding="utf-8") == content
+    def test_write_text_empty(self, tmp_path: Path) -> None:
+        """Un fichier vide est un contenu valide."""
+        target = tmp_path / "empty.txt"
+        atomic_write_text(target, "")
+        assert target.exists()
+        assert target.read_text() == ""
+    def test_write_text_accepts_str_path(self, tmp_path: Path) -> None:
+        """``path`` peut être un str ou un Path (cohérence
+        Pathlike)."""
+        target = tmp_path / "from_str.txt"
+        atomic_write_text(str(target), "ok")
+        assert target.read_text() == "ok"
+class TestOverwrite:
+    def test_overwrite_existing_file(self, tmp_path: Path) -> None:
+        """``atomic_write_text`` remplace un fichier existant."""
+        target = tmp_path / "existing.txt"
+        target.write_text("old content")
+        atomic_write_text(target, "new content")
+        assert target.read_text() == "new content"
+    def test_overwrite_does_not_leak_tmp(self, tmp_path: Path) -> None:
+        """Après un overwrite, il ne reste pas de fichier ``.tmp``
+        orphelin dans le répertoire."""
+        target = tmp_path / "out.txt"
+        atomic_write_text(target, "v1")
+        atomic_write_text(target, "v2")
+        tmp_files = [
+            p for p in tmp_path.iterdir() if ".tmp" in p.name
+        ]
+        assert tmp_files == [], (
+            f"fichiers tmp orphelins : {tmp_files}"
+        )
+class TestErrorHandling:
+    def test_write_to_nonexistent_dir_raises(self, tmp_path: Path) -> None:
+        """Écrire dans un répertoire inexistant doit lever — le
+        helper ne crée pas les répertoires intermédiaires (c'est
+        au caller de gérer la création)."""
+        target = tmp_path / "nonexistent" / "out.txt"
+        with pytest.raises(FileNotFoundError):
+            atomic_write_text(target, "x")
+    def test_target_unchanged_after_write_failure(
+        self, tmp_path: Path,
+    ) -> None:
+        """Si l'écriture du tmp échoue, le ``path`` original (s'il
+        existait) reste inchangé.  Garantit la sémantique
+        "tout ou rien"."""
+        target = tmp_path / "existing.txt"
+        target.write_text("original content")
+        # Simule un échec en intercalant ``open`` qui lève.
+        original_open = open
+        def failing_open(file, *args, **kwargs):
+            # Ne fail que sur le tmp ; laisse passer les autres open.
+            if str(file).endswith(".tmp"):
+                raise OSError("disk full simulated")
+            return original_open(file, *args, **kwargs)
+        with patch("builtins.open", side_effect=failing_open):
+            with pytest.raises(OSError, match="disk full"):
+                atomic_write_text(target, "new content")
+        # Le contenu original DOIT être préservé.
+        assert target.read_text() == "original content"
+    def test_tmp_cleaned_after_write_failure(
+        self, tmp_path: Path,
+    ) -> None:
+        """Si l'écriture échoue, le tmp doit être supprimé pour ne
+        pas laisser d'orphelin sur le filesystem."""
+        target = tmp_path / "out.txt"
+        original_open = open
+        opened_tmp_paths: list[str] = []
+        def failing_after_open(file, *args, **kwargs):
+            if str(file).endswith(".tmp"):
+                opened_tmp_paths.append(str(file))
+                f = original_open(file, *args, **kwargs)
+                # Force fsync à lever (simule erreur disque).
+                original_close = f.close
+                def _close():
+                    original_close()
+                    raise OSError("disk error during close")
+                f.close = _close
+                return f
+            return original_open(file, *args, **kwargs)
+        # Approche plus simple : mock os.fsync pour lever.
+        from picarones.adapters import _atomic_io
+        def failing_fsync(fd):
+            raise OSError("fsync failed")
+        with patch.object(_atomic_io.os, "fsync", side_effect=failing_fsync):
+            with pytest.raises(OSError, match="fsync failed"):
+                atomic_write_text(target, "content")
+        # Pas de tmp orphelin.
+        tmp_files = [
+            p for p in tmp_path.iterdir() if ".tmp" in p.name
+        ]
+        assert tmp_files == [], (
+            f"tmp orphelin après échec : {tmp_files}"
+        )
+class TestAtomicity:
+    """Le contrat clé : un kill entre write et rename ne laisse
+    jamais ``path`` dans un état partiel.
+    Difficile à tester directement (on ne peut pas SIGKILL un
+    sous-process Python depuis pytest proprement), mais on peut
+    tester l'invariant : pendant l'écriture, ``path`` n'existe
+    pas (le contenu est dans le tmp), et après le rename,
+    ``path`` existe avec le contenu complet.
+    """
+    def test_path_does_not_exist_during_write(
+        self, tmp_path: Path,
+    ) -> None:
+        """Pendant le ``open(tmp)``, ``path`` ne doit pas exister.
+        On vérifie via un side_effect qui assert au moment du write."""
+        target = tmp_path / "out.txt"
+        assert not target.exists()
+        original_open = open
+        def open_with_check(file, *args, **kwargs):
+            f = original_open(file, *args, **kwargs)
+            if str(file).endswith(".tmp"):
+                # À ce moment, ``target`` ne doit toujours pas exister.
+                assert not target.exists(), (
+                    f"target {target} existe pendant l'écriture du tmp"
+                )
+            return f
+        with patch("builtins.open", side_effect=open_with_check):
+            atomic_write_text(target, "content")
+        assert target.exists()
+        assert target.read_text() == "content"
+    def test_tmp_path_is_in_same_dir(self, tmp_path: Path) -> None:
+        """Le tmp doit être dans le MÊME répertoire que le ``path``
+        cible — sinon le rename pourrait traverser des filesystems
+        (rename(2) atomique uniquement sur le même FS)."""
+        from picarones.adapters._atomic_io import _tmp_path_for
+        target = tmp_path / "subdir" / "out.txt"
+        target.parent.mkdir()
+        tmp = _tmp_path_for(target)
+        assert tmp.parent == target.parent
+    def test_tmp_path_includes_pid_and_tid(self) -> None:
+        """Le tmp inclut PID + thread ID pour éviter les collisions
+        entre workers du même pool."""
+        import os
+        import threading
+        from picarones.adapters._atomic_io import _tmp_path_for
+        target = Path("/tmp/test_collision.txt")
+        tmp = _tmp_path_for(target)
+        assert str(os.getpid()) in tmp.name
+        assert str(threading.get_ident()) in tmp.name
+class TestConcurrentWrites:
+    """Sans bloquer le caller — on s'assure juste que des
+    écritures concurrentes vers le MÊME path ne corrompent pas
+    le résultat (sémantique : la dernière gagne).
+    """
+    def test_concurrent_writes_yield_one_of_the_contents(
+        self, tmp_path: Path,
+    ) -> None:
+        import threading
+        target = tmp_path / "racy.txt"
+        contents = [f"writer-{i}" for i in range(8)]
+        threads = [
+            threading.Thread(
+                target=atomic_write_text,
+                args=(target, c),
+            )
+            for c in contents
+        ]
+        for t in threads:
+            t.start()
+        for t in threads:
+            t.join(timeout=5.0)
+        # Le contenu final doit être l'un des contenus écrits.
+        # Pas de tmp orphelin.
+        final = target.read_text()
+        assert final in contents
+        tmp_files = [
+            p for p in tmp_path.iterdir() if ".tmp" in p.name
+        ]
+        assert tmp_files == [], f"tmp orphelins : {tmp_files}"