--- library_name: joblib tags: - security - joblib - modelscan - picklescan - scanner-bypass - proof-of-concept license: apache-2.0 --- # JobLib Raw-Array Tail Scanner Bypass PoC Benign security proof-of-concept for a current JobLib scanner/runtime parser mismatch. ## Summary A valid `.joblib` artifact can hide a dangerous pickle tail behind a JobLib `NumpyArrayWrapper` numeric array payload. `joblib.load()` reads the numeric array as raw bytes, then resumes unpickling and executes the later payload. ModelScan 0.8.8 and Picklescan 1.0.4 parse the same bytes as a plain pickle stream, so a crafted raw array beginning with a `BINBYTES` opcode makes the scanners treat the later dangerous tail as inert byte-string data. This is not a new claim that pickle or JobLib loading is safe. JobLib already warns against loading untrusted files. The reportable issue is the parser disagreement: scanners report a valid JobLib model artifact as clean while the normal JobLib runtime reaches the hidden tail during `joblib.load()`. ## Severity High, CVSS 8.1. Rationale: scanner-clean supported JobLib artifact with load-time code execution when a user or automated model pipeline trusts scanner output before calling `joblib.load()`. This is constrained by the known unsafe deserialization semantics of JobLib, so the novelty is the scanner bypass rather than a new deserialization primitive. ## Tested Versions - Python 3.12.3 - `joblib==1.5.3` - `scikit-learn==1.8.0` - `numpy==2.4.4` - `modelscan==0.8.8` - `picklescan==1.0.4` ## Files - `sklearn_nopad_swallow_tail_payload.joblib` - valid JobLib artifact carrying a `sklearn.preprocessing.FunctionTransformer`. - `verify_poc.py` - verifies hash and demonstrates benign marker creation on `joblib.load()`. - `modelscan_sklearn_nopad_swallow_tail.json` - ModelScan 0.8.8 output. - `picklescan_sklearn_nopad_swallow_tail.txt` - Picklescan 1.0.4 output. - `pickletools_sklearn_nopad_swallow_summary.txt` - confirms `pickletools` does not see `posix.system` as a global while the raw bytes contain it. - `runtime_output.txt` - local runtime validation output. - `verify_output.txt` - staging verifier output. - `requirements.txt` - tested dependency versions. - `SHA256SUMS` - hashes for the staged core files. ## Artifact SHA256: ```text 141d2d0b175dc53671dae11994500e0cb82633ba305381b56c6af22cbbbdd5c4 sklearn_nopad_swallow_tail_payload.joblib ``` ## Reproduction ```bash python -m venv .venv . .venv/bin/activate pip install joblib scikit-learn modelscan picklescan modelscan scan -p sklearn_nopad_swallow_tail_payload.joblib -r json --show-skipped picklescan -p sklearn_nopad_swallow_tail_payload.joblib python verify_poc.py ``` Expected scanner result: - ModelScan: zero issues, zero errors, one scanned file. - Picklescan: one scanned file, zero infected files, zero dangerous globals. Expected runtime result: - `joblib.load()` reconstructs a `FunctionTransformer`. - A local marker file named `joblib_inline_array_tail_marker.txt` is created. ## Duplicate Boundary Known public reports cover generic unsafe `joblib.load()`, compressed JobLib scanner bypasses, object-array/double-pickle scanner evasion, extension mismatch bypasses, and legacy `NDArrayWrapper` traversal. This PoC is distinct: it uses a current valid JobLib numeric raw-array payload with `numpy_array_alignment_bytes=None` plus a `BINBYTES` tail-swallow layout so scanners miss the later `posix.system` tail while JobLib executes it. ## Impact The PoC demonstrates a scanner false negative for a dangerous model artifact that loads through the normal JobLib runtime. A model registry, ingestion pipeline, or notebook workflow that treats ModelScan/Picklescan clean output as sufficient for `.joblib` safety can still execute the artifact-carried payload at load time. The payload is benign and only writes `joblib inline-array tail payload executed` to a marker file in the local PoC directory. ## Limitations - Not a new unsafe-deserialization primitive in JobLib. - Requires a victim workflow that loads scanner-clean JobLib artifacts. - The benign payload writes only a local marker file; no network access, credential access, persistence, or destructive behavior is used. ## Mitigation Ideas - Scanners should implement JobLib-aware parsing for `NumpyArrayWrapper` raw ndarray payloads instead of scanning the whole file as plain pickle. - Treat scanner parse disagreement or embedded raw payload regions as suspicious unless the scanner can advance exactly as the JobLib loader does.