--- license: other license_name: lfm1.0 license_link: LICENSE base_model: LiquidAI/LFM2.5-ColBERT-350M library_name: mlx pipeline_tag: sentence-similarity language: - en - es - de - fr - it - pt - ar - sv - 'no' - ja - ko tags: - mlx - lfm2 - lfm2.5 - ColBERT - late-interaction - sentence-similarity - feature-extraction - retrieval --- # LFM2.5-ColBERT-350M — MLX (bf16) MLX build of [**LiquidAI/LFM2.5-ColBERT-350M**](https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M), a multilingual late-interaction retriever (128-dim vector **per token**, scored with MaxSim), for local inference on Apple Silicon with [MLX](https://github.com/ml-explore/mlx). All weights, architecture, and behavior are LiquidAI's. This repository changes the file format (PyTorch/safetensors → MLX) and kept at the original **bf16** precision — it is **not quantized**. Quantized variants (8-bit / 4-bit) are available as sibling repos; see the table below. See the [original model card](https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M) for training details and intended use. ## Conversion details - Converted with `mlx`; weights unchanged apart from tensor layout (bf16 → MLX bf16). Includes the 1024→128 `Dense` projection head (`dense.weight`). - The architecture is `Lfm2BidirectionalModel` (a bidirectional LFM2 encoder), which `mlx-lm` / `mlx-embeddings` do not support out of the box, so a small self-contained MLX implementation is included as [`lfm2_bidirectional.py`](lfm2_bidirectional.py). - **Verified** against the original (PyTorch, float32, identical token ids): worst-case cosine of the per-token projected vectors ≈ **1.0** across short prompts and a 130-token passage. ## Evaluation Retrieval quality of this checkpoint (and its sibling precisions), measured as **NDCG@10 / Recall@10** on judged pools. *Retention* = metric ÷ bf16 metric, averaged per-dataset. **Setup.** English = the four **NanoBEIR** sets (full small corpora, ~2–5k passages, 50 queries each). Multilingual = **MIRACL** dev (the real queries and relevance judgments) for Spanish, German, Japanese, Arabic, each scored over a reduced pool of ~6k passages (judged positives + hard-mined negatives + sampled distractors, from `mteb/MIRACLRetrievalHardNegatives`), 100 queries each. Reduced pools make *absolute* scores easier than full-corpus MIRACL and not leaderboard-comparable — but every precision searches the identical pool, so the **retention** numbers (the point of this table) are sound. ColBERT uses brute-force MaxSim with no query augmentation, so its absolute scores sit a touch below a full PLAID setup. ### Summary (mean over 8 datasets) | precision | NDCG@10 | NDCG retention | Recall@10 | Recall retention | size | |---|---|---|---|---|---| | **bf16** ◄ | 0.740 | 100.0% | 0.780 | 100.0% | 707 MB | | 8-bit | 0.741 | 100.0% | 0.779 | 99.4% | 376 MB | | 4-bit | 0.731 | 98.7% | 0.780 | 99.7% | 199 MB | | mxfp4 | 0.730 | 98.5% | 0.773 | 98.8% | — | ### NDCG@10 by dataset | dataset | **bf16** ◄ | 8-bit | 4-bit | mxfp4 | |---|---|---|---|---| | NanoNQ · en | 0.757 | 0.751 | 0.716 | 0.742 | | NanoFiQA2018 · en | 0.528 | 0.512 | 0.524 | 0.520 | | NanoSciFact · en | 0.693 | 0.712 | 0.702 | 0.682 | | NanoNFCorpus · en | 0.345 | 0.342 | 0.335 | 0.334 | | MIRACL · es | 0.900 | 0.901 | 0.899 | 0.900 | | MIRACL · de | 0.823 | 0.837 | 0.826 | 0.811 | | MIRACL · ja | 0.934 | 0.933 | 0.923 | 0.926 | | MIRACL · ar | 0.938 | 0.941 | 0.924 | 0.926 | ## License & attribution Redistributed under the **LFM Open License v1.0** ([`LICENSE`](LICENSE)) — the same license as the original model. Per Section 4, this notice records that the files were **modified (format conversion to MLX)**. The original work is by **Liquid AI**; this repository is an independent conversion, **not affiliated with or endorsed by Liquid AI**. The license includes a **commercial-use threshold (Section 5)** — review it for your use case. **Base model:** [LiquidAI/LFM2.5-ColBERT-350M](https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M)