LFM2.5-ColBERT-350M โ€” MLX (bf16)

MLX build of LiquidAI/LFM2.5-ColBERT-350M, a multilingual late-interaction retriever (128-dim vector per token, scored with MaxSim), for local inference on Apple Silicon with MLX.

All weights, architecture, and behavior are LiquidAI's. This repository changes the file format (PyTorch/safetensors โ†’ MLX) and kept at the original bf16 precision โ€” it is not quantized. Quantized variants (8-bit / 4-bit) are available as sibling repos; see the table below. See the original model card for training details and intended use.

Conversion details

  • Converted with mlx; weights unchanged apart from tensor layout (bf16 โ†’ MLX bf16). Includes the 1024โ†’128 Dense projection head (dense.weight).
  • The architecture is Lfm2BidirectionalModel (a bidirectional LFM2 encoder), which mlx-lm / mlx-embeddings do not support out of the box, so a small self-contained MLX implementation is included as lfm2_bidirectional.py.
  • Verified against the original (PyTorch, float32, identical token ids): worst-case cosine of the per-token projected vectors โ‰ˆ 1.0 across short prompts and a 130-token passage.

Evaluation

Retrieval quality of this checkpoint (and its sibling precisions), measured as NDCG@10 / Recall@10 on judged pools. Retention = metric รท bf16 metric, averaged per-dataset.

Setup. English = the four NanoBEIR sets (full small corpora, ~2โ€“5k passages, 50 queries each). Multilingual = MIRACL dev (the real queries and relevance judgments) for Spanish, German, Japanese, Arabic, each scored over a reduced pool of ~6k passages (judged positives + hard-mined negatives + sampled distractors, from mteb/MIRACLRetrievalHardNegatives), 100 queries each. Reduced pools make absolute scores easier than full-corpus MIRACL and not leaderboard-comparable โ€” but every precision searches the identical pool, so the retention numbers (the point of this table) are sound. ColBERT uses brute-force MaxSim with no query augmentation, so its absolute scores sit a touch below a full PLAID setup.

Summary (mean over 8 datasets)

precision NDCG@10 NDCG retention Recall@10 Recall retention size
bf16 โ—„ 0.740 100.0% 0.780 100.0% 707 MB
8-bit 0.741 100.0% 0.779 99.4% 376 MB
4-bit 0.731 98.7% 0.780 99.7% 199 MB
mxfp4 0.730 98.5% 0.773 98.8% โ€”

NDCG@10 by dataset

dataset bf16 โ—„ 8-bit 4-bit mxfp4
NanoNQ ยท en 0.757 0.751 0.716 0.742
NanoFiQA2018 ยท en 0.528 0.512 0.524 0.520
NanoSciFact ยท en 0.693 0.712 0.702 0.682
NanoNFCorpus ยท en 0.345 0.342 0.335 0.334
MIRACL ยท es 0.900 0.901 0.899 0.900
MIRACL ยท de 0.823 0.837 0.826 0.811
MIRACL ยท ja 0.934 0.933 0.923 0.926
MIRACL ยท ar 0.938 0.941 0.924 0.926

License & attribution

Redistributed under the LFM Open License v1.0 (LICENSE) โ€” the same license as the original model. Per Section 4, this notice records that the files were modified (format conversion to MLX). The original work is by Liquid AI; this repository is an independent conversion, not affiliated with or endorsed by Liquid AI. The license includes a commercial-use threshold (Section 5) โ€” review it for your use case.

Base model: LiquidAI/LFM2.5-ColBERT-350M

Downloads last month
29
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ronaldmannak/LFM2.5-ColBERT-350M-bf16

Finetuned
(11)
this model