ronaldmannak's picture
Upload LFM2.5 MLX checkpoint (bidirectional encoder)
d8ac668 verified
|
Raw
History Blame Contribute Delete
4.05 kB
metadata
license: other
license_name: lfm1.0
license_link: LICENSE
base_model: LiquidAI/LFM2.5-ColBERT-350M
library_name: mlx
pipeline_tag: sentence-similarity
language:
  - en
  - es
  - de
  - fr
  - it
  - pt
  - ar
  - sv
  - 'no'
  - ja
  - ko
tags:
  - mlx
  - lfm2
  - lfm2.5
  - ColBERT
  - late-interaction
  - sentence-similarity
  - feature-extraction
  - retrieval

LFM2.5-ColBERT-350M — MLX (bf16)

MLX build of LiquidAI/LFM2.5-ColBERT-350M, a multilingual late-interaction retriever (128-dim vector per token, scored with MaxSim), for local inference on Apple Silicon with MLX.

All weights, architecture, and behavior are LiquidAI's. This repository changes the file format (PyTorch/safetensors → MLX) and kept at the original bf16 precision — it is not quantized. Quantized variants (8-bit / 4-bit) are available as sibling repos; see the table below. See the original model card for training details and intended use.

Conversion details

  • Converted with mlx; weights unchanged apart from tensor layout (bf16 → MLX bf16). Includes the 1024→128 Dense projection head (dense.weight).
  • The architecture is Lfm2BidirectionalModel (a bidirectional LFM2 encoder), which mlx-lm / mlx-embeddings do not support out of the box, so a small self-contained MLX implementation is included as lfm2_bidirectional.py.
  • Verified against the original (PyTorch, float32, identical token ids): worst-case cosine of the per-token projected vectors ≈ 1.0 across short prompts and a 130-token passage.

Evaluation

Retrieval quality of this checkpoint (and its sibling precisions), measured as NDCG@10 / Recall@10 on judged pools. Retention = metric ÷ bf16 metric, averaged per-dataset.

Setup. English = the four NanoBEIR sets (full small corpora, ~2–5k passages, 50 queries each). Multilingual = MIRACL dev (the real queries and relevance judgments) for Spanish, German, Japanese, Arabic, each scored over a reduced pool of ~6k passages (judged positives + hard-mined negatives + sampled distractors, from mteb/MIRACLRetrievalHardNegatives), 100 queries each. Reduced pools make absolute scores easier than full-corpus MIRACL and not leaderboard-comparable — but every precision searches the identical pool, so the retention numbers (the point of this table) are sound. ColBERT uses brute-force MaxSim with no query augmentation, so its absolute scores sit a touch below a full PLAID setup.

Summary (mean over 8 datasets)

precision NDCG@10 NDCG retention Recall@10 Recall retention size
bf16 0.740 100.0% 0.780 100.0% 707 MB
8-bit 0.741 100.0% 0.779 99.4% 376 MB
4-bit 0.731 98.7% 0.780 99.7% 199 MB
mxfp4 0.730 98.5% 0.773 98.8%

NDCG@10 by dataset

dataset bf16 8-bit 4-bit mxfp4
NanoNQ · en 0.757 0.751 0.716 0.742
NanoFiQA2018 · en 0.528 0.512 0.524 0.520
NanoSciFact · en 0.693 0.712 0.702 0.682
NanoNFCorpus · en 0.345 0.342 0.335 0.334
MIRACL · es 0.900 0.901 0.899 0.900
MIRACL · de 0.823 0.837 0.826 0.811
MIRACL · ja 0.934 0.933 0.923 0.926
MIRACL · ar 0.938 0.941 0.924 0.926

License & attribution

Redistributed under the LFM Open License v1.0 (LICENSE) — the same license as the original model. Per Section 4, this notice records that the files were modified (format conversion to MLX). The original work is by Liquid AI; this repository is an independent conversion, not affiliated with or endorsed by Liquid AI. The license includes a commercial-use threshold (Section 5) — review it for your use case.

Base model: LiquidAI/LFM2.5-ColBERT-350M