LFM2.5-Embedding-350M โ€” MLX (bf16)

MLX build of LiquidAI/LFM2.5-Embedding-350M, a multilingual dense bi-encoder (1024-dim CLS embedding, cosine similarity), for local inference on Apple Silicon with MLX.

All weights, architecture, and behavior are LiquidAI's. This repository changes the file format (PyTorch/safetensors โ†’ MLX) and kept at the original bf16 precision โ€” it is not quantized. Quantized variants (8-bit / 4-bit) are available as sibling repos; see the table below. See the original model card for training details and intended use.

Conversion details

  • Converted with mlx; weights unchanged apart from tensor layout (bf16 โ†’ MLX bf16).
  • The architecture is Lfm2BidirectionalModel (a bidirectional LFM2 encoder), which mlx-lm / mlx-embeddings do not support out of the box, so a small self-contained MLX implementation is included as lfm2_bidirectional.py.
  • Verified against the original (PyTorch, float32, identical token ids): worst-case cosine of the CLS embedding โ‰ˆ 1.0 across short prompts and a 130-token passage.

Evaluation

Retrieval quality of this checkpoint (and its sibling precisions), measured as NDCG@10 / Recall@10 on judged pools. Retention = metric รท bf16 metric, averaged per-dataset.

Setup. English = the four NanoBEIR sets (full small corpora, ~2โ€“5k passages, 50 queries each). Multilingual = MIRACL dev (the real queries and relevance judgments) for Spanish, German, Japanese, Arabic, each scored over a reduced pool of ~6k passages (judged positives + hard-mined negatives + sampled distractors, from mteb/MIRACLRetrievalHardNegatives), 100 queries each. Reduced pools make absolute scores easier than full-corpus MIRACL and not leaderboard-comparable โ€” but every precision searches the identical pool, so the retention numbers (the point of this table) are sound. ColBERT uses brute-force MaxSim with no query augmentation, so its absolute scores sit a touch below a full PLAID setup.

Summary (mean over 8 datasets)

precision NDCG@10 NDCG retention Recall@10 Recall retention size
bf16 โ—„ 0.728 100.0% 0.775 100.0% 709 MB
8-bit 0.729 100.1% 0.775 100.0% 377 MB
4-bit 0.730 100.0% 0.766 98.6% 200 MB
mxfp4 0.725 99.8% 0.764 98.4% โ€”

NDCG@10 by dataset

dataset bf16 โ—„ 8-bit 4-bit mxfp4
NanoNQ ยท en 0.704 0.704 0.703 0.703
NanoFiQA2018 ยท en 0.504 0.511 0.502 0.498
NanoSciFact ยท en 0.716 0.717 0.714 0.712
NanoNFCorpus ยท en 0.342 0.340 0.335 0.345
MIRACL ยท es 0.891 0.892 0.895 0.893
MIRACL ยท de 0.809 0.810 0.819 0.812
MIRACL ยท ja 0.929 0.928 0.940 0.922
MIRACL ยท ar 0.926 0.926 0.928 0.916

License & attribution

Redistributed under the LFM Open License v1.0 (LICENSE) โ€” the same license as the original model. Per Section 4, this notice records that the files were modified (format conversion to MLX). The original work is by Liquid AI; this repository is an independent conversion, not affiliated with or endorsed by Liquid AI. The license includes a commercial-use threshold (Section 5) โ€” review it for your use case.

Base model: LiquidAI/LFM2.5-Embedding-350M

Downloads last month
62
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ronaldmannak/LFM2.5-Embedding-350M-bf16

Finetuned
(11)
this model