metadata
library_name: lf4
tags:
- lf4
- static-embedding
- 4-bit
- quantized
- sentence-similarity
- code-search
- tool-search
- sentence-transformers
- embedding
language: en
license: mit
pipeline_tag: sentence-similarity
VTXAI/Vortex-Embed-4.7M
Native 4-bit quantized static sentence embedding model.
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.
Weighs only 4.7 MB on disk β no transformers, no torch, no GPU needed.
Model Size
| Format | Size | Compression |
|---|---|---|
| FP32 (original) | 28.8 MB | 1.0Γ |
| LF4 (this model) | 4.7 MB | 6.4Γ |
Architecture
Learned static embedding table with 4-bit per-block quantization (LF4):
LF4StaticEmbedding(
vocab=29528, dim=256, bits=4,
block_size=32, size=4.7MB
)
Encoding: tokenize β lookup dequantized embeddings β mean pool β L2 normalize
Weights stored as:
embedding_packed: uint8 (29528 Γ 128) β 4-bit packed, 2 values/byteembedding_scales: float16 (29528 Γ 8) β per-block scaleembedding_zeros: float16 (29528 Γ 8) β per-block zero-point
Usage
Python inference (lightweight, no torch)
from lf4_model import LF4StaticEmbedding
model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)
# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])
# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)
With sentence-transformers (torch)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])
Quality
- Cosine preservation vs FP32: 0.9969
- MSE: 0.256990
- Tool search accuracy: 100% (15/15, benchmarks)
- Codebase indexing: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
- Base: VTXAI/Vortex-Embed β fine-tuned β LF4 quantized
Why Static Embedding?
| Feature | Static (this) | Transformer (BERT) |
|---|---|---|
| Inference speed | 0.15ms | ~50ms |
| Load time | 144ms | ~5s |
| Disk size | 4.7 MB | ~400 MB |
| GPU needed | No | Recommended |
| Accuracy | Comparable* | Higher for complex semantics |
* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
No Dependencies Beyond NumPy
pip install numpy safetensors tokenizers
The model loads and runs with just numpy, safetensors, and HuggingFace tokenizers.
No PyTorch, no transformers, no sentence-transformers required for basic inference.