--- library_name: lf4 tags: - lf4 - static-embedding - 4-bit - quantized - sentence-similarity - code-search - tool-search - sentence-transformers - embedding language: en license: mit pipeline_tag: sentence-similarity --- # VTXAI/Vortex-Embed-4.7M **Native 4-bit quantized** static sentence embedding model. Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table. Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed. ## Model Size | Format | Size | Compression | |--------|------|-------------| | FP32 (original) | 28.8 MB | 1.0× | | **LF4 (this model)** | **4.7 MB** | **6.4×** | ## Architecture Learned static embedding table with 4-bit per-block quantization (LF4): ``` LF4StaticEmbedding( vocab=29528, dim=256, bits=4, block_size=32, size=4.7MB ) ``` Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize` Weights stored as: - `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte - `embedding_scales`: float16 (29528 × 8) — per-block scale - `embedding_zeros`: float16 (29528 × 8) — per-block zero-point ## Usage ### Python inference (lightweight, no torch) ```python from lf4_model import LF4StaticEmbedding model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M") print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB) # Encode sentences to 256-dim vectors embeddings = model.encode(["search the web for news", "read file contents"]) # Cosine similarity search scores, indices = model.search(query_emb, doc_emb, top_k=10) ``` ### With sentence-transformers (torch) ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static") embeddings = model.encode(["search the web for news", "read file contents"]) ``` ## Quality - **Cosine preservation vs FP32**: 0.9969 - **MSE**: 0.256990 - **Tool search accuracy**: 100% (15/15, benchmarks) - **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks) - Trained on: CornStack (Python/JS/Java) + Glaive function-calling - Base: **VTXAI/Vortex-Embed** → fine-tuned → LF4 quantized ## Why Static Embedding? | Feature | Static (this) | Transformer (BERT) | |---|---|---| | Inference speed | **0.15ms** | ~50ms | | Load time | **144ms** | ~5s | | Disk size | **4.7 MB** | ~400 MB | | GPU needed | **No** | Recommended | | Accuracy | Comparable* | Higher for complex semantics | \* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly. ## No Dependencies Beyond NumPy ```bash pip install numpy safetensors tokenizers ``` The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`. No PyTorch, no transformers, no sentence-transformers required for basic inference.