File size: 2,882 Bytes
ff0e8ed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | ---
library_name: lf4
tags:
- lf4
- static-embedding
- 4-bit
- quantized
- sentence-similarity
- code-search
- tool-search
- sentence-transformers
- embedding
language: en
license: mit
pipeline_tag: sentence-similarity
---
# VTXAI/Vortex-Embed-4.7M
**Native 4-bit quantized** static sentence embedding model.
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.
Weighs only **4.7 MB** on disk β no transformers, no torch, no GPU needed.
## Model Size
| Format | Size | Compression |
|--------|------|-------------|
| FP32 (original) | 28.8 MB | 1.0Γ |
| **LF4 (this model)** | **4.7 MB** | **6.4Γ** |
## Architecture
Learned static embedding table with 4-bit per-block quantization (LF4):
```
LF4StaticEmbedding(
vocab=29528, dim=256, bits=4,
block_size=32, size=4.7MB
)
```
Encoding: `tokenize β lookup dequantized embeddings β mean pool β L2 normalize`
Weights stored as:
- `embedding_packed`: uint8 (29528 Γ 128) β 4-bit packed, 2 values/byte
- `embedding_scales`: float16 (29528 Γ 8) β per-block scale
- `embedding_zeros`: float16 (29528 Γ 8) β per-block zero-point
## Usage
### Python inference (lightweight, no torch)
```python
from lf4_model import LF4StaticEmbedding
model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)
# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])
# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)
```
### With sentence-transformers (torch)
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])
```
## Quality
- **Cosine preservation vs FP32**: 0.9969
- **MSE**: 0.256990
- **Tool search accuracy**: 100% (15/15, benchmarks)
- **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
- Base: **VTXAI/Vortex-Embed** β fine-tuned β LF4 quantized
## Why Static Embedding?
| Feature | Static (this) | Transformer (BERT) |
|---|---|---|
| Inference speed | **0.15ms** | ~50ms |
| Load time | **144ms** | ~5s |
| Disk size | **4.7 MB** | ~400 MB |
| GPU needed | **No** | Recommended |
| Accuracy | Comparable* | Higher for complex semantics |
\* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
## No Dependencies Beyond NumPy
```bash
pip install numpy safetensors tokenizers
```
The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.
No PyTorch, no transformers, no sentence-transformers required for basic inference.
|