---
library_name: lf4
tags:
- lf4
- static-embedding
- 4-bit
- quantized
- sentence-similarity
- code-search
- tool-search
- sentence-transformers
- embedding
language: en
license: mit
pipeline_tag: sentence-similarity
---

# VTXAI/Vortex-Embed-4.7M

**Native 4-bit quantized** static sentence embedding model.  
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.

Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed.

## Model Size

| Format | Size | Compression |
|--------|------|-------------|
| FP32 (original) | 28.8 MB | 1.0× |
| **LF4 (this model)** | **4.7 MB** | **6.4×** |

## Architecture

Learned static embedding table with 4-bit per-block quantization (LF4):

```
LF4StaticEmbedding(
  vocab=29528, dim=256, bits=4,
  block_size=32, size=4.7MB
)
```

Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize`

Weights stored as:
- `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
- `embedding_scales`: float16 (29528 × 8) — per-block scale
- `embedding_zeros`: float16 (29528 × 8) — per-block zero-point

## Usage

### Python inference (lightweight, no torch)

```python
from lf4_model import LF4StaticEmbedding

model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model)  # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)

# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])

# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)
```

### With sentence-transformers (torch)

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])
```

## Quality

- **Cosine preservation vs FP32**: 0.9969
- **MSE**: 0.256990
- **Tool search accuracy**: 100% (15/15, benchmarks)
- **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
- Base: **VTXAI/Vortex-Embed** → fine-tuned → LF4 quantized

## Why Static Embedding?

| Feature | Static (this) | Transformer (BERT) |
|---|---|---|
| Inference speed | **0.15ms** | ~50ms |
| Load time | **144ms** | ~5s |
| Disk size | **4.7 MB** | ~400 MB |
| GPU needed | **No** | Recommended |
| Accuracy | Comparable* | Higher for complex semantics |

\* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.

## No Dependencies Beyond NumPy

```bash
pip install numpy safetensors tokenizers
```

The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.  
No PyTorch, no transformers, no sentence-transformers required for basic inference.