File size: 2,882 Bytes
ff0e8ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
library_name: lf4
tags:
- lf4
- static-embedding
- 4-bit
- quantized
- sentence-similarity
- code-search
- tool-search
- sentence-transformers
- embedding
language: en
license: mit
pipeline_tag: sentence-similarity
---

# VTXAI/Vortex-Embed-4.7M

**Native 4-bit quantized** static sentence embedding model.  
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.

Weighs only **4.7 MB** on disk β€” no transformers, no torch, no GPU needed.

## Model Size

| Format | Size | Compression |
|--------|------|-------------|
| FP32 (original) | 28.8 MB | 1.0Γ— |
| **LF4 (this model)** | **4.7 MB** | **6.4Γ—** |

## Architecture

Learned static embedding table with 4-bit per-block quantization (LF4):

```
LF4StaticEmbedding(
  vocab=29528, dim=256, bits=4,
  block_size=32, size=4.7MB
)
```

Encoding: `tokenize β†’ lookup dequantized embeddings β†’ mean pool β†’ L2 normalize`

Weights stored as:
- `embedding_packed`: uint8 (29528 Γ— 128) β€” 4-bit packed, 2 values/byte
- `embedding_scales`: float16 (29528 Γ— 8) β€” per-block scale
- `embedding_zeros`: float16 (29528 Γ— 8) β€” per-block zero-point

## Usage

### Python inference (lightweight, no torch)

```python
from lf4_model import LF4StaticEmbedding

model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model)  # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)

# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])

# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)
```

### With sentence-transformers (torch)

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])
```

## Quality

- **Cosine preservation vs FP32**: 0.9969
- **MSE**: 0.256990
- **Tool search accuracy**: 100% (15/15, benchmarks)
- **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
- Base: **VTXAI/Vortex-Embed** β†’ fine-tuned β†’ LF4 quantized

## Why Static Embedding?

| Feature | Static (this) | Transformer (BERT) |
|---|---|---|
| Inference speed | **0.15ms** | ~50ms |
| Load time | **144ms** | ~5s |
| Disk size | **4.7 MB** | ~400 MB |
| GPU needed | **No** | Recommended |
| Accuracy | Comparable* | Higher for complex semantics |

\* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.

## No Dependencies Beyond NumPy

```bash
pip install numpy safetensors tokenizers
```

The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.  
No PyTorch, no transformers, no sentence-transformers required for basic inference.