Abhaykoul commited on
Commit
ff0e8ed
·
verified ·
1 Parent(s): ab61753

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: lf4
3
+ tags:
4
+ - lf4
5
+ - static-embedding
6
+ - 4-bit
7
+ - quantized
8
+ - sentence-similarity
9
+ - code-search
10
+ - tool-search
11
+ - sentence-transformers
12
+ - embedding
13
+ language: en
14
+ license: mit
15
+ pipeline_tag: sentence-similarity
16
+ ---
17
+
18
+ # VTXAI/Vortex-Embed-4.7M
19
+
20
+ **Native 4-bit quantized** static sentence embedding model.
21
+ Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.
22
+
23
+ Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed.
24
+
25
+ ## Model Size
26
+
27
+ | Format | Size | Compression |
28
+ |--------|------|-------------|
29
+ | FP32 (original) | 28.8 MB | 1.0× |
30
+ | **LF4 (this model)** | **4.7 MB** | **6.4×** |
31
+
32
+ ## Architecture
33
+
34
+ Learned static embedding table with 4-bit per-block quantization (LF4):
35
+
36
+ ```
37
+ LF4StaticEmbedding(
38
+ vocab=29528, dim=256, bits=4,
39
+ block_size=32, size=4.7MB
40
+ )
41
+ ```
42
+
43
+ Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize`
44
+
45
+ Weights stored as:
46
+ - `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
47
+ - `embedding_scales`: float16 (29528 × 8) — per-block scale
48
+ - `embedding_zeros`: float16 (29528 × 8) — per-block zero-point
49
+
50
+ ## Usage
51
+
52
+ ### Python inference (lightweight, no torch)
53
+
54
+ ```python
55
+ from lf4_model import LF4StaticEmbedding
56
+
57
+ model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
58
+ print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)
59
+
60
+ # Encode sentences to 256-dim vectors
61
+ embeddings = model.encode(["search the web for news", "read file contents"])
62
+
63
+ # Cosine similarity search
64
+ scores, indices = model.search(query_emb, doc_emb, top_k=10)
65
+ ```
66
+
67
+ ### With sentence-transformers (torch)
68
+
69
+ ```python
70
+ from sentence_transformers import SentenceTransformer
71
+
72
+ model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
73
+ embeddings = model.encode(["search the web for news", "read file contents"])
74
+ ```
75
+
76
+ ## Quality
77
+
78
+ - **Cosine preservation vs FP32**: 0.9969
79
+ - **MSE**: 0.256990
80
+ - **Tool search accuracy**: 100% (15/15, benchmarks)
81
+ - **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
82
+ - Trained on: CornStack (Python/JS/Java) + Glaive function-calling
83
+ - Base: **VTXAI/Vortex-Embed** → fine-tuned → LF4 quantized
84
+
85
+ ## Why Static Embedding?
86
+
87
+ | Feature | Static (this) | Transformer (BERT) |
88
+ |---|---|---|
89
+ | Inference speed | **0.15ms** | ~50ms |
90
+ | Load time | **144ms** | ~5s |
91
+ | Disk size | **4.7 MB** | ~400 MB |
92
+ | GPU needed | **No** | Recommended |
93
+ | Accuracy | Comparable* | Higher for complex semantics |
94
+
95
+ \* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
96
+
97
+ ## No Dependencies Beyond NumPy
98
+
99
+ ```bash
100
+ pip install numpy safetensors tokenizers
101
+ ```
102
+
103
+ The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.
104
+ No PyTorch, no transformers, no sentence-transformers required for basic inference.
__pycache__/lf4_model.cpython-314.pyc ADDED
Binary file (10.5 kB). View file
 
config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "lf4-static-embedding",
3
+ "architectures": [
4
+ "LF4StaticEmbedding"
5
+ ],
6
+ "vocab_size": 29528,
7
+ "embedding_dim": 256,
8
+ "block_size": 32,
9
+ "num_blocks": 8,
10
+ "quantization": "lf4",
11
+ "bits": 4,
12
+ "compression_vs_fp32": 6.4,
13
+ "original_model": "VTXAI/Vortex-Embed",
14
+ "base_model": "VTXAI/Vortex-Embed"
15
+ }
lf4_model.py ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LF4 Static Embedding Model - Native 4-bit quantized sentence embeddings.
3
+ =========================================================================
4
+ Usage:
5
+ from lf4_model import LF4StaticEmbedding
6
+ model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
7
+ embeddings = model.encode(["find python json parser", "weather API tool"])
8
+
9
+ # Search
10
+ scores, indices = model.search(query_emb, index_emb, top_k=10)
11
+ """
12
+ import json
13
+ import numpy as np
14
+ from pathlib import Path
15
+ from typing import List, Union, Optional, Tuple
16
+
17
+
18
+ class LF4StaticEmbedding:
19
+ """Native LF4 4-bit static embedding model.
20
+
21
+ Weights are stored as packed 4-bit integers with per-block FP16 scales/zeros.
22
+ Total model size: ~3.5 MB (vs 29 MB FP32).
23
+ """
24
+
25
+ def __init__(self, packed, scales, zeros, tokenizer_data, config):
26
+ self.packed = packed # uint8 (vocab, dim/2)
27
+ self.scales = scales # float16 (vocab, num_blocks)
28
+ self.zeros = zeros # float16 (vocab, num_blocks)
29
+ self.config = config
30
+ self.vocab_size = config["vocab_size"]
31
+ self.dim = config["embedding_dim"]
32
+ self.block_size = config["block_size"]
33
+ self._tokenizer_data = tokenizer_data
34
+ self._tokenizer = None
35
+
36
+ # Pre-dequantize embedding table for fast lookup
37
+ self._embedding_table = self._dequantize_all()
38
+
39
+ def _dequantize_all(self) -> np.ndarray:
40
+ """Dequantize full embedding table to FP32 for fast token lookup."""
41
+ N = self.packed.shape[0]
42
+ D = self.dim
43
+ B = self.block_size
44
+
45
+ low = (self.packed & 0x0F).astype(np.float32)
46
+ high = ((self.packed >> 4) & 0x0F).astype(np.float32)
47
+ D_padded = self.packed.shape[1] * 2
48
+
49
+ unpacked = np.empty((N, D_padded), dtype=np.float32)
50
+ unpacked[:, 0::2] = low
51
+ unpacked[:, 1::2] = high
52
+
53
+ num_blocks = D_padded // B
54
+ blocked = unpacked.reshape(N, num_blocks, B)
55
+ s = self.scales.astype(np.float32)[:, :, None]
56
+ z = self.zeros.astype(np.float32)[:, :, None]
57
+
58
+ return (blocked * s + z).reshape(N, D_padded)[:, :D]
59
+
60
+ @property
61
+ def tokenizer(self):
62
+ if self._tokenizer is None:
63
+ try:
64
+ from tokenizers import Tokenizer
65
+ self._tokenizer = Tokenizer.from_str(self._tokenizer_data)
66
+ except Exception:
67
+ from tokenizers import Tokenizer
68
+ self._tokenizer = Tokenizer.from_file(self._tokenizer_data)
69
+ return self._tokenizer
70
+
71
+ @classmethod
72
+ def from_pretrained(cls, path_or_id: str) -> "LF4StaticEmbedding":
73
+ """Load model from local path or HuggingFace Hub."""
74
+ from pathlib import Path
75
+
76
+ p = Path(path_or_id)
77
+ if p.is_dir():
78
+ model_path = str(p / "model.safetensors")
79
+ config_path = p / "config.json"
80
+ tok_path = str(p / "tokenizer.json")
81
+ else:
82
+ from huggingface_hub import hf_hub_download
83
+ model_path = hf_hub_download(path_or_id, "model.safetensors")
84
+ config_path = Path(hf_hub_download(path_or_id, "config.json"))
85
+ tok_path = hf_hub_download(path_or_id, "tokenizer.json")
86
+
87
+ from safetensors.numpy import load_file
88
+ tensors = load_file(model_path)
89
+ config = json.loads(config_path.read_text())
90
+
91
+ return cls(
92
+ packed=tensors["embedding_packed"],
93
+ scales=tensors["embedding_scales"],
94
+ zeros=tensors["embedding_zeros"],
95
+ tokenizer_data=tok_path,
96
+ config=config,
97
+ )
98
+
99
+ def encode(self, texts: Union[str, List[str]], normalize: bool = True) -> np.ndarray:
100
+ """Encode texts to embeddings.
101
+
102
+ Args:
103
+ texts: single string or list of strings
104
+ normalize: L2-normalize output embeddings (default True for cosine sim)
105
+
106
+ Returns:
107
+ np.ndarray of shape (N, dim)
108
+ """
109
+ if isinstance(texts, str):
110
+ texts = [texts]
111
+
112
+ embeddings = np.zeros((len(texts), self.dim), dtype=np.float32)
113
+
114
+ for i, text in enumerate(texts):
115
+ encoded = self.tokenizer.encode(text)
116
+ token_ids = encoded.ids
117
+
118
+ # Mean pooling over token embeddings
119
+ valid_ids = [tid for tid in token_ids if 0 <= tid < self.vocab_size]
120
+ if valid_ids:
121
+ token_embs = self._embedding_table[valid_ids]
122
+ embeddings[i] = token_embs.mean(axis=0)
123
+
124
+ if normalize:
125
+ norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
126
+ norms = np.where(norms == 0, 1.0, norms)
127
+ embeddings = embeddings / norms
128
+
129
+ return embeddings
130
+
131
+ def search(
132
+ self,
133
+ queries: np.ndarray,
134
+ index: np.ndarray,
135
+ top_k: int = 10
136
+ ) -> Tuple[np.ndarray, np.ndarray]:
137
+ """Cosine similarity search.
138
+
139
+ Args:
140
+ queries: (Q, D) query embeddings
141
+ index: (N, D) document embeddings
142
+ top_k: number of results
143
+
144
+ Returns:
145
+ (scores, indices) arrays
146
+ """
147
+ queries = np.asarray(queries, dtype=np.float32)
148
+ index = np.asarray(index, dtype=np.float32)
149
+ if queries.ndim == 1:
150
+ queries = queries[None, :]
151
+
152
+ # Normalize
153
+ qn = queries / (np.linalg.norm(queries, axis=1, keepdims=True) + 1e-8)
154
+ dn = index / (np.linalg.norm(index, axis=1, keepdims=True) + 1e-8)
155
+
156
+ scores = qn @ dn.T
157
+
158
+ if top_k >= scores.shape[1]:
159
+ idx = np.argsort(-scores, axis=1)
160
+ return np.take_along_axis(scores, idx, 1), idx
161
+
162
+ idx = np.argpartition(-scores, top_k, axis=1)[:, :top_k]
163
+ s = np.take_along_axis(scores, idx, 1)
164
+ order = np.argsort(-s, axis=1)
165
+ return np.take_along_axis(s, order, 1), np.take_along_axis(idx, order, 1)
166
+
167
+ @property
168
+ def model_size_mb(self) -> float:
169
+ return (self.packed.nbytes + self.scales.nbytes + self.zeros.nbytes) / 1e6
170
+
171
+ def __repr__(self):
172
+ return (f"LF4StaticEmbedding(vocab={self.vocab_size}, dim={self.dim}, "
173
+ f"bits=4, size={self.model_size_mb:.1f}MB, "
174
+ f"block_size={self.block_size})")
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f62f5ea97f10d6c9c66eb469143aff968aa856288a41b6fc1c84703b3abb951
3
+ size 4724744
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff