Vortex-Embed-4.7M / README.md

Abhaykoul

Upload folder using huggingface_hub

ff0e8ed verified 4 days ago

2.88 kB

library_name: lf4
tags:
  - lf4
  - static-embedding
  - 4-bit
  - quantized
  - sentence-similarity
  - code-search
  - tool-search
  - sentence-transformers
  - embedding
language: en
license: mit
pipeline_tag: sentence-similarity

VTXAI/Vortex-Embed-4.7M

Native 4-bit quantized static sentence embedding model.
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.

Weighs only 4.7 MB on disk — no transformers, no torch, no GPU needed.

Model Size

Format	Size	Compression
FP32 (original)	28.8 MB	1.0×
LF4 (this model)	4.7 MB	6.4×

Architecture

Learned static embedding table with 4-bit per-block quantization (LF4):

LF4StaticEmbedding(
  vocab=29528, dim=256, bits=4,
  block_size=32, size=4.7MB
)

Encoding: tokenize → lookup dequantized embeddings → mean pool → L2 normalize

Weights stored as:

embedding_packed: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
embedding_scales: float16 (29528 × 8) — per-block scale
embedding_zeros: float16 (29528 × 8) — per-block zero-point

Usage

Python inference (lightweight, no torch)

from lf4_model import LF4StaticEmbedding

model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model)  # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)

# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])

# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)

With sentence-transformers (torch)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])

Quality

Cosine preservation vs FP32: 0.9969
MSE: 0.256990
Tool search accuracy: 100% (15/15, benchmarks)
Codebase indexing: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
Trained on: CornStack (Python/JS/Java) + Glaive function-calling
Base: VTXAI/Vortex-Embed → fine-tuned → LF4 quantized

Why Static Embedding?

Feature	Static (this)	Transformer (BERT)
Inference speed	0.15ms	~50ms
Load time	144ms	~5s
Disk size	4.7 MB	~400 MB
GPU needed	No	Recommended
Accuracy	Comparable*	Higher for complex semantics

* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.

No Dependencies Beyond NumPy

pip install numpy safetensors tokenizers

The model loads and runs with just numpy, safetensors, and HuggingFace tokenizers.
No PyTorch, no transformers, no sentence-transformers required for basic inference.