Vortex-Embed-4.7M / README.md
Abhaykoul's picture
Upload folder using huggingface_hub
ff0e8ed verified
|
raw
history blame
2.88 kB
metadata
library_name: lf4
tags:
  - lf4
  - static-embedding
  - 4-bit
  - quantized
  - sentence-similarity
  - code-search
  - tool-search
  - sentence-transformers
  - embedding
language: en
license: mit
pipeline_tag: sentence-similarity

VTXAI/Vortex-Embed-4.7M

Native 4-bit quantized static sentence embedding model.
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.

Weighs only 4.7 MB on disk β€” no transformers, no torch, no GPU needed.

Model Size

Format Size Compression
FP32 (original) 28.8 MB 1.0Γ—
LF4 (this model) 4.7 MB 6.4Γ—

Architecture

Learned static embedding table with 4-bit per-block quantization (LF4):

LF4StaticEmbedding(
  vocab=29528, dim=256, bits=4,
  block_size=32, size=4.7MB
)

Encoding: tokenize β†’ lookup dequantized embeddings β†’ mean pool β†’ L2 normalize

Weights stored as:

  • embedding_packed: uint8 (29528 Γ— 128) β€” 4-bit packed, 2 values/byte
  • embedding_scales: float16 (29528 Γ— 8) β€” per-block scale
  • embedding_zeros: float16 (29528 Γ— 8) β€” per-block zero-point

Usage

Python inference (lightweight, no torch)

from lf4_model import LF4StaticEmbedding

model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model)  # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)

# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])

# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)

With sentence-transformers (torch)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])

Quality

  • Cosine preservation vs FP32: 0.9969
  • MSE: 0.256990
  • Tool search accuracy: 100% (15/15, benchmarks)
  • Codebase indexing: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
  • Trained on: CornStack (Python/JS/Java) + Glaive function-calling
  • Base: VTXAI/Vortex-Embed β†’ fine-tuned β†’ LF4 quantized

Why Static Embedding?

Feature Static (this) Transformer (BERT)
Inference speed 0.15ms ~50ms
Load time 144ms ~5s
Disk size 4.7 MB ~400 MB
GPU needed No Recommended
Accuracy Comparable* Higher for complex semantics

* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.

No Dependencies Beyond NumPy

pip install numpy safetensors tokenizers

The model loads and runs with just numpy, safetensors, and HuggingFace tokenizers.
No PyTorch, no transformers, no sentence-transformers required for basic inference.