Instructions to use nomic-ai/nomic-embed-text-v1.5-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use nomic-ai/nomic-embed-text-v1.5-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="nomic-ai/nomic-embed-text-v1.5-GGUF", filename="nomic-embed-text-v1.5.Q2_K.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use nomic-ai/nomic-embed-text-v1.5-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
Use Docker
docker model run hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use nomic-ai/nomic-embed-text-v1.5-GGUF with Ollama:
ollama run hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
- Unsloth Studio new
How to use nomic-ai/nomic-embed-text-v1.5-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nomic-ai/nomic-embed-text-v1.5-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nomic-ai/nomic-embed-text-v1.5-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nomic-ai/nomic-embed-text-v1.5-GGUF to start chatting
- Docker Model Runner
How to use nomic-ai/nomic-embed-text-v1.5-GGUF with Docker Model Runner:
docker model run hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
- Lemonade
How to use nomic-ai/nomic-embed-text-v1.5-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull nomic-ai/nomic-embed-text-v1.5-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.nomic-embed-text-v1.5-GGUF-Q4_K_M
List all available models
lemonade list
nomic-embed-text-v1.5 - GGUF
Original model: nomic-embed-text-v1.5
Usage
Embedding text with nomic-embed-text requires task instruction prefixes at the beginning of each string.
For example, the code below shows how to use the search_query prefix to embed user questions, e.g. in a RAG application.
To see the full set of task instructions available & how they are designed to be used, visit the model card for nomic-embed-text-v1.5.
Description
This repo contains llama.cpp-compatible files for nomic-embed-text-v1.5 in GGUF format.
llama.cpp will default to 2048 tokens of context with these files. For the full 8192 token context length, you will have to choose a context extension method. The π€ Transformers model uses Dynamic NTK-Aware RoPE scaling, but that is not currently available in llama.cpp.
Example llama.cpp Command
Compute a single embedding:
./embedding -ngl 99 -m nomic-embed-text-v1.5.f16.gguf -c 8192 -b 8192 --rope-scaling yarn --rope-freq-scale .75 -p 'search_query: What is TSNE?'
You can also submit a batch of texts to embed, as long as the total number of tokens does not exceed the context length. Only the first three embeddings are shown by the embedding example.
texts.txt:
search_query: What is TSNE?
search_query: Who is Laurens Van der Maaten?
Compute multiple embeddings:
./embedding -ngl 99 -m nomic-embed-text-v1.5.f16.gguf -c 8192 -b 8192 --rope-scaling yarn --rope-freq-scale .75 -f texts.txt
Compatibility
These files are compatible with llama.cpp as of commit 4524290e8 from 2/15/2024.
Provided Files
The below table shows the mean squared error of the embeddings produced by these quantizations of Nomic Embed relative to the Sentence Transformers implementation.
| Name | Quant | Size | MSE |
|---|---|---|---|
| nomic-embed-text-v1.5.Q2_K.gguf | Q2_K | 48 MiB | 2.33e-03 |
| nomic-embed-text-v1.5.Q3_K_S.gguf | Q3_K_S | 57 MiB | 1.19e-03 |
| nomic-embed-text-v1.5.Q3_K_M.gguf | Q3_K_M | 65 MiB | 8.26e-04 |
| nomic-embed-text-v1.5.Q3_K_L.gguf | Q3_K_L | 69 MiB | 7.93e-04 |
| nomic-embed-text-v1.5.Q4_0.gguf | Q4_0 | 75 MiB | 6.32e-04 |
| nomic-embed-text-v1.5.Q4_K_S.gguf | Q4_K_S | 75 MiB | 6.71e-04 |
| nomic-embed-text-v1.5.Q4_K_M.gguf | Q4_K_M | 81 MiB | 2.42e-04 |
| nomic-embed-text-v1.5.Q5_0.gguf | Q5_0 | 91 MiB | 2.35e-04 |
| nomic-embed-text-v1.5.Q5_K_S.gguf | Q5_K_S | 91 MiB | 2.00e-04 |
| nomic-embed-text-v1.5.Q5_K_M.gguf | Q5_K_M | 95 MiB | 6.55e-05 |
| nomic-embed-text-v1.5.Q6_K.gguf | Q6_K | 108 MiB | 5.58e-05 |
| nomic-embed-text-v1.5.Q8_0.gguf | Q8_0 | 140 MiB | 5.79e-06 |
| nomic-embed-text-v1.5.f16.gguf | F16 | 262 MiB | 4.21e-10 |
| nomic-embed-text-v1.5.f32.gguf | F32 | 262 MiB | 6.08e-11 |
- Downloads last month
- 89,102
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit