Instructions to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF", filename="Llama-3.1-Tulu-3-8B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with Ollama:
ollama run hf.co/SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
- Unsloth Studio
How to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with Docker Model Runner:
docker model run hf.co/SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
- Lemonade
How to use SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Llama-3.1-Tulu-3-8B-GGUF-Q4_K_M
List all available models
lemonade list
Run and chat with the model
lemonade run user.Llama-3.1-Tulu-3-8B-GGUF-Q4_K_MList all available models
lemonade listQuantized Llama-3.1-Tulu-3-8B Models
This repository contains Q4_KM and Q5_KM quantized versions of the allenai/Llama-3.1-Tulu-3-8B model. These quantized variants provide efficient alternatives while maintaining the core capabilities of Tülu3, a leading instruction-following model family.
Model Overview
- Original Model: Llama-3.1-Tulu-3-8B
- Quantized Versions:
- Q4_KM (4-bit quantization)
- Q5_KM (5-bit quantization)
- Base Architecture: 8B parameter instruction-following model
- Developer: Allen Institute for AI
- License: Llama 3.1 Community License Agreement
- Language: Primarily English
- Finetuned From: allenai/Llama-3.1-Tulu-3-8B-DPO
Quantization Details
Q4_KM Version
- Model size reduction: ~75% smaller than original
- Memory footprint: 4.92 GB
- Optimized for deployment in resource-constrained environments
- Maintains core functionality with minimal performance impact
Q5_KM Version
- Model size reduction: ~69% smaller than original
- Memory footprint: 5.73 GB
- Higher precision than Q4_KM
- Better preservation of model quality
Key Features
Both quantized versions maintain Tülu3's state-of-the-art performance on:
- Instruction following tasks
- Mathematical reasoning (MATH dataset)
- Grade school math problems (GSM8K)
- General instruction following (IFEval)
- Chat-based interactions
- Complex reasoning tasks
Usage
from llama_cpp import Llama
llm = Llama(
model_path="./models/7B/Llama-3.1-Tulu-3-8B.gguf",
verbose=False,
# n_gpu_layers=-1, # Uncomment to use GPU acceleration
# n_ctx=2048, # Uncomment to increase the context window
)
output = llm.create_chat_completion(
messages = [
{"role": "system", "content": "You're an AI assistant who help in answering user question"},
{
"role": "user",
"content": "Write an python code to find prime number"
}
]
)
print(output["choices"][0]['message']['content'])
Training Data
The model was trained on a diverse mix of:
- Publicly available datasets
- Synthetic data
- Human-created datasets
Bias, Risks, and Limitations
These quantized models inherit the limitations of the original Tülu3 model:
- Limited safety training compared to models with active filtering
- Can produce problematic outputs, especially when prompted to do so
- Unknown composition of the base Llama 3.1 training corpus
- Additional considerations for quantized versions:
- Slight degradation in performance compared to full-precision model
- May show increased variance in mathematical reasoning tasks
- Q4_KM may exhibit more pronounced quality loss in complex scenarios
Recommended Use Cases
- Research and development
- Educational applications
- Resource-constrained deployments
- Edge computing scenarios
- Prototyping and testing
- Applications requiring faster inference
Acknowledgments
These quantized models are based on the work of the Allen Institute for AI and the Llama 3.1 team. Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.
Contact
For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.
- Downloads last month
- 6
4-bit
5-bit
Model tree for SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF
Base model
meta-llama/Llama-3.1-8B
Pull the model
# Download Lemonade from https://lemonade-server.ai/lemonade pull SandLogicTechnologies/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M