Instructions to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="j30231/Llama-3.3-70B-Instruct_Q2_K.gguf",
	filename="Llama-3.3-70B-Instruct_Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K
# Run inference directly in the terminal:
llama-cli -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K
# Run inference directly in the terminal:
llama-cli -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K
# Run inference directly in the terminal:
./llama-cli -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Use Docker

docker model run hf.co/j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

LM Studio
Jan
Ollama
How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with Ollama:
```
ollama run hf.co/j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K
```

Unsloth Studio new

How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for j30231/Llama-3.3-70B-Instruct_Q2_K.gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for j30231/Llama-3.3-70B-Instruct_Q2_K.gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for j30231/Llama-3.3-70B-Instruct_Q2_K.gguf to start chatting

Pi new

How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Run Hermes

hermes

Docker Model Runner
How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with Docker Model Runner:
```
docker model run hf.co/j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K
```

Lemonade

How to use j30231/Llama-3.3-70B-Instruct_Q2_K.gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull j30231/Llama-3.3-70B-Instruct_Q2_K.gguf:Q2_K

Run and chat with the model

lemonade run user.Llama-3.3-70B-Instruct_Q2_K.gguf-Q2_K

List all available models

lemonade list

Quantization : Q2_K (using Llama.cpp)

llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q2_K - Medium
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 24.56 GiB (2.99 BPW)
llama_model_loader: - type f32: 162 tensors
llama_model_loader: - type q2_K: 321 tensors
llama_model_loader: - type q3_K: 160 tensors
llama_model_loader: - type q5_K: 80 tensors
llama_model_loader: - type q6_K: 1 tensors

MMLU Result : 74.89%

Category STEM: 66.09% (18 subjects)

high_school_chemistry: 64.04%
high_school_mathematics: 46.67%
abstract_algebra: 48.00%
computer_security: 84.00%
college_computer_science: 61.62%
college_chemistry: 53.00%
conceptual_physics: 74.89%
high_school_statistics: 68.06%
college_mathematics: 44.00%
college_biology: 88.19%
college_physics: 52.94%
elementary_mathematics: 64.81%
high_school_biology: 88.71%
high_school_physics: 57.62%
machine_learning: 56.25%
astronomy: 88.16%
electrical_engineering: 69.66%
high_school_computer_science: 79.00%

Category humanities: 79.28% (13 subjects)

world_religions: 84.80%
high_school_us_history: 89.71%
moral_disputes: 77.75%
high_school_world_history: 88.61%
formal_logic: 62.70%
international_law: 85.12%
jurisprudence: 76.85%
professional_law: 59.58%
logical_fallacies: 83.44%
philosophy: 74.28%
moral_scenarios: 78.66%
prehistory: 84.26%
high_school_european_history: 84.85%

Category social sciences: 82.11% (12 subjects)

high_school_geography: 86.36%
high_school_psychology: 91.19%
sociology: 87.56%
high_school_microeconomics: 86.55%
professional_psychology: 76.80%
security_studies: 77.55%
us_foreign_policy: 91.00%
public_relations: 70.91%
high_school_government_and_politics: 93.78%
econometrics: 61.40%
human_sexuality: 81.68%
high_school_macroeconomics: 80.51%

Category other (business, health, misc.): 75.95% (14 subjects)

virology: 53.61%
college_medicine: 72.25%
global_facts: 62.00%
miscellaneous: 87.36%
medical_genetics: 84.00%
human_aging: 78.48%
nutrition: 83.33%
marketing: 88.89%
anatomy: 71.85%
professional_medicine: 88.24%
professional_accounting: 56.03%
management: 82.52%
clinical_knowledge: 80.75%
business_ethics: 74.00%

Overall correct rate: 74.89% Total subjects evaluated: 57

Perplexity 6.6865 +/- 0.04336

(using wikitext-2-raw/wiki.test.raw)

Downloads last month: 33

GGUF

Model size

71B params

Architecture

llama

Hardware compatibility

2-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for j30231/Llama-3.3-70B-Instruct_Q2_K.gguf

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Quantized

(148)

this model