Instructions to use remiai3/gpt_oss_20b_GGUF_project_guide with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use remiai3/gpt_oss_20b_GGUF_project_guide with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="remiai3/gpt_oss_20b_GGUF_project_guide")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("remiai3/gpt_oss_20b_GGUF_project_guide", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use remiai3/gpt_oss_20b_GGUF_project_guide with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "remiai3/gpt_oss_20b_GGUF_project_guide"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "remiai3/gpt_oss_20b_GGUF_project_guide",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/remiai3/gpt_oss_20b_GGUF_project_guide

SGLang

How to use remiai3/gpt_oss_20b_GGUF_project_guide with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "remiai3/gpt_oss_20b_GGUF_project_guide" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "remiai3/gpt_oss_20b_GGUF_project_guide",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "remiai3/gpt_oss_20b_GGUF_project_guide" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "remiai3/gpt_oss_20b_GGUF_project_guide",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use remiai3/gpt_oss_20b_GGUF_project_guide with Docker Model Runner:
```
docker model run hf.co/remiai3/gpt_oss_20b_GGUF_project_guide
```

A newer version of this model is available: openai/gpt-oss-20b

Local GGUF Chat (Q2_K_L) — Run on CPU (16GB RAM)

This repository shows how to:

Download a single GGUF quantized weight (*Q2_K_L.gguf) from Hugging Face by pasting your token into a file.
Run a small local Flask chat UI that talks to the model using llama-cpp-python.

Files

download_model.py — edit & paste your HF token, then run to download only the Q2_K_L gguf file.
app.py — Flask server + model loader + chat endpoints.
templates/index.html — Chat UI (ChatGPT-like).
requirements.txt — Python dependencies.

Requirements

Python 3.10.9 (recommend)
~16 GB RAM (CPU-only); speed depends on quantization & CPU cores.

Quick start

Create & activate a virtual environment:

python -m venv oss_env
# Windows
oss_env\Scripts\activate
# Linux / macOS
source oss_env/bin/activate

Install Python dependencies: pip install -r requirements.txt
Edit download_model.py: Paste your Hugging Face token into HUGGINGFACE_TOKEN. If your model repo is different, update REPO_ID.
Download the Q2_K_L GGUF: python download_model.py The script will print the full path to the downloaded .gguf file.
(Optional) Edit app.py: If you want to explicitly set the exact .gguf path, set MODEL_PATH at top of app.py. Otherwise app.py will auto-detect the first .gguf under models/.
Run the Flask app: python app.py

Open http://localhost:5000

in your browser.

If need you can run the inference.py code for the single stage demo without chat loop

Trouble shooting

May be you can face issues while installing the libraries in your laptop to solve them follow the below steps
Go through this link and install the Visual Studio Build Tools in your laptop https://visualstudio.microsoft.com/visual-cpp-build-tools/?utm_source=chatgpt.com
After completion of download select Desktop development with C++ workload the Visual Studio Build Tools
After that check the boxes of MSVC v142/143, C++ CMake tools for Windows, Windows SDK
Wait for 40 minutes for to installing all the packages
Then once again run the pip install -r requirements.txt command then all the libraries will download without any errors or issues.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for remiai3/gpt_oss_20b_GGUF_project_guide

Base model

openai/gpt-oss-20b

Finetuned

(535)

this model