Instructions to use AIGym/gpt-oss-20B-jail-broke with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIGym/gpt-oss-20B-jail-broke with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIGym/gpt-oss-20B-jail-broke")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIGym/gpt-oss-20B-jail-broke")
model = AutoModelForCausalLM.from_pretrained("AIGym/gpt-oss-20B-jail-broke")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIGym/gpt-oss-20B-jail-broke with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIGym/gpt-oss-20B-jail-broke"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIGym/gpt-oss-20B-jail-broke",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIGym/gpt-oss-20B-jail-broke

SGLang

How to use AIGym/gpt-oss-20B-jail-broke with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIGym/gpt-oss-20B-jail-broke" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIGym/gpt-oss-20B-jail-broke",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIGym/gpt-oss-20B-jail-broke" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIGym/gpt-oss-20B-jail-broke",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIGym/gpt-oss-20B-jail-broke with Docker Model Runner:
```
docker model run hf.co/AIGym/gpt-oss-20B-jail-broke
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for GPT-OSS-20B-Jail-Broke (Freedom)

Model Overview

GPT-OSS-20B-Jail-Broke (Freedom) is a red-teamed variant of the Open Source GPT-OSS-20B model, developed as part of the Kaggle GPT-OSS Red Teaming Challenge. The model was systematically stress-tested for safety, robustness, and misuse potential, with adaptations and prompts that probe its boundaries. This release illustrates both the power and fragility of large-scale language models when confronted with adversarial inputs.

Architecture: Decoder-only Transformer, 20B parameters.
Base Model: GPT-OSS-20B
Variant Name: Jail-Broke / Freedom
Primary Use Case: Safety evaluation, red-teaming experiments, adversarial prompting research.

Intended Use

This model is not intended for production deployment. Instead, it is released to:

Provide a case study for adversarial robustness evaluation.
Enable researchers to explore prompt engineering attacks and failure modes.
Contribute to discussions of alignment, safety, and governance in open-source LLMs.

Applications & Examples

The model demonstrates how structured adversarial prompting can influence outputs. Below are illustrative examples:

Bypass of Content Filters
- Example: Queries framed as “historical analysis” or “fictional roleplay” can elicit otherwise restricted responses.
Creative/Constructive Applications
- When redirected toward benign domains, adversarial prompting can generate:
  - Satirical writing highlighting model weaknesses.
  - Stress-test datasets for automated safety pipelines.
  - Training curricula for prompt-injection defenses.
Red-Teaming Utility
- Researchers may use this model to simulate malicious actors in controlled environments.
- Security teams can benchmark defensive strategies such as reinforcement learning with human feedback (RLHF) or rule-based moderation.

Limitations

Outputs may contain hallucinations, unsafe recommendations, or offensive material when pushed into adversarial contexts.
Model behavior is highly sensitive to framing — subtle changes in prompts can bypass safety guardrails.
As a derivative of GPT-OSS-20B, it inherits all scaling-related biases and limitations of large autoregressive transformers.

Ethical Considerations

Releasing adversarially tested models provides transparency for the research community but also risks dual-use misuse. To mitigate:

This model card explicitly states non-production, research-only usage.
Examples are framed to support safety analysis, not exploitation.
Documentation emphasizes educational and evaluative value.

Citation

If you use or reference this work in academic or applied contexts, please cite the Kaggle challenge and this model card:

@misc{gptoss20b_jailbroke,
  title = {GPT-OSS-20B-Jail-Broke (Freedom): Red-Teamed Variant for Adversarial Evaluation},
  author = {Anonymous Participants of the GPT-OSS Red Teaming Challenge},
  year = {2025},
  url = {https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming}
}

Downloads last month: 70

Safetensors

Model size

21B params

Tensor type

BF16

Model tree for AIGym/gpt-oss-20B-jail-broke

Quantizations

2 models