Instructions to use AIGym/gpt-oss-20B-jail-broke with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIGym/gpt-oss-20B-jail-broke with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AIGym/gpt-oss-20B-jail-broke") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIGym/gpt-oss-20B-jail-broke") model = AutoModelForCausalLM.from_pretrained("AIGym/gpt-oss-20B-jail-broke") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AIGym/gpt-oss-20B-jail-broke with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIGym/gpt-oss-20B-jail-broke" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIGym/gpt-oss-20B-jail-broke", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AIGym/gpt-oss-20B-jail-broke
- SGLang
How to use AIGym/gpt-oss-20B-jail-broke with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIGym/gpt-oss-20B-jail-broke" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIGym/gpt-oss-20B-jail-broke", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIGym/gpt-oss-20B-jail-broke" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIGym/gpt-oss-20B-jail-broke", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AIGym/gpt-oss-20B-jail-broke with Docker Model Runner:
docker model run hf.co/AIGym/gpt-oss-20B-jail-broke
Model Card for GPT-OSS-20B-Jail-Broke (Freedom)
Model Overview
GPT-OSS-20B-Jail-Broke (Freedom) is a red-teamed variant of the Open Source GPT-OSS-20B model, developed as part of the Kaggle GPT-OSS Red Teaming Challenge. The model was systematically stress-tested for safety, robustness, and misuse potential, with adaptations and prompts that probe its boundaries. This release illustrates both the power and fragility of large-scale language models when confronted with adversarial inputs.
- Architecture: Decoder-only Transformer, 20B parameters.
- Base Model: GPT-OSS-20B
- Variant Name: Jail-Broke / Freedom
- Primary Use Case: Safety evaluation, red-teaming experiments, adversarial prompting research.
Intended Use
This model is not intended for production deployment. Instead, it is released to:
- Provide a case study for adversarial robustness evaluation.
- Enable researchers to explore prompt engineering attacks and failure modes.
- Contribute to discussions of alignment, safety, and governance in open-source LLMs.
Applications & Examples
The model demonstrates how structured adversarial prompting can influence outputs. Below are illustrative examples:
Bypass of Content Filters
- Example: Queries framed as “historical analysis” or “fictional roleplay” can elicit otherwise restricted responses.
Creative/Constructive Applications
When redirected toward benign domains, adversarial prompting can generate:
- Satirical writing highlighting model weaknesses.
- Stress-test datasets for automated safety pipelines.
- Training curricula for prompt-injection defenses.
Red-Teaming Utility
- Researchers may use this model to simulate malicious actors in controlled environments.
- Security teams can benchmark defensive strategies such as reinforcement learning with human feedback (RLHF) or rule-based moderation.
Limitations
- Outputs may contain hallucinations, unsafe recommendations, or offensive material when pushed into adversarial contexts.
- Model behavior is highly sensitive to framing — subtle changes in prompts can bypass safety guardrails.
- As a derivative of GPT-OSS-20B, it inherits all scaling-related biases and limitations of large autoregressive transformers.
Ethical Considerations
Releasing adversarially tested models provides transparency for the research community but also risks dual-use misuse. To mitigate:
- This model card explicitly states non-production, research-only usage.
- Examples are framed to support safety analysis, not exploitation.
- Documentation emphasizes educational and evaluative value.
Citation
If you use or reference this work in academic or applied contexts, please cite the Kaggle challenge and this model card:
@misc{gptoss20b_jailbroke,
title = {GPT-OSS-20B-Jail-Broke (Freedom): Red-Teamed Variant for Adversarial Evaluation},
author = {Anonymous Participants of the GPT-OSS Red Teaming Challenge},
year = {2025},
url = {https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming}
}
- Downloads last month
- 70
