Instructions to use Madras1/Jade-20B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Madras1/Jade-20B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Madras1/Jade-20B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Madras1/Jade-20B")
model = AutoModelForMultimodalLM.from_pretrained("Madras1/Jade-20B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Madras1/Jade-20B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Madras1/Jade-20B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Madras1/Jade-20B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Madras1/Jade-20B

SGLang

How to use Madras1/Jade-20B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Madras1/Jade-20B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Madras1/Jade-20B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Madras1/Jade-20B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Madras1/Jade-20B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Madras1/Jade-20B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Madras1/Jade-20B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Madras1/Jade-20B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Madras1/Jade-20B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Madras1/Jade-20B",
    max_seq_length=2048,
)

Docker Model Runner
How to use Madras1/Jade-20B with Docker Model Runner:
```
docker model run hf.co/Madras1/Jade-20B
```

Jade-20B / README.md

Madras1

Update README.md

8fc4759 verified 22 days ago

preview code

raw

history blame contribute delete

4.22 kB

	---
	language:
	- pt
	- en
	license: apache-2.0
	base_model:
	- unsloth/gpt-oss-20b
	- openai/gpt-oss-20b
	base_model_relation: finetune
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- pt-br
	- portuguese
	- brazilian-portuguese
	- conversational
	- chatbot
	- persona
	- unsloth
	- 4-bit
	- bitsandbytes
	- qwen3
	---
	![Total Downloads All Time](https://img.shields.io/badge/dynamic/json?color=brightgreen&label=Total%20Downloads&query=%24.downloadsAllTime&url=https%3A%2F%2Fhuggingface.co%2Fapi%2Fmodels%2FMadras1%2FJade-20B%3Fexpand%3DdownloadsAllTime)

	# Jade-20b

	Jade-20b is a Brazilian Portuguese conversational finetune of gpt-oss-20b built to express a strong, persistent persona. This model is designed for PT-BR chat, chatbot use cases, and character-style interaction, with colloquial language, abbreviations, slang, and a WhatsApp-like tone.

	## Model Summary

	Jade-20b is a persona-first model. It was intentionally finetuned so the model speaks like Jade even without a strong `system prompt`. Because of that, the model often answers in PT-BR with informal phrasing such as `vc`, slang, and a friendly conversational tone from the very first turn.

	## Model Details

	- Developed by: `Madras1`
	- Base model: `unsloth/gpt-oss-20b`
	- Model type: conversational text-generation finetune
	- Primary language: Brazilian Portuguese (`pt-BR`)
	- License: `apache-2.0`

	## Intended Behavior

	This model was trained to:

	- speak naturally in Brazilian Portuguese
	- maintain a consistent Jade persona
	- sound informal, friendly, and chat-oriented
	- work well in casual assistant and conversational use cases

	Typical behavior includes:

	- abbreviations like `vc`
	- light slang and colloquial wording
	- short expressions such as `tmj`, `mano`, `tlgd`
	- a more human and less robotic tone

	If Jade already sounds like a recurring character during inference, that is expected behavior, not an error.

	## Training Intent

	The finetune objective was to make the persona live in the weights, not only in prompting.

	High-level training approach:

	- synthetic PT-BR prompt generation for chat-like situations
	- persona-driven response distillation
	- supervised finetuning on conversational data
	- removal of `system` persona instructions during SFT so the model directly internalizes the Jade style

	This is why the model can already answer with personality, abbreviations, and slang even with a simple user-only prompt.

	## Training Setup

	High-level setup used for this finetune:

	- around `25,000` examples
	- `3` epochs
	- Unsloth-based SFT pipeline
	- chat-style data in Portuguese

	## Recommended Use

	Best fit:

	- PT-BR chat assistants
	- persona bots
	- WhatsApp-style conversational agents
	- lightweight entertainment or social AI experiences

	Less ideal for:

	- formal writing
	- highly neutral assistant behavior
	- high-stakes legal, medical, or financial contexts

	## Prompting Tips

	For the strongest Jade behavior:

	- use a simple user message
	- avoid a formal system prompt that fights the finetune
	- keep prompts conversational when possible

	Example prompts:

	- `oi jade, tudo bem?`
	- `jade, me explica isso de um jeito simples`
	- `vc acha que vale a pena estudar python hoje?`

	## Example Inference

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "Madras1/Jade-20b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{"role": "user", "content": "oi jade, tudo bem?"}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)

	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.7,
	top_p=0.9,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Limitations

	Because this is a persona-oriented finetune:

	- it may sound informal in contexts where a neutral tone would be better
	- it may over-index on chat style depending on the prompt
	- it is optimized more for persona consistency than strict formality

	## Links

	https://github.com/MadrasLe/JadeLLMV-1