Instructions to use thu-nmrc/thaihao-100m-gpt-3b-sft6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use thu-nmrc/thaihao-100m-gpt-3b-sft6 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="thu-nmrc/thaihao-100m-gpt-3b-sft6")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("thu-nmrc/thaihao-100m-gpt-3b-sft6")
model = AutoModelForMultimodalLM.from_pretrained("thu-nmrc/thaihao-100m-gpt-3b-sft6")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use thu-nmrc/thaihao-100m-gpt-3b-sft6 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "thu-nmrc/thaihao-100m-gpt-3b-sft6"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thu-nmrc/thaihao-100m-gpt-3b-sft6",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/thu-nmrc/thaihao-100m-gpt-3b-sft6

SGLang

How to use thu-nmrc/thaihao-100m-gpt-3b-sft6 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "thu-nmrc/thaihao-100m-gpt-3b-sft6" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thu-nmrc/thaihao-100m-gpt-3b-sft6",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "thu-nmrc/thaihao-100m-gpt-3b-sft6" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thu-nmrc/thaihao-100m-gpt-3b-sft6",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use thu-nmrc/thaihao-100m-gpt-3b-sft6 with Docker Model Runner:
```
docker model run hf.co/thu-nmrc/thaihao-100m-gpt-3b-sft6
```

Thaihao 100M GPT 3B SFT6

This is a roughly 100M-parameter Chinese industrial business assistant prototype exported in Hugging Face Transformers GPT-2 causal language model format.

Intended Use

The model is intended for industrial business assistant scenarios such as:

field troubleshooting triage
acceptance checklist drafting
bid and tender technical response drafting
parameter verification reminders
safety and compliance reminders

It is not a general chat model and does not have live access to customer systems, weather, current work orders, quotes, contracts, or unpublished product data.

Model Details

Architecture: GPT-2 style causal LM
Parameters: about 95M tied-weight parameters
Context length: 512
Layers: 8
Attention heads: 8
Hidden size: 768
Vocabulary size: 50,000

Prompt Format

<|bos|>
<|user|> 用户：ATS 不切换怎么排查？
<|assistant|> 助手：

Python Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "thu-nmrc/thaihao-100m-gpt-3b-sft6"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "<|bos|>\n<|user|> 用户：ATS 不切换怎么排查？\n<|assistant|> 助手："
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=160,
        do_sample=True,
        temperature=0.35,
        top_k=20,
        pad_token_id=tokenizer.pad_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=False))

Safety Notes

The model should not be used to disclose customer purchase quantities, quotations, contracts, contacts, work-order status, unpublished product parameters, military-sensitive details, or dangerous electrical operation steps. Use an authorized RAG or business system for current and permissioned data.

Downloads last month: 163

Safetensors

Model size

95.5M params

Tensor type

F32