Instructions to use RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3")
model = AutoModelForMultimodalLM.from_pretrained("RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3

SGLang

How to use RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3 with Docker Model Runner:
```
docker model run hf.co/RangerX/Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3
```

Qwen3.6-35B-PreREAP-BNB4-Pruned-ratio-0.3 / reap_args.yaml

RangerX

Upload pre-REAP bnb4 ratio 0.3 pruned checkpoint

9692517 verified 2 months ago

Raw

History Blame Contribute Delete

2.25 kB

	cluster_args:
	cluster_description: null
	cluster_method: agglomerative
	compression_ratio: 0.3
	expert_sim: ttm
	frequency_penalty: true
	linkage_method: average
	max_cluster_size: null
	multi_layer: null
	num_clusters: null
	singleton_outlier_experts: false
	singleton_super_experts: false
	softmax_temperature: null
	ds_args:
	dataset_config_name: null
	dataset_name: theblackcat102/evol-codealpaca-v1:171,Salesforce/xlam-function-calling-60k:171,open-r1/Mixture-of-Thoughts[code]:171,open-r1/Mixture-of-Thoughts[math]:171,open-r1/Mixture-of-Thoughts[science]:170,SWE-bench/SWE-smith-trajectories(tool):170
	dataset_test_split: test
	shuffle: true
	split: train
	eval_args:
	evalplus_tasks:
	- mbpp
	- humaneval
	greedy: true
	lm_eval_tasks:
	- winogrande
	- arc_challenge
	- arc_easy
	- boolq
	- hellaswag
	- mmlu
	- openbookqa
	- rte
	min_p: 0.0
	parallel_tasks: 32
	results_dir: null
	run_evalplus: true
	run_livecodebench: true
	run_lm_eval: true
	run_math: false
	run_wildbench: false
	server_log_file_name: server.log
	temperature: 0.7
	top_k: 20
	top_p: 0.8
	use_server: true
	vllm_port: 8000
	model_args:
	model_name: /disk1/rongxiao/hf_cache/hub/models--Qwen--Qwen3.6-35B-A3B/snapshots/995ad96eacd98c81ed38be0c5b274b04031597b0
	num_experts_per_tok_override: null
	obs_args:
	batch_size: 1
	batches_per_category: 1024
	distance_measure: angular
	model_max_length: 2048
	output_file_name: observations_qwen36_pre_reap_bnb4_paper_1024_2048_standard_streaming-pre_reap-bnb_4bit-nf4-bfloat16-dq_true.pt
	overwrite_observations: false
	record_pruning_metrics_only: true
	renormalize_router_weights: true
	return_vllm_tokens_prompt: false
	select_only_categories: null
	split_by_category: false
	truncate: false
	pre_reap_quant_args:
	pre_reap_bnb_4bit_compute_dtype: bfloat16
	pre_reap_bnb_4bit_quant_type: nf4
	pre_reap_bnb_4bit_use_double_quant: true
	pre_reap_quantization_method: bnb_4bit
	prune_args:
	n_experts_to_prune: null
	overwrite_pruned_model: true
	perserve_outliers: false
	perserve_super_experts: false
	prune_method: reap
	reap_args:
	debug: false
	do_eval: false
	plot_clusters: true
	profile: false
	run_observer_only: false
	seed: 42
	smoke_test: false