tomofusa commited on
Commit
f6145ee
·
verified ·
1 Parent(s): 433d5e5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +26 -13
README.md CHANGED
@@ -1,21 +1,34 @@
1
  ---
2
- base_model: tomofusa/exp020-simpo-merged
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** tomofusa
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** tomofusa/exp020-simpo-merged
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ datasets:
4
+ - u-10bei/dpo-dataset-qwen-cot
 
 
 
 
5
  language:
6
  - en
7
+ license: apache-2.0
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - cpo
12
+ - simpo
13
+ - unsloth
14
+ - qwen
15
+ - alignment
16
  ---
17
 
18
+ # exp020-simpo-merged
19
 
20
+ SFT + CPO/SimPO merged model. Full 16-bit weights, no adapter loading required.
 
 
21
 
22
+ ## Training Pipeline
23
+ 1. **SFT**: tomofusa/exp015-blend-h-lora
24
+ 2. **CPO/SimPO**: u-10bei/dpo-dataset-qwen-cot (1 epoch, lr=5e-07, beta=2.5)
25
 
26
+ ## CPO/SimPO Configuration
27
+ - **Trainer**: CPOTrainer (reference-free)
28
+ - **Loss type**: simpo
29
+ - **Learning rate**: 5e-07
30
+ - **Beta**: 2.5 (SimPO scale, NOT DPO scale)
31
+ - **SimPO gamma**: 1.375
32
+ - **CPO alpha**: 0.0
33
+ - **LoRA**: r=64, alpha=128
34
+ - **Max length**: 1024