# DPO Output Memo - Stage: `dpo` - Written: `2026-03-10T11:09:05+00:00` - Output dir: `/data/screenwriter/training/outputs/sgs-smoke/dpo` - train_pairs: `8` - eval_pairs: `2` - input_adapter: `/data/screenwriter/training/outputs/sgs-smoke/sft` - base_model: `unsloth/Qwen3.5-4B` ## Args ```json { "base_model": null, "batch_size": 1, "beta": 0.1, "data": "/data/screenwriter/training/outputs/_smoke_data/dpo_train.jsonl", "epochs": 1, "eval": "/data/screenwriter/training/outputs/_smoke_data/dpo_test.jsonl", "grad_accum": 1, "load_in_4bit": false, "logging_steps": 1, "lr": 2e-05, "max_steps": 1, "model": "/data/screenwriter/training/outputs/sgs-smoke/sft", "output": "/data/screenwriter/training/outputs/sgs-smoke/dpo", "save_steps": 1, "seed": 3407, "seq_len": 2048, "wandb": false, "warmup_steps": 0 } ``` ## Contents - `README.md` (2.3 KB) - `adapter_config.json` (1.2 KB) - `adapter_model.safetensors` (82977.0 KB) - `chat_template.jinja` (7.6 KB) - `checkpoint-1/` - `tokenizer.json` (19520.8 KB) - `tokenizer_config.json` (1.1 KB) - `training_args.json` (0.5 KB)