ropedia-xperience-10m-task-baselines / results /omni_finetune /QWEN3_FULL_PARAMETER_GATES_20260609.md

Add files using upload-large-folder tool

67b10a0 verified 9 days ago

3.25 kB

	# Qwen3-Omni Full-Parameter Feasibility Gates

	Generated: `2026-06-18T12:53:13+00:00`

	The full-parameter gates prove that Qwen3-Omni full-parameter FSDP can load, prepare, run backward/optimizer steps, and complete guarded pilots up to 256 optimizer steps on an 8-GPU remote worker. They do not prove a production full-parameter fine-tune, and they intentionally save no full checkpoints or public weights.

	## Summary

	- Status: `pass`
	- Decision: `full_parameter_feasible_for_guarded_short_runs_not_promoted`
	- Passed runs: `6`
	- Preempted runs: `1`
	- Review/missing runs: `0`
	- Completed full-parameter optimizer steps: `489`
	- Longest passed run: `xperience10m_qwen3_omni_128ep_fullparam_pilot256_after_qwen_v6_preemptible_8gpu_20260611` (256 steps)
	- Checkpoint saved: `False`

	## Runs

	\| run \| status \| steps \| samples \| final loss \| epoch/train loss \| policy \| source \|
	\| --- \| --- \| ---: \| ---: \| ---: \| ---: \| --- \| --- \|
	\| Full-Parameter 1-Step Feasibility Smoke \| passed \| 1 \| 8 \| 1.2726 \| 1.2726 \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_smoke_preemptible_8gpu_20260609/fullparam_feasibility_summary.json` \|
	\| Full-Parameter 8-Step Short Train \| passed \| 8 \| 64 \| 1.1805 \| 1.2190 \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_shorttrain8_preemptible_8gpu_20260609/fullparam_shorttrain8_summary.json` \|
	\| Full-Parameter 32-Step Pilot \| passed \| 32 \| 256 \| 0.2206 \| 0.8451 \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_pilot32_preemptible_8gpu_20260609/fullparam_pilot32_summary.json` \|
	\| Full-Parameter 64-Step Pilot \| passed \| 64 \| 512 \| 0.0112 \| 0.4434 \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_pilot64_preemptible_8gpu_20260609/fullparam_pilot64_summary.json` \|
	\| Full-Parameter 128-Step Opportunistic Pilot \| preempted_for_qwen_v5_handoff \| 0 \| 1024 \| \| \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_pilot128_preemptible_8gpu_20260609/fullparam_pilot128_summary.json` \|
	\| Full-Parameter 128-Step Post-Qwen-v5 Pilot \| passed \| 128 \| 1024 \| 0.0137 \| 0.2158 \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_pilot128_after_qwen_v5_preemptible_8gpu_20260609/training_metadata.json` \|
	\| Full-Parameter 256-Step Post-Qwen-v6 Pilot \| passed \| 256 \| 2048 \| 0.0096 \| 0.1158 \| no weights/checkpoints \| `results/omni_finetune/xperience10m_qwen3_omni_128ep_fullparam_pilot256_after_qwen_v6_preemptible_8gpu_20260611/training_metadata.json` \|

	## Publication Policy

	- Public summary allowed: `true`
	- Publish full-parameter weights: `false`
	- Publish full checkpoints: `false`
	- Reason: All completed full-parameter gate runs used save_mode=none; the preempted pilot saved nothing. These are feasibility evidence only.

	## Next Steps

	- Keep the verified Qwen3-Omni LoRA adapter as the published production result for the 128-episode suite.
	- For a production full-parameter run, add a sharded checkpoint/resume plan before any long training launch.
	- Run a separate checkpointed full-parameter pilot only when GPUs are not needed by verified LoRA evaluation/publication work.