Instructions to use jalasoft/nemotron-mini-4B-it-ft-typ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jalasoft/nemotron-mini-4B-it-ft-typ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jalasoft/nemotron-mini-4B-it-ft-typ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jalasoft/nemotron-mini-4B-it-ft-typ")
model = AutoModelForCausalLM.from_pretrained("jalasoft/nemotron-mini-4B-it-ft-typ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jalasoft/nemotron-mini-4B-it-ft-typ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jalasoft/nemotron-mini-4B-it-ft-typ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jalasoft/nemotron-mini-4B-it-ft-typ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jalasoft/nemotron-mini-4B-it-ft-typ

SGLang

How to use jalasoft/nemotron-mini-4B-it-ft-typ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jalasoft/nemotron-mini-4B-it-ft-typ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jalasoft/nemotron-mini-4B-it-ft-typ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jalasoft/nemotron-mini-4B-it-ft-typ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jalasoft/nemotron-mini-4B-it-ft-typ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jalasoft/nemotron-mini-4B-it-ft-typ with Docker Model Runner:
```
docker model run hf.co/jalasoft/nemotron-mini-4B-it-ft-typ
```

nemotron-mini-4B-it-ft-typ / debug.log

skomadinajs

End of training

e8b2fea verified 7 months ago

Raw

History Blame Contribute Delete

47.4 kB

	[2025-12-16 22:40:42,182] [DEBUG] [axolotl.utils.config.resolve_dtype:66] [PID:27] bf16 support detected, enabling for this configuration.
	[2025-12-16 22:40:42,529] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:27] baseline 0.000GB ()
	[2025-12-16 22:40:42,530] [INFO] [axolotl.cli.config.load_cfg:248] [PID:27] config:
	{
	"activation_offloading": false,
	"adapter": "qlora",
	"axolotl_config_path": "/workspace-data/config/config.yml",
	"base_model": "nvidia/Nemotron-Mini-4B-Instruct",
	"base_model_config": "nvidia/Nemotron-Mini-4B-Instruct",
	"batch_size": 16,
	"bf16": true,
	"capabilities": {
	"bf16": true,
	"compute_capability": "sm_80",
	"fp8": false,
	"n_gpu": 1,
	"n_node": 1
	},
	"context_parallel_size": 1,
	"dataloader_num_workers": 1,
	"dataloader_pin_memory": true,
	"dataloader_prefetch_factor": 256,
	"dataset_num_proc": 20,
	"datasets": [
	{
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "jalasoft/typst-instruct",
	"trust_remote_code": false,
	"type": {
	"field_instruction": "prompt",
	"field_output": "completion",
	"format": "<extra_id_0>System\n{system}\n\n<extra_id_1>User\n{instruction}\n<extra_id_1>Assistant\n",
	"system_prompt": "You are an expert in Typst markup language. Generate clean, well-formatted Typst code based on user instructions."
	}
	}
	],
	"ddp": false,
	"device": "cuda:0",
	"dion_rank_fraction": 1.0,
	"dion_rank_multiple_of": 1,
	"env_capabilities": {
	"torch_version": "2.8.0"
	},
	"eot_tokens": [
	"<extra_id_1>"
	],
	"eval_batch_size": 4,
	"eval_causal_lm_metrics": [
	"sacrebleu",
	"comet",
	"ter",
	"chrf"
	],
	"eval_max_new_tokens": 128,
	"eval_sample_packing": true,
	"eval_steps": 0.05,
	"eval_table_size": 0,
	"evals_per_epoch": 4,
	"experimental_skip_move_to_device": true,
	"flash_attention": true,
	"fp16": false,
	"gradient_accumulation_steps": 4,
	"gradient_checkpointing": true,
	"gradient_checkpointing_kwargs": {
	"use_reentrant": false
	},
	"hub_model_id": "jalasoft/nemotron-mini-4B-it-ft-typ",
	"include_tkps": true,
	"is_falcon_derived_model": false,
	"is_llama_derived_model": false,
	"is_mistral_derived_model": false,
	"learning_rate": 0.0002,
	"lisa_layers_attribute": "model.layers",
	"load_best_model_at_end": false,
	"load_in_4bit": true,
	"load_in_8bit": false,
	"local_rank": 0,
	"logging_steps": 2,
	"lora_alpha": 128,
	"lora_dropout": 0.1,
	"lora_r": 64,
	"lora_target_linear": true,
	"loraplus_lr_embedding": 1e-06,
	"lr_scheduler": "cosine",
	"mean_resizing_embeddings": false,
	"micro_batch_size": 4,
	"model_config_type": "nemotron",
	"multipack_real_batches": false,
	"num_epochs": 5.0,
	"optimizer": "adamw_torch_fused",
	"output_dir": "/workspace-data/output",
	"pad_to_sequence_len": true,
	"pretrain_multipack_attn": true,
	"profiler_steps_start": 0,
	"qlora_sharded_model_loading": false,
	"ray_num_workers": 1,
	"resources_per_worker": {
	"GPU": 1
	},
	"sample_packing": true,
	"sample_packing_bin_size": 200,
	"sample_packing_group_size": 100000,
	"save_only_model": false,
	"save_safetensors": true,
	"save_steps": 0.1,
	"saves_per_epoch": 2,
	"sequence_len": 4096,
	"shuffle_before_merging_datasets": false,
	"shuffle_merged_datasets": true,
	"skip_prepare_dataset": false,
	"special_tokens": {
	"pad_token": "<extra_id_1>"
	},
	"streaming_multipack_buffer_size": 10000,
	"strict": false,
	"tensor_parallel_size": 1,
	"tf32": true,
	"tiled_mlp_use_original_mlp": true,
	"tokenizer_config": "nvidia/Nemotron-Mini-4B-Instruct",
	"tokenizer_save_jinja_files": true,
	"tokenizer_type": "AutoTokenizer",
	"torch_dtype": "torch.bfloat16",
	"train_on_inputs": false,
	"trl": {
	"log_completions": false,
	"mask_truncated_completions": false,
	"ref_model_mixup_alpha": 0.9,
	"ref_model_sync_steps": 64,
	"scale_rewards": true,
	"sync_ref_model": false,
	"use_vllm": false,
	"vllm_server_host": "0.0.0.0",
	"vllm_server_port": 8000
	},
	"type_of_model": "AutoModelForCausalLM",
	"use_ray": false,
	"use_wandb": true,
	"val_set_size": 0.1,
	"vllm": {
	"device": "auto",
	"dtype": "auto",
	"gpu_memory_utilization": 0.9,
	"host": "0.0.0.0",
	"port": 8000
	},
	"wandb_project": "nemotron-mini-4B-it-ft-typ",
	"warmup_ratio": 0.1,
	"warmup_steps": 0,
	"weight_decay": 0.01,
	"world_size": 1
	}
	[2025-12-16 22:40:45,256] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:278] [PID:27] EOS: 3 / </s>
	[2025-12-16 22:40:45,257] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:279] [PID:27] BOS: 2 / <s>
	[2025-12-16 22:40:45,257] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:280] [PID:27] PAD: 5 / <extra_id_1>
	[2025-12-16 22:40:45,257] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:27] UNK: None / None
	[2025-12-16 22:40:45,258] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:481] [PID:27] Unable to find prepared dataset in last_run_prepared/438fce615f8256908523b2639d484352
	[2025-12-16 22:40:45,259] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:27] Loading raw datasets...
	[2025-12-16 22:40:45,259] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:27] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`.
	[2025-12-16 22:40:46,167] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:27] Loading dataset: jalasoft/typst-instruct with base_type: None and prompt_style: None
	[2025-12-16 22:40:55,889] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:218] [PID:27] min_input_len: 144
	[2025-12-16 22:40:55,893] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:220] [PID:27] max_input_len: 4356
	[2025-12-16 22:40:57,188] [WARNING] [axolotl.utils.data.utils.handle_long_seq_in_dataset:260] [PID:27] Dropped 3 samples from dataset
	[2025-12-16 22:41:00,352] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:406] [PID:27] total_num_tokens: 142_890
	[2025-12-16 22:41:00,357] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:424] [PID:27] `total_supervised_tokens: 123_911`
	[2025-12-16 22:41:00,364] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:01,114] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:01,267] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15308165550231934
	[2025-12-16 22:41:01,268] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:01,421] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15324163436889648
	[2025-12-16 22:41:01,421] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:01,581] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15950298309326172
	[2025-12-16 22:41:01,581] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:01,739] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15772557258605957
	[2025-12-16 22:41:01,767] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	[2025-12-16 22:41:01,767] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:483] [PID:27] data_loader_len: 2
	[2025-12-16 22:41:01,768] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:499] [PID:27] sample_packing_eff_est across ranks: [0.9690348307291666]
	[2025-12-16 22:41:01,768] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:511] [PID:27] sample_packing_eff_est: None
	[2025-12-16 22:41:01,768] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:522] [PID:27] total_num_steps: 10
	[2025-12-16 22:41:01,776] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:406] [PID:27] total_num_tokens: 1_144_624
	[2025-12-16 22:41:01,785] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:424] [PID:27] `total_supervised_tokens: 979_724`
	[2025-12-16 22:41:01,798] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:01,955] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:02,107] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15265655517578125
	[2025-12-16 22:41:02,108] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:02,322] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.21393156051635742
	[2025-12-16 22:41:02,322] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:02,483] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.16082262992858887
	[2025-12-16 22:41:02,484] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially.
	[2025-12-16 22:41:02,640] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.1570568084716797
	[2025-12-16 22:41:02,641] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [71]
	[2025-12-16 22:41:02,641] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:483] [PID:27] data_loader_len: 17
	[2025-12-16 22:41:02,641] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:499] [PID:27] sample_packing_eff_est across ranks: [0.9839761223591549]
	[2025-12-16 22:41:02,642] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:511] [PID:27] sample_packing_eff_est: 0.99
	[2025-12-16 22:41:02,642] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:522] [PID:27] total_num_steps: 85
	[2025-12-16 22:41:02,642] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:27] Maximum number of steps set at 85
	[2025-12-16 22:41:02,685] [DEBUG] [axolotl.train.setup_model_and_tokenizer:65] [PID:27] Loading tokenizer... nvidia/Nemotron-Mini-4B-Instruct
	[2025-12-16 22:41:03,685] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:278] [PID:27] EOS: 3 / </s>
	[2025-12-16 22:41:03,686] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:279] [PID:27] BOS: 2 / <s>
	[2025-12-16 22:41:03,686] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:280] [PID:27] PAD: 5 / <extra_id_1>
	[2025-12-16 22:41:03,686] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:27] UNK: None / None
	[2025-12-16 22:41:03,687] [DEBUG] [axolotl.train.setup_model_and_tokenizer:74] [PID:27] Loading model
	[2025-12-16 22:41:03,732] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:27] Patched Trainer.evaluation_loop with nanmean loss calculation
	[2025-12-16 22:41:03,734] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:27] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation
	[2025-12-16 22:41:03,735] [INFO] [axolotl.loaders.patch_manager._apply_multipack_patches:301] [PID:27] Applying multipack dataloader patch for sample packing...
	[2025-12-16 22:41:56,068] [INFO] [axolotl.loaders.model._prepare_model_for_quantization:851] [PID:27] converting PEFT model w/ prepare_model_for_kbit_training
	[2025-12-16 22:41:56,072] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:345] [PID:27] Converting modules to torch.bfloat16
	[2025-12-16 22:41:56,077] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:27] Memory usage after model load 8.584GB (+8.584GB allocated, +10.197GB reserved)
	[2025-12-16 22:41:56,078] [INFO] [axolotl.loaders.adapter.load_lora:80] [PID:27] found linear modules: ['down_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj']
	trainable params: 92,274,688 \|\| all params: 4,282,783,744 \|\| trainable%: 2.1545
	[2025-12-16 22:41:57,304] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:27] after adapters 4.533GB (+4.533GB allocated, +10.400GB reserved)
	[2025-12-16 22:41:59,850] [WARNING] [py.warnings._showwarnmsg:110] [PID:27] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
	warnings.warn(

	[2025-12-16 22:41:59,850] [WARNING] [py.warnings._showwarnmsg:110] [PID:27] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
	warnings.warn(

	[2025-12-16 22:42:07,321] [WARNING] [accelerate.utils.other.check_os_kernel:512] [PID:27] Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	[2025-12-16 22:42:11,621] [INFO] [axolotl.train.save_initial_configs:398] [PID:27] Pre-saving adapter config to /workspace-data/output...
	[2025-12-16 22:42:11,624] [INFO] [axolotl.train.save_initial_configs:402] [PID:27] Pre-saving tokenizer to /workspace-data/output...
	[2025-12-16 22:42:11,962] [INFO] [axolotl.train.save_initial_configs:407] [PID:27] Pre-saving model config to /workspace-data/output...
	[2025-12-16 22:42:11,967] [INFO] [axolotl.train.execute_training:196] [PID:27] Starting trainer...
	[2025-12-16 22:42:13,670] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6428604125976562
	[2025-12-16 22:42:14,293] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6221218109130859
	[2025-12-16 22:42:14,918] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6248421669006348
	[2025-12-16 22:42:15,544] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6253554821014404
	[2025-12-16 22:42:15,545] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [71]
	wandb: Currently logged in as: santiago-komadina (santiago-komadina-jalasoft) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
	wandb: setting up run 0upn0naz
	wandb: Tracking run with wandb version 0.22.2
	wandb: Run data is saved locally in /root/wandb/run-20251216_224215-0upn0naz
	wandb: Run `wandb offline` to turn off syncing.
	wandb: Syncing run warm-cherry-1
	wandb: ⭐️ View project at https://wandb.ai/santiago-komadina-jalasoft/nemotron-mini-4B-it-ft-typ
	wandb: 🚀 View run at https://wandb.ai/santiago-komadina-jalasoft/nemotron-mini-4B-it-ft-typ/runs/0upn0naz
	wandb: Detected [huggingface_hub.inference] in use.
	wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
	wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/
	wandb: WARNING Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt")
	[2025-12-16 22:42:17,312] [INFO] [axolotl.utils.callbacks.on_train_begin:757] [PID:27] The Axolotl config has been saved to the WandB run under files.
	[2025-12-16 22:42:17,318] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:42:18,520] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5991508960723877
	[2025-12-16 22:42:19,464] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5996401309967041
	[2025-12-16 22:42:20,083] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6178188323974609
	[2025-12-16 22:42:20,697] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6140866279602051
	[2025-12-16 22:42:20,698] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 1.2447901964187622, 'eval_runtime': 11.3439, 'eval_samples_per_second': 8.992, 'eval_steps_per_second': 2.292, 'memory/max_active (GiB)': 43.87, 'memory/max_allocated (GiB)': 43.87, 'memory/device_reserved (GiB)': 44.33, 'epoch': 0}
	{'loss': 1.2263, 'grad_norm': 1.0431029796600342, 'learning_rate': 2.5e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.99, 'tokens_per_second_per_gpu': 9742.9, 'epoch': 0.11}
	{'loss': 1.216, 'grad_norm': 0.5182428359985352, 'learning_rate': 7.500000000000001e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.99, 'tokens_per_second_per_gpu': 4473.57, 'epoch': 0.23}
	[2025-12-16 22:43:35,611] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:43:36,859] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5805962085723877
	[2025-12-16 22:43:37,450] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5903089046478271
	[2025-12-16 22:43:38,008] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.557340145111084
	[2025-12-16 22:43:38,613] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6043202877044678
	[2025-12-16 22:43:38,614] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 1.1019586324691772, 'eval_runtime': 11.5084, 'eval_samples_per_second': 8.863, 'eval_steps_per_second': 2.259, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.99, 'epoch': 0.28}
	{'loss': 1.0577, 'grad_norm': 0.3546995520591736, 'learning_rate': 0.000125, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2269.03, 'epoch': 0.34}
	{'loss': 1.0454, 'grad_norm': 0.440667986869812, 'learning_rate': 0.000175, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4557.72, 'epoch': 0.45}
	[2025-12-16 22:44:39,676] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-9
	{'loss': 1.0001, 'grad_norm': 0.3028908669948578, 'learning_rate': 0.0001999167799344583, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4434.28, 'epoch': 0.56}
	[2025-12-16 22:44:55,206] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:44:56,738] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.760429859161377
	[2025-12-16 22:44:57,595] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8560998439788818
	[2025-12-16 22:44:58,448] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8528122901916504
	[2025-12-16 22:44:59,290] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8404862880706787
	[2025-12-16 22:44:59,290] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.9463796019554138, 'eval_runtime': 12.2941, 'eval_samples_per_second': 8.297, 'eval_steps_per_second': 2.115, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 0.56}
	{'loss': 0.8968, 'grad_norm': 0.2737530767917633, 'learning_rate': 0.00019925185024910277, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4502.37, 'epoch': 0.68}
	{'loss': 0.8833, 'grad_norm': 0.2401425987482071, 'learning_rate': 0.00019792641587574212, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4414.93, 'epoch': 0.79}
	[2025-12-16 22:46:13,827] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:46:15,308] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.7395086288452148
	[2025-12-16 22:46:16,063] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.7533645629882812
	[2025-12-16 22:46:16,835] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.7701354026794434
	[2025-12-16 22:46:17,693] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8576903343200684
	[2025-12-16 22:46:17,694] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.8602458834648132, 'eval_runtime': 11.8591, 'eval_samples_per_second': 8.601, 'eval_steps_per_second': 2.192, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 0.85}
	{'loss': 0.8453, 'grad_norm': 0.23792307078838348, 'learning_rate': 0.00019594929736144976, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2303.62, 'epoch': 0.9}
	{'loss': 0.7881, 'grad_norm': 0.2736661732196808, 'learning_rate': 0.0001933336521037367, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 6103.21, 'epoch': 1.0}
	[2025-12-16 22:47:01,434] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-18
	{'loss': 0.7723, 'grad_norm': 0.2326890528202057, 'learning_rate': 0.0001900968867902419, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4474.71, 'epoch': 1.11}
	[2025-12-16 22:47:31,173] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:47:32,438] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6057627201080322
	[2025-12-16 22:47:33,133] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6939413547515869
	[2025-12-16 22:47:33,788] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6543259620666504
	[2025-12-16 22:47:34,427] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.638103723526001
	[2025-12-16 22:47:34,427] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.8047566413879395, 'eval_runtime': 11.5173, 'eval_samples_per_second': 8.856, 'eval_steps_per_second': 2.257, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.49, 'epoch': 1.11}
	{'loss': 0.7225, 'grad_norm': 0.20799821615219116, 'learning_rate': 0.00018626054156009806, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4517.53, 'epoch': 1.23}
	{'loss': 0.7031, 'grad_norm': 0.2009187638759613, 'learning_rate': 0.00018185014665785936, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4433.33, 'epoch': 1.34}
	[2025-12-16 22:48:47,768] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:48:48,952] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.587979793548584
	[2025-12-16 22:48:49,541] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5882759094238281
	[2025-12-16 22:48:50,108] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5667750835418701
	[2025-12-16 22:48:50,671] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5623953342437744
	[2025-12-16 22:48:50,672] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7722324132919312, 'eval_runtime': 11.4867, 'eval_samples_per_second': 8.88, 'eval_steps_per_second': 2.263, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 1.39}
	{'loss': 0.719, 'grad_norm': 0.21798835694789886, 'learning_rate': 0.0001768950525339362, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2290.77, 'epoch': 1.45}
	[2025-12-16 22:49:26,967] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-27
	{'loss': 0.7063, 'grad_norm': 0.21198105812072754, 'learning_rate': 0.00017142823452219038, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4481.31, 'epoch': 1.56}
	{'loss': 0.6842, 'grad_norm': 0.20669737458229065, 'learning_rate': 0.00016548607339452853, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4551.98, 'epoch': 1.68}
	[2025-12-16 22:50:07,271] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:50:08,411] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5734491348266602
	[2025-12-16 22:50:08,964] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5519306659698486
	[2025-12-16 22:50:09,542] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5771615505218506
	[2025-12-16 22:50:10,111] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5690045356750488
	[2025-12-16 22:50:10,112] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7488055229187012, 'eval_runtime': 11.4272, 'eval_samples_per_second': 8.926, 'eval_steps_per_second': 2.275, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 1.68}
	{'loss': 0.7048, 'grad_norm': 0.1963784545660019, 'learning_rate': 0.00015910811325286768, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4556.99, 'epoch': 1.79}
	{'loss': 0.7134, 'grad_norm': 0.19664043188095093, 'learning_rate': 0.00015233679836966122, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4555.74, 'epoch': 1.9}
	[2025-12-16 22:51:23,266] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:51:24,618] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6728262901306152
	[2025-12-16 22:51:25,291] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6715450286865234
	[2025-12-16 22:51:25,928] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6365022659301758
	[2025-12-16 22:51:26,584] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6554775238037109
	[2025-12-16 22:51:26,584] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7312036156654358, 'eval_runtime': 11.6385, 'eval_samples_per_second': 8.764, 'eval_steps_per_second': 2.234, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 1.96}
	{'loss': 0.6924, 'grad_norm': 0.27984705567359924, 'learning_rate': 0.00014521719072826858, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 2164.48, 'epoch': 2.0}
	[2025-12-16 22:51:45,260] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-36
	{'loss': 0.6301, 'grad_norm': 0.20510244369506836, 'learning_rate': 0.00013779667014289065, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4521.82, 'epoch': 2.11}
	{'loss': 0.6013, 'grad_norm': 0.2081460803747177, 'learning_rate': 0.00013012461895372344, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4480.82, 'epoch': 2.23}
	[2025-12-16 22:52:39,901] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:52:41,086] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5562489032745361
	[2025-12-16 22:52:41,631] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5433411598205566
	[2025-12-16 22:52:42,175] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5441112518310547
	[2025-12-16 22:52:42,714] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5376300811767578
	[2025-12-16 22:52:42,714] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7242206931114197, 'eval_runtime': 11.4052, 'eval_samples_per_second': 8.943, 'eval_steps_per_second': 2.28, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.49, 'epoch': 2.23}
	{'loss': 0.6345, 'grad_norm': 0.20908962190151215, 'learning_rate': 0.00012225209339563145, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4464.71, 'epoch': 2.34}
	{'loss': 0.606, 'grad_norm': 0.21036918461322784, 'learning_rate': 0.00011423148382732853, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4610.24, 'epoch': 2.45}
	[2025-12-16 22:53:55,802] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:53:56,924] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5670294761657715
	[2025-12-16 22:53:57,495] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5697734355926514
	[2025-12-16 22:53:58,102] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.606992244720459
	[2025-12-16 22:53:58,655] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5524764060974121
	[2025-12-16 22:53:58,656] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7133870720863342, 'eval_runtime': 11.4191, 'eval_samples_per_second': 8.932, 'eval_steps_per_second': 2.277, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 2.51}
	[2025-12-16 22:54:10,083] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-45
	{'loss': 0.5699, 'grad_norm': 0.22012656927108765, 'learning_rate': 0.00010611616608218429, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2257.77, 'epoch': 2.56}
	{'loss': 0.5996, 'grad_norm': 0.21454322338104248, 'learning_rate': 9.79601462608595e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4404.17, 'epoch': 2.68}
	{'loss': 0.5736, 'grad_norm': 0.21601669490337372, 'learning_rate': 8.981770132961649e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4449.49, 'epoch': 2.79}
	[2025-12-16 22:55:14,460] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:55:15,571] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5595951080322266
	[2025-12-16 22:55:16,117] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5449485778808594
	[2025-12-16 22:55:16,674] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5567653179168701
	[2025-12-16 22:55:17,218] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5431113243103027
	[2025-12-16 22:55:17,219] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7081390023231506, 'eval_runtime': 11.4267, 'eval_samples_per_second': 8.926, 'eval_steps_per_second': 2.275, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 2.79}
	{'loss': 0.581, 'grad_norm': 0.2167098969221115, 'learning_rate': 8.174301791606385e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4608.25, 'epoch': 2.9}
	{'loss': 0.6049, 'grad_norm': 0.3036252558231354, 'learning_rate': 7.378983170608982e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 6340.01, 'epoch': 3.0}
	[2025-12-16 22:56:12,732] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-54
	[2025-12-16 22:56:29,903] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:56:31,105] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5644886493682861
	[2025-12-16 22:56:31,651] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5451717376708984
	[2025-12-16 22:56:32,229] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5778226852416992
	[2025-12-16 22:56:32,817] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5867812633514404
	[2025-12-16 22:56:32,817] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7016817927360535, 'eval_runtime': 11.386, 'eval_samples_per_second': 8.958, 'eval_steps_per_second': 2.284, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'epoch': 3.06}
	{'loss': 0.5623, 'grad_norm': 0.23229679465293884, 'learning_rate': 6.601106984173835e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2249.63, 'epoch': 3.11}
	{'loss': 0.5422, 'grad_norm': 0.20941545069217682, 'learning_rate': 5.845849869981137e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4662.79, 'epoch': 3.23}
	{'loss': 0.5023, 'grad_norm': 0.22417426109313965, 'learning_rate': 5.11823793951719e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4506.08, 'epoch': 3.34}
	[2025-12-16 22:57:45,685] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:57:46,797] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5598804950714111
	[2025-12-16 22:57:47,358] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5603907108306885
	[2025-12-16 22:57:47,944] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5853786468505859
	[2025-12-16 22:57:48,556] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6114795207977295
	[2025-12-16 22:57:48,557] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7072933316230774, 'eval_runtime': 11.3794, 'eval_samples_per_second': 8.964, 'eval_steps_per_second': 2.285, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 3.34}
	{'loss': 0.5672, 'grad_norm': 0.22416777908802032, 'learning_rate': 4.423113330131707e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4504.23, 'epoch': 3.45}
	[2025-12-16 22:58:36,901] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-63
	{'loss': 0.5452, 'grad_norm': 0.24252445995807648, 'learning_rate': 3.7651019814126654e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4601.17, 'epoch': 3.56}
	[2025-12-16 22:59:04,397] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 22:59:05,525] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5673832893371582
	[2025-12-16 22:59:06,092] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.566525936126709
	[2025-12-16 22:59:06,659] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.566476583480835
	[2025-12-16 22:59:07,243] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.582695722579956
	[2025-12-16 22:59:07,243] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7047077417373657, 'eval_runtime': 11.4286, 'eval_samples_per_second': 8.925, 'eval_steps_per_second': 2.275, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 3.62}
	{'loss': 0.5324, 'grad_norm': 0.2310749590396881, 'learning_rate': 3.1485828503215585e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2222.68, 'epoch': 3.68}
	{'loss': 0.5322, 'grad_norm': 0.22174930572509766, 'learning_rate': 2.5776587699573006e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4499.22, 'epoch': 3.79}
	{'loss': 0.5051, 'grad_norm': 0.22046604752540588, 'learning_rate': 2.0561291458788733e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4445.6, 'epoch': 3.9}
	[2025-12-16 23:00:20,447] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 23:00:21,577] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5424764156341553
	[2025-12-16 23:00:22,125] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5474584102630615
	[2025-12-16 23:00:22,678] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5522243976593018
	[2025-12-16 23:00:23,237] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5580763816833496
	[2025-12-16 23:00:23,237] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7018095850944519, 'eval_runtime': 11.3624, 'eval_samples_per_second': 8.977, 'eval_steps_per_second': 2.288, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 3.9}
	{'loss': 0.5013, 'grad_norm': 0.3766796588897705, 'learning_rate': 1.587464671688187e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 6091.58, 'epoch': 4.0}
	[2025-12-16 23:00:53,957] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-72
	{'loss': 0.4905, 'grad_norm': 0.2340284287929535, 'learning_rate': 1.1747842321367886e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4514.34, 'epoch': 4.11}
	[2025-12-16 23:01:36,261] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 23:01:37,429] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.553159236907959
	[2025-12-16 23:01:37,981] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5507926940917969
	[2025-12-16 23:01:38,572] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5903477668762207
	[2025-12-16 23:01:39,119] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5467860698699951
	[2025-12-16 23:01:39,120] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7022753357887268, 'eval_runtime': 11.3827, 'eval_samples_per_second': 8.961, 'eval_steps_per_second': 2.284, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'epoch': 4.17}
	{'loss': 0.5351, 'grad_norm': 0.21539896726608276, 'learning_rate': 8.208341474624071e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2274.18, 'epoch': 4.23}
	{'loss': 0.5053, 'grad_norm': 0.21500085294246674, 'learning_rate': 5.27969897080901e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4638.83, 'epoch': 4.34}
	{'loss': 0.5374, 'grad_norm': 0.24092087149620056, 'learning_rate': 2.9814044425935606e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4547.21, 'epoch': 4.45}
	[2025-12-16 23:02:52,287] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 23:02:53,601] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6551158428192139
	[2025-12-16 23:02:54,241] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6379897594451904
	[2025-12-16 23:02:54,909] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6671915054321289
	[2025-12-16 23:02:55,571] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6615691184997559
	[2025-12-16 23:02:55,572] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7037167549133301, 'eval_runtime': 11.6258, 'eval_samples_per_second': 8.774, 'eval_steps_per_second': 2.236, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.45}
	[2025-12-16 23:03:19,526] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-81
	{'loss': 0.495, 'grad_norm': 0.22582735121250153, 'learning_rate': 1.3287526608711131e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4492.37, 'epoch': 4.56}
	{'loss': 0.494, 'grad_norm': 0.22348730266094208, 'learning_rate': 3.3274175058067846e-07, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4489.26, 'epoch': 4.68}
	[2025-12-16 23:04:13,132] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
	[2025-12-16 23:04:14,299] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5913589000701904
	[2025-12-16 23:04:14,901] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6010839939117432
	[2025-12-16 23:04:15,471] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5690572261810303
	[2025-12-16 23:04:16,044] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5720643997192383
	[2025-12-16 23:04:16,044] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
	{'eval_loss': 0.7037572264671326, 'eval_runtime': 11.4254, 'eval_samples_per_second': 8.927, 'eval_steps_per_second': 2.276, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.73}
	[2025-12-16 23:04:27,478] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-85
	{'train_runtime': 1334.7133, 'train_samples_per_second': 1.019, 'train_steps_per_second': 0.064, 'train_loss': 0.679589033126831, 'memory/max_active (GiB)': 5.24, 'memory/max_allocated (GiB)': 5.24, 'memory/device_reserved (GiB)': 47.45, 'epoch': 4.73}
	[2025-12-16 23:04:36,294] [INFO] [axolotl.train.save_trained_model:218] [PID:27] Training completed! Saving trained model to /workspace-data/output.
	[2025-12-16 23:04:36,950] [INFO] [axolotl.train.save_trained_model:336] [PID:27] Model successfully saved to /workspace-data/output
	[2025-12-16 23:04:37,226] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output