Text Generation
Transformers
Safetensors
English
Spanish
nemotron
typst
code-generation
qlora
fine-tuned
experimental
conversational
4-bit precision
bitsandbytes
Instructions to use jalasoft/nemotron-mini-4B-it-ft-typ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jalasoft/nemotron-mini-4B-it-ft-typ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jalasoft/nemotron-mini-4B-it-ft-typ") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jalasoft/nemotron-mini-4B-it-ft-typ") model = AutoModelForCausalLM.from_pretrained("jalasoft/nemotron-mini-4B-it-ft-typ") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jalasoft/nemotron-mini-4B-it-ft-typ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jalasoft/nemotron-mini-4B-it-ft-typ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jalasoft/nemotron-mini-4B-it-ft-typ", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jalasoft/nemotron-mini-4B-it-ft-typ
- SGLang
How to use jalasoft/nemotron-mini-4B-it-ft-typ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jalasoft/nemotron-mini-4B-it-ft-typ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jalasoft/nemotron-mini-4B-it-ft-typ", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jalasoft/nemotron-mini-4B-it-ft-typ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jalasoft/nemotron-mini-4B-it-ft-typ", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use jalasoft/nemotron-mini-4B-it-ft-typ with Docker Model Runner:
docker model run hf.co/jalasoft/nemotron-mini-4B-it-ft-typ
| [2025-12-16 22:40:42,182] [DEBUG] [axolotl.utils.config.resolve_dtype:66] [PID:27] bf16 support detected, enabling for this configuration. | |
| [2025-12-16 22:40:42,529] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:27] baseline 0.000GB () | |
| [2025-12-16 22:40:42,530] [INFO] [axolotl.cli.config.load_cfg:248] [PID:27] config: | |
| { | |
| "activation_offloading": false, | |
| "adapter": "qlora", | |
| "axolotl_config_path": "/workspace-data/config/config.yml", | |
| "base_model": "nvidia/Nemotron-Mini-4B-Instruct", | |
| "base_model_config": "nvidia/Nemotron-Mini-4B-Instruct", | |
| "batch_size": 16, | |
| "bf16": true, | |
| "capabilities": { | |
| "bf16": true, | |
| "compute_capability": "sm_80", | |
| "fp8": false, | |
| "n_gpu": 1, | |
| "n_node": 1 | |
| }, | |
| "context_parallel_size": 1, | |
| "dataloader_num_workers": 1, | |
| "dataloader_pin_memory": true, | |
| "dataloader_prefetch_factor": 256, | |
| "dataset_num_proc": 20, | |
| "datasets": [ | |
| { | |
| "message_property_mappings": { | |
| "content": "content", | |
| "role": "role" | |
| }, | |
| "path": "jalasoft/typst-instruct", | |
| "trust_remote_code": false, | |
| "type": { | |
| "field_instruction": "prompt", | |
| "field_output": "completion", | |
| "format": "<extra_id_0>System\n{system}\n\n<extra_id_1>User\n{instruction}\n<extra_id_1>Assistant\n", | |
| "system_prompt": "You are an expert in Typst markup language. Generate clean, well-formatted Typst code based on user instructions." | |
| } | |
| } | |
| ], | |
| "ddp": false, | |
| "device": "cuda:0", | |
| "dion_rank_fraction": 1.0, | |
| "dion_rank_multiple_of": 1, | |
| "env_capabilities": { | |
| "torch_version": "2.8.0" | |
| }, | |
| "eot_tokens": [ | |
| "<extra_id_1>" | |
| ], | |
| "eval_batch_size": 4, | |
| "eval_causal_lm_metrics": [ | |
| "sacrebleu", | |
| "comet", | |
| "ter", | |
| "chrf" | |
| ], | |
| "eval_max_new_tokens": 128, | |
| "eval_sample_packing": true, | |
| "eval_steps": 0.05, | |
| "eval_table_size": 0, | |
| "evals_per_epoch": 4, | |
| "experimental_skip_move_to_device": true, | |
| "flash_attention": true, | |
| "fp16": false, | |
| "gradient_accumulation_steps": 4, | |
| "gradient_checkpointing": true, | |
| "gradient_checkpointing_kwargs": { | |
| "use_reentrant": false | |
| }, | |
| "hub_model_id": "jalasoft/nemotron-mini-4B-it-ft-typ", | |
| "include_tkps": true, | |
| "is_falcon_derived_model": false, | |
| "is_llama_derived_model": false, | |
| "is_mistral_derived_model": false, | |
| "learning_rate": 0.0002, | |
| "lisa_layers_attribute": "model.layers", | |
| "load_best_model_at_end": false, | |
| "load_in_4bit": true, | |
| "load_in_8bit": false, | |
| "local_rank": 0, | |
| "logging_steps": 2, | |
| "lora_alpha": 128, | |
| "lora_dropout": 0.1, | |
| "lora_r": 64, | |
| "lora_target_linear": true, | |
| "loraplus_lr_embedding": 1e-06, | |
| "lr_scheduler": "cosine", | |
| "mean_resizing_embeddings": false, | |
| "micro_batch_size": 4, | |
| "model_config_type": "nemotron", | |
| "multipack_real_batches": false, | |
| "num_epochs": 5.0, | |
| "optimizer": "adamw_torch_fused", | |
| "output_dir": "/workspace-data/output", | |
| "pad_to_sequence_len": true, | |
| "pretrain_multipack_attn": true, | |
| "profiler_steps_start": 0, | |
| "qlora_sharded_model_loading": false, | |
| "ray_num_workers": 1, | |
| "resources_per_worker": { | |
| "GPU": 1 | |
| }, | |
| "sample_packing": true, | |
| "sample_packing_bin_size": 200, | |
| "sample_packing_group_size": 100000, | |
| "save_only_model": false, | |
| "save_safetensors": true, | |
| "save_steps": 0.1, | |
| "saves_per_epoch": 2, | |
| "sequence_len": 4096, | |
| "shuffle_before_merging_datasets": false, | |
| "shuffle_merged_datasets": true, | |
| "skip_prepare_dataset": false, | |
| "special_tokens": { | |
| "pad_token": "<extra_id_1>" | |
| }, | |
| "streaming_multipack_buffer_size": 10000, | |
| "strict": false, | |
| "tensor_parallel_size": 1, | |
| "tf32": true, | |
| "tiled_mlp_use_original_mlp": true, | |
| "tokenizer_config": "nvidia/Nemotron-Mini-4B-Instruct", | |
| "tokenizer_save_jinja_files": true, | |
| "tokenizer_type": "AutoTokenizer", | |
| "torch_dtype": "torch.bfloat16", | |
| "train_on_inputs": false, | |
| "trl": { | |
| "log_completions": false, | |
| "mask_truncated_completions": false, | |
| "ref_model_mixup_alpha": 0.9, | |
| "ref_model_sync_steps": 64, | |
| "scale_rewards": true, | |
| "sync_ref_model": false, | |
| "use_vllm": false, | |
| "vllm_server_host": "0.0.0.0", | |
| "vllm_server_port": 8000 | |
| }, | |
| "type_of_model": "AutoModelForCausalLM", | |
| "use_ray": false, | |
| "use_wandb": true, | |
| "val_set_size": 0.1, | |
| "vllm": { | |
| "device": "auto", | |
| "dtype": "auto", | |
| "gpu_memory_utilization": 0.9, | |
| "host": "0.0.0.0", | |
| "port": 8000 | |
| }, | |
| "wandb_project": "nemotron-mini-4B-it-ft-typ", | |
| "warmup_ratio": 0.1, | |
| "warmup_steps": 0, | |
| "weight_decay": 0.01, | |
| "world_size": 1 | |
| } | |
| [2025-12-16 22:40:45,256] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:278] [PID:27] EOS: 3 / </s> | |
| [2025-12-16 22:40:45,257] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:279] [PID:27] BOS: 2 / <s> | |
| [2025-12-16 22:40:45,257] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:280] [PID:27] PAD: 5 / <extra_id_1> | |
| [2025-12-16 22:40:45,257] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:27] UNK: None / None | |
| [2025-12-16 22:40:45,258] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:481] [PID:27] Unable to find prepared dataset in last_run_prepared/438fce615f8256908523b2639d484352 | |
| [2025-12-16 22:40:45,259] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:27] Loading raw datasets... | |
| [2025-12-16 22:40:45,259] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:27] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`. | |
| [2025-12-16 22:40:46,167] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:27] Loading dataset: jalasoft/typst-instruct with base_type: None and prompt_style: None | |
| [2025-12-16 22:40:55,889] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:218] [PID:27] min_input_len: 144 | |
| [2025-12-16 22:40:55,893] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:220] [PID:27] max_input_len: 4356 | |
| [2025-12-16 22:40:57,188] [WARNING] [axolotl.utils.data.utils.handle_long_seq_in_dataset:260] [PID:27] Dropped 3 samples from dataset | |
| [2025-12-16 22:41:00,352] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:406] [PID:27] total_num_tokens: 142_890 | |
| [2025-12-16 22:41:00,357] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:424] [PID:27] `total_supervised_tokens: 123_911` | |
| [2025-12-16 22:41:00,364] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:01,114] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:01,267] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15308165550231934 | |
| [2025-12-16 22:41:01,268] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:01,421] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15324163436889648 | |
| [2025-12-16 22:41:01,421] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:01,581] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15950298309326172 | |
| [2025-12-16 22:41:01,581] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:01,739] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15772557258605957 | |
| [2025-12-16 22:41:01,767] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| [2025-12-16 22:41:01,767] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:483] [PID:27] data_loader_len: 2 | |
| [2025-12-16 22:41:01,768] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:499] [PID:27] sample_packing_eff_est across ranks: [0.9690348307291666] | |
| [2025-12-16 22:41:01,768] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:511] [PID:27] sample_packing_eff_est: None | |
| [2025-12-16 22:41:01,768] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:522] [PID:27] total_num_steps: 10 | |
| [2025-12-16 22:41:01,776] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:406] [PID:27] total_num_tokens: 1_144_624 | |
| [2025-12-16 22:41:01,785] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:424] [PID:27] `total_supervised_tokens: 979_724` | |
| [2025-12-16 22:41:01,798] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:01,955] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:02,107] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.15265655517578125 | |
| [2025-12-16 22:41:02,108] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:02,322] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.21393156051635742 | |
| [2025-12-16 22:41:02,322] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:02,483] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.16082262992858887 | |
| [2025-12-16 22:41:02,484] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:27] Using single process for pack_parallel, running sequentially. | |
| [2025-12-16 22:41:02,640] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.1570568084716797 | |
| [2025-12-16 22:41:02,641] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [71] | |
| [2025-12-16 22:41:02,641] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:483] [PID:27] data_loader_len: 17 | |
| [2025-12-16 22:41:02,641] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:499] [PID:27] sample_packing_eff_est across ranks: [0.9839761223591549] | |
| [2025-12-16 22:41:02,642] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:511] [PID:27] sample_packing_eff_est: 0.99 | |
| [2025-12-16 22:41:02,642] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:522] [PID:27] total_num_steps: 85 | |
| [2025-12-16 22:41:02,642] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:27] Maximum number of steps set at 85 | |
| [2025-12-16 22:41:02,685] [DEBUG] [axolotl.train.setup_model_and_tokenizer:65] [PID:27] Loading tokenizer... nvidia/Nemotron-Mini-4B-Instruct | |
| [2025-12-16 22:41:03,685] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:278] [PID:27] EOS: 3 / </s> | |
| [2025-12-16 22:41:03,686] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:279] [PID:27] BOS: 2 / <s> | |
| [2025-12-16 22:41:03,686] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:280] [PID:27] PAD: 5 / <extra_id_1> | |
| [2025-12-16 22:41:03,686] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:27] UNK: None / None | |
| [2025-12-16 22:41:03,687] [DEBUG] [axolotl.train.setup_model_and_tokenizer:74] [PID:27] Loading model | |
| [2025-12-16 22:41:03,732] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:27] Patched Trainer.evaluation_loop with nanmean loss calculation | |
| [2025-12-16 22:41:03,734] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:27] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation | |
| [2025-12-16 22:41:03,735] [INFO] [axolotl.loaders.patch_manager._apply_multipack_patches:301] [PID:27] Applying multipack dataloader patch for sample packing... | |
| [2025-12-16 22:41:56,068] [INFO] [axolotl.loaders.model._prepare_model_for_quantization:851] [PID:27] converting PEFT model w/ prepare_model_for_kbit_training | |
| [2025-12-16 22:41:56,072] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:345] [PID:27] Converting modules to torch.bfloat16 | |
| [2025-12-16 22:41:56,077] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:27] Memory usage after model load 8.584GB (+8.584GB allocated, +10.197GB reserved) | |
| [2025-12-16 22:41:56,078] [INFO] [axolotl.loaders.adapter.load_lora:80] [PID:27] found linear modules: ['down_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj'] | |
| trainable params: 92,274,688 || all params: 4,282,783,744 || trainable%: 2.1545 | |
| [2025-12-16 22:41:57,304] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:27] after adapters 4.533GB (+4.533GB allocated, +10.400GB reserved) | |
| [2025-12-16 22:41:59,850] [WARNING] [py.warnings._showwarnmsg:110] [PID:27] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type. | |
| warnings.warn( | |
| [2025-12-16 22:41:59,850] [WARNING] [py.warnings._showwarnmsg:110] [PID:27] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type. | |
| warnings.warn( | |
| [2025-12-16 22:42:07,321] [WARNING] [accelerate.utils.other.check_os_kernel:512] [PID:27] Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. | |
| [2025-12-16 22:42:11,621] [INFO] [axolotl.train.save_initial_configs:398] [PID:27] Pre-saving adapter config to /workspace-data/output... | |
| [2025-12-16 22:42:11,624] [INFO] [axolotl.train.save_initial_configs:402] [PID:27] Pre-saving tokenizer to /workspace-data/output... | |
| [2025-12-16 22:42:11,962] [INFO] [axolotl.train.save_initial_configs:407] [PID:27] Pre-saving model config to /workspace-data/output... | |
| [2025-12-16 22:42:11,967] [INFO] [axolotl.train.execute_training:196] [PID:27] Starting trainer... | |
| [2025-12-16 22:42:13,670] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6428604125976562 | |
| [2025-12-16 22:42:14,293] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6221218109130859 | |
| [2025-12-16 22:42:14,918] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6248421669006348 | |
| [2025-12-16 22:42:15,544] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6253554821014404 | |
| [2025-12-16 22:42:15,545] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [71] | |
| wandb: Currently logged in as: santiago-komadina (santiago-komadina-jalasoft) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin | |
| wandb: setting up run 0upn0naz | |
| wandb: Tracking run with wandb version 0.22.2 | |
| wandb: Run data is saved locally in /root/wandb/run-20251216_224215-0upn0naz | |
| wandb: Run `wandb offline` to turn off syncing. | |
| wandb: Syncing run warm-cherry-1 | |
| wandb: ⭐️ View project at https://wandb.ai/santiago-komadina-jalasoft/nemotron-mini-4B-it-ft-typ | |
| wandb: 🚀 View run at https://wandb.ai/santiago-komadina-jalasoft/nemotron-mini-4B-it-ft-typ/runs/0upn0naz | |
| wandb: Detected [huggingface_hub.inference] in use. | |
| wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script. | |
| wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/ | |
| wandb: WARNING Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt") | |
| [2025-12-16 22:42:17,312] [INFO] [axolotl.utils.callbacks.on_train_begin:757] [PID:27] The Axolotl config has been saved to the WandB run under files. | |
| [2025-12-16 22:42:17,318] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:42:18,520] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5991508960723877 | |
| [2025-12-16 22:42:19,464] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5996401309967041 | |
| [2025-12-16 22:42:20,083] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6178188323974609 | |
| [2025-12-16 22:42:20,697] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6140866279602051 | |
| [2025-12-16 22:42:20,698] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 1.2447901964187622, 'eval_runtime': 11.3439, 'eval_samples_per_second': 8.992, 'eval_steps_per_second': 2.292, 'memory/max_active (GiB)': 43.87, 'memory/max_allocated (GiB)': 43.87, 'memory/device_reserved (GiB)': 44.33, 'epoch': 0} | |
| {'loss': 1.2263, 'grad_norm': 1.0431029796600342, 'learning_rate': 2.5e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.99, 'tokens_per_second_per_gpu': 9742.9, 'epoch': 0.11} | |
| {'loss': 1.216, 'grad_norm': 0.5182428359985352, 'learning_rate': 7.500000000000001e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.99, 'tokens_per_second_per_gpu': 4473.57, 'epoch': 0.23} | |
| [2025-12-16 22:43:35,611] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:43:36,859] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5805962085723877 | |
| [2025-12-16 22:43:37,450] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5903089046478271 | |
| [2025-12-16 22:43:38,008] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.557340145111084 | |
| [2025-12-16 22:43:38,613] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6043202877044678 | |
| [2025-12-16 22:43:38,614] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 1.1019586324691772, 'eval_runtime': 11.5084, 'eval_samples_per_second': 8.863, 'eval_steps_per_second': 2.259, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.99, 'epoch': 0.28} | |
| {'loss': 1.0577, 'grad_norm': 0.3546995520591736, 'learning_rate': 0.000125, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2269.03, 'epoch': 0.34} | |
| {'loss': 1.0454, 'grad_norm': 0.440667986869812, 'learning_rate': 0.000175, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4557.72, 'epoch': 0.45} | |
| [2025-12-16 22:44:39,676] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-9 | |
| {'loss': 1.0001, 'grad_norm': 0.3028908669948578, 'learning_rate': 0.0001999167799344583, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4434.28, 'epoch': 0.56} | |
| [2025-12-16 22:44:55,206] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:44:56,738] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.760429859161377 | |
| [2025-12-16 22:44:57,595] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8560998439788818 | |
| [2025-12-16 22:44:58,448] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8528122901916504 | |
| [2025-12-16 22:44:59,290] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8404862880706787 | |
| [2025-12-16 22:44:59,290] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.9463796019554138, 'eval_runtime': 12.2941, 'eval_samples_per_second': 8.297, 'eval_steps_per_second': 2.115, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 0.56} | |
| {'loss': 0.8968, 'grad_norm': 0.2737530767917633, 'learning_rate': 0.00019925185024910277, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4502.37, 'epoch': 0.68} | |
| {'loss': 0.8833, 'grad_norm': 0.2401425987482071, 'learning_rate': 0.00019792641587574212, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4414.93, 'epoch': 0.79} | |
| [2025-12-16 22:46:13,827] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:46:15,308] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.7395086288452148 | |
| [2025-12-16 22:46:16,063] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.7533645629882812 | |
| [2025-12-16 22:46:16,835] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.7701354026794434 | |
| [2025-12-16 22:46:17,693] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.8576903343200684 | |
| [2025-12-16 22:46:17,694] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.8602458834648132, 'eval_runtime': 11.8591, 'eval_samples_per_second': 8.601, 'eval_steps_per_second': 2.192, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 0.85} | |
| {'loss': 0.8453, 'grad_norm': 0.23792307078838348, 'learning_rate': 0.00019594929736144976, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2303.62, 'epoch': 0.9} | |
| {'loss': 0.7881, 'grad_norm': 0.2736661732196808, 'learning_rate': 0.0001933336521037367, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 6103.21, 'epoch': 1.0} | |
| [2025-12-16 22:47:01,434] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-18 | |
| {'loss': 0.7723, 'grad_norm': 0.2326890528202057, 'learning_rate': 0.0001900968867902419, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4474.71, 'epoch': 1.11} | |
| [2025-12-16 22:47:31,173] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:47:32,438] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6057627201080322 | |
| [2025-12-16 22:47:33,133] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6939413547515869 | |
| [2025-12-16 22:47:33,788] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6543259620666504 | |
| [2025-12-16 22:47:34,427] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.638103723526001 | |
| [2025-12-16 22:47:34,427] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.8047566413879395, 'eval_runtime': 11.5173, 'eval_samples_per_second': 8.856, 'eval_steps_per_second': 2.257, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.49, 'epoch': 1.11} | |
| {'loss': 0.7225, 'grad_norm': 0.20799821615219116, 'learning_rate': 0.00018626054156009806, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4517.53, 'epoch': 1.23} | |
| {'loss': 0.7031, 'grad_norm': 0.2009187638759613, 'learning_rate': 0.00018185014665785936, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4433.33, 'epoch': 1.34} | |
| [2025-12-16 22:48:47,768] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:48:48,952] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.587979793548584 | |
| [2025-12-16 22:48:49,541] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5882759094238281 | |
| [2025-12-16 22:48:50,108] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5667750835418701 | |
| [2025-12-16 22:48:50,671] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5623953342437744 | |
| [2025-12-16 22:48:50,672] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7722324132919312, 'eval_runtime': 11.4867, 'eval_samples_per_second': 8.88, 'eval_steps_per_second': 2.263, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 1.39} | |
| {'loss': 0.719, 'grad_norm': 0.21798835694789886, 'learning_rate': 0.0001768950525339362, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2290.77, 'epoch': 1.45} | |
| [2025-12-16 22:49:26,967] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-27 | |
| {'loss': 0.7063, 'grad_norm': 0.21198105812072754, 'learning_rate': 0.00017142823452219038, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4481.31, 'epoch': 1.56} | |
| {'loss': 0.6842, 'grad_norm': 0.20669737458229065, 'learning_rate': 0.00016548607339452853, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4551.98, 'epoch': 1.68} | |
| [2025-12-16 22:50:07,271] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:50:08,411] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5734491348266602 | |
| [2025-12-16 22:50:08,964] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5519306659698486 | |
| [2025-12-16 22:50:09,542] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5771615505218506 | |
| [2025-12-16 22:50:10,111] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5690045356750488 | |
| [2025-12-16 22:50:10,112] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7488055229187012, 'eval_runtime': 11.4272, 'eval_samples_per_second': 8.926, 'eval_steps_per_second': 2.275, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 1.68} | |
| {'loss': 0.7048, 'grad_norm': 0.1963784545660019, 'learning_rate': 0.00015910811325286768, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4556.99, 'epoch': 1.79} | |
| {'loss': 0.7134, 'grad_norm': 0.19664043188095093, 'learning_rate': 0.00015233679836966122, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4555.74, 'epoch': 1.9} | |
| [2025-12-16 22:51:23,266] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:51:24,618] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6728262901306152 | |
| [2025-12-16 22:51:25,291] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6715450286865234 | |
| [2025-12-16 22:51:25,928] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6365022659301758 | |
| [2025-12-16 22:51:26,584] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6554775238037109 | |
| [2025-12-16 22:51:26,584] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7312036156654358, 'eval_runtime': 11.6385, 'eval_samples_per_second': 8.764, 'eval_steps_per_second': 2.234, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 1.96} | |
| {'loss': 0.6924, 'grad_norm': 0.27984705567359924, 'learning_rate': 0.00014521719072826858, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 2164.48, 'epoch': 2.0} | |
| [2025-12-16 22:51:45,260] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-36 | |
| {'loss': 0.6301, 'grad_norm': 0.20510244369506836, 'learning_rate': 0.00013779667014289065, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4521.82, 'epoch': 2.11} | |
| {'loss': 0.6013, 'grad_norm': 0.2081460803747177, 'learning_rate': 0.00013012461895372344, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4480.82, 'epoch': 2.23} | |
| [2025-12-16 22:52:39,901] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:52:41,086] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5562489032745361 | |
| [2025-12-16 22:52:41,631] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5433411598205566 | |
| [2025-12-16 22:52:42,175] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5441112518310547 | |
| [2025-12-16 22:52:42,714] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5376300811767578 | |
| [2025-12-16 22:52:42,714] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7242206931114197, 'eval_runtime': 11.4052, 'eval_samples_per_second': 8.943, 'eval_steps_per_second': 2.28, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.49, 'epoch': 2.23} | |
| {'loss': 0.6345, 'grad_norm': 0.20908962190151215, 'learning_rate': 0.00012225209339563145, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4464.71, 'epoch': 2.34} | |
| {'loss': 0.606, 'grad_norm': 0.21036918461322784, 'learning_rate': 0.00011423148382732853, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4610.24, 'epoch': 2.45} | |
| [2025-12-16 22:53:55,802] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:53:56,924] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5670294761657715 | |
| [2025-12-16 22:53:57,495] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5697734355926514 | |
| [2025-12-16 22:53:58,102] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.606992244720459 | |
| [2025-12-16 22:53:58,655] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5524764060974121 | |
| [2025-12-16 22:53:58,656] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7133870720863342, 'eval_runtime': 11.4191, 'eval_samples_per_second': 8.932, 'eval_steps_per_second': 2.277, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 2.51} | |
| [2025-12-16 22:54:10,083] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-45 | |
| {'loss': 0.5699, 'grad_norm': 0.22012656927108765, 'learning_rate': 0.00010611616608218429, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2257.77, 'epoch': 2.56} | |
| {'loss': 0.5996, 'grad_norm': 0.21454322338104248, 'learning_rate': 9.79601462608595e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4404.17, 'epoch': 2.68} | |
| {'loss': 0.5736, 'grad_norm': 0.21601669490337372, 'learning_rate': 8.981770132961649e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4449.49, 'epoch': 2.79} | |
| [2025-12-16 22:55:14,460] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:55:15,571] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5595951080322266 | |
| [2025-12-16 22:55:16,117] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5449485778808594 | |
| [2025-12-16 22:55:16,674] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5567653179168701 | |
| [2025-12-16 22:55:17,218] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5431113243103027 | |
| [2025-12-16 22:55:17,219] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7081390023231506, 'eval_runtime': 11.4267, 'eval_samples_per_second': 8.926, 'eval_steps_per_second': 2.275, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 2.79} | |
| {'loss': 0.581, 'grad_norm': 0.2167098969221115, 'learning_rate': 8.174301791606385e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4608.25, 'epoch': 2.9} | |
| {'loss': 0.6049, 'grad_norm': 0.3036252558231354, 'learning_rate': 7.378983170608982e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 6340.01, 'epoch': 3.0} | |
| [2025-12-16 22:56:12,732] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-54 | |
| [2025-12-16 22:56:29,903] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:56:31,105] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5644886493682861 | |
| [2025-12-16 22:56:31,651] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5451717376708984 | |
| [2025-12-16 22:56:32,229] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5778226852416992 | |
| [2025-12-16 22:56:32,817] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5867812633514404 | |
| [2025-12-16 22:56:32,817] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7016817927360535, 'eval_runtime': 11.386, 'eval_samples_per_second': 8.958, 'eval_steps_per_second': 2.284, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'epoch': 3.06} | |
| {'loss': 0.5623, 'grad_norm': 0.23229679465293884, 'learning_rate': 6.601106984173835e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2249.63, 'epoch': 3.11} | |
| {'loss': 0.5422, 'grad_norm': 0.20941545069217682, 'learning_rate': 5.845849869981137e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4662.79, 'epoch': 3.23} | |
| {'loss': 0.5023, 'grad_norm': 0.22417426109313965, 'learning_rate': 5.11823793951719e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4506.08, 'epoch': 3.34} | |
| [2025-12-16 22:57:45,685] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:57:46,797] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5598804950714111 | |
| [2025-12-16 22:57:47,358] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5603907108306885 | |
| [2025-12-16 22:57:47,944] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5853786468505859 | |
| [2025-12-16 22:57:48,556] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6114795207977295 | |
| [2025-12-16 22:57:48,557] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7072933316230774, 'eval_runtime': 11.3794, 'eval_samples_per_second': 8.964, 'eval_steps_per_second': 2.285, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 3.34} | |
| {'loss': 0.5672, 'grad_norm': 0.22416777908802032, 'learning_rate': 4.423113330131707e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4504.23, 'epoch': 3.45} | |
| [2025-12-16 22:58:36,901] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-63 | |
| {'loss': 0.5452, 'grad_norm': 0.24252445995807648, 'learning_rate': 3.7651019814126654e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4601.17, 'epoch': 3.56} | |
| [2025-12-16 22:59:04,397] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 22:59:05,525] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5673832893371582 | |
| [2025-12-16 22:59:06,092] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.566525936126709 | |
| [2025-12-16 22:59:06,659] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.566476583480835 | |
| [2025-12-16 22:59:07,243] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.582695722579956 | |
| [2025-12-16 22:59:07,243] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7047077417373657, 'eval_runtime': 11.4286, 'eval_samples_per_second': 8.925, 'eval_steps_per_second': 2.275, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 3.62} | |
| {'loss': 0.5324, 'grad_norm': 0.2310749590396881, 'learning_rate': 3.1485828503215585e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2222.68, 'epoch': 3.68} | |
| {'loss': 0.5322, 'grad_norm': 0.22174930572509766, 'learning_rate': 2.5776587699573006e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4499.22, 'epoch': 3.79} | |
| {'loss': 0.5051, 'grad_norm': 0.22046604752540588, 'learning_rate': 2.0561291458788733e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4445.6, 'epoch': 3.9} | |
| [2025-12-16 23:00:20,447] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 23:00:21,577] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5424764156341553 | |
| [2025-12-16 23:00:22,125] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5474584102630615 | |
| [2025-12-16 23:00:22,678] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5522243976593018 | |
| [2025-12-16 23:00:23,237] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5580763816833496 | |
| [2025-12-16 23:00:23,237] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7018095850944519, 'eval_runtime': 11.3624, 'eval_samples_per_second': 8.977, 'eval_steps_per_second': 2.288, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 3.9} | |
| {'loss': 0.5013, 'grad_norm': 0.3766796588897705, 'learning_rate': 1.587464671688187e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 6091.58, 'epoch': 4.0} | |
| [2025-12-16 23:00:53,957] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-72 | |
| {'loss': 0.4905, 'grad_norm': 0.2340284287929535, 'learning_rate': 1.1747842321367886e-05, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'tokens_per_second_per_gpu': 4514.34, 'epoch': 4.11} | |
| [2025-12-16 23:01:36,261] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 23:01:37,429] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.553159236907959 | |
| [2025-12-16 23:01:37,981] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5507926940917969 | |
| [2025-12-16 23:01:38,572] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5903477668762207 | |
| [2025-12-16 23:01:39,119] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5467860698699951 | |
| [2025-12-16 23:01:39,120] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7022753357887268, 'eval_runtime': 11.3827, 'eval_samples_per_second': 8.961, 'eval_steps_per_second': 2.284, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.49, 'epoch': 4.17} | |
| {'loss': 0.5351, 'grad_norm': 0.21539896726608276, 'learning_rate': 8.208341474624071e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 2274.18, 'epoch': 4.23} | |
| {'loss': 0.5053, 'grad_norm': 0.21500085294246674, 'learning_rate': 5.27969897080901e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4638.83, 'epoch': 4.34} | |
| {'loss': 0.5374, 'grad_norm': 0.24092087149620056, 'learning_rate': 2.9814044425935606e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4547.21, 'epoch': 4.45} | |
| [2025-12-16 23:02:52,287] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 23:02:53,601] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6551158428192139 | |
| [2025-12-16 23:02:54,241] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6379897594451904 | |
| [2025-12-16 23:02:54,909] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6671915054321289 | |
| [2025-12-16 23:02:55,571] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6615691184997559 | |
| [2025-12-16 23:02:55,572] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7037167549133301, 'eval_runtime': 11.6258, 'eval_samples_per_second': 8.774, 'eval_steps_per_second': 2.236, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.45} | |
| [2025-12-16 23:03:19,526] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-81 | |
| {'loss': 0.495, 'grad_norm': 0.22582735121250153, 'learning_rate': 1.3287526608711131e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4492.37, 'epoch': 4.56} | |
| {'loss': 0.494, 'grad_norm': 0.22348730266094208, 'learning_rate': 3.3274175058067846e-07, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4489.26, 'epoch': 4.68} | |
| [2025-12-16 23:04:13,132] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step... | |
| [2025-12-16 23:04:14,299] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5913589000701904 | |
| [2025-12-16 23:04:14,901] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6010839939117432 | |
| [2025-12-16 23:04:15,471] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5690572261810303 | |
| [2025-12-16 23:04:16,044] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5720643997192383 | |
| [2025-12-16 23:04:16,044] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9] | |
| {'eval_loss': 0.7037572264671326, 'eval_runtime': 11.4254, 'eval_samples_per_second': 8.927, 'eval_steps_per_second': 2.276, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.73} | |
| [2025-12-16 23:04:27,478] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-85 | |
| {'train_runtime': 1334.7133, 'train_samples_per_second': 1.019, 'train_steps_per_second': 0.064, 'train_loss': 0.679589033126831, 'memory/max_active (GiB)': 5.24, 'memory/max_allocated (GiB)': 5.24, 'memory/device_reserved (GiB)': 47.45, 'epoch': 4.73} | |
| [2025-12-16 23:04:36,294] [INFO] [axolotl.train.save_trained_model:218] [PID:27] Training completed! Saving trained model to /workspace-data/output. | |
| [2025-12-16 23:04:36,950] [INFO] [axolotl.train.save_trained_model:336] [PID:27] Model successfully saved to /workspace-data/output | |
| [2025-12-16 23:04:37,226] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output | |