Instructions to use codezakh/EFAGen-Llama-3.1-Instruct-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codezakh/EFAGen-Llama-3.1-Instruct-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="codezakh/EFAGen-Llama-3.1-Instruct-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("codezakh/EFAGen-Llama-3.1-Instruct-8B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use codezakh/EFAGen-Llama-3.1-Instruct-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "codezakh/EFAGen-Llama-3.1-Instruct-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codezakh/EFAGen-Llama-3.1-Instruct-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/codezakh/EFAGen-Llama-3.1-Instruct-8B
- SGLang
How to use codezakh/EFAGen-Llama-3.1-Instruct-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "codezakh/EFAGen-Llama-3.1-Instruct-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codezakh/EFAGen-Llama-3.1-Instruct-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "codezakh/EFAGen-Llama-3.1-Instruct-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codezakh/EFAGen-Llama-3.1-Instruct-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use codezakh/EFAGen-Llama-3.1-Instruct-8B with Docker Model Runner:
docker model run hf.co/codezakh/EFAGen-Llama-3.1-Instruct-8B
| { | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 2.9968586387434555, | |
| "eval_steps": 500, | |
| "global_step": 1431, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.020942408376963352, | |
| "grad_norm": 0.5137882828712463, | |
| "learning_rate": 6.944444444444445e-06, | |
| "loss": 0.341, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.041884816753926704, | |
| "grad_norm": 0.5445541739463806, | |
| "learning_rate": 1.388888888888889e-05, | |
| "loss": 0.3137, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.06282722513089005, | |
| "grad_norm": 0.34443390369415283, | |
| "learning_rate": 2.0833333333333336e-05, | |
| "loss": 0.2399, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.08376963350785341, | |
| "grad_norm": 0.36772170662879944, | |
| "learning_rate": 2.777777777777778e-05, | |
| "loss": 0.1832, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.10471204188481675, | |
| "grad_norm": 0.3646106719970703, | |
| "learning_rate": 3.472222222222222e-05, | |
| "loss": 0.151, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.1256544502617801, | |
| "grad_norm": 0.538939356803894, | |
| "learning_rate": 4.166666666666667e-05, | |
| "loss": 0.129, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.14659685863874344, | |
| "grad_norm": 0.36220353841781616, | |
| "learning_rate": 4.8611111111111115e-05, | |
| "loss": 0.1248, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.16753926701570682, | |
| "grad_norm": 0.4773642420768738, | |
| "learning_rate": 5.555555555555556e-05, | |
| "loss": 0.1142, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.18848167539267016, | |
| "grad_norm": 0.45749950408935547, | |
| "learning_rate": 6.25e-05, | |
| "loss": 0.1103, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.2094240837696335, | |
| "grad_norm": 0.4229575991630554, | |
| "learning_rate": 6.944444444444444e-05, | |
| "loss": 0.0985, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.23036649214659685, | |
| "grad_norm": 0.4683126211166382, | |
| "learning_rate": 7.638888888888889e-05, | |
| "loss": 0.0929, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.2513089005235602, | |
| "grad_norm": 0.3871113359928131, | |
| "learning_rate": 8.333333333333334e-05, | |
| "loss": 0.0955, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.27225130890052357, | |
| "grad_norm": 0.45953118801116943, | |
| "learning_rate": 9.027777777777779e-05, | |
| "loss": 0.0941, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.2931937172774869, | |
| "grad_norm": 0.4296354651451111, | |
| "learning_rate": 9.722222222222223e-05, | |
| "loss": 0.0906, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.31413612565445026, | |
| "grad_norm": 0.4266631007194519, | |
| "learning_rate": 9.999463737538053e-05, | |
| "loss": 0.0815, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.33507853403141363, | |
| "grad_norm": 0.4313138425350189, | |
| "learning_rate": 9.996186994612176e-05, | |
| "loss": 0.0803, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.35602094240837695, | |
| "grad_norm": 0.36333203315734863, | |
| "learning_rate": 9.989933382359422e-05, | |
| "loss": 0.0797, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 0.3769633507853403, | |
| "grad_norm": 0.7311431765556335, | |
| "learning_rate": 9.980706626858607e-05, | |
| "loss": 0.0768, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 0.39790575916230364, | |
| "grad_norm": 0.4651825428009033, | |
| "learning_rate": 9.96851222567126e-05, | |
| "loss": 0.0818, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 0.418848167539267, | |
| "grad_norm": 0.5768024325370789, | |
| "learning_rate": 9.953357444566039e-05, | |
| "loss": 0.0717, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.4397905759162304, | |
| "grad_norm": 0.41442790627479553, | |
| "learning_rate": 9.935251313189564e-05, | |
| "loss": 0.0698, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 0.4607329842931937, | |
| "grad_norm": 0.5525192618370056, | |
| "learning_rate": 9.914204619686314e-05, | |
| "loss": 0.0745, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 0.4816753926701571, | |
| "grad_norm": 0.4769018590450287, | |
| "learning_rate": 9.890229904270731e-05, | |
| "loss": 0.0719, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 0.5026178010471204, | |
| "grad_norm": 0.4063033163547516, | |
| "learning_rate": 9.86334145175542e-05, | |
| "loss": 0.0751, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 0.5235602094240838, | |
| "grad_norm": 0.3966362476348877, | |
| "learning_rate": 9.833555283039842e-05, | |
| "loss": 0.0696, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.5445026178010471, | |
| "grad_norm": 0.4347456395626068, | |
| "learning_rate": 9.800889145564617e-05, | |
| "loss": 0.0594, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 0.5654450261780105, | |
| "grad_norm": 0.3599953055381775, | |
| "learning_rate": 9.765362502737097e-05, | |
| "loss": 0.0732, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 0.5863874345549738, | |
| "grad_norm": 0.3424774706363678, | |
| "learning_rate": 9.730960252267743e-05, | |
| "loss": 0.0649, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 0.6073298429319371, | |
| "grad_norm": 0.48636892437934875, | |
| "learning_rate": 9.690058365011607e-05, | |
| "loss": 0.0679, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 0.6282722513089005, | |
| "grad_norm": 0.5237702131271362, | |
| "learning_rate": 9.646362008512602e-05, | |
| "loss": 0.0787, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.6492146596858639, | |
| "grad_norm": 0.3805506229400635, | |
| "learning_rate": 9.599897218294122e-05, | |
| "loss": 0.072, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 0.6701570680628273, | |
| "grad_norm": 0.4530421197414398, | |
| "learning_rate": 9.550691679390558e-05, | |
| "loss": 0.065, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 0.6910994764397905, | |
| "grad_norm": 0.4006820321083069, | |
| "learning_rate": 9.498774709851779e-05, | |
| "loss": 0.0644, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 0.7120418848167539, | |
| "grad_norm": 0.3857646584510803, | |
| "learning_rate": 9.444177243274618e-05, | |
| "loss": 0.0624, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 0.7329842931937173, | |
| "grad_norm": 0.4317072331905365, | |
| "learning_rate": 9.386931810371742e-05, | |
| "loss": 0.066, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.7539267015706806, | |
| "grad_norm": 0.3598201274871826, | |
| "learning_rate": 9.327072519588954e-05, | |
| "loss": 0.0642, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 0.774869109947644, | |
| "grad_norm": 0.39206311106681824, | |
| "learning_rate": 9.264635036782405e-05, | |
| "loss": 0.064, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 0.7958115183246073, | |
| "grad_norm": 0.40854617953300476, | |
| "learning_rate": 9.199656563967875e-05, | |
| "loss": 0.0635, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 0.8167539267015707, | |
| "grad_norm": 0.4052296578884125, | |
| "learning_rate": 9.132175817154763e-05, | |
| "loss": 0.0617, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 0.837696335078534, | |
| "grad_norm": 0.5453737378120422, | |
| "learning_rate": 9.062233003277983e-05, | |
| "loss": 0.0612, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.8586387434554974, | |
| "grad_norm": 0.4285629391670227, | |
| "learning_rate": 8.989869796241559e-05, | |
| "loss": 0.0609, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 0.8795811518324608, | |
| "grad_norm": 0.42577216029167175, | |
| "learning_rate": 8.915129312088112e-05, | |
| "loss": 0.0634, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 0.900523560209424, | |
| "grad_norm": 0.44957348704338074, | |
| "learning_rate": 8.838056083309118e-05, | |
| "loss": 0.0569, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 0.9214659685863874, | |
| "grad_norm": 0.3643034100532532, | |
| "learning_rate": 8.758696032311192e-05, | |
| "loss": 0.0583, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 0.9424083769633508, | |
| "grad_norm": 0.5593224167823792, | |
| "learning_rate": 8.677096444054213e-05, | |
| "loss": 0.0637, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.9633507853403142, | |
| "grad_norm": 0.4963381886482239, | |
| "learning_rate": 8.593305937877614e-05, | |
| "loss": 0.0607, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 0.9842931937172775, | |
| "grad_norm": 0.5490202307701111, | |
| "learning_rate": 8.507374438531607e-05, | |
| "loss": 0.0617, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 1.0052356020942408, | |
| "grad_norm": 0.33712923526763916, | |
| "learning_rate": 8.419353146430609e-05, | |
| "loss": 0.0563, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 1.0261780104712042, | |
| "grad_norm": 0.4473194181919098, | |
| "learning_rate": 8.329294507146579e-05, | |
| "loss": 0.0415, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 1.0471204188481675, | |
| "grad_norm": 0.43435972929000854, | |
| "learning_rate": 8.23725218016048e-05, | |
| "loss": 0.0402, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 1.0471204188481675, | |
| "eval_loss": 0.058112796396017075, | |
| "eval_runtime": 97.7069, | |
| "eval_samples_per_second": 4.35, | |
| "eval_steps_per_second": 1.095, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 1.068062827225131, | |
| "grad_norm": 0.41959723830223083, | |
| "learning_rate": 8.143281006890433e-05, | |
| "loss": 0.0388, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 1.0890052356020943, | |
| "grad_norm": 0.46331778168678284, | |
| "learning_rate": 8.047436978015649e-05, | |
| "loss": 0.0467, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 1.1099476439790577, | |
| "grad_norm": 0.4408092200756073, | |
| "learning_rate": 7.949777200115616e-05, | |
| "loss": 0.0414, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 1.130890052356021, | |
| "grad_norm": 0.5561596751213074, | |
| "learning_rate": 7.850359861644368e-05, | |
| "loss": 0.0413, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 1.1518324607329844, | |
| "grad_norm": 0.6036704778671265, | |
| "learning_rate": 7.749244198260175e-05, | |
| "loss": 0.0448, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 1.1727748691099475, | |
| "grad_norm": 0.333761602640152, | |
| "learning_rate": 7.646490457531257e-05, | |
| "loss": 0.0379, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 1.193717277486911, | |
| "grad_norm": 0.4401955306529999, | |
| "learning_rate": 7.54215986303858e-05, | |
| "loss": 0.0375, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 1.2146596858638743, | |
| "grad_norm": 0.3552649915218353, | |
| "learning_rate": 7.436314577897126e-05, | |
| "loss": 0.0373, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 1.2356020942408377, | |
| "grad_norm": 0.2970343828201294, | |
| "learning_rate": 7.329017667717339e-05, | |
| "loss": 0.042, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 1.256544502617801, | |
| "grad_norm": 0.45380938053131104, | |
| "learning_rate": 7.220333063028872e-05, | |
| "loss": 0.0379, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 1.2774869109947644, | |
| "grad_norm": 0.4114688038825989, | |
| "learning_rate": 7.110325521188949e-05, | |
| "loss": 0.0444, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 1.2984293193717278, | |
| "grad_norm": 0.37951603531837463, | |
| "learning_rate": 6.999060587798128e-05, | |
| "loss": 0.0422, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 1.3193717277486912, | |
| "grad_norm": 0.5121111869812012, | |
| "learning_rate": 6.886604557646356e-05, | |
| "loss": 0.0384, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 1.3403141361256545, | |
| "grad_norm": 0.3808182179927826, | |
| "learning_rate": 6.773024435212678e-05, | |
| "loss": 0.0352, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 1.3612565445026177, | |
| "grad_norm": 0.3297405540943146, | |
| "learning_rate": 6.658387894742071e-05, | |
| "loss": 0.0397, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 1.3821989528795813, | |
| "grad_norm": 0.41979941725730896, | |
| "learning_rate": 6.542763239923215e-05, | |
| "loss": 0.0375, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 1.4031413612565444, | |
| "grad_norm": 0.5469591021537781, | |
| "learning_rate": 6.426219363191224e-05, | |
| "loss": 0.0443, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 1.4240837696335078, | |
| "grad_norm": 0.3828868865966797, | |
| "learning_rate": 6.308825704679596e-05, | |
| "loss": 0.0334, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 1.4450261780104712, | |
| "grad_norm": 0.49810731410980225, | |
| "learning_rate": 6.190652210845815e-05, | |
| "loss": 0.0401, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 1.4659685863874345, | |
| "grad_norm": 0.4368995130062103, | |
| "learning_rate": 6.0717692927952744e-05, | |
| "loss": 0.0367, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 1.486910994764398, | |
| "grad_norm": 0.4042227566242218, | |
| "learning_rate": 5.952247784328351e-05, | |
| "loss": 0.0375, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 1.5078534031413613, | |
| "grad_norm": 0.49127721786499023, | |
| "learning_rate": 5.8321588997356326e-05, | |
| "loss": 0.0339, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 1.5287958115183247, | |
| "grad_norm": 0.3174681067466736, | |
| "learning_rate": 5.7115741913664264e-05, | |
| "loss": 0.0381, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 1.5497382198952878, | |
| "grad_norm": 0.4839365780353546, | |
| "learning_rate": 5.59056550699585e-05, | |
| "loss": 0.0355, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 1.5706806282722514, | |
| "grad_norm": 0.30561134219169617, | |
| "learning_rate": 5.469204947015897e-05, | |
| "loss": 0.0374, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 1.5916230366492146, | |
| "grad_norm": 0.3758664131164551, | |
| "learning_rate": 5.3475648214759896e-05, | |
| "loss": 0.0327, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 1.6125654450261782, | |
| "grad_norm": 0.3118516206741333, | |
| "learning_rate": 5.2257176069986036e-05, | |
| "loss": 0.0403, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 1.6335078534031413, | |
| "grad_norm": 0.33437207341194153, | |
| "learning_rate": 5.103735903595658e-05, | |
| "loss": 0.0364, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 1.6544502617801047, | |
| "grad_norm": 0.54002845287323, | |
| "learning_rate": 4.981692391411366e-05, | |
| "loss": 0.0371, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 1.675392670157068, | |
| "grad_norm": 0.2974444031715393, | |
| "learning_rate": 4.859659787417362e-05, | |
| "loss": 0.0406, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 1.6963350785340314, | |
| "grad_norm": 0.3456689417362213, | |
| "learning_rate": 4.737710802085875e-05, | |
| "loss": 0.0353, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 1.7172774869109948, | |
| "grad_norm": 0.4571535885334015, | |
| "learning_rate": 4.615918096066766e-05, | |
| "loss": 0.0325, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 1.738219895287958, | |
| "grad_norm": 0.3281984031200409, | |
| "learning_rate": 4.4943542368942746e-05, | |
| "loss": 0.0318, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 1.7591623036649215, | |
| "grad_norm": 0.2916335165500641, | |
| "learning_rate": 4.373091655749225e-05, | |
| "loss": 0.0374, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 1.7801047120418847, | |
| "grad_norm": 0.4069044888019562, | |
| "learning_rate": 4.252202604302476e-05, | |
| "loss": 0.0362, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 1.8010471204188483, | |
| "grad_norm": 0.39771515130996704, | |
| "learning_rate": 4.131759111665349e-05, | |
| "loss": 0.038, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 1.8219895287958114, | |
| "grad_norm": 0.38277313113212585, | |
| "learning_rate": 4.011832941472641e-05, | |
| "loss": 0.0393, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 1.8429319371727748, | |
| "grad_norm": 0.2795620560646057, | |
| "learning_rate": 3.8924955491238216e-05, | |
| "loss": 0.0336, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 1.8638743455497382, | |
| "grad_norm": 0.3264184594154358, | |
| "learning_rate": 3.7738180392078937e-05, | |
| "loss": 0.033, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 1.8848167539267016, | |
| "grad_norm": 0.5280706286430359, | |
| "learning_rate": 3.6558711231372704e-05, | |
| "loss": 0.035, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 1.905759162303665, | |
| "grad_norm": 0.3607095181941986, | |
| "learning_rate": 3.538725077015915e-05, | |
| "loss": 0.0357, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 1.9267015706806283, | |
| "grad_norm": 0.3703671991825104, | |
| "learning_rate": 3.422449699766851e-05, | |
| "loss": 0.0313, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 1.9476439790575917, | |
| "grad_norm": 0.4830826222896576, | |
| "learning_rate": 3.307114271543999e-05, | |
| "loss": 0.0345, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 1.9685863874345548, | |
| "grad_norm": 0.42555513978004456, | |
| "learning_rate": 3.192787512453105e-05, | |
| "loss": 0.0333, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 1.9895287958115184, | |
| "grad_norm": 0.3647818863391876, | |
| "learning_rate": 3.079537541606349e-05, | |
| "loss": 0.0317, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 2.0104712041884816, | |
| "grad_norm": 0.27432000637054443, | |
| "learning_rate": 2.9674318365350685e-05, | |
| "loss": 0.0235, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 2.031413612565445, | |
| "grad_norm": 0.2879980802536011, | |
| "learning_rate": 2.8565371929847284e-05, | |
| "loss": 0.0158, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 2.0523560209424083, | |
| "grad_norm": 0.30074015259742737, | |
| "learning_rate": 2.7469196851161373e-05, | |
| "loss": 0.0179, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 2.073298429319372, | |
| "grad_norm": 0.48433229327201843, | |
| "learning_rate": 2.638644626136587e-05, | |
| "loss": 0.0174, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 2.094240837696335, | |
| "grad_norm": 0.3053944706916809, | |
| "learning_rate": 2.531776529384407e-05, | |
| "loss": 0.0204, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 2.094240837696335, | |
| "eval_loss": 0.05247209593653679, | |
| "eval_runtime": 97.7528, | |
| "eval_samples_per_second": 4.348, | |
| "eval_steps_per_second": 1.095, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 2.115183246073298, | |
| "grad_norm": 0.3178795576095581, | |
| "learning_rate": 2.426379069890098e-05, | |
| "loss": 0.0178, | |
| "step": 1010 | |
| }, | |
| { | |
| "epoch": 2.136125654450262, | |
| "grad_norm": 0.30507609248161316, | |
| "learning_rate": 2.3225150464369312e-05, | |
| "loss": 0.0145, | |
| "step": 1020 | |
| }, | |
| { | |
| "epoch": 2.157068062827225, | |
| "grad_norm": 0.3095521330833435, | |
| "learning_rate": 2.2202463441436884e-05, | |
| "loss": 0.0186, | |
| "step": 1030 | |
| }, | |
| { | |
| "epoch": 2.1780104712041886, | |
| "grad_norm": 0.40178781747817993, | |
| "learning_rate": 2.1196338975917358e-05, | |
| "loss": 0.0185, | |
| "step": 1040 | |
| }, | |
| { | |
| "epoch": 2.1989528795811517, | |
| "grad_norm": 0.3371562063694, | |
| "learning_rate": 2.0207376545184893e-05, | |
| "loss": 0.0148, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 2.2198952879581153, | |
| "grad_norm": 0.5400903820991516, | |
| "learning_rate": 1.9236165400988638e-05, | |
| "loss": 0.0188, | |
| "step": 1060 | |
| }, | |
| { | |
| "epoch": 2.2408376963350785, | |
| "grad_norm": 0.1960514336824417, | |
| "learning_rate": 1.8283284218359782e-05, | |
| "loss": 0.0167, | |
| "step": 1070 | |
| }, | |
| { | |
| "epoch": 2.261780104712042, | |
| "grad_norm": 0.29054176807403564, | |
| "learning_rate": 1.734930075082076e-05, | |
| "loss": 0.0156, | |
| "step": 1080 | |
| }, | |
| { | |
| "epoch": 2.282722513089005, | |
| "grad_norm": 0.31542539596557617, | |
| "learning_rate": 1.6434771492101485e-05, | |
| "loss": 0.0168, | |
| "step": 1090 | |
| }, | |
| { | |
| "epoch": 2.303664921465969, | |
| "grad_norm": 0.30312708020210266, | |
| "learning_rate": 1.5540241344564915e-05, | |
| "loss": 0.0178, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 2.324607329842932, | |
| "grad_norm": 0.22572675347328186, | |
| "learning_rate": 1.46662432945386e-05, | |
| "loss": 0.0159, | |
| "step": 1110 | |
| }, | |
| { | |
| "epoch": 2.345549738219895, | |
| "grad_norm": 0.32737213373184204, | |
| "learning_rate": 1.3813298094746491e-05, | |
| "loss": 0.0155, | |
| "step": 1120 | |
| }, | |
| { | |
| "epoch": 2.3664921465968587, | |
| "grad_norm": 0.29300010204315186, | |
| "learning_rate": 1.2981913954029784e-05, | |
| "loss": 0.0158, | |
| "step": 1130 | |
| }, | |
| { | |
| "epoch": 2.387434554973822, | |
| "grad_norm": 0.3531718850135803, | |
| "learning_rate": 1.2172586234541644e-05, | |
| "loss": 0.0162, | |
| "step": 1140 | |
| }, | |
| { | |
| "epoch": 2.4083769633507854, | |
| "grad_norm": 0.3117247521877289, | |
| "learning_rate": 1.1385797156596506e-05, | |
| "loss": 0.0157, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 2.4293193717277486, | |
| "grad_norm": 0.3147590756416321, | |
| "learning_rate": 1.062201551134957e-05, | |
| "loss": 0.0146, | |
| "step": 1160 | |
| }, | |
| { | |
| "epoch": 2.450261780104712, | |
| "grad_norm": 0.3247630298137665, | |
| "learning_rate": 9.88169638147784e-06, | |
| "loss": 0.0143, | |
| "step": 1170 | |
| }, | |
| { | |
| "epoch": 2.4712041884816753, | |
| "grad_norm": 0.3280116319656372, | |
| "learning_rate": 9.16528087002892e-06, | |
| "loss": 0.017, | |
| "step": 1180 | |
| }, | |
| { | |
| "epoch": 2.492146596858639, | |
| "grad_norm": 0.30790087580680847, | |
| "learning_rate": 8.473195837599418e-06, | |
| "loss": 0.0155, | |
| "step": 1190 | |
| }, | |
| { | |
| "epoch": 2.513089005235602, | |
| "grad_norm": 0.4461917579174042, | |
| "learning_rate": 7.805853647999362e-06, | |
| "loss": 0.0176, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 2.5340314136125652, | |
| "grad_norm": 0.3395947515964508, | |
| "learning_rate": 7.163651922554149e-06, | |
| "loss": 0.0194, | |
| "step": 1210 | |
| }, | |
| { | |
| "epoch": 2.554973821989529, | |
| "grad_norm": 0.30847620964050293, | |
| "learning_rate": 6.5469733031905515e-06, | |
| "loss": 0.0147, | |
| "step": 1220 | |
| }, | |
| { | |
| "epoch": 2.5759162303664924, | |
| "grad_norm": 0.2089480310678482, | |
| "learning_rate": 5.956185224447841e-06, | |
| "loss": 0.0146, | |
| "step": 1230 | |
| }, | |
| { | |
| "epoch": 2.5968586387434556, | |
| "grad_norm": 0.36750879883766174, | |
| "learning_rate": 5.391639694549943e-06, | |
| "loss": 0.0164, | |
| "step": 1240 | |
| }, | |
| { | |
| "epoch": 2.6178010471204187, | |
| "grad_norm": 0.41677984595298767, | |
| "learning_rate": 4.853673085668947e-06, | |
| "loss": 0.0148, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 2.6387434554973823, | |
| "grad_norm": 0.32629698514938354, | |
| "learning_rate": 4.342605933505084e-06, | |
| "loss": 0.0154, | |
| "step": 1260 | |
| }, | |
| { | |
| "epoch": 2.6596858638743455, | |
| "grad_norm": 0.34532350301742554, | |
| "learning_rate": 3.858742746302535e-06, | |
| "loss": 0.0153, | |
| "step": 1270 | |
| }, | |
| { | |
| "epoch": 2.680628272251309, | |
| "grad_norm": 0.21778731048107147, | |
| "learning_rate": 3.402371823414774e-06, | |
| "loss": 0.0136, | |
| "step": 1280 | |
| }, | |
| { | |
| "epoch": 2.701570680628272, | |
| "grad_norm": 0.28639301657676697, | |
| "learning_rate": 2.9737650835276853e-06, | |
| "loss": 0.0137, | |
| "step": 1290 | |
| }, | |
| { | |
| "epoch": 2.7225130890052354, | |
| "grad_norm": 0.44888314604759216, | |
| "learning_rate": 2.573177902642726e-06, | |
| "loss": 0.0204, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 2.743455497382199, | |
| "grad_norm": 0.3165195882320404, | |
| "learning_rate": 2.200848961916718e-06, | |
| "loss": 0.0147, | |
| "step": 1310 | |
| }, | |
| { | |
| "epoch": 2.7643979057591626, | |
| "grad_norm": 0.3899117112159729, | |
| "learning_rate": 1.8570001054488362e-06, | |
| "loss": 0.0175, | |
| "step": 1320 | |
| }, | |
| { | |
| "epoch": 2.7853403141361257, | |
| "grad_norm": 0.2718731164932251, | |
| "learning_rate": 1.5418362080996507e-06, | |
| "loss": 0.0153, | |
| "step": 1330 | |
| }, | |
| { | |
| "epoch": 2.806282722513089, | |
| "grad_norm": 0.4670896828174591, | |
| "learning_rate": 1.2555450534208978e-06, | |
| "loss": 0.0138, | |
| "step": 1340 | |
| }, | |
| { | |
| "epoch": 2.8272251308900525, | |
| "grad_norm": 0.26648226380348206, | |
| "learning_rate": 9.98297221768718e-07, | |
| "loss": 0.0142, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 2.8481675392670156, | |
| "grad_norm": 0.3085860311985016, | |
| "learning_rate": 7.702459886670788e-07, | |
| "loss": 0.0141, | |
| "step": 1360 | |
| }, | |
| { | |
| "epoch": 2.869109947643979, | |
| "grad_norm": 0.46135184168815613, | |
| "learning_rate": 5.715272334818944e-07, | |
| "loss": 0.0162, | |
| "step": 1370 | |
| }, | |
| { | |
| "epoch": 2.8900523560209423, | |
| "grad_norm": 0.3128398060798645, | |
| "learning_rate": 4.02259358460233e-07, | |
| "loss": 0.0159, | |
| "step": 1380 | |
| }, | |
| { | |
| "epoch": 2.9109947643979055, | |
| "grad_norm": 0.44724029302597046, | |
| "learning_rate": 2.6254321818295345e-07, | |
| "loss": 0.0178, | |
| "step": 1390 | |
| }, | |
| { | |
| "epoch": 2.931937172774869, | |
| "grad_norm": 0.35600796341896057, | |
| "learning_rate": 1.5246205947265779e-07, | |
| "loss": 0.0151, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 2.9528795811518327, | |
| "grad_norm": 0.2872493863105774, | |
| "learning_rate": 7.208147179291192e-08, | |
| "loss": 0.0156, | |
| "step": 1410 | |
| }, | |
| { | |
| "epoch": 2.973821989528796, | |
| "grad_norm": 0.27174440026283264, | |
| "learning_rate": 2.1449348168167682e-08, | |
| "loss": 0.0162, | |
| "step": 1420 | |
| }, | |
| { | |
| "epoch": 2.994764397905759, | |
| "grad_norm": 0.42477887868881226, | |
| "learning_rate": 5.95856647772619e-10, | |
| "loss": 0.0136, | |
| "step": 1430 | |
| } | |
| ], | |
| "logging_steps": 10, | |
| "max_steps": 1431, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 3, | |
| "save_steps": 500, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 1.717501382279299e+18, | |
| "train_batch_size": 1, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |