deepspeed / accelerate

by fblgit - opened Feb 11, 2024

Discussion

fblgit

Shinoji Research org Feb 11, 2024

do u mind sharing the deepspeed/accelerate config? I tried to replicate this training but on a 8xH100 OOM :D

alicecomfy

Shinoji Research org Feb 11, 2024

This model was trained on a single H100 using QLORA. I am currently doing a LORA on 5xa100 to fix the prompting issues (this one is a mixof chatml and mistral), be careful with that setting if finetuning a similar model.

To get the 5xa100s working I used https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/deepspeed_configs/zero3_bf16.json and enabled the zero init (it asks you in the accelerate config dialog), nothing else special.

I tried doing it on 8x 40gb a100s out of curiosity and could not get any configuration of that to work.

I also post a bit about it here: https://twitter.com/Alice_comfy/status/1756675929181467098

fblgit

Shinoji Research org Feb 13, 2024

Confirmed. Works, 0.4.0

fblgit changed discussion status to closed Feb 13, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment