Question on FP8 training

#1
by 5fp - opened

Hi, is the FP8 training applied to 27B and below dense models?
Does the FP8 pipeline quantize most of the model, so BF16 is an upscaled version of the FP8 checkpoint? Thanks!

Sign up or log in to comment