What is the purpose of this new 57B variant?

#1
by sebastienbo - opened

What is the purpose of this new 57B variant?

It's kind of weird because the official 4.7 flash model was 30B parameters, so where are the other parameters comming from?
And what kind Of differenc edoes that make in the active paramters? is it still 3B ?

Such models are usually made by self-merging the smaller model into a larger one. Doing so makes the model more intelligent but also more expensive to run but if someone really likes a certain model and has the resources to run a larger one then doing so is worth it. Here a popular example of a self-merge chain which resulted in FATLLAMA-1.7T-Instruct which to this day is the largest publicly available model on HuggingFace:

Sign up or log in to comment