Can this model be layer pruned?

#12
by Rnake - opened

I want to run this model on my phone, but when I run layer pruning and do distillation, I find that the effect drops dramatically. Is this related to the use of RoPE in this model?

Jina AI org

Hey, I don't think it's related to RoPE, it could be that nano model is already quite compact for layer pruning depending on the pruning degree you experimented with, or that the distillation setup needs some tuning. Either way, if your goal is on-device inference, I'd suggest trying quantized GGUF versions instead. Q4_K_M (157 MB, down from 424 MB) is probably the best tradeoff between size and quality.

Hey, I don't think it's related to RoPE, it could be that nano model is already quite compact for layer pruning depending on the pruning degree you experimented with, or that the distillation setup needs some tuning. Either way, if your goal is on-device inference, I'd suggest trying quantized GGUF versions instead. Q4_K_M (157 MB, down from 424 MB) is probably the best tradeoff between size and quality.
Are there any metrics for the performance of Q4_K_M?

Sign up or log in to comment