Heads-up: BF16 weights appear to produce degenerate outputs (logits collapsed)

#1
by sakamakismile - opened

Hi huihui-ai team β€” long-time fan of the abliterated line, wanted to flag something we ran into while preparing an NVFP4 variant of this release for Blackwell. Posting it here as a friendly heads-up, not a complaint β€” totally up to you whether to investigate.

What we observed

When loading the BF16 weights via AutoModelForCausalLM.from_pretrained(..., dtype=torch.bfloat16, trust_remote_code=True, device_map="auto"), the model appears to load cleanly (no missing/unexpected keys), but:

  1. Output logits magnitude is collapsed to roughly [-0.08, +0.08] (healthy Qwen3-Next-80B BF16 logits typically span at least Β±10).
  2. Greedy generation produces only ! tokens (and occasional fragments like whole, BUFFER, journal, InlineData):
Prompt: "Hello, who are you?"
Output: "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!cribe!!! whole! whole!!!module!now!Le! whole! whole! whole! whole! whole"

Prompt: "Write a Python function that computes the factorial of n."
Output: "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! journal!!!!!!!!!!!!!!!!!!!!!!!!!BUFFER!InlineData!BUFFER!!!!!!!!!"

Prompt: "List the noble gases:"
Output: "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\x1a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
  1. NVFP4 calibration via nvidia-modelopt fails with NaN amax at the very first attention o_proj input β€” consistent with the upstream activations being near-zero and the o_proj input collapsing once any quantization scale tries to track it.

Test environment (clean room, fast path active)

  • Container: nvidia/cuda:13.0 base, torch 2.11, transformers 5.5.4
  • flash-linear-attention + causal-conv1d installed (no fast-path warning printed during forward)
  • 3Γ— RTX PRO 6000 Blackwell, device_map="auto" sharding
  • BF16 dtype, no quantization, no abliteration step from us β€” just from_pretrained + generate

So the "fast path is not available" fallback is not in play, and the issue is reproducible from a clean transformers load.

What we don't know

We only tested this single release, so we can't tell whether the cause sits in:

  • the abliteration pass over samuelcardillo/Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled (your step), or
  • something already present in the Reasoning-Distilled base, or
  • some interaction between the two.

We didn't run a comparison against samuelcardillo/Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled to bisect the cause β€” happy to do that if it would help.

Why we mention it

35K-DL-tier repos like yours are often the entry point for the local-LLM crowd, and BF16 generating only ! is the kind of thing that'll create a wave of confused issues. Wanted to surface it early so you have the option to investigate before that happens. We've stopped our NVFP4 path on this release accordingly.

Always grateful for the abliterated line β€” it's been the foundation of much of our Blackwell fast-path work this year. Let me know if there's any diagnostic data I can share that would speed up triage.

β€” Tonoken3 / Lna-Lab

We are very grateful for your support and feedback. We have not tested the method you mentioned, but we have added
code testing to our processes. You may want to give it a try.
https://huggingface.co/huihui-ai/Huihui-Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled-abliterated#usage

Sign up or log in to comment