Instructions to use JimmyXiao091130/plaa-mfilm-qwen3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use JimmyXiao091130/plaa-mfilm-qwen3-4b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/qwen3-4b") model = PeftModel.from_pretrained(base_model, "JimmyXiao091130/plaa-mfilm-qwen3-4b") - Notebooks
- Google Colab
- Kaggle
PLAA - Multiplicative FiLM
Multiplicative FiLM modulation for Qwen3-4B. Per-layer feature amplitude gating via hs x (1 + alpha*tanh(W_l * S_t)).
Architecture
Injects modulation layers into Qwen3-4B layers 16-28. Each layer has an independent scale projection from the latent state S_t.
hs = hs * (1 + alpha * tanh(W_l * S_t))
Contents
adapter_model.safetensors— PEFT LoRA adapter (Phase 2.5 persona alignment)plaa_full.pt— PlaaCore GRU + FiLM scale_proj weightsmodeling_plaa.py— FiLMLayer + PlaaCore definitionconfig.json— PEFT adapter config
Loading
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Load base model
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True)
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B",
quantization_config=bnb, device_map="auto",
trust_remote_code=True, torch_dtype=torch.bfloat16)
# Inject FiLM layers
from modeling_plaa import FiLMLayer, PlaaCore
for i in range(16, 29):
base.model.layers[i] = FiLMLayer(base.model.layers[i])
# Load PEFT adapter
peft = PeftModel.from_pretrained(base, "./", adapter_name="plaa")
peft.set_adapter("plaa")
# Load FiLM weights
import torch
ckpt = torch.load("./plaa_full.pt")
plaa_core = PlaaCore()
plaa_core.load_state_dict(ckpt["plaa_core"])
for i in range(16, 29):
peft.base_model.model.model.layers[i].scale_proj.load_state_dict(
ckpt["scale_proj"][i])
peft.base_model.model.model.layers[i].cuda()
# Inference
S = plaa_core.init_state(1)
for i in range(16, 29):
peft.base_model.model.model.layers[i]._s = S
inp = tokenizer(["Hello"], return_tensors="pt").to("cuda")
out = peft.generate(**inp, max_new_tokens=50)
Results
| Condition | Pure LM Loss |
|---|---|
| Vanilla Qwen3-4B | 3.53 |
| Trained mFiLM | 2.70 |
| FiLM removed | 2.74 |
| State frozen | 2.70 |
Causal ablation Δ = 0.044. See paper for details.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support