Model Card for qwen3-4b-mascarade-emc-lora

This model is a fine-tuned version of Qwen/Qwen3-4B. It has been trained using TRL with SFT on an EMC compliance corpus as part of the Ailiance mascarade LoRA family.

Quick start

from transformers import pipeline

question = "What decoupling capacitor strategy minimizes conducted emissions on a switching regulator?"
generator = pipeline("text-generation", model="Ailiance-fr/qwen3-4b-mascarade-emc-lora", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=256, return_full_text=False)[0]
print(output["generated_text"])

Bench results — ailiance-bench Phase 7 (CUDA, 2026-05-11)

Functional eval via the parsers/scorers from ailiance/ailiance-bench Phase 1 (bench_kicad_functional), ported to CUDA / transformers + PEFT for the Qwen3-4B-Instruct-2507 base.

Dataset	n	Composite score	Duration
`emc-dsp-power`	10	0.646	1221.9s

Composite score combines structural-parse-ok, component-count match, ground-node presence, etc. — see bench_kicad_functional.score_* for the exact formula. Greedy decoding, max_tokens per GEN_PARAMS.

Upstream base model — official evaluations

These are the official scores for the unmodified base model Qwen/Qwen3-4B-Instruct-2507, reported by Alibaba Qwen team. They represent the floor of capability that this LoRA inherits before the hardware-domain fine-tune adapts behavior.

Category	Benchmark	Qwen3-4B-Instruct-2507
Knowledge	MMLU-Pro	69.6
Knowledge	MMLU-Redux	84.2
Knowledge	GPQA	62.0
Knowledge	SuperGPQA	42.8
Reasoning	AIME25	47.4
Reasoning	HMMT25	31.0
Reasoning	ZebraLogic	80.2
Reasoning	LiveBench 2024-11-25	63.0
Coding	LiveCodeBench v6	35.1
Coding	MultiPL-E	76.8
Coding	Aider-Polyglot	12.9
Alignment	IFEval	83.4
Alignment	Arena-Hard v2	43.4
Alignment	Creative Writing v3	83.5
Alignment	WritingBench	83.4
Agent	BFCL-v3	61.9
Agent	TAU1-Retail	48.7
Agent	TAU1-Airline	32.0
Agent	TAU2-Retail	40.4
Multilingual	MultiIF	69.0
Multilingual	MMLU-ProX	61.6
Multilingual	INCLUDE	60.1
Multilingual	PolyMATH	31.1

Source: official Qwen3-4B-Instruct-2507 model card.

Reading these numbers alongside the Phase 6 bench above: the upstream scores measure general capability (knowledge, reasoning, coding, alignment). The Phase 6 deltas measure hardware-domain specialization (KiCad, SPICE, schematic extraction). A rank-16 LoRA adapter modifies less than 1% of base weights, so the upstream scores remain approximately the floor — this LoRA adds the Phase 6 deltas on top of these inherited capabilities.

Training procedure

This model was trained with SFT on an EMC compliance corpus.

Framework versions

TRL: 1.4.0
Transformers: 5.8.0
Pytorch: 2.11.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Bench results — held-out token-overlap (eval_mascarade_lora, n=10)

Evaluated on 10 random held-out prompts from Ailiance-fr/mascarade-emc-dataset (seed=101 ≠ train seed 42).

Metric	Value
Avg Jaccard token-overlap	0.06
Avg generation tokens	143.5
Avg latency (per sample, RTX 4090)	7.8s

Token-overlap is a coarse quality proxy — high overlap (>0.4) suggests the LoRA reproduces domain vocabulary; low overlap indicates either domain-shift or stylistic divergence from the reference. See ailiance/ailiance-bench for richer functional evaluations (KiCad DRC, SPICE convergence, etc.) on the same family.

Citations

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

Bench (vs base Qwen3-4B)

Consolidated comparison of this LoRA against its base, drawing on two complementary evaluation streams. The reference base used for cross-adapter comparison in Phase 6 is gemma-e4b-eu-kiki-base (legacy Gemma-4 ancestor). A dedicated Qwen3-4B-Instruct-2507 baseline run is not in our pipeline yet — those rows are n/a.

Phase 6 — cross-adapter scoreboard (reference base: `gemma-e4b-eu-kiki-base`)

Phase	iact-bench task	Base	Tuned (+mascarade)	Δ
P3	kicad-sch-extract (cross-domain)	0.308	0.785	++0.477

Phase 7 — CUDA functional eval on Qwen3-4B base (production-aligned)

Dataset	n	Base (Qwen3-4B)	Tuned (this LoRA)	Δ
`emc-dsp-power`	10	n/a	0.646	n/a

Methodology: iact-bench v0.2.0 (audit-grade Docker validators), greedy decoding, max_tokens per GEN_PARAMS. NDJSON audit trail in ailiance/ailiance-bench. Scoring date: 2026-05-11 (commit 46801af).

Phase 6 numbers reflect adapter behavior on a Gemma-4 reference base; domain semantics transfer to the Qwen3-4B production base served via Tower Ollama :8004, but absolute scores may shift. A Qwen3-4B baseline run is tracked for a future bench refresh.

Cross-domain forgetting check (Phase 9, 2026-05-11)

For each domain's eval set (seed=101, n samples held-out), compare this LoRA's Jaccard token-overlap vs the Qwen3-4B-Instruct-2507 baseline (no adapter) on the SAME prompts. Negative Δ = the LoRA degrades base behaviour on that domain.

Eval domain	LoRA Jaccard	Δ vs base
`kicad`	0.09	+0.003
`spice`	0.007	+0.002
`stm32`	0.055	+0.005
`emc`	0.061	-0.005 ⬅ in-domain
`embedded`	0.072	-0.002
`platformio`	0.047	+0.005
`freecad`	0.03	+0.009
`dsp`	0.098	-0.003
`iot`	0.05	-0.018
`power`	0.078	+0.010

In-domain Δ: -0.005 Out-of-domain mean Δ: 0.001

Downloads last month: 12

Model tree for Ailiance-fr/qwen3-4b-mascarade-emc-lora

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5567)

this model