Model Card for qwen3-4b-mascarade-emc-lora

This model is a fine-tuned version of Qwen/Qwen3-4B. It has been trained using TRL with SFT on an EMC compliance corpus as part of the Ailiance mascarade LoRA family.

Quick start

from transformers import pipeline

question = "What decoupling capacitor strategy minimizes conducted emissions on a switching regulator?"
generator = pipeline("text-generation", model="Ailiance-fr/qwen3-4b-mascarade-emc-lora", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=256, return_full_text=False)[0]
print(output["generated_text"])

Bench results — ailiance-bench Phase 7 (CUDA, 2026-05-11)

Functional eval via the parsers/scorers from ailiance/ailiance-bench Phase 1 (bench_kicad_functional), ported to CUDA / transformers + PEFT for the Qwen3-4B-Instruct-2507 base.

Dataset n Composite score Duration
emc-dsp-power 10 0.646 1221.9s

Composite score combines structural-parse-ok, component-count match, ground-node presence, etc. — see bench_kicad_functional.score_* for the exact formula. Greedy decoding, max_tokens per GEN_PARAMS.

Upstream base model — official evaluations

These are the official scores for the unmodified base model Qwen/Qwen3-4B-Instruct-2507, reported by Alibaba Qwen team. They represent the floor of capability that this LoRA inherits before the hardware-domain fine-tune adapts behavior.

Category Benchmark Qwen3-4B-Instruct-2507
Knowledge MMLU-Pro 69.6
Knowledge MMLU-Redux 84.2
Knowledge GPQA 62.0
Knowledge SuperGPQA 42.8
Reasoning AIME25 47.4
Reasoning HMMT25 31.0
Reasoning ZebraLogic 80.2
Reasoning LiveBench 2024-11-25 63.0
Coding LiveCodeBench v6 35.1
Coding MultiPL-E 76.8
Coding Aider-Polyglot 12.9
Alignment IFEval 83.4
Alignment Arena-Hard v2 43.4
Alignment Creative Writing v3 83.5
Alignment WritingBench 83.4
Agent BFCL-v3 61.9
Agent TAU1-Retail 48.7
Agent TAU1-Airline 32.0
Agent TAU2-Retail 40.4
Multilingual MultiIF 69.0
Multilingual MMLU-ProX 61.6
Multilingual INCLUDE 60.1
Multilingual PolyMATH 31.1

Source: official Qwen3-4B-Instruct-2507 model card.

Reading these numbers alongside the Phase 6 bench above: the upstream scores measure general capability (knowledge, reasoning, coding, alignment). The Phase 6 deltas measure hardware-domain specialization (KiCad, SPICE, schematic extraction). A rank-16 LoRA adapter modifies less than 1% of base weights, so the upstream scores remain approximately the floor — this LoRA adds the Phase 6 deltas on top of these inherited capabilities.

Training procedure

This model was trained with SFT on an EMC compliance corpus.

Framework versions

  • TRL: 1.4.0
  • Transformers: 5.8.0
  • Pytorch: 2.11.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Bench results — held-out token-overlap (eval_mascarade_lora, n=10)

Evaluated on 10 random held-out prompts from Ailiance-fr/mascarade-emc-dataset (seed=101 ≠ train seed 42).

Metric Value
Avg Jaccard token-overlap 0.06
Avg generation tokens 143.5
Avg latency (per sample, RTX 4090) 7.8s

Token-overlap is a coarse quality proxy — high overlap (>0.4) suggests the LoRA reproduces domain vocabulary; low overlap indicates either domain-shift or stylistic divergence from the reference. See ailiance/ailiance-bench for richer functional evaluations (KiCad DRC, SPICE convergence, etc.) on the same family.

Citations

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

Bench (vs base Qwen3-4B)

Consolidated comparison of this LoRA against its base, drawing on two complementary evaluation streams. The reference base used for cross-adapter comparison in Phase 6 is gemma-e4b-eu-kiki-base (legacy Gemma-4 ancestor). A dedicated Qwen3-4B-Instruct-2507 baseline run is not in our pipeline yet — those rows are n/a.

Phase 6 — cross-adapter scoreboard (reference base: gemma-e4b-eu-kiki-base)

Phase iact-bench task Base Tuned (+mascarade) Δ
P3 kicad-sch-extract (cross-domain) 0.308 0.785 ++0.477

Phase 7 — CUDA functional eval on Qwen3-4B base (production-aligned)

Dataset n Base (Qwen3-4B) Tuned (this LoRA) Δ
emc-dsp-power 10 n/a 0.646 n/a

Methodology: iact-bench v0.2.0 (audit-grade Docker validators), greedy decoding, max_tokens per GEN_PARAMS. NDJSON audit trail in ailiance/ailiance-bench. Scoring date: 2026-05-11 (commit 46801af).

Phase 6 numbers reflect adapter behavior on a Gemma-4 reference base; domain semantics transfer to the Qwen3-4B production base served via Tower Ollama :8004, but absolute scores may shift. A Qwen3-4B baseline run is tracked for a future bench refresh.

Cross-domain forgetting check (Phase 9, 2026-05-11)

For each domain's eval set (seed=101, n samples held-out), compare this LoRA's Jaccard token-overlap vs the Qwen3-4B-Instruct-2507 baseline (no adapter) on the SAME prompts. Negative Δ = the LoRA degrades base behaviour on that domain.

Eval domain LoRA Jaccard Δ vs base
kicad 0.09 +0.003
spice 0.007 +0.002
stm32 0.055 +0.005
emc 0.061 -0.005 ⬅ in-domain
embedded 0.072 -0.002
platformio 0.047 +0.005
freecad 0.03 +0.009
dsp 0.098 -0.003
iot 0.05 -0.018
power 0.078 +0.010

In-domain Δ: -0.005 Out-of-domain mean Δ: 0.001

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ailiance-fr/qwen3-4b-mascarade-emc-lora

Adapter
(5567)
this model