--- base_model: mlx-community/Qwen2.5-Coder-7B-Instruct-4bit language: - en library_name: mlx license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE pipeline_tag: text-generation tags: - code - codeqwen - chat - qwen - qwen-coder - mlx - uvl - feature-models - software-product-lines --- # UVL feature-model generator — fused SFT model (MLX 4-bit) A standalone, **MLX 4-bit quantized** model (Qwen2.5-Coder-7B-Instruct) fine-tuned to generate **UVL** (Universal Variability Language) feature models from a domain description and structural constraints (feature count, cross-tree constraints, depth, UVL level). This is the **fused** model: the SFT LoRA adapter has been merged into the base, so it runs directly with [mlx-lm](https://github.com/ml-explore/mlx-lm) on Apple Silicon — no separate adapter or base download needed. For a portable adapter you can apply to the full-precision base, use the LoRA repos linked below. Trained with [fine-tunning-uvl](https://github.com/jagalindo/fine-tunning-uvl) on the good/bad generations of the `UVL_LLM_Guided_Generation` benchmark, supervised on the *valid* (flamapy-parseable) outputs with prompt masking. ## Results Valid-UVL rate on a 62-prompt held-out test set (flamapy validator): - single-shot, raw: **77.42%** - single-shot + dedup: **96.77%** - SFT + retry-until-valid(5) + light post-processing: **96.77%** ## Usage ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("jagalindo/uvl-qwen2.5-coder-7b-sft") messages = [ {"role": "system", "content": "You are an expert in Software Product Lines ... generate valid UVL ..."}, {"role": "user", "content": "## Your task\nGenerate a valid UVL feature model for ..."}, ] prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True) print(generate(model, tokenizer, prompt=prompt, max_tokens=1536, verbose=True)) ``` For best results, sample a few candidates and keep the first that parses with flamapy (retry-until-valid), then apply the project's light post-processing. ## Limitations Small/medium model on a niche DSL; not every generation parses on the first try. Use retry-until-valid + the project's light post-processing (dedup feature names, drop dangling constraints, quote non-identifier namespaces) to reach the headline rate. The 4-bit quantization trades a little fidelity for size/speed; the full-precision LoRA adapter is available if you need it. ## Related models Part of the [fine-tunning-uvl](https://github.com/jagalindo/fine-tunning-uvl) project — three artifacts from the same training run, all based on `Qwen/Qwen2.5-Coder-7B-Instruct`: - [`jagalindo/uvl-qwen2.5-coder-7b-sft`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-sft) — **this model**: fused full model (MLX 4-bit, runs standalone via mlx-lm) - [`jagalindo/uvl-qwen2.5-coder-7b-sft-merged`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-sft-merged) — merged bf16 full model (CUDA-ready, transformers) - [`jagalindo/uvl-qwen2.5-coder-7b-sft-lora`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-sft-lora) — SFT **LoRA adapter** (apply on the full-precision base; **recommended** checkpoint) - [`jagalindo/uvl-qwen2.5-coder-7b-dpo-lora`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-dpo-lora) — DPO LoRA adapter (experimental; regressed vs SFT)