---
base_model: mlx-community/Qwen2.5-Coder-7B-Instruct-4bit
language:
- en
library_name: mlx
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
tags:
- code
- codeqwen
- chat
- qwen
- qwen-coder
- mlx
- uvl
- feature-models
- software-product-lines
---

# UVL feature-model generator — fused SFT model (MLX 4-bit)

A standalone, **MLX 4-bit quantized** model (Qwen2.5-Coder-7B-Instruct) fine-tuned
to generate **UVL** (Universal Variability Language) feature models from a domain
description and structural constraints (feature count, cross-tree constraints,
depth, UVL level).

This is the **fused** model: the SFT LoRA adapter has been merged into the base,
so it runs directly with [mlx-lm](https://github.com/ml-explore/mlx-lm) on Apple
Silicon — no separate adapter or base download needed. For a portable adapter you
can apply to the full-precision base, use the LoRA repos linked below.

Trained with [fine-tunning-uvl](https://github.com/jagalindo/fine-tunning-uvl) on
the good/bad generations of the `UVL_LLM_Guided_Generation` benchmark, supervised
on the *valid* (flamapy-parseable) outputs with prompt masking.

## Results

Valid-UVL rate on a 62-prompt held-out test set (flamapy validator):

- single-shot, raw: **77.42%**
- single-shot + dedup: **96.77%**
- SFT + retry-until-valid(5) + light post-processing: **96.77%**

## Usage

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("jagalindo/uvl-qwen2.5-coder-7b-sft")

messages = [
    {"role": "system", "content": "You are an expert in Software Product Lines ... generate valid UVL ..."},
    {"role": "user",   "content": "## Your task\nGenerate a valid UVL feature model for ..."},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, max_tokens=1536, verbose=True))
```

For best results, sample a few candidates and keep the first that parses with
flamapy (retry-until-valid), then apply the project's light post-processing.

## Limitations

Small/medium model on a niche DSL; not every generation parses on the first try.
Use retry-until-valid + the project's light post-processing (dedup feature names,
drop dangling constraints, quote non-identifier namespaces) to reach the headline
rate. The 4-bit quantization trades a little fidelity for size/speed; the
full-precision LoRA adapter is available if you need it.

## Related models

Part of the [fine-tunning-uvl](https://github.com/jagalindo/fine-tunning-uvl)
project — three artifacts from the same training run, all based on
`Qwen/Qwen2.5-Coder-7B-Instruct`:

- [`jagalindo/uvl-qwen2.5-coder-7b-sft`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-sft) — **this model**: fused full model (MLX 4-bit, runs standalone via mlx-lm)
- [`jagalindo/uvl-qwen2.5-coder-7b-sft-merged`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-sft-merged) — merged bf16 full model (CUDA-ready, transformers)
- [`jagalindo/uvl-qwen2.5-coder-7b-sft-lora`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-sft-lora) — SFT **LoRA adapter** (apply on the full-precision base; **recommended** checkpoint)
- [`jagalindo/uvl-qwen2.5-coder-7b-dpo-lora`](https://huggingface.co/jagalindo/uvl-qwen2.5-coder-7b-dpo-lora) — DPO LoRA adapter (experimental; regressed vs SFT)