---
base_model: mlx-community/Qwen2.5-Coder-7B-Instruct-4bit
language:
- en
library_name: mlx
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
tags:
- code
- qwen-coder
- mlx
- cybersecurity
- nist
- fine-tuned
---

# HackIDLE-NIST-Coder (MLX 4-bit)

This is the first MLX build of HackIDLE-NIST-Coder, a NIST-focused local model built from Qwen2.5-Coder-7B-Instruct and fine-tuned on a NIST cybersecurity corpus.

This repo is kept for reproducibility. For new testing, start with the v1.1 MLX build:

- [ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit](https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit)

Use this model as a helper. Do not treat it as a source of truth for exact control names, RMF step lists, or reference-architecture component names without checking the source publication.

## Training data

This first build used `523,706` examples from `568` NIST cybersecurity documents.

Training dataset:

- [ethanolivertroy/nist-cybersecurity-training](https://huggingface.co/datasets/ethanolivertroy/nist-cybersecurity-training)

## Current eval status

The dated smoke eval from April 22, 2026 was run against the Ollama `latest` tag, which matched the v1.1 line in the local install used for that check. I have not rerun that exact eval against this older MLX build.

The v1.1 result matters for this older build too because it sets the right expectation for the model family: the model can stay in-domain while still missing exact NIST structure.

Be careful with:

- exact control names
- exact RMF step ordering
- exact SP 800-207 component naming
- source-level answers that need to be right on the first pass

## Installation

```bash
pip install mlx-lm
```

## Usage

```python
from mlx_lm import load, generate

model, tokenizer = load("ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit")

prompt = "Which NIST docs would you start with for contractor remote access?"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)
```

## License

The base model is Qwen2.5-Coder-7B-Instruct, released under Apache 2.0. The NIST source publications used for the dataset are public domain U.S. government works. This model card uses Apache 2.0 for the model artifact and documents the NIST data source separately.