--- base_model: mlx-community/Qwen2.5-Coder-7B-Instruct-4bit language: - en library_name: mlx license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE pipeline_tag: text-generation tags: - code - qwen-coder - mlx - cybersecurity - nist - fine-tuned --- # HackIDLE-NIST-Coder (MLX 4-bit) This is the first MLX build of HackIDLE-NIST-Coder, a NIST-focused local model built from Qwen2.5-Coder-7B-Instruct and fine-tuned on a NIST cybersecurity corpus. This repo is kept for reproducibility. For new testing, start with the v1.1 MLX build: - [ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit](https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit) Use this model as a helper. Do not treat it as a source of truth for exact control names, RMF step lists, or reference-architecture component names without checking the source publication. ## Training data This first build used `523,706` examples from `568` NIST cybersecurity documents. Training dataset: - [ethanolivertroy/nist-cybersecurity-training](https://huggingface.co/datasets/ethanolivertroy/nist-cybersecurity-training) ## Current eval status The dated smoke eval from April 22, 2026 was run against the Ollama `latest` tag, which matched the v1.1 line in the local install used for that check. I have not rerun that exact eval against this older MLX build. The v1.1 result matters for this older build too because it sets the right expectation for the model family: the model can stay in-domain while still missing exact NIST structure. Be careful with: - exact control names - exact RMF step ordering - exact SP 800-207 component naming - source-level answers that need to be right on the first pass ## Installation ```bash pip install mlx-lm ``` ## Usage ```python from mlx_lm import load, generate model, tokenizer = load("ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit") prompt = "Which NIST docs would you start with for contractor remote access?" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True) response = generate(model, tokenizer, prompt=prompt, max_tokens=500) print(response) ``` ## License The base model is Qwen2.5-Coder-7B-Instruct, released under Apache 2.0. The NIST source publications used for the dataset are public domain U.S. government works. This model card uses Apache 2.0 for the model artifact and documents the NIST data source separately.