general-slm-300m-20260424-p5qcva

. Trained on ? documents (~? tokens) from the domain.

Model Details

  • Parameters: 300000000
  • Base model: Qwen/Qwen2.5-1.5B
  • Training regime: lora-sft
  • Training framework: huggingface-trainer
  • Chat template: qwen2
  • License: apache-2.0

Intended Use

Training Data

Curated corpus (forge forge-2026-04-24-publications-v2-p5qcva)

Domain: ``. Corpus shape: ? tokens, ? documents.

Evaluation

Full eval reports: comparison-vs-baseline.md.

Domain-specific performance

(see eval/comparison-vs-baseline.md)

Generic performance

(M5 v1: generic eval deferred to hardening)

Baseline comparison

(see eval/comparison-vs-baseline.md)

Limitations

  • Small model — capabilities are proportionally limited
  • Domain specialization may cause out-of-domain degradation vs baseline
  • Not a general-purpose assistant; best at -specific tasks

Recommended sampling settings

This is a small model (≤300M params) fine-tuned on a narrow domain. Without the parameters below, it can collapse into repetition or emit garbled GGUF detokenize artifacts (e.g. literal tti, ttiassistant from Qwen ChatML markers). Always use:

param value reason
temperature 0.5 0.7 is too hot for a small base on narrow corpus
top_p 0.9 constrains tail
top_k 40 constrains tail
repeat_penalty 1.18 kills paragraph loops (the main failure mode)
repeat_last_n 256 window for the penalty
max_tokens 320 keep responses short

LM Studio: open the right-hand "Advanced configuration" panel and set the values above. Also add these stop strings under "Stop strings": <|im_end|>, <|im_start|>, <|endoftext|>, ttiuser, ttiassistant.

Ollama: the published Modelfile has these defaults baked in — just ollama create ... -f Modelfile.

llama.cpp / llama-cpp-python: see the Space app.py for a known-working configuration.

How to Use

Download the GGUF for local inference (LM Studio, llama.cpp, text-generation-webui)

Quantizations available:

  • gguf/model-Q4_K_M.gguf — ~986.0 MB — best size/quality trade-off
  • gguf/model-Q8_0.gguf — ~1646.6 MB — near-lossless

Load via HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Nexless/general-slm-300m-20260424-p5qcva")
model = AutoModelForCausalLM.from_pretrained("Nexless/general-slm-300m-20260424-p5qcva")

Run in a terminal with Ollama

curl https://huggingface.co/Nexless/general-slm-300m-20260424-p5qcva/raw/main/Modelfile -o Modelfile
ollama create general-slm-300m-20260424-p5qcva -f Modelfile
ollama run general-slm-300m-20260424-p5qcva

Try it in a browser

See the live demo Space.

Training Details

  • Training hardware: g5.xlarge
  • Training duration: 63 minutes
  • Training tokens: ?
  • Training cost: $0 on AWS (NVIDIA Inception credits)
  • Reproducibility hash: forge-2026-04-24-publications-v2-p5qcva

Citation

If you use this model, please cite:

@misc{{general-slm-300m-20260424-p5qcva_2026},
  title={{general-slm-300m-20260424-p5qcva: a 300M-parameter  SLM}},
  author={{Nexless}},
  year={{2026}},
  url={{https://huggingface.co/Nexless/general-slm-300m-20260424-p5qcva}}
}

Forged with the SLM-Forge skill tree.

Downloads last month
-
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nexless/dental-research-slm-1.5b-v2-p5qcva

Quantized
(70)
this model

Space using Nexless/dental-research-slm-1.5b-v2-p5qcva 1