---
license: apache-2.0
language:
- en
tags:
- biology
- protein
- bioinformatics
- mixture-of-experts
- gguf
- llama.cpp
base_model: dnagpt/OmniGene-4-CPT-v2-merged
quantized_by: Liang Wang
---

# OmniGene-4-CPT-v2-GGUF

**GGUF format models for OmniGene-4-CPT-v2** (continued pretraining checkpoint)

GGUF format quantized versions of OmniGene-4 for efficient inference on consumer GPUs and CPUs using llama.cpp, llama-cpp-python, Ollama, LM Studio, and other GGUF-compatible runtimes.

## Available Quantizations

| Quantization | File | Size | RAM Required | Quality |
|---|---|---|---|---|
| **F16** | `OmniGene-4-CPT-v2-f16.gguf` | 50.6 GB | ~52 GB | Best quality |
| **Q4_K_M** | `OmniGene-4-CPT-v2-Q4_K_M.gguf` | 16 GB | ~17 GB | Recommended balance |

## Hardware Requirements

| Quantization | GPU | CPU + RAM |
|---|---|---|
| **F16** | RTX A6000 (48GB) | 64GB+ system RAM |
| **Q4_K_M** | RTX 5090 (32GB) / RTX 4090 (24GB) / RTX 3090 (24GB) | 32GB+ system RAM |

## Quick Start

### Option 1: llama-cpp-python

```bash
pip install llama-cpp-python
```

```python
from llama_cpp import Llama

llm = Llama(
    model_path="OmniGene-4-CPT-v2-Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,  # Offload all layers to GPU
)

output = llm("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK", max_tokens=100)
print(output['choices'][0]['text'])
```

### Option 2: llama.cpp Command Line

```bash
./llama-cli -m OmniGene-4-CPT-v2-Q4_K_M.gguf -p "MKTAYIAKQRQISFVKSHFSRQLEERL" -n 100 -ngl -1
```

### Option 3: Ollama

```bash
# Create Modelfile
cat > Modelfile <<EOF
FROM ./OmniGene-4-CPT-v2-Q4_K_M.gguf
EOF

ollama create omnigene-4-cpt -f Modelfile
ollama run omnigene-4-cpt
```

### Option 4: LM Studio

1. Download `OmniGene-4-CPT-v2-Q4_K_M.gguf`
2. Place in LM Studio models folder
3. Load in LM Studio
4. Start chatting

## Model Description

OmniGene-4-CPT-v2 is a biological foundation model with:
- **Base**: Gemma-4-26B-A4B-Instruct (MoE, 128 experts, top-8 routing)
- **Vocabulary**: 290,048 tokens (262,020 original + 28,028 bio tokens)
- **CPT data**: 32.5 GB mixed corpus (DNA, Protein, OpenWebText, Structure)
- **Training**: 0.6 epoch, 2,806 steps, 8×H20 GPUs

## Biological Tokens

The model includes 28,028 additional biological tokens:
- **DNA BPE**: 20,000 tokens (optimized for genomic sequences)
- **Protein BPE**: 8,000 tokens (optimized for amino acid sequences)
- **3Di alphabet**: 20 tokens (Foldseek structural alphabet)
- **DSSP**: 8 tokens (secondary structure: H, E, C, etc.)

## Other Versions

- **Full BF16** (HuggingFace transformers): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2-merged
- **LoRA adapter** (requires base model): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2
- **4-bit auto-quantize**: https://huggingface.co/dnagpt/OmniGene-4-CPT-v2-4bit
- **Instruction-tuned GGUF**: https://huggingface.co/dnagpt/OmniGene-4-SFT-v3-GGUF

## Citation

```bibtex
@article{wang2026omnigene4,
  title={OmniGene-4: A Unified Bio-Language MoE Model with Router-Level Interpretability},
  author={Wang, Liang},
  journal={bioRxiv},
  year={2026}
}
```

## Paper

Full paper: https://github.com/maris205/omnigene4

## License

Apache 2.0

## Contact

Liang Wang (wangliang.f@gmail.com)  
School of Artificial Intelligence and Automation  
Huazhong University of Science and Technology