File size: 4,017 Bytes
f44f7cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | # OmniGene-4-CPT-v2-4bit
**BF16 model with automatic 4-bit quantization for RTX 5090 (32GB)**
This model automatically quantizes to 4-bit when loaded, requiring only ~13GB GPU memory.
## Model Description
OmniGene-4-CPT-v2-4bit is a biological foundation model with:
- **Base**: Gemma-4-26B-A4B-Instruct (MoE, 128 experts, top-8 routing)
- **Vocabulary**: 290,048 tokens (262,020 original + 28,028 bio tokens)
- **CPT data**: 32.5 GB mixed corpus (DNA, Protein, OpenWebText, Structure)
- **Training**: 0.6 epoch, 2,806 steps, 8×H20 GPUs
- **Storage**: BF16 (~49 GB, 32 shards of ~1.5GB each)
- **Runtime**: Automatic 4-bit quantization (~13GB GPU memory)
## Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model (automatically quantizes to 4-bit)
model = AutoModelForCausalLM.from_pretrained(
"dnagpt/OmniGene-4-CPT-v2-4bit",
device_map="auto", # Automatically applies quantization_config.json
)
tokenizer = AutoTokenizer.from_pretrained("dnagpt/OmniGene-4-CPT-v2-4bit")
# Generate
prompt = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Hardware Requirements
- **GPU Memory**: ~13-15GB (after automatic 4-bit quantization)
- **Recommended**: RTX 5090 (32GB), RTX 4090 (24GB), or better
- **Minimum**: RTX 3090 (24GB)
## Quantization Details
This model uses **bitsandbytes NF4 quantization** with double quantization:
- **Method**: NF4 (Normal Float 4-bit)
- **Compute dtype**: bfloat16
- **Double quantization**: Yes
- **Quality**: Minimal accuracy loss compared to BF16
The quantization happens automatically when you load the model thanks to the included `quantization_config.json`.
## Download Size vs Runtime Size
- **Download**: ~49GB (BF16 weights, 32 shards)
- **Disk**: ~49GB
- **GPU Memory**: ~13GB (after automatic quantization)
The model is stored in BF16 for maximum quality, then quantized to 4-bit at load time.
## Model Architecture
- **Layers**: 30 transformer layers
- **Experts**: 128 experts per layer (top-8 routing)
- **Hidden size**: 2816
- **Attention heads**: 22
- **Active parameters**: ~3.8B per token
- **Total parameters**: ~26B
## Biological Tokens
The model includes 28,028 additional biological tokens:
- **DNA BPE**: 20,000 tokens (optimized for genomic sequences)
- **Protein BPE**: 8,000 tokens (optimized for amino acid sequences)
- **3Di alphabet**: 20 tokens (Foldseek structural alphabet)
- **DSSP**: 8 tokens (secondary structure: H, E, C, etc.)
## Training Data
| Source | Size | Tokens | Proportion |
|---|---|---|---|
| DNA (human genome) | 8.0 GB | 2.1B | 24.6% |
| Protein (UniProt) | 8.0 GB | 2.1B | 24.6% |
| Protein (LucaOne) | 7.5 GB | 2.0B | 23.1% |
| OpenWebText | 8.0 GB | 2.1B | 24.6% |
| Structure (3Di + DSSP) | 0.4 GB | 0.1B | 1.2% |
| Instruction replay | 0.6 GB | 0.4B | 1.9% |
## Other Versions
- **Full BF16** (no quantization): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2-merged
- **LoRA adapter** (requires base model): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2
- **Instruction-tuned**: https://huggingface.co/dnagpt/OmniGene-4-SFT-v3-4bit
## Citation
```bibtex
@article{wang2026omnigene4,
title={OmniGene-4: A Unified Bio-Language MoE Model with Router-Level Interpretability},
author={Wang, Liang},
journal={bioRxiv},
year={2026}
}
```
## Paper
Full paper: https://github.com/maris205/omnigene4
## License
Apache 2.0
## Contact
Liang Wang (wangliang.f@gmail.com)
School of Artificial Intelligence and Automation
Huazhong University of Science and Technology
|