# OmniGene-4-CPT-v2-4bit **BF16 model with automatic 4-bit quantization for RTX 5090 (32GB)** This model automatically quantizes to 4-bit when loaded, requiring only ~13GB GPU memory. ## Model Description OmniGene-4-CPT-v2-4bit is a biological foundation model with: - **Base**: Gemma-4-26B-A4B-Instruct (MoE, 128 experts, top-8 routing) - **Vocabulary**: 290,048 tokens (262,020 original + 28,028 bio tokens) - **CPT data**: 32.5 GB mixed corpus (DNA, Protein, OpenWebText, Structure) - **Training**: 0.6 epoch, 2,806 steps, 8×H20 GPUs - **Storage**: BF16 (~49 GB, 32 shards of ~1.5GB each) - **Runtime**: Automatic 4-bit quantization (~13GB GPU memory) ## Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model (automatically quantizes to 4-bit) model = AutoModelForCausalLM.from_pretrained( "dnagpt/OmniGene-4-CPT-v2-4bit", device_map="auto", # Automatically applies quantization_config.json ) tokenizer = AutoTokenizer.from_pretrained("dnagpt/OmniGene-4-CPT-v2-4bit") # Generate prompt = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Hardware Requirements - **GPU Memory**: ~13-15GB (after automatic 4-bit quantization) - **Recommended**: RTX 5090 (32GB), RTX 4090 (24GB), or better - **Minimum**: RTX 3090 (24GB) ## Quantization Details This model uses **bitsandbytes NF4 quantization** with double quantization: - **Method**: NF4 (Normal Float 4-bit) - **Compute dtype**: bfloat16 - **Double quantization**: Yes - **Quality**: Minimal accuracy loss compared to BF16 The quantization happens automatically when you load the model thanks to the included `quantization_config.json`. ## Download Size vs Runtime Size - **Download**: ~49GB (BF16 weights, 32 shards) - **Disk**: ~49GB - **GPU Memory**: ~13GB (after automatic quantization) The model is stored in BF16 for maximum quality, then quantized to 4-bit at load time. ## Model Architecture - **Layers**: 30 transformer layers - **Experts**: 128 experts per layer (top-8 routing) - **Hidden size**: 2816 - **Attention heads**: 22 - **Active parameters**: ~3.8B per token - **Total parameters**: ~26B ## Biological Tokens The model includes 28,028 additional biological tokens: - **DNA BPE**: 20,000 tokens (optimized for genomic sequences) - **Protein BPE**: 8,000 tokens (optimized for amino acid sequences) - **3Di alphabet**: 20 tokens (Foldseek structural alphabet) - **DSSP**: 8 tokens (secondary structure: H, E, C, etc.) ## Training Data | Source | Size | Tokens | Proportion | |---|---|---|---| | DNA (human genome) | 8.0 GB | 2.1B | 24.6% | | Protein (UniProt) | 8.0 GB | 2.1B | 24.6% | | Protein (LucaOne) | 7.5 GB | 2.0B | 23.1% | | OpenWebText | 8.0 GB | 2.1B | 24.6% | | Structure (3Di + DSSP) | 0.4 GB | 0.1B | 1.2% | | Instruction replay | 0.6 GB | 0.4B | 1.9% | ## Other Versions - **Full BF16** (no quantization): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2-merged - **LoRA adapter** (requires base model): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2 - **Instruction-tuned**: https://huggingface.co/dnagpt/OmniGene-4-SFT-v3-4bit ## Citation ```bibtex @article{wang2026omnigene4, title={OmniGene-4: A Unified Bio-Language MoE Model with Router-Level Interpretability}, author={Wang, Liang}, journal={bioRxiv}, year={2026} } ``` ## Paper Full paper: https://github.com/maris205/omnigene4 ## License Apache 2.0 ## Contact Liang Wang (wangliang.f@gmail.com) School of Artificial Intelligence and Automation Huazhong University of Science and Technology