--- license: apache-2.0 language: - en tags: - biology - protein - bioinformatics - mixture-of-experts - gguf - llama.cpp base_model: dnagpt/OmniGene-4-CPT-v2-merged quantized_by: Liang Wang --- # OmniGene-4-CPT-v2-GGUF **GGUF format models for OmniGene-4-CPT-v2** (continued pretraining checkpoint) GGUF format quantized versions of OmniGene-4 for efficient inference on consumer GPUs and CPUs using llama.cpp, llama-cpp-python, Ollama, LM Studio, and other GGUF-compatible runtimes. ## Available Quantizations | Quantization | File | Size | RAM Required | Quality | |---|---|---|---|---| | **F16** | `OmniGene-4-CPT-v2-f16.gguf` | 50.6 GB | ~52 GB | Best quality | | **Q4_K_M** | `OmniGene-4-CPT-v2-Q4_K_M.gguf` | 16 GB | ~17 GB | Recommended balance | ## Hardware Requirements | Quantization | GPU | CPU + RAM | |---|---|---| | **F16** | RTX A6000 (48GB) | 64GB+ system RAM | | **Q4_K_M** | RTX 5090 (32GB) / RTX 4090 (24GB) / RTX 3090 (24GB) | 32GB+ system RAM | ## Quick Start ### Option 1: llama-cpp-python ```bash pip install llama-cpp-python ``` ```python from llama_cpp import Llama llm = Llama( model_path="OmniGene-4-CPT-v2-Q4_K_M.gguf", n_ctx=4096, n_gpu_layers=-1, # Offload all layers to GPU ) output = llm("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK", max_tokens=100) print(output['choices'][0]['text']) ``` ### Option 2: llama.cpp Command Line ```bash ./llama-cli -m OmniGene-4-CPT-v2-Q4_K_M.gguf -p "MKTAYIAKQRQISFVKSHFSRQLEERL" -n 100 -ngl -1 ``` ### Option 3: Ollama ```bash # Create Modelfile cat > Modelfile <