--- license: apache-2.0 base_model: unsloth/gemma-4-26B-A4B-it tags: - poker - blackjack - card-games - vision - game-ai - reinforcement-learning - lora - gemma4 datasets: - custom pipeline_tag: image-text-to-text --- # Poker & Blackjack Vision AI — Gemma 4 26B-A4B LoRA Fine-tuned **Gemma 4 26B-A4B** (vision MoE: 26B total params, 4B active per token) for poker and blackjack decision-making. ## What This Model Does Given a poker or blackjack game state, the model outputs the optimal action (fold, call, raise, hit, stand, etc.) as JSON. **This is a vision model** — it can process card images directly, enabling camera-to-decision pipelines (e.g., smart glasses → see cards → optimal play). ## Training Details - **Base model**: `unsloth/gemma-4-26B-A4B-it` (MoE with vision encoder) - **Method**: LoRA (r=16, alpha=32) on q/k/v/o/gate/up/down projections - **Data**: 12,848 examples (3,072 poker + 9,776 blackjack) - Poker: Winning decisions from TAGBot, EquityBot, ExploitBot - Blackjack: Counter (I18) optimal strategy - **Training**: 3 epochs, 2,289 steps on NVIDIA A6000 48GB - **Final metrics**: Loss 0.109, Token accuracy 95.95% - **Cost**: ~$3.73 on RunPod ## Poker Prompt Format ``` You are a specialist in playing 6-handed No Limit Texas Holdem. Do not explain your answer. Game summary: - Small blind: 5 chips, Big blind: 10 chips - Your position: BTN, Your holding: As Ks - Board: 7c 4d 2h - Pot: 75, To call: 0, Your stack: 970 - Equity: 73%, Pot odds: 0% - Legal actions: check, raise, all_in Respond with ONLY valid JSON: {"action": "fold|check|call|raise|all_in", "amount": 0} ``` ## Blackjack Prompt Format ``` You are a blackjack expert. Decide the best action. Your cards: Ace, 6 (total: 17 soft) Dealer showing: 9 Available actions: hit, stand, double Respond with ONLY the action word. ``` ## Usage with llama.cpp (recommended for Mac) ```bash # Merge LoRA → GGUF Q3_K_M (~11GB, fits 16GB Mac) # Then serve: llama-server --model gemma4-poker-26b-q3_k_m.gguf --port 8080 --n-gpu-layers 999 --ctx-size 2048 --jinja ``` **Important**: Disable thinking mode for fast responses: ```json {"chat_template_kwargs": {"enable_thinking": false}} ``` ## Arena Results (E4B version, 1000 hands) | Metric | Value | |--------|-------| | BB/100 | -0.1 (breakeven) | | VPIP | 80.5% | | Style | LAG (loose-aggressive) | | vs CallingStation | +9.3 bb/100 | *Note: Model plays too many hands (VPIP too high). GRPO reinforcement learning is the planned fix.* ## Part of the Flywheel This model is part of an iterative training loop: 1. Bots play → generate winning decisions → SFT training (this model) 2. Model plays in arena → find weaknesses → GRPO with reward functions 3. Retrain → better model → repeat ## Links - [E4B version (smaller, 7.5B)](https://huggingface.co/waltgrace/poker-gemma4-e4b-lora) - Built with the poker/blackjack arena platform