---
license: apache-2.0
base_model: unsloth/gemma-4-26B-A4B-it
tags:
  - poker
  - blackjack
  - card-games
  - vision
  - game-ai
  - reinforcement-learning
  - lora
  - gemma4
datasets:
  - custom
pipeline_tag: image-text-to-text
---

# Poker & Blackjack Vision AI — Gemma 4 26B-A4B LoRA

Fine-tuned **Gemma 4 26B-A4B** (vision MoE: 26B total params, 4B active per token) for poker and blackjack decision-making.

## What This Model Does

Given a poker or blackjack game state, the model outputs the optimal action (fold, call, raise, hit, stand, etc.) as JSON.

**This is a vision model** — it can process card images directly, enabling camera-to-decision pipelines (e.g., smart glasses → see cards → optimal play).

## Training Details

- **Base model**: `unsloth/gemma-4-26B-A4B-it` (MoE with vision encoder)
- **Method**: LoRA (r=16, alpha=32) on q/k/v/o/gate/up/down projections
- **Data**: 12,848 examples (3,072 poker + 9,776 blackjack)
  - Poker: Winning decisions from TAGBot, EquityBot, ExploitBot
  - Blackjack: Counter (I18) optimal strategy
- **Training**: 3 epochs, 2,289 steps on NVIDIA A6000 48GB
- **Final metrics**: Loss 0.109, Token accuracy 95.95%
- **Cost**: ~$3.73 on RunPod

## Poker Prompt Format

```
You are a specialist in playing 6-handed No Limit Texas Holdem. Do not explain your answer.

Game summary:
- Small blind: 5 chips, Big blind: 10 chips
- Your position: BTN, Your holding: As Ks
- Board: 7c 4d 2h
- Pot: 75, To call: 0, Your stack: 970
- Equity: 73%, Pot odds: 0%
- Legal actions: check, raise, all_in

Respond with ONLY valid JSON: {"action": "fold|check|call|raise|all_in", "amount": 0}
```

## Blackjack Prompt Format

```
You are a blackjack expert. Decide the best action.
Your cards: Ace, 6 (total: 17 soft)
Dealer showing: 9
Available actions: hit, stand, double
Respond with ONLY the action word.
```

## Usage with llama.cpp (recommended for Mac)

```bash
# Merge LoRA → GGUF Q3_K_M (~11GB, fits 16GB Mac)
# Then serve:
llama-server --model gemma4-poker-26b-q3_k_m.gguf --port 8080 --n-gpu-layers 999 --ctx-size 2048 --jinja
```

**Important**: Disable thinking mode for fast responses:
```json
{"chat_template_kwargs": {"enable_thinking": false}}
```

## Arena Results (E4B version, 1000 hands)

| Metric | Value |
|--------|-------|
| BB/100 | -0.1 (breakeven) |
| VPIP | 80.5% |
| Style | LAG (loose-aggressive) |
| vs CallingStation | +9.3 bb/100 |

*Note: Model plays too many hands (VPIP too high). GRPO reinforcement learning is the planned fix.*

## Part of the Flywheel

This model is part of an iterative training loop:
1. Bots play → generate winning decisions → SFT training (this model)
2. Model plays in arena → find weaknesses → GRPO with reward functions
3. Retrain → better model → repeat

## Links

- [E4B version (smaller, 7.5B)](https://huggingface.co/waltgrace/poker-gemma4-e4b-lora)
- Built with the poker/blackjack arena platform