---
language:
- en
license: apache-2.0
tags:
- gin-rummy
- card-games
- behavioral-cloning
- reinforcement-learning
- game-ai
base_model: Qwen/Qwen3.5-0.8B
datasets:
- GoodStartLabs/gin-rummy-trajectories-32k
metrics:
- accuracy
pipeline_tag: text-generation
---

# Gin Rummy HBC - Qwen3.5 0.8B

**Behavioral cloning model for Gin Rummy trained via supervised fine-tuning on expert trajectories.**

This model was trained on 32,000 stratified expert game states to learn optimal Gin Rummy decision-making. It serves as the initialization for subsequent GRPO (Group Relative Policy Optimization) self-play training.

## Model Details

- **Model type:** Causal language model (decoder-only transformer)
- **Base model:** [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B)
- **Parameters:** 0.8B parameters
- **Training method:** LoRA (Low-Rank Adaptation) fine-tuning
- **Task:** Gin Rummy move prediction
- **License:** Apache 2.0

## Training Data

**Dataset:** [GoodStartLabs/gin-rummy-trajectories-32k](https://huggingface.co/datasets/GoodStartLabs/gin-rummy-trajectories-32k)

- **Training samples:** 32,000 (stratified sampling, minimum 1,000 per action type)
- **Validation samples:** 1,000 (perfectly balanced, 200 per action type)
- **Source:** Expert agent gameplay using Monte Carlo Tree Search (MCTS)

**Action distribution (training set):**
- `discard` (discard a card): 44.6%
- `draw` (draw from stock): 33.1%
- `+discard` (pick from discard pile): 14.9%
- `KNOCK-[card]` (knock and discard): 4.0%
- `pass` (pass on upcard): 3.5%

**Validation set:** Perfectly balanced with exactly 200 samples per action type for unbiased evaluation.

## Training Procedure

**Fine-tuning platform:** Together AI (serverless LoRA training)

**Hyperparameters:**
- LoRA rank: 16 (0.8B, 2B) / 32 (4B)
- LoRA alpha: 16 (0.8B, 2B) / 32 (4B)
- LoRA dropout: 0.05
- LoRA modules: all-linear
- Learning rate: 1e-4 (0.8B) / 5e-5 (2B, 4B)
- Batch size: 8
- Epochs: 3
- Warmup ratio: 0.1
- Weight decay: 0.01
- Max gradient norm: 1.0
- **Train on inputs:** False (loss calculated only on assistant response tokens)

**Training duration:** ~2-4 hours per model

**Infrastructure:**
- Platform: Together AI
- GPUs: NVIDIA H100 (serverless)
- Precision: bfloat16

## Intended Use

### Primary Use Case

This model serves as the **warm-start initialization** for GRPO self-play training:

1. **HBC (Behavioral Cloning)** ← *This model*
   - Learn from expert trajectories
   - Acquire strong baseline policy
   - Fast convergence to competent play

2. **GRPO (Group Relative Policy Optimization)** ← *Next stage*
   - Self-play reinforcement learning
   - Discover novel strategies
   - Optimize for win rate

### Inference

The model predicts the next action given the current game state formatted as a chat conversation:

**Input format:**
```
[SYSTEM]
You are an expert Gin Rummy player. Your goal is to minimize deadwood and form melds.

[USER]
History:
1. You: +D6x -C3
2. Opp: draw -CK

Now:
Hand: CK D2 D3 D4 D5 D6 D9 H7 HK HQ S9
Stock: 28 | Deadwood: 45 | Phase: discard_or_knock
YOUR TURN | Can: no

[ASSISTANT]
```

**Output (predicted action):**
```
-H7
```

**Action format:**
- `draw` - Draw from stock pile
- `+discard` - Pick from discard pile
- `-[CARD]` - Discard a card (e.g., `-H7` = discard 7 of Hearts)
- `KNOCK-[CARD]` - Knock and discard (e.g., `KNOCK-C3`)
- `pass` - Pass on the initial upcard

**Card notation:** Rank (A/2-9/T/J/Q/K) + Suit (C/D/H/S)
- Example: `H7` = 7 of Hearts, `CK` = King of Clubs, `SA` = Ace of Spades

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "GoodStartLabs/gin-rummy-hbc-qwen3.5-0.8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto",
)

# Format game state as chat
messages = [
    {
        "role": "system",
        "content": "You are an expert Gin Rummy player. Your goal is to minimize deadwood and form melds."
    },
    {
        "role": "user",
        "content": '''History:
1. Opp: draw -SQ
2. You: draw(DT) -DT

Now:
Hand: C9 D3 D9 H3 H6 HJ HQ HT S6 S9
Stock: 22 | Deadwood: 18 | Phase: draw
YOUR TURN | Can: no'''
    }
]

# Generate prediction
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=20,
    temperature=0.0,  # Greedy decoding for deterministic play
    do_sample=False,
)

action = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip()
print(f"Predicted action: {action}")
```

## Limitations

- **Behavioral cloning ceiling:** Model is limited by the quality of expert demonstrations. Cannot exceed expert performance without RL.
- **Distribution shift:** May struggle on game states not represented in training data.
- **Stochastic policy:** Model predicts a distribution over actions; greedy decoding gives deterministic play but may not explore optimally.
- **No opponent modeling:** Does not explicitly model opponent strategy (though may learn implicit patterns from game history).
- **Fixed strategy:** Cannot adapt during a game; uses the same policy throughout.

## Evaluation

**Validation accuracy (on balanced 1K validation set):**
- Overall: *TBD* (check W&B: [good-start-labs/gin-rummy-hbc](https://wandb.ai/good-start-labs/gin-rummy-hbc))
- Per action type: *TBD*

**Win rate vs. baselines:**
- Random policy: *TBD*
- Greedy heuristic: *TBD*
- Expert policy: *TBD*

## Ethical Considerations

This model is trained for the game of Gin Rummy and should only be used for:
- Game AI research
- Educational purposes
- Entertainment (single-player practice, AI opponents)

**Not intended for:**
- Real-money gambling
- Cheating in online games
- Deceptive or manipulative applications

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{gin-rummy-hbc-0.8b,
  author = {Good Start Labs},
  title = {Gin Rummy HBC - Qwen3.5 0.8B},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{GoodStartLabs/gin-rummy-hbc-qwen3.5-0.8b}},
}
```

## Model Card Authors

- Good Start Labs
- Contact: [GitHub](https://github.com/GoodStartLabs)

## Model Card Contact

For questions or issues with this model:
- Open an issue on the [model repository](https://huggingface.co/GoodStartLabs/gin-rummy-hbc-qwen3.5-0.8b)
- Check [W&B training logs](https://wandb.ai/good-start-labs/gin-rummy-hbc)

---

*Model trained on Together AI • Base model: Qwen3.5 • Training date: March 2026*