--- language: - en license: apache-2.0 tags: - gin-rummy - card-games - behavioral-cloning - reinforcement-learning - game-ai base_model: Qwen/Qwen3.5-0.8B datasets: - GoodStartLabs/gin-rummy-trajectories-32k metrics: - accuracy pipeline_tag: text-generation --- # Gin Rummy HBC - Qwen3.5 0.8B **Behavioral cloning model for Gin Rummy trained via supervised fine-tuning on expert trajectories.** This model was trained on 32,000 stratified expert game states to learn optimal Gin Rummy decision-making. It serves as the initialization for subsequent GRPO (Group Relative Policy Optimization) self-play training. ## Model Details - **Model type:** Causal language model (decoder-only transformer) - **Base model:** [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) - **Parameters:** 0.8B parameters - **Training method:** LoRA (Low-Rank Adaptation) fine-tuning - **Task:** Gin Rummy move prediction - **License:** Apache 2.0 ## Training Data **Dataset:** [GoodStartLabs/gin-rummy-trajectories-32k](https://huggingface.co/datasets/GoodStartLabs/gin-rummy-trajectories-32k) - **Training samples:** 32,000 (stratified sampling, minimum 1,000 per action type) - **Validation samples:** 1,000 (perfectly balanced, 200 per action type) - **Source:** Expert agent gameplay using Monte Carlo Tree Search (MCTS) **Action distribution (training set):** - `discard` (discard a card): 44.6% - `draw` (draw from stock): 33.1% - `+discard` (pick from discard pile): 14.9% - `KNOCK-[card]` (knock and discard): 4.0% - `pass` (pass on upcard): 3.5% **Validation set:** Perfectly balanced with exactly 200 samples per action type for unbiased evaluation. ## Training Procedure **Fine-tuning platform:** Together AI (serverless LoRA training) **Hyperparameters:** - LoRA rank: 16 (0.8B, 2B) / 32 (4B) - LoRA alpha: 16 (0.8B, 2B) / 32 (4B) - LoRA dropout: 0.05 - LoRA modules: all-linear - Learning rate: 1e-4 (0.8B) / 5e-5 (2B, 4B) - Batch size: 8 - Epochs: 3 - Warmup ratio: 0.1 - Weight decay: 0.01 - Max gradient norm: 1.0 - **Train on inputs:** False (loss calculated only on assistant response tokens) **Training duration:** ~2-4 hours per model **Infrastructure:** - Platform: Together AI - GPUs: NVIDIA H100 (serverless) - Precision: bfloat16 ## Intended Use ### Primary Use Case This model serves as the **warm-start initialization** for GRPO self-play training: 1. **HBC (Behavioral Cloning)** ← *This model* - Learn from expert trajectories - Acquire strong baseline policy - Fast convergence to competent play 2. **GRPO (Group Relative Policy Optimization)** ← *Next stage* - Self-play reinforcement learning - Discover novel strategies - Optimize for win rate ### Inference The model predicts the next action given the current game state formatted as a chat conversation: **Input format:** ``` [SYSTEM] You are an expert Gin Rummy player. Your goal is to minimize deadwood and form melds. [USER] History: 1. You: +D6x -C3 2. Opp: draw -CK Now: Hand: CK D2 D3 D4 D5 D6 D9 H7 HK HQ S9 Stock: 28 | Deadwood: 45 | Phase: discard_or_knock YOUR TURN | Can: no [ASSISTANT] ``` **Output (predicted action):** ``` -H7 ``` **Action format:** - `draw` - Draw from stock pile - `+discard` - Pick from discard pile - `-[CARD]` - Discard a card (e.g., `-H7` = discard 7 of Hearts) - `KNOCK-[CARD]` - Knock and discard (e.g., `KNOCK-C3`) - `pass` - Pass on the initial upcard **Card notation:** Rank (A/2-9/T/J/Q/K) + Suit (C/D/H/S) - Example: `H7` = 7 of Hearts, `CK` = King of Clubs, `SA` = Ace of Spades ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model_name = "GoodStartLabs/gin-rummy-hbc-qwen3.5-0.8b" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype="auto", ) # Format game state as chat messages = [ { "role": "system", "content": "You are an expert Gin Rummy player. Your goal is to minimize deadwood and form melds." }, { "role": "user", "content": '''History: 1. Opp: draw -SQ 2. You: draw(DT) -DT Now: Hand: C9 D3 D9 H3 H6 HJ HQ HT S6 S9 Stock: 22 | Deadwood: 18 | Phase: draw YOUR TURN | Can: no''' } ] # Generate prediction text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=20, temperature=0.0, # Greedy decoding for deterministic play do_sample=False, ) action = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip() print(f"Predicted action: {action}") ``` ## Limitations - **Behavioral cloning ceiling:** Model is limited by the quality of expert demonstrations. Cannot exceed expert performance without RL. - **Distribution shift:** May struggle on game states not represented in training data. - **Stochastic policy:** Model predicts a distribution over actions; greedy decoding gives deterministic play but may not explore optimally. - **No opponent modeling:** Does not explicitly model opponent strategy (though may learn implicit patterns from game history). - **Fixed strategy:** Cannot adapt during a game; uses the same policy throughout. ## Evaluation **Validation accuracy (on balanced 1K validation set):** - Overall: *TBD* (check W&B: [good-start-labs/gin-rummy-hbc](https://wandb.ai/good-start-labs/gin-rummy-hbc)) - Per action type: *TBD* **Win rate vs. baselines:** - Random policy: *TBD* - Greedy heuristic: *TBD* - Expert policy: *TBD* ## Ethical Considerations This model is trained for the game of Gin Rummy and should only be used for: - Game AI research - Educational purposes - Entertainment (single-player practice, AI opponents) **Not intended for:** - Real-money gambling - Cheating in online games - Deceptive or manipulative applications ## Citation If you use this model in your research, please cite: ```bibtex @misc{gin-rummy-hbc-0.8b, author = {Good Start Labs}, title = {Gin Rummy HBC - Qwen3.5 0.8B}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{GoodStartLabs/gin-rummy-hbc-qwen3.5-0.8b}}, } ``` ## Model Card Authors - Good Start Labs - Contact: [GitHub](https://github.com/GoodStartLabs) ## Model Card Contact For questions or issues with this model: - Open an issue on the [model repository](https://huggingface.co/GoodStartLabs/gin-rummy-hbc-qwen3.5-0.8b) - Check [W&B training logs](https://wandb.ai/good-start-labs/gin-rummy-hbc) --- *Model trained on Together AI • Base model: Qwen3.5 • Training date: March 2026*