# ContextFlow RL Training Guide This guide explains how to train the RL model and upload it to Hugging Face. ## Quick Start ### 1. Install Dependencies ```bash cd research-app/backend pip install torch numpy pickle pip install huggingface_hub # For uploading ``` ### 2. Generate Training Data & Train ```bash python train_rl.py --mode train --epochs 10 --samples 1000 ``` ### 3. Upload to Hugging Face ```bash python train_rl.py --mode upload --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl ``` ### 4. Or Do Both at Once ```bash python train_rl.py --mode full --epochs 10 --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl ``` ## Training Options | Parameter | Description | Default | |-----------|-------------|---------| | `--epochs` | Number of training epochs | 10 | | `--samples` | Number of training samples to generate | 1000 | | `--batch_size` | Training batch size | 32 | | `--checkpoint_path` | Path to save/load checkpoint | checkpoint.pkl | ## Model Architecture The RL model uses: - **Q-Network**: 3-layer neural network (64 → 128 → 128 → 10) - **State Dimension**: 64 features - **Action Dimension**: 10 doubt prediction actions - **Training Algorithm**: GRPO (Group Relative Policy Optimization) ## Hugging Face Upload After training, the model is uploaded as: - **Repository**: `your-username/contextflow-rl` - **Files**: - `checkpoint.pkl` - Model weights - `README.md` - Model documentation - `training_stats.json` - Training history ## Using the Model ```python import pickle # Load checkpoint with open("checkpoint.pkl", "rb") as f: checkpoint = pickle.load(f) print(f"Policy version: {checkpoint.policy_version}") print(f"Training samples: {checkpoint.training_stats['total_samples']}") ``` ## Citation ```bibtex @software{contextflow_rl, title={ContextFlow RL Doubt Predictor}, author={ContextFlow Team}, year={2026}, url={https://github.com/contextflow/research-app} } ```