# ContextFlow RL Training Guide

This guide explains how to train the RL model and upload it to Hugging Face.

## Quick Start

### 1. Install Dependencies

```bash
cd research-app/backend
pip install torch numpy pickle
pip install huggingface_hub  # For uploading
```

### 2. Generate Training Data & Train

```bash
python train_rl.py --mode train --epochs 10 --samples 1000
```

### 3. Upload to Hugging Face

```bash
python train_rl.py --mode upload --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl
```

### 4. Or Do Both at Once

```bash
python train_rl.py --mode full --epochs 10 --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl
```

## Training Options

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--epochs` | Number of training epochs | 10 |
| `--samples` | Number of training samples to generate | 1000 |
| `--batch_size` | Training batch size | 32 |
| `--checkpoint_path` | Path to save/load checkpoint | checkpoint.pkl |

## Model Architecture

The RL model uses:
- **Q-Network**: 3-layer neural network (64 → 128 → 128 → 10)
- **State Dimension**: 64 features
- **Action Dimension**: 10 doubt prediction actions
- **Training Algorithm**: GRPO (Group Relative Policy Optimization)

## Hugging Face Upload

After training, the model is uploaded as:
- **Repository**: `your-username/contextflow-rl`
- **Files**:
  - `checkpoint.pkl` - Model weights
  - `README.md` - Model documentation
  - `training_stats.json` - Training history

## Using the Model

```python
import pickle

# Load checkpoint
with open("checkpoint.pkl", "rb") as f:
    checkpoint = pickle.load(f)

print(f"Policy version: {checkpoint.policy_version}")
print(f"Training samples: {checkpoint.training_stats['total_samples']}")
```

## Citation

```bibtex
@software{contextflow_rl,
  title={ContextFlow RL Doubt Predictor},
  author={ContextFlow Team},
  year={2026},
  url={https://github.com/contextflow/research-app}
}
```