---
title: "AIO2025M03 Random Forest Demo"
emoji: "🌲"
colorFrom: "green"
colorTo: "blue"
sdk: "gradio"
sdk_version: "5.38.0"
app_file: "app.py"
pinned: false
license: "mit"
---

# AIO2025 Module 03 - Random Forest Demo

This interactive demo showcases Random Forest algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring ensemble learning with decision trees through dynamic visualizations and real-time parameter adjustment.

## 🌲 Features

### Core Functionality
- **Dual Problem Types**: Support for both classification and regression tasks
- **Multiple Datasets**: Built-in sample datasets including Titanic dataset
- **Custom Data**: Upload your own CSV/Excel files
- **Interactive Parameters**: Adjust forest parameters in real-time
- **Dynamic Input Generation**: Automatic feature input creation based on dataset

### Random Forest Parameters
- **Number of Trees**: Control ensemble size (limited to 20 for performance)
- **Max Depth**: Limit tree depth (default: 5, 0 = unlimited)
- **Min Samples Split**: Minimum samples to split a node (default: 2)
- **Min Samples Leaf**: Minimum samples at leaf nodes (default: 1)
- **Criterion**: Split quality measure (auto-switched for problem type)
- **Max Features**: Feature selection strategy (sqrt, log2, auto)

### Visualizations
- **Tree Confidence Chart**: Shows confidence scores and predictions for each individual tree
- **Individual Tree Visualization**: Detailed view of selected tree structure and decision paths
- **Feature Importance**: Displays which features matter most across all trees
- **Voting Process**: Classification aggregation display showing how individual tree predictions combine

## 🚀 Quick Start

1. **Select Data**: Choose from sample datasets or upload your own
2. **Configure Target**: Select the target column for prediction
3. **Set Parameters**: Adjust random forest hyperparameters
4. **Enter Features**: Provide values for the new data point
5. **Run Prediction**: Execute the random forest and view results

## 📊 Sample Datasets

### Classification Datasets
- **Titanic**: Passenger survival prediction (default dataset)
- **Iris**: Classic 3-class flower classification
- **Wine**: Wine type classification based on chemical properties
- **Breast Cancer**: Binary classification for cancer detection

### Regression Dataset
- **Diabetes**: Medical regression dataset

## 🛠️ Technical Details

### Dependencies
- `scikit-learn`: Random Forest implementation
- `pandas`: Data manipulation
- `numpy`: Numerical operations
- `plotly`: Interactive visualizations
- `gradio`: Web interface

### Architecture
- **Modular Design**: Separated core logic in `src/random_forest_core.py`
- **Dynamic UI**: Automatic parameter and input generation
- **Error Handling**: Comprehensive validation and error messages
- **Responsive Design**: Adapts to different screen sizes

## 💡 Key Concepts

### Random Forest Benefits
- **Reduced Overfitting**: Multiple trees with different random subsets
- **Better Generalization**: Aggregated predictions from diverse trees
- **Feature Importance**: Robust importance scores across trees
- **Ensemble Robustness**: More stable predictions than single trees

### Ensemble Learning
- **Bootstrap Aggregating**: Each tree trained on random sample
- **Feature Randomization**: Random feature selection for splits
- **Majority Voting**: Classification by majority of trees
- **Averaging**: Regression by mean of tree predictions

## 🔧 Customization

### Adding New Datasets
1. Place CSV files in the `data/` directory
2. Update `SAMPLE_DATA_CONFIG` in `app.py`
3. Ensure target column is properly configured

### Modifying Parameters
- Edit parameter ranges in the UI components
- Adjust default values for different use cases
- Add new parameter types as needed

## 📈 Performance Tips

- **Number of Trees**: Limited to 20 for optimal performance in this demo
- **Max Features**: Use 'sqrt' for good balance of diversity and performance
- **Max Depth**: Default of 5 prevents overfitting on small datasets
- **Min Samples**: Increase for more robust trees

## 🎯 Use Cases

### Classification
- Medical diagnosis (Titanic survival prediction)
- Customer segmentation
- Fraud detection
- Image classification

### Regression
- Price prediction
- Demand forecasting
- Quality assessment
- Risk modeling

## 📝 Notes

- **Auto-detection**: Problem type automatically detected from target column
- **Data Validation**: Comprehensive input validation and error handling
- **Memory Efficient**: Optimized for large datasets
- **Real-time Updates**: Instant parameter adjustment and visualization
- **Tree Selection**: Interactive dropdown to explore individual trees (up to 20)

## 🔗 Related Resources

- [Scikit-learn Random Forest Documentation](https://scikit-learn.org/stable/modules/ensemble.html#forest)
- [Ensemble Methods in Machine Learning](https://en.wikipedia.org/wiki/Ensemble_learning)
- [Random Forest Algorithm Explained](https://towardsdatascience.com/random-forest-algorithm-explained-4d3c996e8f3)

---

*This demo is part of AIO2025 Module 03 - Machine Learning Fundamentals*