--- title: "AIO2025M03 Random Forest Demo" emoji: "🌲" colorFrom: "green" colorTo: "blue" sdk: "gradio" sdk_version: "5.38.0" app_file: "app.py" pinned: false license: "mit" --- # AIO2025 Module 03 - Random Forest Demo This interactive demo showcases Random Forest algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring ensemble learning with decision trees through dynamic visualizations and real-time parameter adjustment. ## 🌲 Features ### Core Functionality - **Dual Problem Types**: Support for both classification and regression tasks - **Multiple Datasets**: Built-in sample datasets including Titanic dataset - **Custom Data**: Upload your own CSV/Excel files - **Interactive Parameters**: Adjust forest parameters in real-time - **Dynamic Input Generation**: Automatic feature input creation based on dataset ### Random Forest Parameters - **Number of Trees**: Control ensemble size (limited to 20 for performance) - **Max Depth**: Limit tree depth (default: 5, 0 = unlimited) - **Min Samples Split**: Minimum samples to split a node (default: 2) - **Min Samples Leaf**: Minimum samples at leaf nodes (default: 1) - **Criterion**: Split quality measure (auto-switched for problem type) - **Max Features**: Feature selection strategy (sqrt, log2, auto) ### Visualizations - **Tree Confidence Chart**: Shows confidence scores and predictions for each individual tree - **Individual Tree Visualization**: Detailed view of selected tree structure and decision paths - **Feature Importance**: Displays which features matter most across all trees - **Voting Process**: Classification aggregation display showing how individual tree predictions combine ## 🚀 Quick Start 1. **Select Data**: Choose from sample datasets or upload your own 2. **Configure Target**: Select the target column for prediction 3. **Set Parameters**: Adjust random forest hyperparameters 4. **Enter Features**: Provide values for the new data point 5. **Run Prediction**: Execute the random forest and view results ## 📊 Sample Datasets ### Classification Datasets - **Titanic**: Passenger survival prediction (default dataset) - **Iris**: Classic 3-class flower classification - **Wine**: Wine type classification based on chemical properties - **Breast Cancer**: Binary classification for cancer detection ### Regression Dataset - **Diabetes**: Medical regression dataset ## 🛠️ Technical Details ### Dependencies - `scikit-learn`: Random Forest implementation - `pandas`: Data manipulation - `numpy`: Numerical operations - `plotly`: Interactive visualizations - `gradio`: Web interface ### Architecture - **Modular Design**: Separated core logic in `src/random_forest_core.py` - **Dynamic UI**: Automatic parameter and input generation - **Error Handling**: Comprehensive validation and error messages - **Responsive Design**: Adapts to different screen sizes ## 💡 Key Concepts ### Random Forest Benefits - **Reduced Overfitting**: Multiple trees with different random subsets - **Better Generalization**: Aggregated predictions from diverse trees - **Feature Importance**: Robust importance scores across trees - **Ensemble Robustness**: More stable predictions than single trees ### Ensemble Learning - **Bootstrap Aggregating**: Each tree trained on random sample - **Feature Randomization**: Random feature selection for splits - **Majority Voting**: Classification by majority of trees - **Averaging**: Regression by mean of tree predictions ## 🔧 Customization ### Adding New Datasets 1. Place CSV files in the `data/` directory 2. Update `SAMPLE_DATA_CONFIG` in `app.py` 3. Ensure target column is properly configured ### Modifying Parameters - Edit parameter ranges in the UI components - Adjust default values for different use cases - Add new parameter types as needed ## 📈 Performance Tips - **Number of Trees**: Limited to 20 for optimal performance in this demo - **Max Features**: Use 'sqrt' for good balance of diversity and performance - **Max Depth**: Default of 5 prevents overfitting on small datasets - **Min Samples**: Increase for more robust trees ## 🎯 Use Cases ### Classification - Medical diagnosis (Titanic survival prediction) - Customer segmentation - Fraud detection - Image classification ### Regression - Price prediction - Demand forecasting - Quality assessment - Risk modeling ## 📝 Notes - **Auto-detection**: Problem type automatically detected from target column - **Data Validation**: Comprehensive input validation and error handling - **Memory Efficient**: Optimized for large datasets - **Real-time Updates**: Instant parameter adjustment and visualization - **Tree Selection**: Interactive dropdown to explore individual trees (up to 20) ## 🔗 Related Resources - [Scikit-learn Random Forest Documentation](https://scikit-learn.org/stable/modules/ensemble.html#forest) - [Ensemble Methods in Machine Learning](https://en.wikipedia.org/wiki/Ensemble_learning) - [Random Forest Algorithm Explained](https://towardsdatascience.com/random-forest-algorithm-explained-4d3c996e8f3) --- *This demo is part of AIO2025 Module 03 - Machine Learning Fundamentals*