--- tags: - pytorch - xG - football - soccer - expected-goals - mlp - binary-classification - player-kpis library_name: pytorch --- # MLP xG Prediction Model (with Player KPIs) This is a Multi-Layer Perceptron (MLP) model trained to predict Expected Goals (xG) in football/soccer. This version includes **player-level KPIs** (`shots_per_90` and `goals_per_90`) as additional features to capture individual player shooting ability. ## Model Description - **Architecture**: Multi-Layer Perceptron with 3 hidden layers - **Hidden Dimensions**: 128 → 64 → 32 - **Input Features**: 25 features (including player KPIs) - **Output**: Binary probability (goal vs no goal) - **Framework**: PyTorch - **Dropout Rate**: 0.3 - **Activation**: ReLU (hidden layers), Sigmoid (output) - **Normalization**: Batch Normalization after each hidden layer ## Key Enhancement This model extends the baseline MLP by incorporating **player-specific KPIs**: - `shots_per_90`: Average shots per 90 minutes played by the player - `goals_per_90`: Average goals per 90 minutes played by the player - `goals_percentage`: Percentage of shots that result in goals These features help the model capture individual player shooting quality and finishing ability beyond just the shot characteristics. ## Performance Metrics - **Accuracy**: 0.8839 - **Precision**: 0.6456 - **Recall**: 0.1085 - **F1 Score**: 0.1858 - **ROC AUC**: 0.7978 - **Log Loss**: 0.3023 ## Features The model uses the following 25 features: ### Shot Context Features: - angle_to_gk - angle_to_goal - ball_closer_than_gk - body_part_name_Left Foot - body_part_name_Other - body_part_name_Right Foot - dist_to_gk - distance_to_goal - goal_dist_to_gk - minute - nearest_opponent_dist - nearest_teammate_dist - opponents_within_5m - play_pattern_name_From Counter - play_pattern_name_From Free Kick - play_pattern_name_From Goal Kick - play_pattern_name_From Keeper - play_pattern_name_From Kick Off - play_pattern_name_From Throw In - play_pattern_name_Other - play_pattern_name_Regular Play - teammates_within_5m - goals_percentage ### Player KPI Features: - shots_per_90 - goals_per_90 ## Usage ```python import torch import joblib import pandas as pd from huggingface_hub import hf_hub_download # Download files model_path = hf_hub_download(repo_id="rokati/mlp_with_kpis_xg", filename="best_mlp_model.pth") architecture_path = hf_hub_download(repo_id="rokati/mlp_with_kpis_xg", filename="model_architecture.py") scaler_path = hf_hub_download(repo_id="rokati/mlp_with_kpis_xg", filename="scaler.pkl") config_path = hf_hub_download(repo_id="rokati/mlp_with_kpis_xg", filename="config.json") # Load architecture import importlib.util spec = importlib.util.spec_from_file_location("model_architecture", architecture_path) model_module = importlib.util.module_from_spec(spec) spec.loader.exec_module(model_module) # Load model model = model_module.MLP(input_dim=25, hidden_dims=[128, 64, 32], dropout_rate=0.3) model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu'))) model.eval() # Load scaler scaler = joblib.load(scaler_path) # Make prediction # X_new should be a pandas DataFrame with all 25 features including player KPIs X_scaled = scaler.transform(X_new) X_tensor = torch.FloatTensor(X_scaled) with torch.no_grad(): xg_prediction = model(X_tensor).numpy() ``` ## Data Requirements To use this model, you need: 1. **Shot event data** with geometric and contextual features 2. **Player KPI data** calculated from historical player performance (shots_per_90, goals_per_90) 3. The data should be merged on `player_id` before making predictions ## Training The model was trained on football shot event data merged with player KPIs: - Binary Cross Entropy loss - Adam optimizer (lr=0.001, weight_decay=1e-5) - ReduceLROnPlateau scheduler - Batch size: 256 - Epochs: 50 ## License MIT