Tabular Classification
Scikit-learn
Joblib
fraud-detection
random-forest
xgboost
ensemble
imbalanced
Instructions to use gusdelact/credit-card-fraud-bagging-boosting with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use gusdelact/credit-card-fraud-bagging-boosting with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("gusdelact/credit-card-fraud-bagging-boosting", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
credit-card-fraud-detector
Modelo ensemble (Random Forest + XGBoost) para detectar fraude en transacciones
de tarjeta de crédito sobre el dataset
alenc123/credit-card-fraud.
Modelo ganador
xgboost — seleccionado por F1 de la clase fraude con umbral calibrado.
| Métrica | Valor |
|---|---|
| F1 (clase fraude, umbral calibrado) | 0.9180 |
| Precision | 0.9634 |
| Recall | 0.8767 |
| ROC-AUC | 0.9990 |
| PR-AUC | 0.9596 |
| Umbral calibrado | 0.9516 |
Datos
- Fuente:
alenc123/credit-card-fraud, archivocredit_card_transactions.parquet. - Train / Test: 1,037,340 / 259,335 filas (split estratificado 80/20).
- Tasa de fraude: ~0.579% (clase positiva fuertemente minoritaria).
- Features finales (post-FE): 30, incluidas
amt_log1p,distance_km,hour,dayofweek,month,age, frequency encoding demerchant/city/job/statey one-hot decategory/gender.
Hiperparámetros
Random Forest (Bagging)
{
"n_estimators": 300,
"min_samples_leaf": 1,
"max_features": 0.5,
"max_depth": null,
"criterion": "entropy"
}
XGBoost (Boosting)
{
"subsample": 1.0,
"reg_lambda": 0,
"n_estimators": 600,
"min_child_weight": 1,
"max_depth": 6,
"learning_rate": 0.1,
"colsample_bytree": 0.8
}
Estrategia frente al desbalance
- Random Forest:
class_weight='balanced'. - XGBoost:
scale_pos_weight = n_neg / n_pos ≈ 172. - Calibración del umbral via curva precision-recall (max F1 sobre train).
- Sin SMOTE (ver
notes/02_design_modeling.md).
Cómo usar
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
model_path = hf_hub_download("gusdelact/credit-card-fraud-bagging-boosting", "model.joblib")
pre_path = hf_hub_download("gusdelact/credit-card-fraud-bagging-boosting", "preprocessor.joblib")
model = joblib.load(model_path)
preprocessor = joblib.load(pre_path)
# X_new debe contener las columnas crudas del dataset original; aplicar el mismo
# feature engineering (ver scripts/03_feature_engineering.py o app_inference/).
X_t = preprocessor.transform(X_new_engineered)
proba = model.predict_proba(X_t)[:, 1]
prediction = (proba >= 0.9516).astype(int)
Limitaciones
- El dataset original es sintético (Sparkov-style); las métricas pueden ser optimistas en producción.
- Frequency encoding mapea categorías nuevas a 0; un
merchantno visto bajará la señal. - Sin split temporal: para escenarios con concept drift se recomienda re-evaluar.
- Las probabilidades NO están calibradas en sentido estricto (no se aplicó
CalibratedClassifierCV).
Citar
@model{ credit-card-fraud-bagging-boosting_2026,
author = {gusdelact},
title = {credit-card-fraud-detector},
year = {2026},
publisher = {Hugging Face}
}
- Downloads last month
- -