I have the following custom scorer code that I'm using in a grid search:
def custom_auprc_scorer(y_true, y_pred): pos_label = 'positive' y_true_mapped = np.where(y_true == pos_label, 1, 0) y_pred_numeric = np.where(y_pred == pos_label, 1, 0) return average_precision_score(y_true_mapped, y_pred_numeric)# Create the final scorer with make_scorercustom_auprc_scorer = make_scorer(custom_auprc_scorer, greater_is_better=True)scorer = {'AUPRC': custom_auprc_scorer, 'ROC_AUC': 'roc_auc'}
Below is the function evaluatealgorithm modified to include retrieving the AUPRC and ROC AUC scores from the grid search results:
def evaluatealgorithm(x_train, y_train, kfold, scorer): gbm = GradientBoostingClassifier(loss='log_loss') smote = SMOTE(sampling_strategy='minority') pipeline = Pipeline(steps=[['smote', smote], ['gbm', gbm]]) parameters = {'smote__k_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],'gbm__learning_rate': [0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.5, 1],'gbm__subsample': [0.4, 0.5, 0.6, 0.8, 1],'gbm__n_estimators': [50, 100, 200, 300, 400, 600],'gbm__max_depth': [3, 4, 5, 6, 7, 8, 9, 10],'gbm__min_samples_split': [2, 3, 4, 5],'gbm__min_samples_leaf': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] } grid_gbm = GridSearchCV(estimator=pipeline, param_grid=parameters, cv=kfold, verbose=1, n_jobs=-1, refit='AUPRC', scoring=scorer) grid_gbm.fit(x_train, y_train) auprc_score = grid_gbm.cv_results_['mean_test_AUPRC'][grid_gbm.best_index_] roc_score = grid_gbm.cv_results_['mean_test_ROC_AUC'][grid_gbm.best_index_] # Best model from the grid search model_gbm = grid_gbm.best_estimator_ smote_params = model_gbm.named_steps['smote'].get_params() gbm_params = model_gbm.named_steps['gbm'].get_params() return smote_params, gbm_params, model_gbm, roc_score, auprc_score
I'm using this custom scorer in a grid search, but I'm unsure if the AUPRC calculation is being performed correctly. I'm relatively new to machine learning and related activities, so I'm seeking guidance on whether my code is correct. If not, does anyone have a solution to ensure that my code correctly calculates AUPRC as a metric for selecting the best hyperparameters? Thank you.
I've searched in several places for functions to accurately calculate AUPRC, but I'm unable to confirm if it's actually happening as I expect.