Fitting AdaBoost Models From Imbalanced Data with Applications in College Basketball

Loading...
Thumbnail Image

Authors

Romaniuk, Raymond

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Data imbalance is an important consideration when working with real world data. Over/undersampling approaches allow us to gather more insight from the limited data we have on the minority class; however, there are many proposed methods. The goal of our study is to identify the optimal approach for over/undersampling to use with Adaptive Boosting (AdaBoost). Based on a simulation study, we’ve found that combining AdaBoost with various sampling techniques provides an increased weighted accuracy across classes for progressively larger data imbalances. The three Synthetic Minority Oversampling Technique’s (SMOTE) and Jittering with Over/Undersampling (JOUS) performed the best, with the JOUS approach being the most accurate for all levels of data imbalance in the simulation study. We then applied the most effective over/undersampling methods to predict upsets (games where the lower seeded team wins) in the March Madness College Basketball Tournament.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By