Volume 17, Number 2
Prominent Risk Factors in Diabetes
Authors
Oleg Fleitman, University of Colorado Boulder, USA
Abstract
A December 2023 Fortune [1] article revealed that nearly 50% of the U.S. population has Diabetes or Prediabetes, many unaware of it. This inspired a data mining project using the CDC's 2015 BRFSS dataset [2], with 253,000 entries and 17 features, to identify key Diabetes risk factors. The data was pre- processed using SMOTE to address class imbalance before applying four models: Logistic Regression, Random Forest, Gradient Boosting, and XGBoost. While Logistic Regression had the lowest F1 score (0.66), the others achieved an F1 score of 0.83. Age, BMI, and General Health were the top three risk factors identified. It is recommended to target diabetes awareness campaigns at individuals over 45, with a BMI above 25, or those who self-rate their health poorly. Future work should involve a broader set of features and consultation with medical experts.
Keywords
Diabetes, risk factors, prevention, reversing Diabetes, Type 1, Type 2, pre-Diabetes