Academy & Industry Research Collaboration Center (AIRCC)

Volume 10, Number 11, September 2020

Machine Learning for Multiple Stage Heart Disease Prediction


Khalid Amen, Mohamed Zohdy and Mohammed Mahmoud, Oakland University, USA


According to the Centers for Disease Control and Prevention (CDC), heart disease is the number one cause of death for men, women, and people of most racial and ethnic groups in the United States. More than one person dies every minute and nearly half a million die each year from it, costing billions of dollars annually. Previous machine learning approaches have been used to predict whether patients have heart disease. The purpose of this work is to predict the five stages of heart disease starting from no disease, stage 1, stage 2, stage 3, and advance condition or severe heart disease. We investigate different potential supervised models that are trained by machine learning algorithms and find out which of these models has better accuracy. In this paper, we describe and investigate five machine learning algorithms (SVM, LR, RF, GTB, ERF) with hyper parameters that maximize classifier performance to show which one is the best to predict the stage at which a person is determined to have heart disease. We found that the LR algorithm performs better compared to the other four algorithms. The experiment results show that LR performs the best with an accuracy of 82%, followed by SVM with an accuracy of 80% when all five classifiers are compared and evaluated for performance based on accuracy, precision, recall, and F measure. This predication can facilitate every step of patient care, reducing the margin of error and contributing to precision medicine. Lastly, this paper aims to improve heart disease prediction accuracy, precision, recall and F measure using UCI heart disease dataset. For this, multiple machine learning approaches were used to understand the data and predict the chances of heart disease in a medical database.


machine learning, ml, cnn, dnn, rnn, jupyter, python, cleveland dataset, gradient tree boosting, gtb, random forest, rf, support vector machine, svm, extra random forest, erf, logistic regression, lr.