Volume 13, Number 3

Heart Disease Prediction using Machine Learning and Deep Learning

  Authors

Dinesh Kalla and Arvind Chandrasekaran, Colorado Technical University, USA

  Abstract

Heart disease is most common disease reported currently in the United States among both the genders and according to official statistics about fifty percent of the American population is suffering from some form of cardiovascular disease. This paper performs chi square tests and linear regression analysis to predict heart disease based on the symptoms like chest pain and dizziness. This paper will help healthcare sectors to provide better assistance for patients suffering from heart disease by predicting it in beginning stage of disease. Chi square test is conducted to identify whether there is a relation between chest pain and heart disease cases in the United States by analyzing heart disease dataset from IEEE Data Port. The test results and analysis show that males in the United States are most likely to develop heart disease with the symptoms like chest pain, dizziness, shortness of breath, fatigue, and nausea. This test also shows that there is a week corelation of 0.5 is identified which shows people with all ages including teens can face heart diseases and its prevalence increase with age. Also, the tests indicate that 90 percent of the participant who are facing severe chest pain is suffering from heart disease where majority of the successful heart disease identified is in males and only 10 percent participants are identified as healthy. The evaluated p-values are much greater than the statistical threshold of 0.05 which concludes factors like sex, Exercise angina, Cholesterol, old peak, ST_Slope, obesity, and blood sugar play significant role in onset of cardiovascular disease. We have tested the dataset with prediction model built on logistic regression and observed an accuracy of 85.12 percent.

  Keywords

Chi-Square Test, R; Data Mining; Big Data; Linear Regression Analysis; Heart Disease; Risk Factor; Machine Learning; Cardiovascular Disease; Python; Logistic Regression; sklearn; Pandas, Numpy, NLTK.