Muhammad Garba, Muhammad Abdurrahman Usman and Anas Muhammad Gulumbe, Kebbi State University of Science & Technology, Nigeria
The study focuses on predicting breast cancer survival using naÏve bayes techniques and compares several machine learning models across large dataset of 310,000 patient records. The survival and non-survival classes were the two main categories. The objective of the study was to assess the effectiveness of the NaÏve Bayes classifier in the data mining area and to attain noteworthy outcomes for survival classification that were consistent with the body of existing literature. Naive Bayes achieved an average accuracy of 91.08%, indicating reliable performance but with some variability across folds. Logistic Regression achieved an accuracy of 94.84%, excelling in identifying instances of class 1 but struggling with class 0. Decision Tree model, with an accuracy of 93.42%, showed similar performance trends. At 95.68% accuracy, Random Forest outperformed Decision Tree. However, all models faced challenges in classifying instances of class 0 accurately. The Naive Bayes algorithm was compared with K-Nearest Neighbors (KNN) and Support Vector Machines (SVM). Future research will enhance prediction models with new methods and address the challenge of accurately identifying instances of class 0.
Machine Learning, Data Mininig,Naïve Bayes, Cancer, random survival forest