keyboard_arrow_up
Comparing Classifiers in the Presence of Errors in True Label Assignment in Medical Datasets

Authors

Vishwa Vallabh Angampally and Eugene Pinsky, Metropolitan College Boston University, USA

Abstract

We often rely on human experts to assign true labels in medical datasets, which may not be 100% accurate. We investigate the impact of labeling errors on machine-learning classi- fiers applied to medical datasets. By introducing symmetric errors from 0% to 40% in True labels— simulating errors of true labels assignment by experts, inter-observer variability, and automated annotation - we evaluate the impact of such errors in binary classification for several well-known medical datasets using traditional machine learning models and metrics. Although all models experience degradation as errors increase, simpler, well-regularized methods such as Logistic Regression and SVM decline more gracefully. Our results underscore the necessity for improved data curation and error-aware training strategies in medical AI, ultimately guiding the selection of robust algorithms that maintain reliability under imperfect real-world conditions.


Keywords

Labeling Errors, Medical Datasets, Classifier Robustness, Data Imbalance, Machine-Learning, Performance Evaluation, Diagnostic Prediction.