Volume 9, Number 9, July 2019
Attribute Reduction and Decision Tree Pruning to Simplify Liver Fibrosis Prediction Algorithms
A Cohort Study
Authors
Mahasen Mabrouk1, Abubakr Awad2, Hend Shousha1, Wafaa Alakel1,3, Ahmed Salama1 and Tahany Awad1, 1Cairo University, Egypt, 2University of Aberdeen, UK and 3National Hepatology and Tropical Medicine Research Institute, Cairo, Egypt
Abstract
Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions
and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive
investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and
APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis
(NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining
techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive
algorithms.
Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive
model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs
for additional examinations especially in areas with limited resources like Egypt.
Methods: This multi-centered retrospective study included baseline demographic, laboratory,
and histopathological data of 69106 patients with chronic hepatitis C. We started by data
collection preprocessing, cleansing and formatting for knowledge discovery of useful information
from Electronic Health Records EHRs. Data mining has been used to build a decision tree
(Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology
results were used to assess accuracy for fibrosis stages. Machine learning feature selection and
reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the
most relevant ones (N=6) for developing the prediction model.
Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4
and APRI revalidation in our study showed low accuracy and high discordance with biopsy
results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning
selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results
for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which
was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time
for execution.
Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis
patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and
APRI. This study represents the collaboration between computer scientist and hepatologists to
provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of
liver fibrosis.
Keywords
Liver Fibrosis, Data Mining, Weka, Decision Tree, Attribute Reduction, Tree Pruning