Date of Award

9-1-2024

Thesis Type

phd

Document Type

Thesis (Restricted Access)

Divisions

eng

Department

Department of Electrical Engineering

Institution

Universiti Malaya

Abstract

Heart disease remains the primary cause of mortality globally, and its early detection is critical for reducing mortality rates. However, the challenge of class imbalance and high dimensionality in clinical data significantly impedes the efficacy of Machine Learning (ML) models in this domain. This thesis presents two innovative methods that holistically address these challenges at algorithmic and data levels to enhance heart disease detection. The first method introduces an Improved Weighted Random Forest (IWRF) approach, focusing on algorithmic innovation to tackle the imbalance problem. It employs supervised infinite feature selection (Inf-FSs) to identify significant features and Bayesian optimization for fine-tuning hyperparameters. Validated on Statlog and heart disease clinical records datasets, this method demonstrates a notable improvement in prediction accuracy and F-measure, outperforming existing models and marking an accuracy enhancement of 2.4% and 4.6% on these datasets. In contrast, the second method addresses the data-level imbalance through a novel framework named Conditional Autoencoder with Stack Predictor for Heart Disease (CAVE-SPFHD). This approach integrates a conditional variational autoencoder (CVAE) to effectively balance the dataset and a stack predictor (SPFHD) that utilizes tree-based ensemble learning algorithms. The base models' predictions are integrated using a support vector machine, significantly enhancing detection accuracy. Tested across four datasets, CAVE-SPFHD surpasses state-of-the-art methods in f1-score, providing improved not only predictive performance but also critical interpretative insights using the SHapley Additive explanation (SHAP) algorithm. Together, these two methods represent a comprehensive approach to heart disease detection in ML, effectively addressing the dual challenges of class imbalance and high dimensionality. By innovatively tackling these issues at both the algorithm and data levels, this thesis significantly contributes to the field, offering robust, accurate, and interpretable ML solutions for early heart disease detection, which is crucial for proactive healthcare interventions.

Note

Thesis (PhD) - Faculty of Engineering, Universiti Malaya, 2024.

Share

COinS