This paper aims to develop and compare predictive models to detect diabetes early using machine learning algorithms and presents a survival analysis of patients with diabetes having heart failure using the random survival forest (RSF) algorithm. Gradient Boosting, Logistic Regression, and Random Forest are the three models selected for the diabetes prediction study. Two datasets have been used, one containing demographic, clinical, and laboratory variables of patients diagnosed with diabetes and healthy individuals. The second is a comprehensive dataset, which includes demographic, clinical, and laboratory data of patients diagnosed with diabetes and heart failure. The problem that is being solved is firstly to predict whether a person is suffering from diabetes or not. After determining the presence of diabetes, the study aims to solve a more significant question of whether a person being diabetic can have heart failure, the reason being diabetes. Experimental results show that all three models used for diabetes prediction have high predictive performance, with Gradient Boosting achieving the highest accuracy and AUC-ROC values. The RSF model is trained and evaluated based on its ability to predict the survival outcomes of patients. The results demonstrate that RSF provides high prediction accuracy and outperforms other survival analysis techniques. The findings of this study have significant implications for improving the early diagnosis and treatment of diabetes, which can eventually improve patient outcomes and reduce healthcare expenses. It can also help healthcare providers develop personalized treatment plans and interventions for improving the survival results of patients with both diabetes and heart failure.
Published in: 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)