Article Info

A Comparative Analysis of Machine Learning Algorithms for Diabetes Prediction

Waseem Abdulmahdi Alansari, Masnizah Mohd
dx.doi.org/10.17576/apjitm-2024-1302-07

Abstract

Diabetes mellitus is a chronic metabolic disorder with significant global health implications. The accurate prediction and detection of diabetes using artificial intelligence are crucial for preventing complications and improving patient outcomes. This study focuses on comparing the performance of three machine learning algorithms, namely Naive Bayes (NB), Support Vector Machines (SVM), and Random Forest (RF), in predicting diabetes using two datasets: Pima Indians Diabetes Dataset (PIDD) and the Diabetes 2019 Dataset (DD2019), and the need to identify the most accurate and effective algorithm for diabetes prediction. Nine features which are Age, Blood pressure, Skin thickness, Glucose, Diabetes pedigree function, Pregnancy, BMI, Insulin level, and Outcome been used for the prediction of diabetes. The methodology involves data collection, pre-processing, and training the algorithms using k-fold cross-validation. The results indicate that pre-processing steps and dataset characteristics significantly impact algorithm performance. We discovered that the model with RF consistently achieves the highest accuracy. As per the findings, the RF algorithm attained the maximum accuracy of 77% in the context of PIDD. During the DD2019 experiment, the RF and SVM algorithms demonstrated the highest levels of accuracy, achieving 96.65% and 93.93%, respectively. The study contributes insights into the importance of pre-processing and feature selection in improving algorithm performance. The findings have implications for developing accurate predictive models and improving diabetes detection.

keyword

Machine Learning, Diabetes Prediction, Na?ve Bayes, Support Vector Machines

Area

Knowledge Technology