Abstract:
This study investigates the predictive performance of ten machine learning algorithms for diabetes detection using two distinct datasets: one from the United States and another from Pakistan. The models are evaluated using key performance metrics, including Accuracy, Precision, Recall, F1 Score, AUC, and Specificity. Ensemble-based models such as Random Forest, XGBoost, and AdaBoost demonstrated exceptional performance on the Pakistani dataset, achieving near-perfect AUC values and accuracies exceeding 99\%, indicating strong reliability. However, their performance declined significantly on the U.S. dataset, where the Neural Network achieved the highest accuracy of 75.25\%. This disparity underscores the importance of regional data characteristics and suggests that predictive healthcare models must be tailored to specific population contexts. Overall, the findings emphasize that region-specific training data and customized model selection are critical for enhancing prediction accuracy in clinical applications.
Page(s):
185-185
DOI:
DOI not available
Published:
Journal: 4th International Conference of Sciences “Revamped Scientific Outlook of 21st Century, 2025” , November 12,2025, Volume: 1, Issue: 1, Year: 2025
Keywords:
machine learning
,
BRFSS
,
Predictive modeling
,
USA diabetes dataset
,
Pakistani diabetes dataset