A Big Data-Driven Information System for Disease Prediction in Public Health: A Comparative Study of Machine Learning Approaches

Abdelhay HADJ KOUIDER

PDF

Published: Feb 13, 2026

Keywords:

Big Data; Machine Learning; Health Information Systems; Disease Prediction; Classification; Decision Support.

Abdelhay HADJ KOUIDER, Benameur ZIANI, Younes GUELLOUMA

Abstract

With the fast growth of health data coming from electronic records, medical devices, and monitoring systems, we now have great opportunities for data-driven decision making. However, dealing with such a large amount of information is still a challenge for standard analysis techniques. In this paper, we show a comparative study of machine learning models within a Big Data framework to predict diseases in public health. We tested six different classification techniques: Naive Bayes, SVM, Random Forest, Gradient Boosting, XGBoost, and MLP. To get reliable results, the experiments were done on two well-known medical datasets (UCI Heart Disease and Pima Indians Diabetes) using a 10-fold stratified cross-validation method. Interestingly, the results show that Naive Bayes performs best for heart disease (83.78% accuracy), while Gradient Boosting is the leader for the Diabetes dataset (77.72% accuracy). These findings offer practical advice on how to choose the right model, while also considering the choices and trade-offs made during the process.

Issue

Vol. 11 No. 2s (2026)

Section

Articles

Journal of Information Systems Engineering and Management

A Big Data-Driven Information System for Disease Prediction in Public Health: A Comparative Study of Machine Learning Approaches

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details