Machine Learning-Based Framework for Drinking Water Quality Classification

Main Article Content

Fatima Bouakkaz, Wided Ali

Abstract

 Access to safe drinking water remains one of the most critical global public health issues. Contamination from chemical and microbial sources poses a serious threat to millions of people worldwide. Machine Learning (ML) has emerged as a promising approach to assess and predict water quality efficiently using complex physicochemical data.


This study aims to investigate and compare the performance of various Machine Learning algorithms for classifying water potability and to identify the most accurate and reliable models for practical water quality assessment.


Several ML algorithms—Logistic Regression, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosting, and Artificial Neural Networks—were applied to a benchmark water quality dataset. The study focused on preprocessing steps, handling missing data, and addressing class imbalance to improve model reliability.


Experimental results showed that ensemble-based algorithms such as Random Forest and Gradient Boosting achieved the highest accuracy and F1-scores. Traditional and shallow models performed moderately, while deep learning models showed limited improvement due to the small size of the tabular dataset.


Machine Learning provides a powerful tool for automated water potability classification. The findings emphasize the importance of preprocessing, data balancing, and model selection for reliable predictions. This work contributes to improving the interpretability and performance of ML systems for real-world water management applications.

Article Details

Section
Articles