Automated Data Quality Validation Frameworks for ETL Pipelines

Subash Yadav

PDF

Published: Aug 20, 2025

Keywords:

Automated Data Validation, ETL Pipeline, Anomaly Detection, Isolation Forest, Random Forest, Machine Learning, Data Quality, Real-time Data Validation, Data Accuracy, Financial Data, Healthcare Data, E-commerce Data.

Subash Yadav

Abstract

In the modern data-driven environment, guaranteeing the quality of data handled in ETL (Extract, Transform, Load) pipelines is essential in making informed decisions in different industries. This research paper introduces a new model of automating data quality validation of the ETL pipeline through anomaly detection and machine learning models. In particular, the paper combines Isolation Forest (anomaly detection in real-time) and Random Forest (supervised validation) to detect anomalies, e.g. missing data, outliers, and schema violations. The framework was applied to the world bank data and had remarkable results in the data accuracy, processing speed and error detection as compared to the conventional manual validation techniques. The findings show that Isolation Forest model has precision of 0.92, recall of 0.88, and AUC-ROC of 0.94, and recall of 0.89, and Accuracy of 92% of the random Forest model. The framework will save more than 60 percent of manual intervention and processing time and improve data accuracy by 7 percent. These results highlight the opportunity of the suggested framework in real-time data settings and high-quality data is essential in operational and strategic decision-making in finance, healthcare, and e-commerce fields.

Issue

Vol. 10 No. 58s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Automated Data Quality Validation Frameworks for ETL Pipelines

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details