Self-Optimizing Data Pipelines Using Machine Learning for Cloud Workloads

Velangani Divya Vardhan Kumar Bandi

doi:10.52783/jisem.v10i63s.14302

PDF

Published: Dec 13, 2025

DOI: https://doi.org/10.52783/jisem.v10i63s.14302

Keywords:

Cloud Data Pipelines, Real-Time ETL Optimization, ML-Driven Resource Scheduling, Cost–Latency Trade-off, Cross-Cloud Data Architecture, Intelligent Autoscaling, Data Ingestion Frameworks, Transformation Pipelines, Enhanced ETL, Cloud Portability, Modular Connection Management, Multi-Cloud Support (AWS, GCP, Azure, IBM), ML-Based Workload Scheduling, Resource Allocation Optimization, Data Engineering Automation, Operational Cost Reduction, Low-Latency Analytics Pipelines, Structured and Unstructured Data Processing, Cloud-Native Data Platforms, Production-Grade Data Pipelines.

Velangani Divya Vardhan Kumar Bandi

Abstract

Cloud Data Pipelines enable enterprises to readily ingest, process, clean, and store large amounts of structured and unstructured data in cloud environments to drive analytics, business intelligence, and data-science workloads. However, designing and implementing such pipelines is non-trivial and challenging. Pipelines should be optimized for cost, speed, or any combination of the two but these objectives are at odds with each other. A data pipeline architecture that enables easy prototyping of data ingestion and transformation processes within any cloud platform is presented. Machine Learning (ML) is employed to inform scheduling and resource allocation decisions in order to reduce operational cost while ensuring acceptable latencies.The objectives of optimizing Ingest, Transformation, and Enhanced ETL cloud data pipelines in real-time for cost and latency are accomplished. The four cloud providers—Google, Amazon, Microsoft, and IBM—are supported, with data volumes ranging from a few megabytes to generating several gigabytes. Latency from minutes to hours can be supported without breaking the bank. ML models inform autoscaling groups, transformation resources, and scheduling. Cross-cloud portability through modular code-based connection-management further optimizes the development phases while improving code quality.

Issue

Vol. 10 No. 63s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Self-Optimizing Data Pipelines Using Machine Learning for Cloud Workloads

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details