Cloud Computing for Big Data Analytics: Scalable Solutions for Data-Intensive Applications

Main Article Content

Gullapalli Sathar, Abhijit Aditya, Archana Mani, Aravinda Kumar Appachikumar, Aryan Francis Verghese

Abstract

The explosion of data in the digital era has posed major challenges handling, computing and analyzing enormous and complex datasets. Cloud computing has arisen as a revolutionary solution providing scalable and elastic infrastructure necessary to deal with incoming big data workloads. This study employs an empirical approach to evaluate the performance, cost efficiency, and scalability of the three dominant cloud service models, Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Function-as-a-Service (FaaS) on Amazon Web Services, Microsoft Azure and Google Cloud Platform. Standard big data analytics workloads running real-time stream processing and machine learning activities were then implemented in Apache Spark, Hadoop, and Kafka on harmonized cloud environments. Key performance metrics such as execution time, CPU utilization, memory, cost per task and throughput were taken, analyzed statistically using ANOVA and Tukey’s post hoc tests. Results show that FaaS configurations are always faster in execution speed, memory efficiency and cost compared to IaaS, while IaaS delivers better CPU usage for continual workloads. AWS and GCP platform performed relatively balanced when compared to Azure. It is concluded that serverless architecture is, in fact, optimal for modular and burst-oriented analytics, and hybrid models might be more appropriate for complex pipelines. These results can offer cloud architects practical directions towards scalable and cost-effective big data solutions.

Article Details

Section
Articles