Human-Guided Intelligent Operations for Multi-Cloud Kubernetes at Enterprise Scale

Satya Sai Ram Alla

doi:10.52783/jisem.v11i2s.14607

PDF

Published: Mar 25, 2026

DOI: https://doi.org/10.52783/jisem.v11i2s.14607

Keywords:

Kubernetes, AIOps, Observability, Anomaly Detection, Incident Correlation, Topology Modeling, Remediation Automation, Hybrid Cloud

Satya Sai Ram Alla

Abstract

Operating thousands of Kubernetes clusters across public cloud, private cloud, and edge environments strains traditional monitoring, which relies on static thresholds and manual triage. This article introduces a way to manage operations that sees reliability as a data issue: it collects data from different sources, combines various signals into a single view, represents service connections as a changing graph, and identifies problems using advanced detection and reasoning methods. The platform closes the loop with a serverless playbook engine that executes remediation when confidence is high and guardrails are satisfied, while keeping humans in the control plane through clear explanations and previewable actions. In practice, such systems can compress mean time to detect from tens of minutes to minutes, reduce mean time to restore through targeted automation, and materially lower the operating cost of large microservice fleets without compromising safety or governance.

Issue

Vol. 11 No. 2s (2026)

Section

Articles

Journal of Information Systems Engineering and Management

Human-Guided Intelligent Operations for Multi-Cloud Kubernetes at Enterprise Scale

Abstract

Volume 11 (2026)

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details