AI-Driven Dynamic Load Balancing: A Predictive Framework for Congestion Avoidance in High-Speed Data Center Networks
Main Article Content
Abstract
Traditional congestion management mechanisms, including Explicit Congestion Notification and Priority Flow Control, operate reactively, addressing congestion only after queue thresholds are exceeded. This fundamental limitation becomes increasingly critical as modern data center workloads generate microbursts lasting under 100 milliseconds, creating transient bottlenecks and elevated tail latency. AI-Driven Dynamic Load Balancing (AI-DLB) introduces an intelligent, predictive framework that combines real-time telemetry with machine learning inference to forecast congestion events and proactively redistribute traffic flows. The system employs a closed-loop control architecture integrating supervised learning for short-term congestion prediction and reinforcement learning for continuous policy optimization. By analyzing queue depth, ECN marks, link utilization, and RTT trends, AI-DLB enables sub-second load redistribution before congestion manifests. Simulation results on spine-leaf topologies demonstrate substantial reductions in tail latency and faster convergence compared to conventional mechanisms. The framework operates as a complementary enhancement to existing protocols, establishing a foundation for self-optimizing, intent-driven data center fabrics that bridge predictive analytics with autonomous control.