Predictive Scaling and Load Balancing for Kubernetes-Based Microservices

Suleymanli, Emil

Library MyADA ADA University

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

Predictive Scaling and Load Balancing for Kubernetes-Based Microservices

Suleymanli, Emil

URI: http://hdl.handle.net/20.500.12181/1510

Date: 2025-04

Abstract:

Kubernetes has emerged as the standard platform for managing microservices at scale, offering robust orchestration capabilities. However, ensuring optimal performance under dynamic and often predictable workload fluctuations remains a significant challenge. Traditional autoscaling mechanisms, such as the Horizontal Pod Autoscaler (HPA) rely on reactive policies that adjust resources based on current metrics like CPU utilization. While effective in many cases, reactive scaling often lags behind sudden traffic surges, leading to temporary service degradation or resource inefficiency. This thesis addresses these limitations by proposing a predictive autoscaling framework for Kubernetes-based microservices that integrates machine learning-based forecasting with intelligent load balancing. The proposed solution leverages a Long Short-Term Memory (LSTM) neural network trained on twelve months of real-world microservice load data. The model forecasts short-term workload trends, enabling the system to proactively adjust pod counts before demand peaks occur. In parallel, a custom load balancing mechanism was developed to distribute traffic more efficiently based on runtime pod metrics such as CPU usage and response time, ensuring that scaled-out resources are utilized effectively. An experimental Kubernetes cluster was set up to evaluate the predictive scaling approach against the standard HPA under realistic load patterns, including sharp end-of-month traffic surges observed in the banking sector of Azerbaijan. Results show that the predictive autoscaler achieved a mean absolute percentage error (MAPE) under 10% during normal periods and around 12–15% during peak salary-day surges. Compared to HPA, the predictive system reduced 95th percentile response times by up to 37% during load spikes, maintained full throughput without request drops, and triggered fewer, better-timed scaling actions. CPU utilization stayed within safer bounds, avoiding the saturation seen under reactive scaling. This work demonstrates that predictive autoscaling can significantly enhance the resilience and efficiency of Kubernetes-managed microservices. By combining accurate load forecasting with intelligent traffic distribution, the system improves user experience and infrastructure utilization. While challenges such as prediction errors and model retraining remain, the results highlight the practical benefits of integrating machine learning into cloud-native scaling strategies. Future work can extend this approach by exploring hybrid models that combine predictive insights with reinforcement learning or by refining load balancing strategies to optimize service quality during unpredictable demand fluctuations further.

Show full item record