Abstract:
Kubernetes has emerged as the standard platform for managing microservices at scale, offering
robust orchestration capabilities. However, ensuring optimal performance under dynamic and
often predictable workload fluctuations remains a significant challenge. Traditional
autoscaling mechanisms, such as the Horizontal Pod Autoscaler (HPA) rely on reactive policies
that adjust resources based on current metrics like CPU utilization. While effective in many
cases, reactive scaling often lags behind sudden traffic surges, leading to temporary service
degradation or resource inefficiency. This thesis addresses these limitations by proposing a
predictive autoscaling framework for Kubernetes-based microservices that integrates machine
learning-based forecasting with intelligent load balancing.
The proposed solution leverages a Long Short-Term Memory (LSTM) neural network
trained on twelve months of real-world microservice load data. The model forecasts short-term
workload trends, enabling the system to proactively adjust pod counts before demand peaks
occur. In parallel, a custom load balancing mechanism was developed to distribute traffic more
efficiently based on runtime pod metrics such as CPU usage and response time, ensuring that
scaled-out resources are utilized effectively.
An experimental Kubernetes cluster was set up to evaluate the predictive scaling approach
against the standard HPA under realistic load patterns, including sharp end-of-month traffic
surges observed in the banking sector of Azerbaijan. Results show that the predictive autoscaler
achieved a mean absolute percentage error (MAPE) under 10% during normal periods and
around 12–15% during peak salary-day surges. Compared to HPA, the predictive system
reduced 95th percentile response times by up to 37% during load spikes, maintained full
throughput without request drops, and triggered fewer, better-timed scaling actions. CPU
utilization stayed within safer bounds, avoiding the saturation seen under reactive scaling.
This work demonstrates that predictive autoscaling can significantly enhance the resilience
and efficiency of Kubernetes-managed microservices. By combining accurate load forecasting
with intelligent traffic distribution, the system improves user experience and infrastructure
utilization. While challenges such as prediction errors and model retraining remain, the results
highlight the practical benefits of integrating machine learning into cloud-native scaling
strategies. Future work can extend this approach by exploring hybrid models that combine
predictive insights with reinforcement learning or by refining load balancing strategies to
optimize service quality during unpredictable demand fluctuations further.