The increasing energy consumption in cloud data centers poses significant challenges in reducing operational costs and minimizing carbon emissions. Addressing these issues, we introduce EcoPyCSim, a novel multi-agent deep reinforcement learning (MADRL)-based cloud scheduling simulator designed for energy-aware job scheduling and resource allocation. Unlike existing simulation platforms that predominantly support single-agent learning, EcoPyCSim integrates multi-agent reinforcement learning through the PettingZoo framework, offering a robust and unified environment for optimizing resource scheduling in cloud infrastructures. The platform incorporates a Partially Observable Stochastic Game (POSG) model which enables it to simulate real-world scenarios where agents must make decisions with incomplete information to reflect a complex and realistic scheduling environments. Our contributions include a comprehensive evaluation of the MADDPG algorithm using the 2011 Google Trace Workload Dataset, which demonstrate superior energy efficiency in cloud resource management. Additionally, EcoPyCSim is publicly available, providing researchers with an essential tool for developing and testing next-generation cloud scheduling algorithms focused on energy conservation. This work establishes a critical foundation for future research in energy-efficient cloud computing as it offers a versatile, scalable, and standardized platform for multi-agent-based scheduling environments.
Cloud computing provides businesses with an efficient, scalable, and cost-effective method for delivering business or consumer IT services over the Internet [1]. The growing demand for user applications requires cloud service providers(CSPs) to implement efficient scheduling strategies that allocate specific computing resources to meet the diverse needs of application requests [2].
As stated by [3], data center electricity requirements are projected to increase from 286 TWh in 2016 to over 321 TWh in 2030. Increasing energy consumption presents challenges for CSPs in improving the operational efficiency of their data centers, with the aim of reducing operating costs. Implementing energy-efficient scheduling strategies to match tasks with optimal resources is an effective way to reduce energy consumption in data centers [4].
Firstly, most current scheduling algorithms are developed within simulation environments for cloud computing or clusters, leading to significant discrepancies in the definition of entities across different studies. For instance, the definition of user request often varies depending on the experimental dataset; some studies utilize cloudlets, while others rely on workloads derived from real trace datasets from sources like Google and
Alibaba, or employ custom simulation data. Additionally, there are variations in the definition of computing resources: some studies consider virtual machines (VMs) as the fundamental unit, with physical machines as the underlying layer, while others focus on container scheduling. Furthermore, different studies have varying definitions of scheduling metrics, including discrepancies in methods for calculating energy consumption,
costs, and resource utilization.
The discrepancies in the models may result in limitations for further research. The specific environments not only make it difficult to extend to other application scenarios but also limits the reproducibility and comparative analysis of the algorithms. In addition, this approach increases the complexity for developers, who must simultaneously consider the optimization of DRL algorithms and the impact of environmental modeling on scheduling performance. Therefore, a unified DRL-based simulation platform not only helps standardize the evaluation process of scheduling algorithm, but also provides a more reliable foundation for optimizing resource scheduling in cloud computing environments.
To the best of our knowledge, no existing simulation environment supports resource scheduling based on MARL algorithms. To address this gap, we developed a cloud scheduling simulation framework named EcoPyCSim which supports MARLalgorithms, based on the PettingZoo [5] standard interface. This project employs a discrete event simulation process, with a focus on energy-aware job scheduling and resource allocation. The implementation of this simulator is highly significant, as it provides a unified platform for the development, testing, and evaluation of MARL scheduling algorithms in cloud resource management. This platform provides a solid foundation for research and development in MARL for energy-aware cloud resource scheduling. The source code for EcoPyCSim is publicly accessible on GitHub.
In summary, the main contributions of this research work are as follows:
• EcoPyCSim, a cloud simulation-based job scheduling system implemented in Python, simulates real-world cloud computing infrastructure and resource management processes. The system enables the comparison and evaluation of energy consumption and response time for scheduling algorithms that handle DAG tasks in various resource configurations. By incorporating fluctuating electricity prices to define the power consumption of data centers, EcoPyCSim offers a more accurate model to assess energy consumption.
• The two-level task scheduling problem is modeled as a Partially Observable Stochastic Game (POSG), and the discrete-event-based scheduling simulation environment, CloudSchedulingEnv, is implemented based on the PettingZoo framework. This environment seamlessly integrates with the library of MARL algorithms to accelerate the development, testing, and validation of these algorithms.
• The capabilities of EcoPyCSim are demonstrated by using the MADDPG algorithm on the Google trace workload dataset. Simulation results validate the effectiveness of the framework and show its potential as a versatile tool for cloud scheduling.
Gym was created by OpenAI [6] and was one of the earliest and most widely used libraries for standardized RL environments, providing a common interface for experimentation and model testing. Gymnasium [7] maintained by the Farama Foundation, is an improved version of Gym, aiming to preserve Gym’s stable interface and user experience while adding enhancements and updates. Both of them have shown remarkable success in addressing reinforcement learning problems in a diverse set of applications. We have performed research and comparison of studies related to simulators for RL/DRL algorithms implemented based on the Gym and Gymnasium frameworks in cloud computing. The characteristics of these works are listed in Table1.
In contrast to the after mentioned works, our research aims to develop an energy-aware DAG job scheduling simulator. This simulator can facilitate the optimization and evaluation of various multi-agent reinforcement learning algorithms by incorporating an interactive cloud scheduling environment based on the PettingZoo framework.
Our proposed EcoPyCSim, which is built entirely in Python, implements core scheduling entities for cloud computing without requiring integration with external simulation frameworks. In addition, the simulator leverages PyTorch to establish a deep reinforcement learning environment that enables seamless integration of RL algorithms to improve scheduling decisions within a simulated cloud environment. Our framework is designed to support the May 2011 Google cluster data [8]. It is also extendable to support other datasets, and it is adaptable for synthetic dataset generation. EcoPyCSim offers a versatile environment for researchers to evaluate and validate cloud scheduling algorithms, with an emphasis on energy efficiency through multi-agent reinforcement learning.
Further reading can be found by in preprint:
http://ssrn.com/abstract=5020937
Framework of the EcoPyCSim Simulator:
Further reading can be found by in preprint:
http://ssrn.com/abstract=5020937
Architecture of EcoPyCSim:
Framework of centralized training and decentralized execution:
5.1. Setting up of cloud scheduling environment
In the experiments, we created four job groups with varying quantities (50, 100, 300, and 500), corresponding to task counts of 600, 800, 3600, and 9000. Three combinations of server farms and servers were configured in the scheduling environment:(2, 5), (5, 30) and (10, 100). This setup was designed to test performance under different task loads and resource configurations. The MADDPG algorithm was utilized to train the
MARL-based scheduler in the simulation environment. Our scheduling objective is to optimize job allocation to minimize energy consumption while ensuring the completion of each job within its specified deadline. The training was performed using the hyperparameters listed in Table 2.
Job data was generated from the Google Cluster Trace dataset. Based on the specified number of jobs, the program extracts an equivalent number of DAGs. Each DAG was transformed into a job, with each node in the DAG converted into a task and the edges representing the interdependencies between tasks. The arrival time sequence of jobs followed a Poisson process. The mean inter-arrival time was determined by the inverse of the arrival rate, with λ = 0.5. The execution time of each task was automatically generated according to a normal distribution N(µ, σ2). The resource requests CPU and RAM for each task come from the relevant properties in the job.
The server farm was divided based on the input values for the number of server farms and the total number of servers. Each server can accommodate up to 10 VMs. The request state of each VM is represented by a binary value: 0 indicates an initial state, while 1 indicates an active state. The power consumption parameters for the servers followed a model based on CPU utilization. When the CPU utilization rate falls within the optimal range, the power consumption is calculated as the product of the optimal utilization rate and the coefficient α. However, if CPU utilization exceeds this optimal range, the additional power consumption is determined by squaring the exceeded portion and multiplying it by the coefficient β, thereby accounting for the increased energy demand. The key parameters related to jobs and computational resources are summarized in Table 3, detailing the specifics of jobs, VM configurations, and server capacities used in the experiments.
5.2. Training Experiments and Results
The experiments were conducted in groups, varying the number of jobs, server farms, and servers. Based on the parameter configurations outlined in the previous subsection, we trained the energy-aware scheduling policy using the MADDPG algorithm. Figure 6 shows the reward curves for the server agent and server farm agent during the training process, where 300 jobs are scheduled across 100 servers and 10 server farms. The
agents initially explore various strategies, leading to fluctuations in performance. As training progresses, the agents trend to more efficient scheduling policies, as evidenced by the increasing rewards. The results demonstrate that the MADDPG algorithm effectively learns to optimize cloud scheduling decisions in a multi-agent environment.
Furthermore, we evaluated the performance of the models after training 1000 episodes of the MADDPG algorithm and compared the scheduling metrics with the Random strategy in terms of wall time which represents job completed time, total energy consumption and average electricity price.
Figure 7 shows the task completion times for three distinct configurations of servers and server farms across four different workloads. As shown in Figure 7, task completion time rises for both algorithms as the number of jobs increases. For the same job count, MADDPG completes tasks in slightly less time than the Random algorithm, requiring approximately 96% of Random’s completion time, a trend consistent across all three
server configurations.
Figure 8 represents the total energy consumption values for task execution under this comparison condition. Energy consumption varies significantly among the three groups of computing resources with different configurations. As the number of servers and server farms increases, the energy required to complete the same number of jobs also rises. Within the same group of computing resources, as the number of jobs gradually increases, energy consumption under the Random scheduling strategy rises sharply, whereas the energy consumption curve for MADDPG remains relatively stable. Even with 500 jobs, 100 servers, and 10 server farms, the energy consumption remains lower than at 300 jobs, indicating that the MADDPG agent becomes more energy-efficient as the job count increases.
Figure 9 depicts the average electricity prices for different jobs under this comparison condition. From the figure, it can be seen that for Random scheduling, the average electricity cost required to complete the same number of jobs increases as the number of servers and server farms grows. In contrast, the average electricity cost under MADDPG remains steady regardless of the number of jobs and servers. Overall, MADDPG scheduling results in a more stable and significantly lower average electricity cost compared to Random scheduling.
In summary, the experiments validate the effectiveness of CloudSchedulingEnv for seamless integration with MADDPG in EcoPyCSim. Results also demonstrate that MADDPG significantly outperforms the Random strategy in terms of both energy consumption and average electricity price.
The limitation of this simulator is that:
Firstly, it currently does not support other additional MARL algorithms such as QMIX[9], MAPPO[10], and VDN[11], which would provide a broader toolkit for developers to enable a more comprehensive research on various algorithmic approaches to cloud task scheduling and resource management to prompt comparative studies and innovation within the framework.
Secondly, it only supports Google cluster trace dataset from 2011. Further improvements such as Alibaba or custom datasets would allow the simulator to better reflect the diverse real-world scenarios.
Thirdly, it only supports a VM-based resource configuration model. It does not support a container-based resource architectures to accurately simulate real-world cloud computing environments.
Fourthly, as job volumes and resource numbers increase, simulation performance could become a bottleneck. Future improvements may focus on optimizing the performance of EcoPyCSim by utilizing parallel training methods to support larger-scale simulation effectively.
This study proposed a novel job scheduling simulator EcoPyCSim that was designed for MARL-based energy-aware research. A standard learning environment CloudSchedulingEnv was implemented based on the PettingZoo framework, and an electricity consumption model was formulated based on floating electricity prices in the data center of cloud computing. The simulator provides a unified platform for developing, optimizing, and evaluating MARL-based scheduling algorithms in cloud computing. To support further research and enhancement of this project, we have made the system development source code publicly available on GitHub. EcoPyCSim can be further expanded and enhanced in several aspects:
[1] X. Zhang, N. Wuwong, H. Li, X. Zhang, Information security risk management framework for the cloud computing environments, in: 2010 10th
IEEE international conference on computer and information technology,
IEEE, 2010, pp. 1328–1334.
[2] A. Motlagh, A. Movaghar, A. Rahmani, A. Masoud, Task scheduling
mechanisms in cloud computing: A systematic review, International Journal of Communication Systems 3 (2020).
[3] M. Koot, F. Wijnhoven, Usage impact on data center electricity needs: A
system dynamic forecasting model, Applied Energy 291 (2021) 116798.
[4] H. Hou, A. Ismail, Eets: An energy-efficient task scheduler in cloud
computing based on improved dqn algorithm, Journal of King Saud
University-Computer and Information Sciences (2024) 102177.
[5] J. Terry, B. Black, N. Grammel, M. Jayakumar, A. Hari, R. Sullivan, L. S.
Santos, C. Dieffendahl, C. Horsch, R. Perez-Vicente, et al., Pettingzoo:
Gym for multi-agent reinforcement learning, Advances in Neural Information Processing Systems 34 (2021) 15032–15043.
[6] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman,
J. Tang, W. Zaremba, Openai gym, arXiv preprint arXiv:1606.01540
(2016)
[7] M. Towers, J. K. Terry, A. Kwiatkowski, J. U. Balis, G. de Cola,
T. Deleu, M. Goulao, A. Kallinteris, A. KG, M. Krimmel, R. Perez- ˜
Vicente, A. Pierre, S. Schulho ´ ff, J. J. Tai, A. T. J. Shen, O. G. Younis,
Gymnasium, 2023. doi:10.5281/zenodo.8127026.
[8] J. Wilkes, More Google cluster data, Google research blog,
2011. Posted at http://googleresearch.blogspot.com/2011/11/
more-google-cluster-data.html.
[9] T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster,
S. Whiteson, Monotonic value function factorisation for deep multi-agent
reinforcement learning, Journal of Machine Learning Research 21 (2020)
1–51.
[10] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, Y. Wu, The surprising effectiveness of ppo in cooperative multi-agent games, Advances
in Neural Information Processing Systems 35 (2022) 24611–24624.
[11] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi,
M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, et al., Valuedecomposition networks for cooperative multi-agent learning, arXiv
preprint arXiv:1706.05296 (2017)