- 1 Introduction
- 2 Project Overview
- 3 System Architecture
- 4 Functional Definition
- 5 Component Descriptions
- 6 Appendix
- 7 Links
After Grid Computing has become widely accepted, people ask more and more for the price of the overall service. Since the electricity bill contributes significantly to the overall costs, Grid Engine users ask for means to power off execution hosts during times when there is actually no workload for utilizing them.
Note: This document is under Grid Engine project SISSL license.
Overall aim is to minimize power consumption without actually affecting Grid Engine job scheduling service.
Minimizing power consumption will help to lower the electricity bill and thus make Grid Computing more cost-effective.
Overall Block Diagram
Imagine one of these nice block diagrams pictures.
Components apparently are
- Grid Engine job scheduler (= core system)
- Grid Engine service domain manager (= Hedeby)
switching on/off the execution nodes can work only, if there are interfaces that let individual nodes be switched on/off without human intervention.
Switching on/off execution nodes shall be under control of the service domain manager, while job scheduler provides information about current and future execution node utilization.
Power saving may have no or almost no impact on the performance of the job scheduler.
Diagnostics must allow error conditions be examined that may occur with switching on/off.
For end-users of Grid Engine job scheduler power saving shall be as invisible as possible: All Grid Engine interfaces shall be usable as today and must also have the same timing characteristics. Only caveat is the switching-on delay in the execution of workload, but this shall affect workload processing only when it is "instantly" given to Grid Engine. For workload that was passed to Grid Engine "early enough" any execution delay shall be ruled out.
Administrators and operators
Grid Engine administrators and operators must have a means
- to monitor which nodes are currently on/off
- to monitor when it is planned to switch a node on/off
- to overrule power saving for nodes or groups of nodes as to rule out switching-on delays at peak times
Grid Engine core system
The task of Grid Engine job scheduler is to provide information about current and future execution node utilization. Besides it must consider also switched off nodes with reservation scheduling.
- it is necessary to keep also switched off nodes in the list of execution nodes, but there is a need for a "switched-off" state with execution hosts so that the job scheduler (a) continues to consider switched-off nodes with it's reservation scheduling and (b) allows advance reservations be requested also on switched-off nodes
- need a means to extract schedule information from the job scheduler so that a power on/off decision making component can act in sync with job scheduler. That is needed for comprehensible and reasonable behaviour of the overall arrangement. Schedule information must compromise a snapshot of node allocations due to (a) advance reservations, (b) scheduled workload "-R y" and (c) unscheduled workload "-R n"
- ensure job workload without runtime limitation is orderly reflected (=infinity) in the schedule snapshots
Two enhancements are needed:
- a means to query a schedule that unveils when each execution node presumably be utilized
- a power-off state for execution nodes as to let Grid Engine know the reason for unknown state
The two interfaces must be usable from within a hedeby adaptor
The task of Hedeby is to switch on/off execution nodes for power saving.
As for making reasonable decisions the bootup time of each execution node must be taken into account by the power-saving decision making component. Similarly a per node switch-off time is needed.
As soon as a node is moved into the Hedeby spare pool Grid Engine job scheduler must entirely prevent reservation scheduling for the node take place. That means both a
qrsub -l h=node_name
qsub -R y -l h=node_name script.sh
must be rejected. That means in turn, moving nodes from/to other service domains must be enabled on a per node base by the administrator. Any node can thus be either a power-saving node or disposable for Hedeby node scheduling.
Appendix #1: Resource reservation use case
Utilization diagram as to make the case why reservation scheduling needs to be done also for switched off nodes: power_off.pdf
The diagram shows two dual-cpu nodes
- the blue job runs already
- the yellow job waits for a reservation
- the red job can not be started as this would apparently defer the yellow jobs reservation
in that situation anyone would expect node #2 be switched off for some 9.75 hours. But if switching off removes a node entirely from the screen of the Grid Engine scheduler, the reservation of the yellow job would loose it's resource reservation, so that the red job would get an assignment to node #1 and thus eat up the yellow jobs lunch!
- Remote Monitoring of Sun x64 Systems using IPMITOOL and IPMIEVD
- Making Scheduling "Cool": Temperature-Aware Workload Placement in Data Centers
- Cool Job Allocation: Measuring the Power Savings of Placing Jobs at Cooling-Efficient Locations in the Data Center
- cluster-powersave algorithm empolyed by the Rocks-solid project