PowerSaving

From GridWiki
Jump to: navigation, search

Contents

Introduction

After Grid Computing has become widely accepted, people ask more and more for the price of the overall service. Since the electricity bill contributes significantly to the overall costs, Grid Engine users ask for means to power off execution hosts during times when there is actually no workload for utilizing them.

Note: This document is under Grid Engine project SISSL license.

Project Overview

Project Aim

Overall aim is to minimize power consumption without actually affecting Grid Engine job scheduling service.

Project Benefit

Minimizing power consumption will help to lower the electricity bill and thus make Grid Computing more cost-effective.

Project Duration

Will see.

Project Dependencies

System Architecture

Enhancement Functions

Both the Grid Engine job scheduler core system and the service domain manager will need to be enhanced.

Overall Block Diagram

Imagine one of these nice block diagrams pictures.

Components apparently are

  • Grid Engine job scheduler (= core system)
  • Grid Engine service domain manager (= Hedeby)

switching on/off the execution nodes can work only, if there are interfaces that let individual nodes be switched on/off without human intervention.

Functional Definition

Switching on/off execution nodes shall be under control of the service domain manager, while job scheduler provides information about current and future execution node utilization.

Performance

Power saving may have no or almost no impact on the performance of the job scheduler.

Diagnostics

Diagnostics must allow error conditions be examined that may occur with switching on/off.

User Experience

End users

For end-users of Grid Engine job scheduler power saving shall be as invisible as possible: All Grid Engine interfaces shall be usable as today and must also have the same timing characteristics. Only caveat is the switching-on delay in the execution of workload, but this shall affect workload processing only when it is "instantly" given to Grid Engine. For workload that was passed to Grid Engine "early enough" any execution delay shall be ruled out.

Administrators and operators

Grid Engine administrators and operators must have a means

  • to monitor which nodes are currently on/off
  • to monitor when it is planned to switch a node on/off
  • to overrule power saving for nodes or groups of nodes as to rule out switching-on delays at peak times

Quality Assurance

Documentation

Installation

Component Descriptions

Grid Engine core system

Overview

The task of Grid Engine job scheduler is to provide information about current and future execution node utilization. Besides it must consider also switched off nodes with reservation scheduling.

Functionality

  • it is necessary to keep also switched off nodes in the list of execution nodes, but there is a need for a "switched-off" state with execution hosts so that the job scheduler (a) continues to consider switched-off nodes with it's reservation scheduling and (b) allows advance reservations be requested also on switched-off nodes
  • need a means to extract schedule information from the job scheduler so that a power on/off decision making component can act in sync with job scheduler. That is needed for comprehensible and reasonable behaviour of the overall arrangement. Schedule information must compromise a snapshot of node allocations due to (a) advance reservations, (b) scheduled workload "-R y" and (c) unscheduled workload "-R n"
  • ensure job workload without runtime limitation is orderly reflected (=infinity) in the schedule snapshots

Interfaces

Two enhancements are needed:

  • a means to query a schedule that unveils when each execution node presumably be utilized
  • a power-off state for execution nodes as to let Grid Engine know the reason for unknown state

Other Requirements

The two interfaces must be usable from within a hedeby adaptor

Hedeby

Overview

The task of Hedeby is to switch on/off execution nodes for power saving.

Functionality

As for making reasonable decisions the bootup time of each execution node must be taken into account by the power-saving decision making component. Similarly a per node switch-off time is needed.

As soon as a node is moved into the Hedeby spare pool Grid Engine job scheduler must entirely prevent reservation scheduling for the node take place. That means both a

  qrsub -l h=node_name 

and a

  qsub -R y -l h=node_name script.sh

must be rejected. That means in turn, moving nodes from/to other service domains must be enabled on a per node base by the administrator. Any node can thus be either a power-saving node or disposable for Hedeby node scheduling.

Interfaces

Other Requirements

Appendix

Appendix #1: Resource reservation use case

Utilization diagram as to make the case why reservation scheduling needs to be done also for switched off nodes: power_off.pdf

The diagram shows two dual-cpu nodes

  • the blue job runs already
  • the yellow job waits for a reservation
  • the red job can not be started as this would apparently defer the yellow jobs reservation

in that situation anyone would expect node #2 be switched off for some 9.75 hours. But if switching off removes a node entirely from the screen of the Grid Engine scheduler, the reservation of the yellow job would loose it's resource reservation, so that the red job would get an assignment to node #1 and thus eat up the yellow jobs lunch!

Links

Personal tools
Namespaces

Variants
Actions
GridWiki Navigation
Toolbox