PresubmissionProcedures

From GridWiki
Revision as of 18:51, 24 September 2011 by Dag (talk | contribs) (Reverted edits by Ularedmond (talk) to last revision by AndreasHaas)
Jump to: navigation, search

Introduction

The need for presubmission procedures is known since long.

Project Overview

Project Aim

Aim of the project is to provide a plug-in interface that allows jobs be verified/modified in accordance with site-specific needs. See also issue #2621.

Project Benefit

Benefit is more clever capabilities for Sun Grid Engine administrators to implement site-specific policies.

Project Duration

Will see.

Project Dependencies

None.

System Architecture

Enhancement Functions

  • there should be one client-side plug-in that is called by qsub/qrsh/qsh/qlogin/qmon before a job is sent to qmaster
  • in addition there should be a qmaster-side plug-in that is called by sge_qmaster before a new job and modifications to existing jobs are accepted
  • use of the client-side plug-in for qalter is problematic due to qalter knowing merely about a sub-set of the job attributes
  • script-based plug-ins seems sufficient at first, but shared library plug-in interfaces must be still possible as future enhancement e.g. for better performance
  • the concept "script-based plug-in" shall be understood as a plug-in interface that is suited for any kind of executable i.e. shell scripts and binaries
  • plug-in input data to cover all relevant job information
  • plug-in exit status controls whether (a) job can be accepted as is, (b) job is to be rejected, (c) the corrected shall be accepted, or (d) the job is to be reject now, even though it's acceptance at a later point in time is possible
  • in case (c) plug-in output data to contain all relevant job information of the corrected job
  • data format for plug-in input/output should reuse DRMAA job template concept
  • for enabling/disabling script plug-ins a stat(2)-based implementation on two well-known path names somewhere in $SGE_ROOT could be fine

Overall Block Diagram

The block diagram consists of

  • Sun Grid Engine
  • client-side presubmission plug-in
  • qmaster-side presubmission plug-in

Functional Definition

Performance

User Experience

Quality Assurance

Documentation

Installation

Component Descriptions

Shared library plug-in component

Overview

Functionality

Interfaces

As to make them interchangeable client-side and the qmaster-side plug-in functions shall have the same interface:

  int presubmission_procedure(drmaa_job_template_t *jt, void *data, bool client_side, dstring *diag)

a submission procedure can return four different return codes

Symbolic name Returncode Meaining
PRESUB_PROC_ACCEPT 0 accept job template as is
PRESUB_PROC_CORRECT 1 job template was corrected, but is fine now
PRESUB_PROC_REJECT 2 job template was rejected
PRESUB_PROC_TRYLATER 3 job template was rejected, but might be accepted later
PRESUB_PROC_ERROR any other error condition in presubmission function

for evaluation and correction of a job template functional counterparts of the DRMAA C library functions documented in drmaa_jobtemplate(3) could be used. For a first implementation the following job template attributes (see drmaa_attributes(3)) were needed in the DRMAA job template 'jt' at least:


DRMAA job template attribute submission options
drmaa_job_name -N name
drmaa_native_specification -pe pe_name pe_range -masterq wc_queue_list -hard -q wc_queue_list -soft -q wc_queue_list -hard -l rsrc_list -soft -l rsrc_list -ckpt ckpt_name -P project -A account
drmaa_input_path -i input_path
drmaa_output_path -o output_path
drmaa_error_path -e error_path
drmaa_join_files -j yesno
drmaa_wd -wd job_working_directory

Other Requirements

Script-based plug-in component

Overview

The first implementation will support a client-side and a server-side script plug-in

  • $SGE_ROOT/$SGE_CELL/common/sge_psp_client
  • $SGE_ROOT/$SGE_CELL/common/sge_psp_server

if these files exist and are executable they will be launched as plug-in.

The client-side plug-in will be launched by qsub/qrsh/qsh/qlogin/qresub/qmon each time a new job gets submitted

The server-side plug-in will be launched by sge_qmaster before a new job arrives and before an existing job is altered via qalter or qmon.

Functionality

When a script plugin is launched it gets as input a text representation of all supported DRMAA attributes such as

  drmaa_job_name=myjob
  drmaa_native_specification=-pe make 17 -soft -q wish.q -hard -q all.q -l h_rt=::60 -soft -l h_rt=1:: -masterq wish.q -P myproject -ckpt myckpt -A full_account
  drmaa_input_path=:/dev/null
  drmaa_output_path=:/path/to/output
  drmaa_error_path=:/path/to/error
  drmaa_join_files=no
  drmaa_wd=/jobs/working/directory

to change the -pe slot request amount from 17 to 32 the script plug-in must echo like this

  drmaa_job_name=myjob
  drmaa_native_specification=-pe make 32 -soft -q wish.q -hard -q all.q -l h_rt=::60 -soft -l h_rt=1:: -masterq wish.q -P myproject -ckpt myckpt -A full_account
  drmaa_input_path=:/dev/null
  drmaa_output_path=:/path/to/output
  drmaa_error_path=:/path/to/error
  drmaa_join_files=no
  drmaa_wd=/jobs/working/directory

or like this

  drmaa_native_specification=-pe make 32 -soft -q wish.q -hard -q all.q -l h_rt=::60 -soft -l h_rt=1:: -masterq wish.q -P myproject -ckpt myckpt -A full_account

because only 'drmaa_native_specification' changes.

As to make the change effective the presubmission plug-in must exit with '1' as to indicate it's intend to correct the job:

  • 0 no change
  • 1 correction via stdin
  • 2 job rejection for the reason to be provided via stderr
  • 3 job presently rejected for the reason to be provided via stderr
  • 4-255 error in presubmission plugin

Interfaces

Other Requirements

Alternatives

Wrapper Scripts

You can write wrapper scripts that perform a test, either of the job (MPI? Condor?...), or of the grid/cluster state (for example - how many processors are available?) that could accomplish this task. This puts more responsibility on the admin rather than GridEngine developers. For a crude example wrapper script see Utilities.

Appendix

Appendix #1 Name

Appendix #2 Name