- 1 Introduction
- 2 Project Overview
- 3 System Architecture
- 4 Functional Definition
- 5 Component Descriptions
- 6 Alternatives
- 7 Appendix
The need for presubmission procedures is known since long.
Aim of the project is to provide a plug-in interface that allows jobs be verified/modified in accordance with site-specific needs. See also issue #2621.
Benefit is more clever capabilities for Sun Grid Engine administrators to implement site-specific policies.
- there should be one client-side plug-in that is called by qsub/qrsh/qsh/qlogin/qmon before a job is sent to qmaster
- in addition there should be a qmaster-side plug-in that is called by sge_qmaster before a new job and modifications to existing jobs are accepted
- use of the client-side plug-in for qalter is problematic due to qalter knowing merely about a sub-set of the job attributes
- script-based plug-ins seems sufficient at first, but shared library plug-in interfaces must be still possible as future enhancement e.g. for better performance
- the concept "script-based plug-in" shall be understood as a plug-in interface that is suited for any kind of executable i.e. shell scripts and binaries
- plug-in input data to cover all relevant job information
- plug-in exit status controls whether (a) job can be accepted as is, (b) job is to be rejected, (c) the corrected shall be accepted, or (d) the job is to be reject now, even though it's acceptance at a later point in time is possible
- in case (c) plug-in output data to contain all relevant job information of the corrected job
- data format for plug-in input/output should reuse DRMAA job template concept
- for enabling/disabling script plug-ins a stat(2)-based implementation on two well-known path names somewhere in $SGE_ROOT could be fine
Overall Block Diagram
The block diagram consists of
- Sun Grid Engine
- client-side presubmission plug-in
- qmaster-side presubmission plug-in
As to make them interchangeable client-side and the qmaster-side plug-in functions shall have the same interface:
int presubmission_procedure(drmaa_job_template_t *jt, void *data, bool client_side, dstring *diag)
a submission procedure can return four different return codes
|PRESUB_PROC_ACCEPT||0||accept job template as is|
|PRESUB_PROC_CORRECT||1||job template was corrected, but is fine now|
|PRESUB_PROC_REJECT||2||job template was rejected|
|PRESUB_PROC_TRYLATER||3||job template was rejected, but might be accepted later|
|PRESUB_PROC_ERROR||any other||error condition in presubmission function|
for evaluation and correction of a job template functional counterparts of the DRMAA C library functions documented in drmaa_jobtemplate(3) could be used. For a first implementation the following job template attributes (see drmaa_attributes(3)) were needed in the DRMAA job template 'jt' at least:
|DRMAA job template attribute||submission options|
|drmaa_native_specification||-pe pe_name pe_range -masterq wc_queue_list -hard -q wc_queue_list -soft -q wc_queue_list -hard -l rsrc_list -soft -l rsrc_list -ckpt ckpt_name -P project -A account|
Script-based plug-in component
The first implementation will support a client-side and a server-side script plug-in
if these files exist and are executable they will be launched as plug-in.
The client-side plug-in will be launched by qsub/qrsh/qsh/qlogin/qresub/qmon each time a new job gets submitted
The server-side plug-in will be launched by sge_qmaster before a new job arrives and before an existing job is altered via qalter or qmon.
When a script plugin is launched it gets as input a text representation of all supported DRMAA attributes such as
drmaa_job_name=myjob drmaa_native_specification=-pe make 17 -soft -q wish.q -hard -q all.q -l h_rt=::60 -soft -l h_rt=1:: -masterq wish.q -P myproject -ckpt myckpt -A full_account drmaa_input_path=:/dev/null drmaa_output_path=:/path/to/output drmaa_error_path=:/path/to/error drmaa_join_files=no drmaa_wd=/jobs/working/directory
to change the -pe slot request amount from 17 to 32 the script plug-in must echo like this
drmaa_job_name=myjob drmaa_native_specification=-pe make 32 -soft -q wish.q -hard -q all.q -l h_rt=::60 -soft -l h_rt=1:: -masterq wish.q -P myproject -ckpt myckpt -A full_account drmaa_input_path=:/dev/null drmaa_output_path=:/path/to/output drmaa_error_path=:/path/to/error drmaa_join_files=no drmaa_wd=/jobs/working/directory
or like this
drmaa_native_specification=-pe make 32 -soft -q wish.q -hard -q all.q -l h_rt=::60 -soft -l h_rt=1:: -masterq wish.q -P myproject -ckpt myckpt -A full_account
because only 'drmaa_native_specification' changes.
As to make the change effective the presubmission plug-in must exit with '1' as to indicate it's intend to correct the job:
- 0 no change
- 1 correction via stdin
- 2 job rejection for the reason to be provided via stderr
- 3 job presently rejected for the reason to be provided via stderr
- 4-255 error in presubmission plugin
You can write wrapper scripts that perform a test, either of the job (MPI? Condor?...), or of the grid/cluster state (for example - how many processors are available?) that could accomplish this task. This puts more responsibility on the admin rather than GridEngine developers. For a crude example wrapper script see Utilities.