Specifications

From GridWiki
Jump to: navigation, search

Jobs with -a

Before you start with IZ #921/CR #5063310 make sure you have read and understood the specification document about the 6.0 resource reservation and backfilling enhancement. In that document you find explained

  • the relevant terminology
  • all interface changes due to RR+BF 6.0 enhancement
  • a scheduling algorithm code walktrough

in particular try to understand the difference between assignment scheduling and reservation scheduling.

Required changes

  • make blocked -a <time> jobs available in core scheduling algorithm

It is necessary to ensure jobs blocked by -a <time> are not sorted out into the "SPLIT_WAITING_DUE_TO_TIME" job list before actual scheduling starts. Currently -a <time> jobs get always sorted out by function split_jobs() in libs/sched/sge_job_schedd.c

        if (target == NULL &amp;&amp; result_list[SPLIT_WAITING_DUE_TO_TIME] &amp;&amp;
             (lGetUlong(job, JB_execution_time) > sge_get_gmt()) &amp;&amp;
             (ja_task_status == JIDLE)) {
#ifdef JOB_SPLIT_DEBUG
            DPRINTF(("Task "sge_u32" is waiting due to time.\n", ja_task_id));
#endif
            target = &amp;(target_tasks[SPLIT_WAITING_DUE_TO_TIME]);

that is called by scheduler(). This is done because until now -a <time> jobs need not be considered with reservation scheduling so it is sufficient to deal with those jobs in assignment scheduling at the time when they can be started. Past the #921 fix those -a <time> jobs that request a reservation ("-R y") should be available in "SPLIT_PENDING" job list if reservation scheduling is enabled (i.e. sched_conf(5) max_reservation >= 0).

  • prevent jobs blocked by -a <time> are considered with assignment scheduling

Due to some of the -a <time> jobs now being kept in "SPLIT_PENDING" job list there is a need to prevent those jobs that still can not be assigned being considered by assignment scheduling.

  • enhance reservation scheduling code to deal with -a <time> jobs

The current reservation scheduling code always tries to find the earliest possible reservation for any job based the reservation schedule that is kept for each consumable resource associated with any queue instances, execution hosts, the global host and (PE_Type) parallel environments. For storing each of these reservation schedules a resource utilization entry with it's sub-list resource diagram entry is used.

An overview at large on the overall algorithm is given in section "4. Implementation" of 6.0 resource reservation and backfilling enhancement. The algorithm may seem to be complicated, but all in all it does obvious things only and the required change for -a <time> jobs is much easier since reservation time is already known. That means implementing reservation scheduling for those jobs requires merely to enhance existing code in a way that it can handle a case that is easier compared to the already handled cases.


In case of questions please do not hesitate to contact me.