DispatchingDiagnosisOnDemand

From GridWiki
Jump to: navigation, search

Introduction

End of 2007 community members started to complain about memory growth in schedd. Suspicion was it could be a memory leak. It took long time and lots of work until it turned out finally that 'schedd_job_info' has severe problems. Conclusion of those engineers who helped those community members was that 'schedd_job_info' is not just buggy, but the root of all evil.

Project Overview

Project Aim

Project aim is to make enhance qalter -w e|w|n|v so in a way that schedd_job_info in sched_conf(5) and qconf -tsm is not needed any more as required before #2548 can be worked on.

Secondary aim is to make qalter -w e|w|n|v better usable in general as covered by #2598 and #2607

Project Benefit

Benefit is that 'schedd_job_info' can be removed. This will bring smaller daemon memory footprint and faster dispatching by the scheduler rsp. actually the dispatching will not become faster, but 'schedd_job_info' known as performance-brake since very long time won't be needed anymore.

Project Duration

Hard to tell. As release for this RFE we target for a past 6.2 version.

Project Dependencies

No dependencies.

System Architecture

Enhancement Functions

Overall Block Diagram

Functional Definition

Performance

User Experience

  • Must maintain ability to determine scheduler runtimes ala the first and last lines generated by qconf -tsm, as well as when the scheduler last completed.
  • Must be similar in performance to qconf -tsm, to determine scheduling results of all jobs in a grid. (taking seconds, not minutes.)

Quality Assurance

Documentation

Installation

Component Descriptions

Changes with -w e|w|n|v submit option

Overview

Functionality

Interfaces

In submit(1) a new option argument 'V' would be added. The current 'v' description

         `v'  verify - does not submit the job but prints an
              extensive validation report.

would be split up into into

         `p'  poke - does not submit the job but prints a
              validation report based on a cluster as is with
              all resource utilizations in place.
         `v'  verify - does not submit the job but prints a
              validation report based on an empty cluster.

Other Requirements

Component #2 Name

Overview

Functionality

Interfaces

Other Requirements

Appendix

Appendix #1 Name

Appendix #2 Name