GridEngine qping

From GridWiki
Jump to: navigation, search

This is a placeholder page created by Chris dag to document qping's dense output and the various ways that qping can be used to profile, troubleshoot and diagnose Grid Engine performance and scheduling internals.

The reason for this page is an interesting thread on SGE scheduler profiling on the SGE users mailing list. The discussion thread can be found here: http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7823

In the discussion, Andy recommends the following temporary configuration change:

qmaster_param MONITOR_TIME=0:0:10

And then using the command:

qping -i 10 -f <qmaster_host> <port> qmaster 1

Which shows all sorts of cool data that is pretty hard for a non-developer to understand:

...

02/08/2006 07:40:35:
SIRM version:             0.1
SIRM message id:          1
start time:               02/07/2006 14:37:53 (1139344673)
run time [s]:             61362
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 2000
status:                   0
info:                     TET: R (9.95) | EDT: R (0.01) | SIGT: R (61250.28) | MT(1): R (0.01) | MT(2): R (0.12) | OK
Monitor:
02/08/2006 07:40:26 | TET: runs: 0.80r/s (pending: 12.00 executed: 0.60/s) out: 0.00m/s APT: 0.3370s/m idle: 73.04% wait: 3.16% time: 10.00s
02/08/2006 07:40:32 | EDT: runs: 7.52r/s (clients: 1.00 mod: 0.00/s ack: 0.00/s blocked: 0.00 busy: 1.00 | events: 9.12/s added: 9.12/s skipt: 0.00/s) out: 0.00m/s APT: 0.0002s/m idle: 99.86% wait: 0.00% time: 9.97s
02/07/2006 14:39:45 | SIGT: no monitoring data available
02/08/2006 07:40:33 | MT(1): runs: 4.91r/s (execd (l:4.48,j:4.91,c:4.48,p:4.48,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.11m/s APT: 0.0032s/m idle: 98.44% wait: 0.21% time: 9.37s
02/08/2006 07:40:33 | MT(2): runs: 4.10r/s (execd (l:3.57,j:4.10,c:3.57,p:3.57,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.11m/s APT: 0.0051s/m idle: 97.92% wait: 1.11% time: 9.52s


02/08/2006 07:40:45:
SIRM version:             0.1
SIRM message id:          2
start time:               02/07/2006 14:37:53 (1139344673)
run time [s]:             61372
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 2000
status:                   0
info:                     TET: R (5.53) | EDT: R (0.06) | SIGT: R (61260.30) | MT(1): R (0.06) | MT(2): R (0.07) | OK
Monitor:
02/08/2006 07:40:36 | TET: runs: 0.20r/s (pending: 9.00 executed: 0.20/s) out: 0.00m/s APT: 0.0000s/m idle: 100.00% wait: 0.00% time: 10.00s
02/08/2006 07:40:42 | EDT: runs: 12.20r/s (clients: 1.00 mod: 0.00/s ack: 0.00/s blocked: 0.00 busy: 1.00 | events: 15.00/s added: 15.00/s skipt: 0.00/s) out: 0.00m/s APT: 0.0002s/m idle: 99.75% wait: 0.00% time: 10.00s
02/07/2006 14:39:45 | SIGT: no monitoring data available
02/08/2006 07:40:43 | MT(1): runs: 5.57r/s (execd (l:5.48,j:5.57,c:5.48,p:5.48,a:0.00)/s GDI (a:0.00,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.10m/s APT: 0.0314s/m idle: 82.51% wait: 16.34% time: 10.23s
02/08/2006 07:40:43 | MT(2): runs: 5.22r/s (execd (l:5.03,j:5.12,c:5.03,p:5.03,a:0.00)/s GDI (a:0.10,g:0.00,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s event-acks: 0.00/s) out: 0.49m/s APT: 0.0340s/m idle: 82.25% wait: 14.60% time: 10.15s


...


I'd like to start this page as a potential resource for documenting:

  1. Various ways of using 'qping' for diagnostic or profiling work
  2. Details on how to understand the output of 'qping'

See also