Using Ganglia As Load Sensor

From GridWiki
Jump to: navigation, search

Many Grid Engine users have an implementation of Ganglia cluster monitoring system for their cluster/grid. The script at the bottom of the page is a load sensor that converts the XML output from Ganglia's gmond daemon into the format needed for a Grid Engine load sensor.

Most of the load parameters reported by gmond are duplicate content (though named differently) from those already available from Grid Engine. The major advantage is the extensible nature of Ganglia in terms of using their gmetric program.

  • Save the script at the bottom of the page to a directory accessible to your slave nodes.
  • Assign as load sensor script (if you want it for all nodes, use global in place of node01 below).
qconf -mconf node01
 load_sensor    /path/to/
  • Add a item to the complexes. The complex definition below is an example to report the os_release.
qconf -mc
 #name               shortcut      type        relop requestable consumable default  urgency
 os_release          os_release    CSTRING     ==    YES         NO         NONE     0
  • Test
qhost -F os_release
 node01                  lx24-amd64      2  2.02    2.0M  254.7M    2.0M  144.0K
    Host Resource(s):      hl:os_release=2.6.9-11.ELsmp

Since our added complex is requestable, you can include in the job submission. For example: qsub -l os_release=2.6.9-11.ELsmp

# (c) 2007 Tim Cera

# GPLv2

# example for a load sensor script that uses gmond information from the ganglia
# project.

# Be careful: Load sensor scripts are started with root permissions.
# In an admin_user system euid=0 and uid=admin_user

# Change to the port configured in /etc/gmond.conf.  Ganglia default is 8649.

telnet=`which telnet`


HOST=`$SGE_ROOT/utilbin/$ARCH/gethostname -name`

while [ $end = false ]; do

  # ----------------------------------------
  # wait for an input
  read input
  if [ $result != 0 ]; then

  if [ "$input" = "quit" ]; then

  # ----------------------------------------
  # send mark for begin of load report
  echo "begin"
  xml2load=`$telnet localhost $ganglia_port 2> /dev/null | \
            sed \
                -e "/HOST.*${HOST}/,/HOST>/ ! d" \
                -e "s/^.*<METRIC NAME=\"/${HOST}:/g" \
                -e '/HOST/ d' \
                -e 's/" VAL="/:/g' \
                -e 's/" TYPE.*>//g' \
  # Add whatever changes you need to coordinate with GridEngine
  # example
  #             -e 's/load_one/load_short/g' \
  # OR you can add the complex 'load_one'

  # ----------------------------------------
  # send load information
  for line in $xml2load
    echo $line

  # ----------------------------------------
  # send mark for end of load report
  echo "end"