Using Ganglia As Load Sensor
From GridWiki
Many Grid Engine users have an implementation of Ganglia cluster monitoring system for their cluster/grid. The script at the bottom of the page is a load sensor that converts the XML output from Ganglia's gmond daemon into the format needed for a Grid Engine load sensor.
Most of the load parameters reported by gmond are duplicate content (though named differently) from those already available from Grid Engine. The major advantage is the extensible nature of Ganglia in terms of using their gmetric program.
- Save the script at the bottom of the page to a directory accessible to your slave nodes.
- Assign as load sensor script (if you want it for all nodes, use
globalin place ofnode01below).
qconf -mconf node01 ... load_sensor /path/to/gmond_load_sensor.sh ...
- Add a item to the complexes. The complex definition below is an example to report the
os_release.
qconf -mc ... #name shortcut type relop requestable consumable default urgency #------------------------------------------------------------------------------------------- os_release os_release CSTRING == YES NO NONE 0 ...
- Test
qhost -F os_release
...
node01 lx24-amd64 2 2.02 2.0M 254.7M 2.0M 144.0K
Host Resource(s): hl:os_release=2.6.9-11.ELsmp
...
Since our added complex is requestable, you can include in the job submission. For example: qsub -l os_release=2.6.9-11.ELsmp myjob.sh
gmond_load_sensor.sh
#!/bin/sh
#
# (c) 2007 Tim Cera
# GPLv2
# example for a load sensor script that uses gmond information from the ganglia
# project.
#
# Be careful: Load sensor scripts are started with root permissions.
# In an admin_user system euid=0 and uid=admin_user
#
# Change to the port configured in /etc/gmond.conf. Ganglia default is 8649.
ganglia_port=8649
telnet=`which telnet`
PATH=/bin:/usr/bin
ARCH=`$SGE_ROOT/util/arch`
HOST=`$SGE_ROOT/utilbin/$ARCH/gethostname -name`
end=false
while [ $end = false ]; do
# ----------------------------------------
# wait for an input
#
read input
result=$?
if [ $result != 0 ]; then
end=true
break
fi
if [ "$input" = "quit" ]; then
end=true
break
fi
# ----------------------------------------
# send mark for begin of load report
echo "begin"
xml2load=`$telnet localhost $ganglia_port 2> /dev/null | \
sed \
-e "/HOST.*${HOST}/,/HOST>/ ! d" \
-e "s/^.*<METRIC NAME=\"/${HOST}:/g" \
-e '/HOST/ d' \
-e 's/" VAL="/:/g' \
-e 's/" TYPE.*>//g' \
`
# Add whatever changes you need to coordinate with GridEngine
# example
# -e 's/load_one/load_short/g' \
# OR you can add the complex 'load_one'
# ----------------------------------------
# send load information
for line in $xml2load
do
echo $line
done
# ----------------------------------------
# send mark for end of load report
echo "end"
done