Integrating PEST and Grid Engine
From GridWiki
Parameter ESTimation (PEST) is an open source program that will find, if possible, the optimum set of parameters for any simulation system. It uses a template of the input data sets, varying those parameters indicated by certain flags, and compares the output from the simulation against observations. It has a wide range of capabilities and features, most beyond my understanding. The optimization procedures require multiple model runs, lending themselves quite nicely to running on computational clusters. PEST comes with a parallel mode that uses files on a shared file system to communicate messages between parallel PEST (ppest) and the slaves (pslave).
I must admit that I have yet to use PEST for a complete project. In ramping up to use it for several projects, I worked up an integration of parallel PEST and Grid Engine.
More information about PEST is available at http://www.sspa.com/pest. Windows and UNIX/Linux versions are available for download.
I wrote this some time ago and then discovered array jobs. I tried to rewrite this script to use array jobs, but it was getting real messy so I kept my old brute force method. If someone else has a go at it with array jobs, please post to the wiki. This script creates a new Run Management File (*.rmf), so if you are reading in the PEST manuals about all of the files you have to create, the rmf is the one you don't have to worry about.
You should set max_num_slaves and EMAIL to something that makes sense for your system.
rppest script:
#!/bin/sh
# GridEngine script to run parallel PEST
# $1 is number of slaves to run
# $2 is name of PEST case (.pst)
# $3 is run_time estimate
# $* is the model executable and any arguments
# Adjust the following to match your needs
WHOAMI=`whoami`
EMAIL=${WHOAMI}@sjrwmd.com
max_num_slaves=`qstat -g c -l arch=lx24-amd64 -q '*_core' | awk 'NR > 2 {sum = sum + $4} END {print int(sum - 1 - 0.1*sum)}'`
# Shouldn't have to change anything beyond this line. Probably.
cmd_name=`basename $0`
if [ $# -le 3 ]; then
echo "$cmd_name requires at least 4 arguments
Example:
$cmd_name 3 pest_case.pst run_time model [model_arguments]
3 = number of slaves to use
pest_case.pst = name of PEST .pst file (no spaces)
run_time = overestimate of model run time in seconds
model = model executable
model_arguments = optional arguments to model executable"
exit
fi
num_slaves=$1
test_case=$2
run_time=$3
shift
shift
shift
# I make the following tests to allow for the pest_case.pst file to be entered
# without the pst extension.
# Remove .pst/PST for test_case
test_case=`basename ${test_case} .pst`
test_case=`basename ${test_case} .PST`
# Check to make sure .pst/.PST file exists
if [ ! -r ${test_case}.pst -o -r ${test_case}.PST ]; then
echo "${test_case}.pst or ${test_case}.PST not found."
exit
fi
# Check to make sure not asking for too many slaves
if [ $num_slaves -gt $max_num_slaves ]; then
num_slaves=$max_num_slaves
echo "Number of slaves set to maximum allowable = ${max_num_slaves}."
fi
# Want full pathname to model executable.
program=`which $1`
# Add in the remainder of the arguments.
shift
program="${program} $*"
# This is the model executable or script to be run by the slaves.
if [ -f runfile ]; then
rm runfile
fi
echo "$program" > runfile
# This removes the old Run Management File.
if [ -f "${test_case}.rmf" ]; then
rm "${test_case}.rmf"
fi
# This creates the header for the new Run Management File (rmf).
cat << EOT > "${test_case}.rmf"
prf
$num_slaves 0 2 1
EOT
# This creates the line for each slave in the new Run Management File (rnf).
for (( count=1; count <= num_slaves; count++ ))
do
echo "slave${count} ./slave${count}" >> "${test_case}.rmf"
if [ -d "./slave${count}" ]; then
rm -r "./slave${count}"
fi
mkdir "slave${count}"
done
# Last line in the Run Management File (rmf) is an estimate of run times.
for (( count=1; count <= num_slaves; count++ ))
do
echo -n "${run_time} " >> "${test_case}.rmf"
done
echo ""
# Run the slaves first.
present_dir=`pwd`
slave_qid=''
pslave_exe=`which pslave`
for (( count=1; count <= num_slaves; count++ ))
do
cd "slave${count}"
# I can't use < redirection because it tries to feed to qsub, so have to use
# '-i' option
# Collect the job-id in testr variable
testr=`qsub -cwd -j y -V -o pslave.out -i ../runfile -b y ${pslave_exe} | awk '{print $3}'`
# Collect all of the job-ids in slave_qid
slave_qid="${slave_qid} ${testr}"
cd "$present_dir"
done
# Run parallel PEST, block, delete all slaves from queue when ppest is done.
# Will work later at whether to restart optimization or not.
if [ -f 'ppest.out' ]; then
rm -f ppest.out
fi
(qsub -cwd -j y -M ${EMAIL} -V -m eas -o ppest.out -sync y -b y `which ppest` $test_case; qdel ${slave_qid}) &
