Integrating BOINC and Grid Engine

From GridWiki
Jump to: navigation, search

BOINC

BOINC - Berkeley Open Infrastructure for Network Computingis a framework for uncoupled grid computing. Idle time can be applied to many established projects. The BOINC client is available for many platforms, though some projects only work on specific platforms.

Integration with Grid Engine

1. Create a top level directory with sub-directories for each queue.

setenv BOINC_INSTALL_DIR /mnt/boinc
mkdir ${BOINC_INSTALL_DIR}
mkdir ${BOINC_INSTALL_DIR}/boinc@node01
mkdir ${BOINC_INSTALL_DIR}/boinc@node02

2. Download the BOINC client.

cd ${BOINC_INSTALL_DIR}
# Get the latest for your platform
wget http://boincdl.ssl.berkeley.edu/dl/boinc_5.10.21_i686-pc-linux-gnu.sh  
# Run whatever you downloaded, creating a sub-directory BOINC
boinc_5.10.21_i686-pc-linux-gnu.sh

3. Edit the 'binstall.sh' file.

cd ${BOINC_INSTALL_DIR}/BOINC
# make sure that 'boinc' works - I can't get any of the 64-bit versions 
# working because of mismatched libraries
./run_client
# quit out of the client
# remove the leading 'cd BOINC && ' from 'binstall.sh'
vi binstall.sh

4. Copy contents of BOINC into each node sub-directory.

cp -r ${BOINC_INSTALL_DIR}/BOINC/* ${BOINC_INSTALL_DIR}/boinc@node01
cp -r ${BOINC_INSTALL_DIR}/BOINC/* ${BOINC_INSTALL_DIR}/boinb@node02

5. Change directory and then run 'binstall.sh' in each node sub-directory.

cd ${BOINC_INSTALL_DIR}/boinc@node01; sh binstall.sh
cd ${BOINC_INSTALL_DIR}/boinc@node02; sh binstall.sh

6. Create a BOINC account. I strongly suggest one of the BOINC account managers such as GridRepublic or BAM! which allow for centralized control of the hosts you add to BOINC and the projects that they work on. Alternatively, you could create an account at each project's web site that you want to support.

7. It would be nice to automate this step, but I couldn't figure out a way to do it. Login to each node, go to that node's directory, run the client and manager to login to the account you created.

ssh -X node01
cd ${BOINC_INSTALL_DIR}/boinc@node01
./run_client
./run_manager
# In the BOINC Manager, login to the account created above and change any preferences.
# Verify that the node is working on the BOINC projects that you signed up for.
# IMPORTANT! Use the BOINC Manager to force the client to quit.
# Quit out of the manager
# logout
exit

ssh -X node02
cd ${BOINC_INSTALL_DIR}/boinc@node02
./run_client
./run_manager
# In the BOINC Manager, login to the account created above and change any preferences.
# Verify that the node is working on the BOINC projects that you signed up for.
# IMPORTANT! Use the BOINC Manager to force the client to quit.
# Quit out of the manager
# logout
exit

8. Create BOINC cluster queue (1 slot per node - BOINC automatically uses all processors by default) and add the nodes above to the queue.

qconf -aq boinc
# Replace 'NONE' on the 'hostlist' line with the nodes (or host groups) you want to run BOINC
 ...
 hostlist    node01 node02
 ...
# Change rerun to true.
 ...
 rerun       TRUE
 ...
# Might not be necessary, but I also removed the INTERACTIVE flag.
 ...
 qtype       BATCH
 ...

9. Sub-ordinate the BOINC cluster queue to any of the real work queues that you have and trigger the sub-ordinate suspension when there is 1 or more jobs in any real work queues.

qconf -mq dual_core
# 
 ...
 subordinate_list      boinc=1
 ...

10. Run the jobs. I wrote a little script to handle this, but you should only have to do this once, so up to you how to do it.

#!/bin/sh

BOINC_INSTALL_DIR=/mnt/boinc

for node in \
   node01 \
   node02 \
   ;
do
 node_path=${BOINC_INSTALL_DIR}/boinc@${node}
 cd ${node_path}
 qsub -q boinc@${node} -cwd -N ${node} /sjr/beodata/local/bin/vanilla_job.sh ${node_path}/run_client
done

10A. Alternate way to run jobs. We regularly add or remove nodes from our cluster, and add or remove nodes from the BOINC queue. To make life easier, I submit the BOINC tasks as an array job using the following scripts. By doing that, GridEngine takes care of launching one instance of BOINC per available node, and you don't need to maintain the list of nodes in the launching script.

The start_boinc.sh script launches the array job.

#!/bin/bash

BOINC_DIR=/usr/global/boinc
BOINC_SLOTS=`qstat -g c -q boinc.q | tail -1 | sed -e "s/ [ ]*/ /g" | cut --delim=\  -f 5`

qsub -q boinc.q -t 1-${BOINC_SLOTS} -o boinc -j y -V -cwd -N boinc ${BOINC_DIR}/run_boinc.sh

And the run_boinc.sh script launches an individual BOINC task.

#!/bin/bash

BOINC_DIR=/usr/global/boinc
SHORT_HOST=`hostname -s`
NODE_PATH=${BOINC_DIR}/boinc@${SHORT_HOST}

echo HOST=${SHORT_HOST}
echo NODE_PATH=${NODE_PATH}

cd ${NODE_PATH}
./run_client


11. Make sure that Grid Engine jobs do not use the boinc cluster queue. How to do this is specific to your cluster setup and use.

12. That should be it. Suspension of the sub-ordinated boinc queue suspends the 'run_client' job. At this point everything 'works', the node is utilized by BOINC until required by a gridengine job.

Suspend/Resume?

The sub-ordinate queue suspension might mean that BOINC work can be lost since the 'run_client' job is stopped and restarted. I created a short script below to suspend and resume (or restart as needed) the BOINC 'run_client'. Frankly, the following step may not be necessary - I haven't verified whether 'run_client' does the right thing or not when stopped/restarted.

#!/bin/sh

# suspends and resumes/restarts BOINC run_client

host=${1}
command=${2}

BOINC_INSTALL_DIR=/mnt/boinc

node_path=${BOINC_INSTALL_DIR}/boinc@${host}

cd ${node_path}

prj_urls=`boinc_cmd --get_project_status | grep 'master URL' | awk '{print $3}'`

if [ ${command} = 'resume' ]; then
   # if client already running this just errors out.
   ${node_path}/run_client
fi

for url in ${prj_urls};
do
   ${node_path}/boinc_cmd --project ${url} $2
done

And then modify the queue to use the above script for the suspend and resume method.

qconf -mq boinc
 ...
 suspend_method        /mnt/boinc/boinc_controller.sh $host suspend
 resume_method         /mnt/boinc/boinc_controller.sh $host resume
 ...