Integrating BOINC and Grid Engine

From GridWiki

Jump to: navigation, search

BOINC

BOINC - Berkeley Open Infrastructure for Network Computingis a framework for uncoupled grid computing. Idle time can be applied to many established projects. The BOINC client is available for many platforms, though some projects only work on specific platforms.

Integration with Grid Engine

1. Create a top level directory with sub-directories for each queue.

setenv BOINC_INSTALL_DIR /mnt/boinc
mkdir ${BOINC_INSTALL_DIR}
mkdir ${BOINC_INSTALL_DIR}/boinc@node01
mkdir ${BOINC_INSTALL_DIR}/boinc@node02

2. Download the BOINC client.

cd ${BOINC_INSTALL_DIR}
# Get the latest for your platform
wget http://boincdl.ssl.berkeley.edu/dl/boinc_5.10.21_i686-pc-linux-gnu.sh  
# Run whatever you downloaded, creating a sub-directory BOINC
boinc_5.10.21_i686-pc-linux-gnu.sh

3. Edit the 'binstall.sh' file.

cd ${BOINC_INSTALL_DIR}/BOINC
# make sure that 'boinc' works - I can't get any of the 64-bit versions 
# working because of mismatched libraries
./run_client
# quit out of the client
# remove the leading 'cd BOINC && ' from 'binstall.sh'
vi binstall.sh

4. Copy contents of BOINC into each node sub-directory.

cp -r ${BOINC_INSTALL_DIR}/BOINC/* ${BOINC_INSTALL_DIR}/boinc@node01
cp -r ${BOINC_INSTALL_DIR}/BOINC/* ${BOINC_INSTALL_DIR}/boinb@node02

5. Change directory and then run 'binstall.sh' in each node sub-directory.

cd ${BOINC_INSTALL_DIR}/boinc@node01; sh binstall.sh
cd ${BOINC_INSTALL_DIR}/boinc@node02; sh binstall.sh

6. Create a BOINC account. I strongly suggest one of the BOINC account managers such as GridRepublic or BAM! which allow for centralized control of the hosts you add to BOINC and the projects that they work on. Alternatively, you could create an account at each project's web site that you want to support.

7. It would be nice to automate this step, but I couldn't figure out a way to do it. Login to each node, go to that node's directory, run the client and manager to login to the account you created.

ssh -X node01
cd ${BOINC_INSTALL_DIR}/boinc@node01
./run_client
./run_manager
# In the BOINC Manager, login to the account created above and change any preferences.
# Verify that the node is working on the BOINC projects that you signed up for.
# IMPORTANT! Use the BOINC Manager to force the client to quit.
# Quit out of the manager
# logout
exit

ssh -X node02
cd ${BOINC_INSTALL_DIR}/boinc@node02
./run_client
./run_manager
# In the BOINC Manager, login to the account created above and change any preferences.
# Verify that the node is working on the BOINC projects that you signed up for.
# IMPORTANT! Use the BOINC Manager to force the client to quit.
# Quit out of the manager
# logout
exit

8. Create BOINC cluster queue (1 slot per node - BOINC automatically uses all processors by default) and add the nodes above to the queue.

qconf -aq boinc
# Replace 'NONE' on the 'hostlist' line with the nodes (or host groups) you want to run BOINC
 ...
 hostlist    node01 node02
 ...
# Change rerun to true.
 ...
 rerun       TRUE
 ...
# Might not be necessary, but I also removed the INTERACTIVE flag.
 ...
 qtype       BATCH
 ...

9. Sub-ordinate the BOINC cluster queue to any of the real work queues that you have and trigger the sub-ordinate suspension when there is 1 or more jobs in any real work queues.

qconf -mq dual_core
# 
 ...
 subordinate_list      boinc=1
 ...

10. Run the jobs. I wrote a little script to handle this, but you should only have to do this once, so up to you how to do it.

#!/bin/sh

BOINC_INSTALL_DIR=/mnt/boinc

for node in \
   node01 \
   node02 \
   ;
do
 node_path=${BOINC_INSTALL_DIR}/boinc@${node}
 cd ${node_path}
 qsub -q boinc@${node} -cwd -N ${node} /sjr/beodata/local/bin/vanilla_job.sh ${node_path}/run_client
done

11. Make sure that Grid Engine jobs do not use the boinc cluster queue. How to do this is specific to your cluster setup and use.

12. That should be it. Suspension of the sub-ordinated boinc queue suspends the 'run_client' job. At this point everything 'works', the node is utilized by BOINC until required by a gridengine job.

Suspend/Resume?

The sub-ordinate queue suspension might mean that BOINC work can be lost since the 'run_client' job is stopped and restarted. I created a short script below to suspend and resume (or restart as needed) the BOINC 'run_client'. Frankly, the following step may not be necessary - I haven't verified whether 'run_client' does the right thing or not when stopped/restarted.

#!/bin/sh

# suspends and resumes/restarts BOINC run_client

host=${1}
command=${2}

BOINC_INSTALL_DIR=/mnt/boinc

node_path=${BOINC_INSTALL_DIR}/boinc@${host}

cd ${node_path}

prj_urls=`boinc_cmd --get_project_status | grep 'master URL' | awk '{print $3}'`

if [ ${command} = 'resume' ]; then
   # if client already running this just errors out.
   ${node_path}/run_client
fi

for url in ${prj_urls};
do
   ${node_path}/boinc_cmd --project ${url} $2
done

And then modify the queue to use the above script for the suspend and resume method.

qconf -mq boinc
 ...
 suspend_method        /mnt/boinc/boinc_controller.sh $host suspend
 resume_method         /mnt/boinc/boinc_controller.sh $host resume
 ...
Personal tools