Slaton Lipscomb

Slaton's Tips:
Using IMAGIC/MPI On A PBS Linux Cluster

Last updated October 14, 2004
Comments/corrections welcome.

IMAGIC is a product of Image Science Software GmbH in Berlin, Germany.

Three of the most CPU-intensive IMAGIC programs are available for parallel processing: MULTI-REFERENCE-ALIGNMENT, TRUE-THREED-RECONSTRUCTION, and ANGULAR-RECONSTITUTION. These programs use the message passing interface (MPI) software for job distribution over a cluster. The MPI software which IMAGIC uses is the Argonne National Laboratory implementation MPICH.

For the purposes of these instructions, I will create and submit an IMAGIC job on a Linux machine with hostname node01.cluster.edu, which is a member of a Linux cluster. When you see these values referred to, you will need to substitute the IMAGIC version you are actually installing, and the hostname of the computer you are installing on. The IMAGIC version number is always indicated by an empty file entitled version_xxxxxx in the source directory.

It is assumed that IMAGIC and MPICH are already installed and configured to use the PBS job scheduling system used on the cluster, as described here. The MPICH package should be configured to use ssh instead of rsh for launching threads. A batchmode ssh key must be available. However, instructions for creating an OpenSSH batchmode key are included in this guide.

NOTE:   The MULTI-REFERENCE-ALIGNMENT program is also available for multi-processor machines in the way that it forks child processes (within a "C" interface). This version can handle up to 16 CPUs on one shared-memory machine (the number of processes is specified in the imagic.drv file). I'm unclear as to whether this implementation is carried out using OpenMP directives, or is implemented entirely by Image Science.

1   If you have not already done so, you must generate an ssh key with a blank passphrase so that mpich can use ssh instead of rsh for launching parallel threads. Then this key needs to be added to authorized_keys. Red Hat Linux comes with OpenSSH, so we need to do this the OpenSSH way.

$ ssh-keygen -q -C BatchModeKey -t rsa -f ~/.ssh/batchmode -N ""
$ cat ~/.ssh/batchmode.pub >> ~/.ssh/authorized_keys

2   You also need to edit your user ssh config file, ~/.ssh/config, and specify that this batchmode key is to be used for connections to the local machine (only). This is done with the Host keyword. Add a section like the following to the TOP of the config file. It must be above the Host * wildcard section, if it exists.

Host node01.cluster.edu node01
  IdentityFile ~/.ssh/batchmode

IMAGIC determines the local machine's hostname according to the output of uname -n, so make sure the same name is used here. In this case node01 is just a short convenience alias.

3   To test the ssh configuration, the following command should give a listing of the user's home directory, without asking for a password or passphrase.

$ ssh node01 ls

If a password is requested, or an error results, the configuration is incorrect. Go through the steps above again.

Once a batchmode key has been created and configured, you do not need to repeat this step for future IMAGIC jobs.

1   First, start IMAGIC and give the command:

IMAGIC-COMMAND: mode-accu

IMAGIC will reply with:

IMAGIC-COMMAND (ACC.) : 

2   Now you can use any IMAGIC command(s) you want.

IMPORTANT   When you select the MULTI-REFERENCE-ALIGNMENT, ANGULAR-RECONSTITUTION, or TRUE-THREED-RECONSTRUCTION programs, you will be asked whether you wish to use MPI parallelisation. Be sure to answer yes.

Use MPI parallelisation                  : yes

3   When you are finished, use the command MODE-STOP. IMAGIC will prompt you for a name, and then create a batch file in the current directory, using the specified name with the extension .b.

IMAGIC-COMMAND (ACC.) : mode-stop

Give file name for command file [bigjob] : mrajob

Ordinarily, to run your job from the shell you would now use the command:

$ ./mrajob.b

However, in order to use more than one node on our Linux cluster, we will need to first add some instructions for the PBS job scheduling system.

Our original batchfile mrajob.b will look something like the following. For this example, I am performing a multi-reference alignment.

#!/bin/csh -f
setenv IMAGIC_BATCH
echo "!  "
echo "! ---------------- IMAGIC ACCUMULATE FILE----------------"
echo "! "
echo "! IMAGIC-PROGRAM : align:mralign.e "
echo "! "
mpirun -np 10 /usr/local/imagic-030123/align/mralign.e_mpi <<EOF
YES
[...other IMAGIC commands...]

Before PBS can schedule and run our IMAGIC batch, several pieces of information need to be added to it. This includes a number of PBS directives, as well as support for the dynamic list of available cluster nodes that PBS maintains.

1   The PBS directives are added immediately following the first line (#!/bin/csh -f) of the batch.

Add PBS directives to define the resources to be used for the job. Here we are specifying the number of nodes to use (nodes), number of processers per node (ppn), and maximum runtime (walltime).

#PBS -l nodes=10:ppn=1,walltime=10:00:00

2   Add a PBS directive to name our queue submission.

#PBS -N mrajob

3   Add a PBS directive to redirect stdout and stderr (that is, all of our job's output) to a couple of named files.

#PBS -o mrajob.out
#PBS -e mrajob.err

4   Add a PBS directive to indicate that all PBS environment variables should be exported to (and available to) the job.

#PBS -V

5   Add the following line, between the last echo statement and the mpirun statement.

cd $PBS_O_WORKDIR

6   Insert the following into the mpirun line, directly following the word mpirun.

-machinefile $PBS_NODEFILE

Here is our batchfile, with all of the PBS inclusions correctly added, in yellow.

#!/bin/csh -f
#PBS -l nodes=10:ppn=1,walltime=10:00:00
#PBS -N mrajob
#PBS -o mrajob.out
#PBS -e mrajob.err
#PBS -V
setenv IMAGIC_BATCH
echo "!  "
echo "! ---------------- IMAGIC ACCUMULATE FILE----------------"
echo "! "
echo "! IMAGIC-PROGRAM : align:mralign.e " 
echo "! "
cd $PBS_O_WORKDIR
mpirun -machinefile $PBS_NODEFILE -np 10 \
    /usr/local/imagic-030123/align/mralign.e_mpi <<EOF
YES
[...other IMAGIC commands...]

Keep a copy of your original mrajob.b, and save the new version as mrajob_pbs.b.

7   Alternatively, you may use my bash shell script imagic2pbs to do steps 1 through 6 automatically. It reads in your MO-AC IMAGIC batchfile, and writes a PBS-enabled batchfile. Use redirection to write the output to a file. For example:

$ imagic2pbs -i mrajob.b -o mrajob.pbs

Download imagic2pbs using the link below, unzip it, and place into a directory that is in your PATH, such as /usr/local/bin. Make sure it has the execution bits set (eg. chmod +x imagic2pbs).

    « download imagic2pbs.gz »

1   Submit your batch to PBS using the qsub command.

$ qsub ./mrajob.pbs

1   The qstat command is used to request the status of jobs and queues. Use the -u flag to only display your jobs.

$ qstat -u slaton

node01.cluster.edu:
                                                Req'd  Req'd Elap
Job ID         Username Queue    Jobname    NDS Memory Time  Time
-------------- -------- -------- ---------- --- ------ ----- -----
32191.node01.c slaton   dque     mrajob      20    --  32:00 20:10

For more verbose job information, use qstat -f.

2   pbstop is a very useful command for checking on available cluster resources and monitoring your job's progress. It is analogous to the UNIX top command.

$ pbstop

Usage Totals: 66/120 Procs, 33/60 Nodes, 3/3 Jobs Running
Node States:    23 job-exclusive         27 offline

CPU 0
        1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0
        ---------------------------------------------------------------
node001 % % % % % % % % % %   % % % % % % % % % %   S S S S S S S S S S
node031 % % % % % % A % A A   A A A A A A A . . .   . . . . . . . H H H
        ---------------------------------------------------------------


      Job#  Username  Queue    Jobname    Nodes   S  Elapsed/Requested
  S = 32191 slaton    dque     mrajob        10   R    10:17/32:00
  A = 32193 andres    dque     job           10   R    04:59/32:00
  H = 32194 hongwei   dque     job            3   R    04:54/32:00

  [?] unknown  [@] busy  [*] down  [.] idle  [%] offline  [!] other