How to Submit a Job

In order to submit a job you have to create a script (that we indicate with SCRIPT_NAME.sh) with the commands you want to execute. Once you have the script, from majorana execute the command

qsub SCRIPT_NAME.sh

The script should be similar to this wrapper script (see below for a more complete script):

#!/bin/bash
#$ -N %NAME_OF_JOB%
#$ -cwd

In this case the job will be scheduled for execution in one node chosen from any node, and in one queue chosen from any queue of that node.

See Usage Policies and Queue Types before submitting jobs.

YOU CAN ONLY ADD JOBS COMPLYING WITH THE REQUIREMENTS OF THE SCHEDULER!

Selecting a queue

If you want, you can impose some restrictions on the kind of node or queue to choose from.

To select a node with Opteron 2216 CPUs (RACK 1), you must add to your script the line #$ -l opteron2216.
To select a node with Xeon E5410 CPUs (RACK 2), you must add to your script the line #$ -l xeon5410.
To select a node with Opteron 6128 CPUs (RACK 3), you must add to your script the line #$ -l opteron6128.
To select a node with Opteron 6272 CPUs (RACK 4), you must add to your script the line #$ -l opteron6272.
To select a node with Xeon 2680 CPUs (RACK 5), you must add to your script the line #$ -l xeon2680.

To select only among the short queues, you must add to your script the line:
#$ -l short
To select only among the long queues, you must add to your script the line:
#$ -l long

Enabling email reporting

From the manual:

The qsub -m command requests email to be sent to the user who submitted a job or to the email addresses specified by the -M flag if certain events occur. See the qsub(1) man page for a description of the flags. An argument to the -m option specifies the events. The following arguments are available:

  • b – Send email at the beginning of the job.
  • e – Send email at the end of the job.
  • a – Send email when the job is rescheduled or aborted (for example, by using the qdel command).
  • s – Send email when the job is suspended.
  • n – Do not send email. n is the default.

Use a string made up of one or more of the letter arguments to specify several of these options with a single -m option. For example, -m be sends email at the beginning and at the end of a job.

Thus, you can simply change your submit script like this:

#!/bin/bash
#$ -N NAME_OF_JOB
#$ -m as
#$ -M user1@ulb.ac.be,user2@gmail.com
#$ -cwd

This sends a mail to the two specified email addresses if the job is rescheduled or aborted. If you omit -M, the mail is send to the user who submitted the job.

A more advanced job script

This script moves the job-depended data to the nodes /tmp directory, executes the command without any terminal output and then moves all data back to the job directory on the user’s homedir. Job reporting will be done via email to the submit user.

#!/bin/bash
#$ -N NAME_OF_JOB
#$ -m a,s
#$ -cwd

USERNAME=`whoami`
TMPDIR=/tmp/$USERNAME/%NAME_OF_JOB%
JOBDIR=/home/$USERNAME/%DIR_OF_JOB%
COMMAND=/home/$USERNAME/%YOUR_COMMAND%

mkdir -p $TMPDIR
mv * $TMPDIR
cd $TMPDIR
$COMMAND &> /dev/null
RET=$?
mv * $JOBDIR
cd $JOBDIR
rmdir -p $TMPDIR &> /dev/null

exit $RET

The main advantage of this wrapper script is that it’s

  1. Redirects the terminal output to /dev/null, thus not writing it into your home directory
  2. Moving the job’s data to the nodes /tmp directory and back after the job finishes

That way, the job will not write all the time on your home directory. If it would do this, it would be effectively be a denial-of-service attack on majorana, as the home directories are hosted there.

In order to use the script, change the following:

  1. Replace %NAME_OF_JOB% with a name unique among all your submissions.
  2. Replace %DIR_OF_JOB% with the job’s directory in your userdir. Make sure you have one directory per job.
  3. Replace %COMMAND% with the full command line you want to execute.

Remember that the files remaining on /tmp directories are deleted after 10 days without being accessed, so you must move your data back to your home directory to keep them (and on a more long-term, backup them somewhere else…).

Remember that if you fill the /tmp directory, you will compromise the other users jobs. Therefore, please always include as a last line in your scripts something like “rm -rf $TMPDIR” in order to make sure that there’s no data left on the node after your job terminated. You can use the command ‘clean_temp_dirs.sh’ on the submit node to clean all /tmp directories of all nodes on any of your data that would be left on them. This is particurly usefull if your job crash and does not reach the line “rm -rf $TMPDIR”. If you fill up the /tmp somewhere, all your jobs will be killed without mercy, and all your data on all nodes erased immediately.

Grid Engine States

d (eletion) Indicates that a qdel has been used to initiate job deletion.
E (rror) Appears for pending jobs that couldn’t be started due to job properties. The reason for the job error is shown by the qstat -j job_list option.
h (old) Only for pending jobs. Indicates that a job currently is not eligible for execution due to a hold state (assigned via qhold, qalter, qsub -h).
r (unning) Indicates that a job is executing.
R (estarted) Indicates that the job was restarted. Can be caused by job migration or because of other reasons (see qsub -r).
s (uspended) Is caused by suspending the job via the qmod command.
S (uspended) Indicates that the queue containing the job is suspended and therefore the job is also suspended.
t (ransfering) Indicates that a job is about to be executed.
T (hreshold) Shows that at least one suspend threshold of the corresponding queue was exceeded (queue_conf).
w (aiting) Only for pending jobs. Indicates that the job is waiting for completion of a job dependencies (assigned via -hold_jid option of qsub or qalter).

See the Grid Engine’s Domumentation for more info.

Leave a Reply