What is a PBS script?
A PBS script is a shell script that tells the PBS client how to execute your
job. At the very simplest, it consists of a set of PBS commands, followed
by the shell commands you wish to run, followed by an "exit" command.
The very least you should know
The Cluster Resources web page has full documentation for Torque, including
several examples of how to use PBS scripts to submit jobs. The most
important are reproduced here. The Torque documentation may be found
here.
The basics of a PBS script
Just like a shell script, a PBS script must begin with:
#!/bin/bash
Any shell supported by the system (sh, bash, csh, ksh) will do. Then, at
the very least, you must give your job a name:
#PBS -N /jobname/
Usefully, the job name that you use can later be referenced by the
environment variable $PBS_JOBNAME. Once the shell and job name are defined,
enter the shell commands you wish to execute, followed by "exit":
/shell command 1/
/shell command 2/
...
exit 0
That's it! You're ready to run jobs on Solomon!
More advanced usage of the PBS script
In order to exercise control over your job, you may use the following PBS
flags in your PBS script. The first and most important flags are the
error and output targets, which can be designated by:
#PBS -e /errorfile/
#PBS -o /outputfile/
If any errors are encountered in the execution of your script, they will be
piped to /errorfile/. /outputfile/ should be empty, unless
your job was terminated for some reason (exceeded walltime, exceeded CPU
time, was terminated by the administrator ...)
Using the -m flag will force Torque to send you an e-mail under
various conditions, being [b]egin, [e]nd, and [a]bort:
#PBS -m [abe]
Using -l allows you to more carefully control the resources that
your job uses. The most important of these are nodes and
walltime. nodes gives control over what type, and how many,
nodes are being used. For instance:
#PBS -l nodes=2
specifies that two nodes are to be used. Using -l nodes=/nodename/
specifies that only a particular node may be used. Several options may be
strung together with a +, as in:
#PBS -l nodes=2+node4
The nodes keyword has additional functionality by which nodes
may be sorted and selected, but, as the nodes on Solomon are identical,
this is not useful here.
walltime allows you to change the maximum elapsed time which a
job may take on the compute nodes; the default is one hour. walltime
of two hours is invoked by:
#PBS -l walltime=2:00:00
Last but not least, the -V flag pipes your environment variables to
the PBS server, so that your script can use important things like your path,
and the $SANDIA directory.
Using resources effectively and wisely
...depending on what you want
Issues with our previous cluster, Flare, have raised an interesting question
regarding cluster task management: "Where is the best place to run my job?"
Fortunately, using PBS makes choosing where to run your job somewhat
simpler. In the old days, we simply used to run jobs by logging in to
whichever slave node we happened to find available; we knew our files
existed because all home directories were mounted on the Net File System.
Solomon is set up in the same way, so it is feasible to do something like
this. A PBS script written to run a job in your home directory might look
something like this:
#!/bin/csh
#PBS -N myjob
#/some list of PBS commands/
premix-d
exit 0
This job would use the cklink and files that already exist
in your home directory. This may not be the best way to run jobs, however;
among other things, it creates a high
network load for processes (such as Premix) which write to and from disk
frequently. For this reason, one might take advantage of shell variables
and write a PBS script that looks like this:
#!/bin/csh
#PBS -N myjob
...
set PBS_WORKDIR = /tmp/$PBS_JOBNAME
cp cklink $PBS_WORKDIR
cp tplink $PBS_WORKDIR
cp inp #PBS_WORKDIR
premix-d
cp f.out ~/
cp recover ~/
This script would execute in the directory /tmp/myjob on the node to which
it was submitted. It has the disadvantage that, if your job terminates, any
files will be abandoned in /tmp on the node you were using. It is
also possible to use PBS to do interesting things, such as:
#!/bin/csh
...
set PBS_WORKDIR = ~/$PBS_JOBNAME
...
which would guarantee that your job were placed in an appropriately-named
directory in your home file system. This should work well, because Solomon
is designed for high data throughput to the NFS-mounted drive.
Closing Remarks
This should be enough to get you running on PBS. Happy computing!