Current Status: [Docs] | [Job Process Info] | [Job Queue] | [Cluster Usage & Load]
The TTIC slurm cluster is a pool of machines, many with modern GPUs, to which users may submit compute jobs through the slurm scheduler.
Much of the cluster infrastructure relies on users monitoring their own jobs and usage while being careful about adhering to policy, rather than automated scripts that kill jobs or block accounts for infractions. We request that you respect this collegial arrangement, and be careful that your usage of the cluster adheres to the following set of guidelines:
Use the head node (i.e., slurm.ttic.edu) only to submit and monitor jobs. Do not run any computationally intensive processes (including installing software). You should use an interactive job for such things.
You can explicitly request a certain number of cores / GPUs for
your jobs to use when you submit them (with the -c
option, see
the next section). The scheduler will set up environment variables
for your job so that most common computing packages (OpenMP, MKL, CUDA)
will restrict their usage to only the assigned resources. But it is
possible for jobs to cross these limits inadvertently (and of course,
deliberately) when using a non-standard package or library. It is
your responsibility to make sure this does not happen. We strongly
recommend that as your jobs are running, you monitor their processes on
the slurm website's PS page, and
ensure they aren't exceeding their assigned resource allocations.
For GPU jobs, it is important that your code respect the
CUDA_VISIBLE_DEVICES
environment variable (most CUDA programs
do this automatically), and uses its assigned GPU on a
node. Also, a GPU job should not use more than 3 CPU cores per
assigned GPU. So if your job asked for n GPUs, it should use
its assigned n GPUs, and up to 3n CPU cores on the node.
Your job shouldn't use any program or library that uses a nohup
system call. Programs that do this include screen
and tmux
. You
may use screen
and tmux
on the head-node, but not on any
compute nodes (these programs aren't available by default on the
compute nodes). Any process escaping the slurmstepd process will be
killed.
We generally discourage the use of interactive jobs, but recognize that they are necessary in some workflows (for example, for compilation and initial testing of programs). However, we find that with interactive jobs, users often confuse which machine they are on, and either (a) confuse the head node for a compute node and start running their jobs on the head node, which slows it down and makes it difficult or impossible for other users to submit their jobs; or (b) confuse compute nodes for the head node and use them to submit jobs, and take up a slot on the compute node that remains idle. If you do use interactive jobs, please keep track of which machine you are on!
Scratch space: if your jobs need to repeatedly read and write large
files from disk, we ask that you use fast temporary local storage
(4T SSD) on the compute nodes, and not your NFS-shared home
directories. Scratch space is available in /scratch
on all compute
nodes. We also request that you delete all temporary files when you are
done with them, and also organize them in a subdirectory with your user or
group name.
However, if there is some dataset that you expect to use multiple times, you should leave it in the temporary directory rather than transferring it at the beginning of every job. Your job could check for the presence of the dataset and copy it from a central location only if its absent. Optionally, if this is a large dataset that you expect to use over a period of time, you can ask the IT Director to place it on all (or a subset of) compute nodes.
All jobs are submitted by logging in via ssh
to the head node
slurm.ttic.edu
which, along with all compute nodes, mounts your NFS
home directory. Jobs are run by submitting them to the slurm
scheduler, which then executes them on one of the compute nodes.
In this section, we provide information to get you started with using the scheduler, and about details of the TTIC cluster setup. For complete documentation, see the man pages on the slurm website.
All jobs in slurm are submitted to a partition---which defines whether the submission is a GPU or CPU job, the set of nodes it can be run on, and the priority it will have in the queue. Different users will have access to different partitions (based on the group's contributions to the cluster) as noted below:
cpu
, gpu
: The public partitions for CPU and GPU jobs that are
available to all users. These partitions only contain a subset of
all machines in the cluster.
contrib-gpu
: Partitions available to members of contributing
groups that contain all nodes in the cluster with a
baseline priority level.
<group>-gpu
: Partitions associated with a particular group, with
access to all nodes at an enhanced priority level, based on the
arrangement under which that group contributed
resources to the cluster.
You can run the sinfo
command to see which partitions you have
access to. Please consult with your faculty adviser (or the IT
Director) if you need access to other partitions.
All of the above partitions have a strict time limit of 4 hours
per job, and jobs that do not finish in this time will be
killed. However, the cluster also has -long
partitions that contain
a subset of nodes that allow for longer running jobs, with the limit
of 4 days. To use this longer time-limit, submit your jobs to
partitions: cpu-long
, contrib-gpu-long
, or <group>-gpu-long
as appropriate.
Currently each cpu core selected has roughly 4GB of memory. If you
need more than that either ask for more cores or use the -C highmem
constraint which will a only run on the nodes with at least 192G of
memory.
The primary way to submit jobs is through the command sbatch
. In
this regime, you write the commands you want executed into a script
file (typically, a bash script). It is important that the first line
of this file is a shebang line to the script interpreter: in most
cases, you will want to use #!/bin/bash
.
The sbatch
command also takes the following options (amongst others):
-p name
: Partition name. e.g., -p contrib-gpu
.-cN
: Number N of cpu cores (for cpu partition jobs) or GPUs (for
gpu partition jobs) that the job will use. e.g., -c1
-C feature
: Optional parameter specifying that you want your job
only to run on nodes with specific features. In our current setup,
this is used for GPU jobs to request a specific kind of GPU, or a
minimum amount of GPU memory. See the complete listing of nodes and
features for details.There are many others options you can pass to sbatch to customize your jobs (for example, to submit array jobs, to use MPI, etc.). See the sbatch man page.
You can submit array jobs using the python script below. You must provide an input file that contains the commands you wish to run for each job on a single line and a partition on which to run the job. The script will then package up a batch-commands-$.txt and sbatch-script-$.txt splitting your input file into batches of 5000 if necessary. You then submit the job by running sbatch sbatch-script-$.txt
. You can also optionally supply a job name and constraint with -J and -C respectively.
#!/usr/bin/env python
import argparse
parser = argparse.ArgumentParser(description='TTIC SLURM sbatch script creator')
parser.add_argument('INPUT_FILE', help='Input file with list of commands to run')
parser.add_argument('PARTITION', help='Name of partition to use')
parser.add_argument('-C', '--constraint', help='Constraint to use')
parser.add_argument('-J', '--job-name', help='Name of the job')
args = parser.parse_args()
def gen_sbatch_end(constraint, job_name):
if constraint and job_name:
sbatch_end = ' --constraint=' + args.constraint + ' --job-name=' + args.job_name
elif constraint:
sbatch_end = ' --constraint=' + args.constraint
elif job_name:
sbatch_end = ' --job-name=' + args.job_name
else:
sbatch_end = ''
return sbatch_end
file_in = open(args.INPUT_FILE, 'r')
lines = file_in.readlines()
count = 0
commands = []
while count < len(lines):
if count % 5000 == 0 and count > 0:
index = count / 5000
file_out = open('batch-commands-' + str(index) + '.txt', 'w')
for i in commands:
file_out.write(i.strip() + '\n')
file_out.close()
file_out = open('sbatch-script-' + str(index) + '.txt', 'w')
file_out.write('#!/bin/bash\n')
sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
file_out.write('#SBATCH --partition=' + args.PARTITION + ' --cpus-per-task=1 --array=1-' + str(len(commands)) + sbatch_end + '\n')
file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-'+str(index)+'.txt'+'`"')
file_out.close()
commands = []
commands.append(lines[count])
count += 1
file_out = open('batch-commands-last.txt', 'w')
for i in commands:
file_out.write(i.strip() + '\n')
file_out.close()
file_out = open('sbatch-script-last.txt', 'w')
file_out.write('#!/bin/bash\n')
sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
file_out.write('#SBATCH --partition=' + args.PARTITION + ' --cpus-per-task=1 --array=1-' + str(len(commands)) + sbatch_end + '\n')
file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-last.txt'+'`"')
file_out.close()
While we recommend that you try to use batch jobs for majority of
tasks submitted to the cluster, it may be necessary to run programs
interactively occasionally to set-up your experiments for the first
time. You can use the srun
command to request an interactive
shell on a compute node.
Call srun
with the same options as sbatch
above to specify partition, number of cores, etc., followed by the option --pty bash
. For example, to request a shell with a single gpu with atleast 11GB of memory on the gpu
partition, run
srun -p gpu -c1 -C 11g --pty bash
Note that interactive jobs are subject to the same time limits and priority as batch jobs, which means that you might have to wait for your job to be scheduled, and that your shell will be automatically killed after the time limit expires.
Let's say you have split up your job into a series of three script files called: optimize_1.sh, optimize_2.sh, optimize_3.sh --- each of which runs under the cluster's time limit, and picks up from where the last left off. You can request that they be executed as separate jobs in sequence on the cluster.
Pick a unique "job name" for the sequence, (let's say
"series_A"). Then, just submit the three batch jobs in series using
sbatch, with the additional command parameters -J series_A -d
singleton
. For example:
sbatch -p gpu -c1 -J series_A -d singleton optimize_1.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_2.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_3.sh
All three jobs will be immediately added to the queue, and if there are slots free, optimize_1.sh will start running. But optimize_2.sh will NOT start until the first job is done, and similarly, optimize_3.sh will only be started after the other two jobs have ended. Note that there is no guarantee that they will start on the same machine.
The singleton dependency essentially requires that previously submitted jobs with the same name (by the same user) have finished. There is a caveat however---the subsequent job will even be started if the previous job failed, or was killed (for example, because it overshot the time limit). So your scripts should be robust to the possibility that the previous job may have failed.
Note that you can have multiple such sequences running in parallel by giving them different names.
Once your jobs have been scheduled, you can keep an eye on them using both command line tools on the login host, as well as on the cluster website http://slurm.ttic.edu/. At the very least, you should monitor your jobs to ensure that their processor usage is not exceeding what you requested when submitting these jobs.
The cluster website provides you with a listing of scheduled and waiting jobs in the cluster queue, shows you statistics of load on the cluster, as well as provides details (from the output of ps
and nvidia-smi
) of processes corresponding to jobs running on the cluster.
You can also use the slurm command line tool squeue
to get a list of jobs in the queue (remember to call it with the -a
option to see all jobs, including those in other groups' partitions that you may not have access to). To get a listing like the output on the website, which organizes job sequences into single entries, you can run xqueue.py
.
Finally, use the scancel
to cancel any of your running or submitted jobs. See the scancel man page for details on how to call this command. In particular, if you are using job sequences, you can use -n series_name
option to cancel all jobs in a sequence.
Nodes that are public are available to all users, other nodes are only available to groups who have contributed resources to the cluster
Nodes gpu[30-42] are exclusive access nodes that are not part of any other partitions
Node-name | Public | Cores | RAM | GPU(s) | GPU Type | Feature labels |
---|---|---|---|---|---|---|
cpu0 | Y | 8 | 24G | - | - | |
cpu1 | Y | 8 | 24G | - | - | |
cpu2 | Y | 8 | 24G | - | - | |
cpu3 | Y | 8 | 24G | - | - | |
cpu4 | Y | 8 | 24G | - | - | |
cpu6 | Y | 8 | 24G | - | - | |
cpu7 | Y | 8 | 24G | - | - | |
cpu8 | Y | 8 | 48G | - | - | avx |
cpu9 | Y | 8 | 48G | - | - | avx |
cpu10 | Y | 8 | 48G | - | - | avx |
cpu11 | Y | 8 | 48G | - | - | avx |
cpu12 | Y | 12 | 48G | - | - | |
cpu13 | Y | 12 | 48G | - | - | |
cpu14 | Y | 12 | 48G | - | - | |
cpu15 | Y | 12 | 48G | - | - | |
cpu16 | Y | 12 | 64G | - | - | avx |
cpu17 | Y | 12 | 128G | - | - | avx |
cpu18 | Y | 20 | 128G | - | - | avx |
cpu19 | Y | 12 | 128G | - | - | avx |
cpu20 | Y | 64 | 256G | - | - | avx,highmem |
cpu21 | Y | 256 | 1024G | - | - | avx,highmem |
gpu0 | N | 24 | 192G | 4 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu1 | N | 20 | 256G | 4 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu2 | N | 20 | 192G | 4 | A4000 | 11g,12g,16g,a4000,highmem,avx |
gpu3 | N | 24 | 192G | 4 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu4 | N | 20 | 192G | 4 | A4000 | 11g,12g,16g,a4000,highmem,avx |
gpu5 | N | 16 | 256G | 4 | A4000 | 11g,12g,16g,a4000,highmem,avx |
gpu6 | Y | 16 | 256G | 4 | Titan V | 11g,12g,titanv,highmem,avx |
gpu7 | N | 20 | 192G | 4 | Titan RTX | 11g,12g,24g,titanrtx,highmem,avx |
gpu10 | N | 48 | 1024G | 8 | A6000 Ada | 11g,12g,16g,24g,48g,a6000ada,highmem,avx |
gpu11 | N | 20 | 192G | 8 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu12 | N | 48 | 1024G | 8 | A6000 | 11g,12g,16g,24g,48g,a6000,highmem,avx |
gpu13 | Y | 48 | 1024G | 8 | A4000 | 11g,12g,16g,a4000,highmem,avx |
gpu14 | N | 20 | 384G | 8 | A6000 | 11g,12g,16g,24g,48g,a6000,highmem,avx |
gpu15 | Y | 20 | 384G | 8 | A6000 | 11g,12g,16g,24g,48g,a6000,highmem,avx |
gpu16 | N | 48 | 1024G | 8 | A6000 Ada | 11g,12g,16g,24g,48g,a6000ada,highmem,avx |
gpu17 | N | 20 | 192G | 8 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu18 | N | 20 | 192G | 8 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu19 | N | 20 | 192G | 8 | 2080 Ti | 11g,2080ti,highmem,avx |
gpu20 | N | 20 | 192G | 8 | A6000 | 11g,12g,24g,48g,a6000,highmem,avx |
gpu21 | N | 20 | 256G | 10 | A4000 | 11g,12g,16g,a4000,highmem,avx |
gpu30 | N | 6 | 64G | 2 | Titan X Pascal | |
gpu31 | N | 4 | 64G | 2 | RTX 8000 | |
gpu32 | N | 4 | 64G | 2 | A5000 | |
gpu33 | N | 8 | 128G | 2 | RTX 6000 | |
gpu34 | N | 8 | 64G | 2 | RTX 6000 | |
gpu35 | N | 4 | 64G | 2 | 2080 Ti | |
gpu36 | N | 8 | 64G | 2 | A5500 | |
gpu37 | N | 4 | 64G | 2 | 1080 Ti | |
gpu38 | N | 4 | 64G | 2 | A5000 | |
gpu39 | N | 8 | 128G | 2 | 2080 Ti | |
gpu40 | N | 20 | 1536G | 8 | A6000 | |
gpu41 | N | 20 | 384G | 8 | A6000 | |
gpu42 | N | 12 | 32G | 1 | 2080 Ti |
Because users are limited to 20G home directories Miniconda is preferred.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/mc3
rm Miniconda3-latest-Linux-x86_64.sh
eval "$($HOME/mc3/bin/conda 'shell.bash' 'hook')"
You will want to run the last command every time you want to start working within miniconda. Installing this way (skipping a bashrc auto initialization) will keep your logins quick by not loading and scanning files unnecessarily.
First you will need to install jupyter notebook. Here are a couple of of options. The examples below will be using the virtualenv option.
virtualenv ~/myenv # create the virtualenv
. ~/myenv/bin/activate # activate the env
pip install --upgrade pip # it's always a good idea to update pip
pip install jupyter # install jupyter
You can run the jupyter notebook as either an interactive or batch job.
We recommend setting your jupyter environment variables so that they are not located on NFS directories and instead use node local /scratch space.
mkdir -p /scratch/$USER/jupyter
export JUPYTER_CONFIG_DIR=/scratch/$USER/jupyter
export JUPYTER_PATH=/scratch/$USER/jupyter
export JUPYTER_DATA_DIR=/scratch/$USER/jupyter
export JUPYTER_RUNTIME_DIR=/scratch/$USER/jupyter
export IPYTHONDIR=/scratch/$USER/ipython
srun --pty bash
run an interactive job
. ~/myenv/bin/activate
activate virutal env
unset XDG_RUNTIME_DIR
jupyter tries to use the value of this environment
variable to store some files, by defaut it is set to '' and that causes errors
when trying to run juypter notebook.
export NODEIP=$(hostname -i)
get the ip address of the node you are using
export NODEPORT=$(( $RANDOM + 1024 ))
get a random port above 1024
echo $NODEIP:$NODEPORT
echo the env var values to use later
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
start the jupyter notebook
Make a new ssh connection with a tunnel to access your notebook
ssh -N -L 8888:$NODEIP:$NODEPORT user@slurm.ttic.edu
using the values not variables
This will make an ssh tunnel on your local machine that fowards traffic sent to localhost:8888 to $NODEIP:$NODEPORT via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
Open your local browser and visit: http://localhost:8888
The process for a batch job is very similar.
jupyter-notebook.sbatch
#!/bin/bash
unset XDG_RUNTIME_DIR
NODEIP=$(hostname -i)
NODEPORT=$(( $RANDOM + 1024))
echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT `whoami`@slurm.ttic.edu"
. ~/myenv/bin/activate
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
Check the output of your job to find the ssh command to use when accessing your notebook.
Make a new ssh connection to tunnel your traffic. The format will be something like:
ssh -N -L 8888:###.###.###.###:#### user@slurm.ttic.edu
This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
Open your local browser and visit: http://localhost:8888
These are the commands to install the current stable version of pytorch. In this example we are using /scratch, though in practice you may want to install it in a network location. The total install is 11G which means that installing in your home directory in not recommended.
# getting an interactive job on a gpu node
srun -p contrib-gpu --pty bash
# creating a place to work
export MYDIR=/scratch/$USER/pytorch
mkdir -p $MYDIR && cd $MYDIR
# installing miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $MYDIR/mc3
rm Miniconda3-latest-Linux-x86_64.sh
# activating the miniconda base environment (you will need to run this before using pytorch in future sessions).
eval "$($MYDIR/mc3/bin/conda 'shell.bash' 'hook')"
# install pytorch, with cuda 11.8
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# test (should return True)
python -c "import torch; print(torch.cuda.is_available())"
When using tensorflow it will not respect common environmental variables to restrict the number of threads in use. If you add the following code to your tensorflow setup if will respect the correct number of threads requested with the -c option.
import os
NUM_THREADS = int(os.environ['OMP_NUM_THREADS'])
sess = tf.Session(config=tf.ConfigProto(
intra_op_parallelism_threads=NUM_THREADS,
inter_op_parallelism_threads=NUM_THREADS))
If you are a faculty member at TTIC, we would like to invite you to contribute machines or hardware to the cluster. A rather high-level outline of what it would mean to you is below.
The basic principle we intend to follow is that the setup should provide people, on average, with access to more resources than they have on their own, and to manage these pooled resources to maximize efficiency and throughput.
If you contribute hardware, you will gain access to the entire cluster (see description of partitions above), and be given a choice between two options for high-priority access:
You (which means you and any users you designate as being in your "group") will be guaranteed access to your machines within a specified time window from when you request it (the window is 4 hours, i.e., the time-limit for any other users' job that may be running on your machine). Once you get this access, you can keep it as long as you need it. Effectively, you decide when you want to let others use your machines, with a limited waiting period when you want them back.
You give up the ability to guarantee on-demand access to your specific machines, in exchange for a higher priority for your jobs on the entire cluster. You still can reserve your machines for up to four weeks per year (possibly in installments of at least a week each time).
As noted above, the priority of one's jobs is affected by a weighted combination of waiting time, user priority (higher for members of a group that has contributed more equipment), and fair share (lower for users who have recently made heavy use of the cluster).