TTIC Slurm Cluster: Usage & Guidelines

Current Status: [Docs] | [Job Process Info] | [Job Queue] | [Cluster Usage & Load]

The TTIC slurm cluster is a pool of machines, many with modern CUDA-capable GPUs, to which users may submit compute jobs through the slurm scheduler.

Table of Contents

General Guidelines

Much of the cluster infrastructure relies on users monitoring their own jobs and usage and being careful about adhering to policy, rather than automated scripts that kill jobs or block accounts for infractions. We request that you respect this collegial arrangement, and be careful that your usage of the cluster adheres to the following set of guidelines:

  1. Use the head node (i.e., slurm.ttic.edu) only to submit and monitor jobs. Do not run any computationally intensive processes (including compilation of large packages) on them.

  2. You can explicitly request a certain number of cores / GPUs for your jobs to use when you submit them (with the -c option, see the next section). The scheduler will set up environment variables for your job so that most common computing packages (MATLAB, OpenMP, MKL, CUDA) will restrict their usage to only the assigned resources. But it is possible for jobs to cross these limits inadvertently (and of course, deliberately) when using a non-standard package or library. It is your responsibility to make sure this does not happen. We strongly recommend that as your jobs are running, you monitor their processes on the slurm website's PS page, and ensure they aren't exceeding their assigned resource allocations.

    For GPU jobs, it is important that your code respect the CUDA_VISIBLE_DEVICES environment variable (most CUDA programs do this automatically), and uses its assigned GPU on a node. Also, a GPU job should not use more than 2 CPU cores per assigned GPU. So if your job asked for n GPUs, it should use its assigned n GPUs, and upto 2n CPU cores on the node.

  3. Your job shouldn't use any program or library that uses a nohup system call. Programs that do this include screen and tmux. You may use screen and tmux on the head-node, but not on any compute nodes (these programs aren't available by default on the compute nodes). It is crucial that you don't have any processes running on the compute node after your job has finished.

  4. We generally discourage the use of interactive jobs, but recognize that they are necessary in some workflows (for example, for compilation and initial testing of programs). However, we find that with interactive jobs, users often confuse which machine they are on, and either (a) confuse the head node for a compute node and start running their jobs on the head node, which slows it down and makes it difficult or impossible for other users to submit their jobs; or (b) confuse compute nodes for the head node and use them to submit jobs, and take up a slot on the compute node that remains idle. If you do use interactive jobs, please keep track of which machine you are on!

  5. Scratch space: if your jobs need to repeatedly read and write large files from disk, we ask that you use temporary local storage on the compute nodes, and not your NFS-shared home directories. Scratch space is available in /scratch on all compute nodes. We also request that you delete all temporary files when you are done with them, and also organize them in a subdirectory with your user or group name.

    However, if there is some dataset that you expect to use multiple times, you should leave it in the temporary directory rather than transferring it at the beginning of every job. Your job could check for the presence of the dataset and copy it from a central location only if its absent. Optionally, if this is a large dataset that you expect to use over a period of time, you can ask Adam to place it on all (or a subset of) compute nodes.

Submitting jobs

All jobs are submitted by logging in via ssh to the head node slurm.ttic.edu which, along with all compute nodes, mounts your NSF home directory. Jobs are run by submitting them to the slurm scheduler, which then executes them on one of the compute nodes.

In this section, we provide information to get you started with using the scheduler, and about details of the TTIC cluster setup. For complete documentation, see the man pages on the slurm website.

Understanding Partitions

All jobs in slurm are submitted to a partition---which defines whether the submission is a GPU or CPU job, the set of nodes it can be run on, and the priority it will have in the queue. Different users will have access to different partitions (based on the group's contributions to the cluster) as noted below:

Please consult with your faculty adviser (or the IT director) to determine which partitions you have access to.

All of the above partitions have a strict time limit of 4 hours per job, and jobs that do not finish in this time will be killed. However, the cluster also has -long partitions that contain a subset of nodes that allow for longer running jobs, with the limit of 4 days. To use this longer time-limit, submit your jobs to partitions: contrib-cpu-long, contrib-gpu-long, <group>-cpu-long, or <group>-gpu-long as appropriate.

Memory Usage

Currently each cpu core selected has roughly 5GB of memory. If you need more than that either ask for more cores or use the -C highmem constraint which will a only run on the nodes with 256G of memory.

Batch Jobs

The primary way to submit jobs is through the command sbatch. In this regime, you write the commands you want executed into a script file (typically, a bash script). It is important that the first line of this file is a shebang line to the script interpreter: in most cases, you will want to use #!/bin/bash.

The sbatch command also takes the following options (amongst others):

There are many others options you can pass to sbatch to customize your jobs (for example, to submit array jobs, to use MPI, etc.). See the sbatch man page.

Array Jobs

You can submit array jobs using the python script below. You must provide an input file that contains the commands you wish to run for each job on a single line and a partition on which to run the job. The script will then package up a batch-commands-$.txt and sbatch-script-$.txt splitting your input file into batches of 5000 if necessary. You then submit the job by running sbatch sbatch-script-$.txt. You can also optionally supply a job name and constraint with -J and -C respectively.

#!/usr/bin/env python

import argparse

parser = argparse.ArgumentParser(description='TTIC SLURM sbatch script creator')
parser.add_argument('INPUT_FILE', help='Input file with list of commands to run')
parser.add_argument('PARTITION', help='Name of partition to use')
parser.add_argument('-C', '--constraint', help='Constraint to use')
parser.add_argument('-J', '--job-name', help='Name of the job')

args = parser.parse_args()

def gen_sbatch_end(constraint, job_name):
  if constraint and job_name:
    sbatch_end = ' --constraint=' + args.constraint + ' --job-name=' + args.job_name
  elif constraint:
    sbatch_end = ' --constraint=' + args.constraint
  elif job_name:
    sbatch_end = ' --job-name=' + args.job_name
  else:
    sbatch_end = ''
  return sbatch_end

file_in = open(args.INPUT_FILE, 'r')
lines = file_in.readlines()

count = 0
commands = []
while count < len(lines):
  if count % 5000 == 0 and count > 0:
    index = count / 5000
    file_out = open('batch-commands-' + str(index) + '.txt', 'w')
    for i in commands:
      file_out.write(i.strip() + '\n')
    file_out.close()
    file_out = open('sbatch-script-' + str(index) + '.txt', 'w')
    file_out.write('#!/bin/bash\n')
    sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
    file_out.write('#SBATCH --partition=' + args.PARTITION + ' --cpus-per-task=1 --array=1-' + str(len(commands)) + sbatch_end + '\n')
    file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-'+str(index)+'.txt'+'`"')
    file_out.close()
    commands = []
  commands.append(lines[count])
  count += 1

file_out = open('batch-commands-last.txt', 'w')
for i in commands:
  file_out.write(i.strip() + '\n')
file_out.close()
file_out = open('sbatch-script-last.txt', 'w')
file_out.write('#!/bin/bash\n')
sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
file_out.write('#SBATCH --partition=' + args.PARTITION + ' --cpus-per-task=1 --array=1-' + str(len(commands)) + sbatch_end + '\n')
file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-last.txt'+'`"')
file_out.close()

Interactive jobs

While we recommend that you try to use batch jobs for majority of tasks submitted to the cluster, it may be necessary to run programs interactively occasionally to set-up your experiments for the first time. You can use the srun command to request an interactive shell on a compute node.

Call srun with the same options as sbatch above to specify partition, number of cores, etc., followed by the option --pty bash. For example, to request a shell with a single gpu with atleast 11GB of memory on the gpu partition, run

srun -p gpu -c1 -C 11g --pty bash

Note that interactive jobs are subject to the same time limits and priority as batch jobs, which means that you might have to wait for your job to be scheduled, and that your shell will be automatically killed after the time limit expires.

Job Sequences for Dealing with Time limits

Let's say you have split up your job into a series of three script files called: optimize_1.sh, optimize_2.sh, optimize_3.sh --- each of which runs under the cluster's time limit, and picks up from where the last left off. You can request that they be executed as separate jobs in sequence on the cluster.

Pick a unique "job name" for the sequence, (let's say "series_A"). Then, just submit the three batch jobs in series using sbatch, with the additional command parameters -J series_A -d singleton. For example:

sbatch -p gpu -c1 -J series_A -d singleton optimize_1.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_2.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_3.sh

All three jobs will be immediately added to the queue, and if there are slots free, optimize_1.sh will start running. But optimize_2.sh will NOT start until the first job is done, and similarly, optimize_3.sh will only be started after the other two jobs have ended. Note that there is no guarantee that they will start on the same machine.

The singleton dependency essentially requires that previously submitted jobs with the same name (by the same user) have finished. There is a caveat however---the subsequent job will even be started if the previous job failed, or was killed (for example, because it overshot the time limit). So your scripts should be robust to the possibility that the previous job may have failed.

Note that you can have multiple such sequences running in parallel by giving them different names.

Monitoring your usage

Once your jobs have been scheduled, you can keep an eye on them using both command line tools on the login host, as well as on the cluster website http://slurm.ttic.edu/. At the very least, you should monitor your jobs to ensure that their processor usage is not exceeding what you requested when submitting these jobs.

The cluster website provides you with a listing of scheduled and waiting jobs in the cluster queue, shows you statistics of load on the cluster, as well as provides details (from the output of ps and nvidia-smi) of processes corresponding to jobs running on the cluster.

You can also use the slurm command line tool squeue to get a list of jobs in the queue (remember to call it with the -a option to see all jobs, including those in other groups' partitions that you may not have access to). To get a listing like the output on the website, which organizes job sequences into single entries, you can run xqueue.py.

Finally, use the scancel to cancel any of your running or submitted jobs. See the scancel man page for details on how to call this command. In particular, if you are using job sequences, you can use -n series_name option to cancel all jobs in a sequence.

List of Node Names & Features

Node-name GPU Public CPU Public CPU Cores RAM GPU(s) Feature labels
cpu2 N/A N 8 48G None avx,cpuonly
cpu3 N/A N 8 48G None avx,cpuonly
cpu4 N/A N 8 48G None avx,cpuonly
cpu5 N/A N 8 48G None avx,cpuonly
cpu6 N/A N 8 24G None cpuonly
cpu7 N/A N 8 24G None cpuonly
cpu8 N/A N 8 24G None cpuonly
cpu9 N/A N 8 24G None cpuonly
cpu10 N/A N 12 64G None avx,cpuonly
cpu11 N/A N 12 128G None avx,cpuonly
cpu12 N/A N 12 128G None avx,cpuonly
cpu13 N/A N 12 128G None avx,cpuonly
cpu14 N/A Y 12 48G None cpuonly
cpu15 N/A Y 12 48G None cpuonly
cpu16 N/A Y 12 48G None cpuonly
cpu17 N/A Y 12 48G None cpuonly
cpu18 N/A N 20 128G None avx,cpuonly
cpu19 N/A N 8 24G None cpuonly
cpu20 N/A N 8 24G None cpuonly
cpu21 N/A N 8 24G None cpuonly
cpu22 N/A N 8 24G None cpuonly
cpu23 N/A N 64 256G None avx,cpuonly,highmem
gpu-g1 N N 20 256G 10 1080 Ti 11g,1080ti,highmem,avx
gpu-[c,g]2 Y Y 16 256G 4 1080 Ti 11g,1080ti,highmem,avx
gpu-[c,g]3 Y Y 28 256G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]10 Y N 6 64G 2 Titan X 11g,12g,txpascal,avx
gpu-g11 N N 20 192G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]12 Y Y 16 256G 4 1080 Ti 11g,1080ti,highmem,avx
gpu-[c,g]13 Y Y 16 256G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]14 N Y 16 256G 4 1080 Ti 11g,1080ti,highmem,avx
gpu-[c,g]15 N Y 16 256G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]16 N Y 16 256G 4 A6000 11g,12g,24g,48g,a6000,highmem,avx
gpu-[c,g]17 Y Y 16 256G 4 Titan V 11g,12g,titanv,highmem,avx
gpu-g19 N/A N/A 8 128G 2 RTX 6000
gpu-g20 N/A N/A 4 64G 2 1080 Ti
gpu-g21 N/A N/A 4 64G 2 RTX 8000
gpu-g22 N/A N/A 4 64G 2 1080 Ti
gpu-g23 N/A N/A 4 64G 2 RTX 6000
gpu-g24 N/A N/A 8 64G 2 1080 Ti
gpu-g25 N/A N/A 4 64G 2 1080 Ti
gpu-g26 N/A N/A 6 64G 2 1080 Ti
gpu-g27 N/A N/A 8 128G 4 2080 Ti
gpu-g28 N/A N/A 20 192G 8 A6000
gpu-[c,g]29 N N 20 192G 4 Titan RTX 11g,12g,24g,titanrtx,highmem,avx
gpu-g30 N N 20 192G 8 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]31 N N 24 192G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]32 N N 24 192G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-g33 N N 20 192G 8 2080 Ti 11g,2080ti,highmem,avx
gpu-g34 N/A N/A 8 64G 2 2080 Ti
gpu-[c,g]35 N Y 16 256G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-g36 N N 20 192G 8 2080 Ti 11g,2080ti,highmem,avx
gpu-[c,g]37 N Y 20 256G 4 2080 Ti 11g,2080ti,highmem,avx
gpu-g38 N N/A 20 192G 8 A6000 11g,12g,24g,48g,a6000,highmem,avx
gpu-g39 N N 8 128G 2 2080 Ti
gpu-g40 N N 12 32G 1 1080 Ti

Software Tips

Jupyter Notebook

First you will need to install jupyter notebook. Here are a couple of of options. The examples below will be using the virtualenv option.

  1. Install Anaconda(Miniconda is preferred)
  2. Create a python virtualenv
virtualenv ~/myenv # create the virtualenv
. ~/myenv/bin/activate # activate the env
pip install --upgrade pip # it's always a good idea to update pip
pip install jupyter # install jupyter

You can run the jupyter notebook as either an interactive or batch job.

Jupyter File Locations

We recommend setting your jupyter environment variables so that they are not located on NFS directories and instead use node local /scratch space.

mkdir -p /scratch/$USER/jupyter
export JUPYTER_CONFIG_DIR=/scratch/$USER/jupyter
export JUPYTER_PATH=/scratch/$USER/jupyter
export JUPYTER_DATA_DIR=/scratch/$USER/jupyter
export JUPYTER_RUNTIME_DIR=/scratch/$USER/jupyter
export IPYTHONDIR=/scratch/$USER/ipython

Interactive

srun --pty bash run an interactive job

. ~/myenv/bin/activate activate virutal env

unset XDG_RUNTIME_DIR jupyter tries to use the value of this environment variable to store some files, by defaut it is set to '' and that causes errors when trying to run juypter notebook.

export NODEIP=$(hostname -i) get the ip address of the node you are using

export NODEPORT=$(( $RANDOM + 1024 )) get a random port above 1024

echo $NODEIP:$NODEPORT echo the env var values to use later

jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser start the jupyter notebook

Make a new ssh connection with a tunnel to access your notebook

ssh -N -L 8888:$NODEIP:$NODEPORT user@slurm.ttic.edu using the values not variables

This will make an ssh tunnel on your local machine that fowards traffic sent to localhost:8888 to $NODEIP:$NODEPORT via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.

Open your local browser and visit: http://localhost:8888

Batch

The process for a batch job is very similar.

jupyter-notebook.sbatch

#!/bin/bash
unset XDG_RUNTIME_DIR
NODEIP=$(hostname -i)
NODEPORT=$(( $RANDOM + 1024))
echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT `whoami`@slurm.ttic.edu"
. ~/myenv/bin/activate
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser

Check the output of your job to find the ssh command to use when accessing your notebook.

Make a new ssh connection to tunnel your traffic. The format will be something like:

ssh -N -L 8888:###.###.###.###:#### user@slurm.ttic.edu

This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.

Open your local browser and visit: http://localhost:8888

Tensorflow

When using tensorflow it will not respect common environmental variables to restrict the number of threads in use. If you add the following code to your tensorflow setup if will respect the correct number of threads requested with the -c option.

import os
NUM_THREADS = int(os.environ['OMP_NUM_THREADS'])
sess = tf.Session(config=tf.ConfigProto(
    intra_op_parallelism_threads=NUM_THREADS,
    inter_op_parallelism_threads=NUM_THREADS))

Torch

When installing torch from source you will need to modify the options it uses with the cp Linux command.

git clone https://github.com/torch/distro.git ~/torch --recursive
sed -i 's/-pPR/-PR/' ~/torch/exe/luajit-rocks/luarocks/src/luarocks/fs/unix/tools.lua
cd ~/torch; ./install.sh

[For Faculty] Contributing to the Cluster

If you are a faculty member at TTIC, we would like to invite you to contribute machines or hardware to the cluster. A rather high-level outline of what it would mean to you is below.

The basic principle we intend to follow is that the setup should provide people, on average, with access to more resources than they have on their own, and to manage these pooled resources to maximize efficiency and throughput.

If you contribute hardware, you will gain access to the entire cluster (see description of partitions above), and be given a choice between two options for high-priority access:

  1. You (which means you and any users you designate as being in your "group") will be guaranteed access to your machines within a specified time window from when you request it (the window is 4 hours, i.e., the time-limit for any other users' job that may be running on your machine). Once you get this access, you can keep it as long as you need it. Effectively, you decide when you want to let others use your machines, with a limited waiting period when you want them back.

  2. You give up the ability to guarantee on-demand access to your specific machines, in exchange for a higher priority for your jobs on the entire cluster. You still can reserve your machines for up to four weeks per year (possibly in installments of at least a week each time).

Some important notes: