Running Jobs

Table of Content

    Introduction

    IDUN uses the Slurm Workload Manager to manage the provided resources and to schedule jobs on these resources.

    We have several videos on our IDUN HPC Youtube channel with some practical examples.

    IDUN has 3 partitions (queues):

    • CPUQ - for jobs that needs only CPU.
    • GPUQ - for jobs that will use GPUs.
    • short - for test and development with time limit 20 minutes.

    Current job time limits:

    • CPUQ - maximum 30 day
    • GPUQ - maximim 14 day

    If you need to increase Time Limit: send a request to help desk, with the job ID, about more hours for this specific job.

    Which group account to use? CPU quota limit.

    Most users have access to more than one group account. You can find your group accounts with this command:

    $ sacctmgr show assoc format=Account%15,User,QOS | grep -e QOS -e USER_NAME
            Account       User                  QOS 
             my-dep    USER_NAME             normal 
       share-my-dep    USER_NAME               high 

    Group accounts that start with share-* are high-priority accounts. And created for IDUN shareholders. We recommend using share-* accounts by default if you have it.

    Accounts without share-* and the beginning of the name have normal job priority. And it exists for two purposes:

    • group account created for non-shareholders.
    • group account created for shareholder after they users CPU quota. Emergency cases.

    Every group account has CPU hours quota limits. Use this command to check CPU quota usage for this month:

    $ idun-slurm-quota
    Account               Quota(hours)  CPU(hours)  A100(hours) H100(hours) P100(hours) V100(hours) Cost(kr)    
    . . .                 . . .         . . .       . . .       . . .       . . .       . . .       . . .

    All group accounts quota usage is resets on the first day of each month.

    Simple CPU job

    CPUQ is the default partition for pure CPU jobs. If you need GPU's, please use the GPUQ partition.

    #!/bin/sh
    #SBATCH --partition=CPUQ
    #SBATCH --account=<account>
    #SBATCH --time=0-00:15:00     # 0 days and 15 minutes limit
    #SBATCH --nodes=1             # 1 compute nodes
    #SBATCH --cpus-per-task=2     # 2 CPU cores
    #SBATCH --mem=5G              # 5 gigabytes memory
    #SBATCH --output=hello.txt    # Log file
    echo "Hello IDUN"

    Note, that you need to replace <account> with your allocation account, respectively.

    Save the script, e.g. as job.slurm, and submit the job using command sbatch:

    $ sbatch job.slurm

    GPU job

    To get access to gpu resources, you need to add a request for GPU resources. You may choose between:

    • p100 - NVIDIA GPU with 16GB
    • v100 - NVIDIA GPU with 16GB or 32GB
    • a100 - NVIDIA GPU with 40GB or 80GB. Also available PCIE or NVLink cards (sxm4).
    • h100 - NVIDIA GPU with 80GB

    NVIDIA specifications:

    p100v100a100h100
    Single-Precision FP32 TFLOPS9.31419.567
    Double-Precision FP64 TFLOPS4.779.734

    Our benchmark results:

    p100v100a100h100
    Tensorflow ResNet-50239 images/sec379 images/sec891 images/sec
    HPCG CUDA95 GFLOPS140 GFLOPS241 GFLOPS534 GFLOPS
    Hashcat MD527129 MH/s50663 MH/s 60503 MH/s 120.2 GH/s
    Blender 4.0.1 Classroom00:58.27 sec00:30.41 sec00:20.28 sec00:12.99 sec

    GPUs are available in GPUQ partition. Use this line in the job script:

    #SBATCH --partition=GPUQ

    IDUN uses these GPU names:

    • p100
    • v100
    • a100

    IDUN uses these constraint names (features):

    • p100
    • v100
    • a100
    • gpu16g
    • gpu32g
    • gpu40g
    • gpu80g
    • sxm4

    Examples

    If you need one gpu:

    #SBATCH --gres=gpu:1

    if you need two p100 gpus:

    #SBATCH --gres=gpu:p100:2

    If you are dependent on specific type of GPU (p100,v100,a100), for example 2 Volta GPUs and only with 32GB memory:

    #SBATCH --gres=gpu:v100:2
    #SBATCH --constraint=gpu32g
    

    It is also possible combine constraints with logical AND "&", logical OR "|":

    Three GPUs but only v100 or a100:

    #SBATCH --gres=gpu:3
    #SBATCH --constraint="v100|a100"
    

    Four GPUs but only a100 with sxm4

    #SBATCH --gres=gpu:4
    #SBATCH --constraint="a100&sxm4"
    

    Five a100 GPUs but only with 40 or 80 GB memory

    #SBATCH --gres=gpu:a100:5
    #SBATCH --constraint="gpu40g|gpu80g"

    Example:

    #!/bin/sh
    #SBATCH --partition=GPUQ
    #SBATCH --account=<account>
    #SBATCH --time=00:30:00
    #SBATCH --nodes=2
    #SBATCH --ntasks-per-node=2
    #SBATCH --mem=10G              # 10 gigabytes of system memory (not GPU memory)
    #SBATCH --gres=gpu:2  
    #SBATCH --job-name="LBM_CUDA"
    #SBATCH --output=lbm_cuda.out
    module purge
    module load fosscuda/2018b
    mpirun hostname
    srun ./my cudacode

    MPI job

    job.slurm (example with compiled c-code or fortran code).

    #!/bin/sh
    #SBATCH --partition=CPUQ
    #SBATCH --account=<account>
    #SBATCH --time=00:15:00
    #SBATCH --nodes=2              # 2 compute nodes
    #SBATCH --ntasks-per-node=1    # 1 mpi process each node
    #SBATCH --mem=12000            # 12GB - in megabytes
    #SBATCH --job-name="hello_test"
    #SBATCH --output=test-srun.out
    #SBATCH --mail-user=<email>
    #SBATCH --mail-type=ALL
    
    WORKDIR=${SLURM_SUBMIT_DIR}
    cd ${WORKDIR}
    echo "we are running from this directory: $SLURM_SUBMIT_DIR"
    echo " the name of the job is: $SLURM_JOB_NAME"
    echo "Th job ID is $SLURM_JOB_ID"
    echo "The job was run on these nodes: $SLURM_JOB_NODELIST"
    echo "Number of nodes: $SLURM_JOB_NUM_NODES"
    echo "We are using $SLURM_CPUS_ON_NODE cores"
    echo "We are using $SLURM_CPUS_ON_NODE cores per node"
    echo "Total of $SLURM_NTASKS cores"
    
    module purge
    module load intel/2020b
    module list
    srun myprogram
    

    We enabled switch topology on IDUN. You can list topology with this command:

    $ scontrol show topo

    In case you need to start job only on the nodes connected to one switch add this line to the job script:

    #SBATCH --switches=1

    If you your job is MPI and you want to run on several tasks and nodes, apply corresponding changes.

    If you need to run your job on two nodes and ten tasks (cores) on each node:

    #SBATCH --nodes=2
    #SBATCH --ntasks-per-node=10

    Use same module in the script as for the compilation of the code.

    Array jobs

    To run many sub jobs from same job, use --array as

    #SBATCH --array=1-10

    Starts 10 sub jobs and you get a array ID from each sub job with environment variable $SLURM_ARRAY_TASK_ID

    Example:

    #!/bin/sh
    #SBATCH --partition=CPUQ
    #SBATCH --account=<account>
    #SBATCH --time=00:15:00
    #SBATCH --nodes=1
    #SBATCH -c 28
    #SBATCH --mem=12000
    #SBATCH --array=1-10
    #SBATCH --job-name="hello_test"
    #SBATCH --output=test-srun.out
    #SBATCH --mail-user=<email>
    #SBATCH --mail-type=ALL
     
    WORKDIR=${SLURM_SUBMIT_DIR}
    cd ${WORKDIR}
    
    module purge
    module load intel/2020b
    module list
    ./myprogram $SLURM_ARRAY_TASK_ID

    This example will start 10 sub jobs and the running program will get an input argument with individual Task ID, which is from 1 to 10.

    Other array jobs settings:

    Specification   Resulting SLURM_ARRAY_TASK_IDs
    1,4,42          # 1, 4, 42
    1-5             # 1, 2, 3, 4, 5
    0-10:2          # 0, 2, 4, 6, 8, 10 (step 2)
    32,56,100-200   # 32, 56, 100, 101, 102, ..., 200
    1-200%10        # 1, 2, ..., 200, but maximum 10 running at the same time

    One more job array example:

    ### Job array script:
    
    $ cat myjob.slurm
    #!/bin/bash
    #SBATCH --job-name=array_calc
    #SBATCH --partition=CPUQ
    #SBATCH --account=<account>
    #SBATCH --time=5:00:00
    #SBATCH --mem=4G
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=1
    #SBATCH --array=1-800
    module purge
    module load intel/2022a
    DATASET=dataset.$SLURM_ARRAY_TASK_ID
    srun /bin/bash -lc "echo $DATASET"
    
    ### Start job array
    
    $ sbatch myjob.slurm
    
    ### One of output files example:
    
    $ cat slurm-22635052_8.out
    dataset.8

    Interactive Job

    A user can also request node(s) for an interactive job:

    $ salloc --account=<account> --cpus-per-task=1 --partition=CPUQ --time=00:30:00
    salloc: Pending job allocation 416189
    salloc: job 416189 queued and waiting for resources
    salloc: job 416189 has been allocated resources
    salloc: Granted job allocation 416189
    salloc: Waiting for resource configuration
    salloc: Nodes idun-99-77 are ready for job
    
    $ ssh idun-99-77

    The above example requests a one CPU core in the CPUQ partition for 30 minutes. After the allocation, the user logins with ssh command on the requested node.

    Alternative solution is to use command srun:

    srun --account=support --cpus-per-task=1 --partition=CPUQ --time=1-00:30:00 --pty bash

    WARNING: Do not use options below with "srun" and "--pty bash" options:
    --ntasks-per-node=
    --ntasks=
    --nodes=

    Or your job will be terminated in 60 seconds with this message:

    srun: Job step aborted: Waiting up to 62 seconds for job step to finish.

    Use --cpus-per-task=

    Running Graphical User Interface (GUI) Applications

    In order to use GUI applications, it is necessary to use the -X flag when logging into the cluster:

    XForwarding
    $ ssh -X <username>@idun-login1.hpc.ntnu.no
    <username>@idun-login1.hpc.ntnu.no's password:
    
    [<username>@idun-login1 ~]$

    This will enable the forwarding of a GUI applications' windows to your desktop machine. After login, an interactive job can be started for the execution of the GUI application:

    [<username>@idun-login1 ~] salloc --account=<account> --nodes=1 --partition=CPUQ --time=00:30:00
    salloc: Pending job allocation 416194
    salloc: job 416194 queued and waiting for resources
    salloc: job 416194 has been allocated resources
    salloc: Granted job allocation 416194
    salloc: Waiting for resource configuration
    salloc: Nodes compute-1-0-27 are ready for job
    
    [<username>@idun-login1 ~]@ ssh -X compute-1-0-27
    
    [<username>@compute-1-0-27 ~] xclock

    The first command reserves a single node from the WORKQ partition for 30 minutes. After the allocation has been granted, it is possible to log into this node using the -X flag and start the GUI application (xclock in the above example).

    Python job script example (Slurm)

    Exampel Python/3.8.6-GCCcore-10.2.0. Use "module spider python" to find versions.

    #!/bin/bash
    #SBATCH --job-name="my-job"   # Sensible name for the job
    #SBATCH --account=<account>   # Account for consumed resources
    #SBATCH --nodes=1             # Allocate 1 nodes for the job
    #SBATCH -c28                  # Number of cores (can vary)
    #SBATCH --time=00-00:10:00    # Upper time limit for the job (DD-HH:MM:SS)
    #SBATCH --partition=CPUQ
    
    module load Python/3.8.6-GCCcore-10.2.0
    
    python mypython.py 

    For parallel python see: Parallel Python – High Performance Computing Group (ntnu.no)

    Matlab job script example (Slurm):

    #!/bin/bash
    #SBATCH --job-name="my-job"   # Sensible name for the job
    #SBATCH --account=<account>   # Account for consumed resources
    #SBATCH --nodes=1             # Allocate 1 nodes for the job
    #SBATCH -c28                  # Number of cores (can vary)
    #SBATCH --time=00-00:10:00    # Upper time limit (DD-HH:MM:SS)
    #SBATCH --partition=CPUQ
    
    module load MATLAB/2021b
    
    matlab -nodisplay -nodesktop -nosplash -nojvm -r "test"

    (NOTE! If using Parallel Computing Toolbox, remove -nojvm as: matlab -nodisplay -nodesktop -nosplash -r "test")

    To check Matlab versions, type: module spider matlab

    For parallel Matlab (using MPI) see: Distributed Matlab(using MPI) – High Performance Computing Group (ntnu.no)

    COMSOL job script example (Slurm):

    #!/bin/bash
    ########################################################
    #
    #  Running COMSOL cluster job
    #
    #  The default setup is to run COMSOL in hybrid mode
    #  with 2 MPI processes per node ('mpiprocs'), one per 
    #  socket, and 8 threads ('ompthreads') per MPI process
    #
    ########################################################
    #
    #SBATCH --partition=CPUQ    # partition the batch job will be put in
    #SABTCH --account=<account>  # Account for consumed resources
    #SBATCH --time=00:30:00      # the walltime length of the job (30 minutes)
    #SBATCH --nodes=2            # number of nodes requested
    #SBATCH --ntasks-per-node=2  # number of processes per node
    #SBATCH --job-name="comsol_example"  # name of the job
    #SBATCH --output=comsol.out  # name of output file
    #SBATCH -J COMSOL_example   # Name for the job 
     
    module load COMSOL/5.3a
    
    
    # ${SLURM_SUBMIT_DIR} - directory from where the job were submitted 
    # Create (if necessary) the working directory
    w=${SLURM_SUBMIT_DIR}/comsol/$case
    if [ ! -d $w ]; then mkdir -p $w; fi
     
    # Copy inputfile and move to working directory
    cp $case.mph $w
    cd $w
     
    comsol batch -mpibootstrap ssh -inputfile $case.mph -outputfile out.mph -tmpdir $w

    Commands to controll Slurm jobs

    # Get all jobs
    squeue
     
    # get all jobs for user < only pending | only running > in <partition>
    squeue -u username <-t PENDING|-t RUNNING> <-p partition>
     
    # Show detailed info on <jobid>
    scontrol show jobid -dd <jobid>
     
    # cancel specific <jobid>
    scancel < jobid >
     
    # cancel all <pending> jobs for <username>
    scancel <-t PENDING> -u <username>
    
    # show available resources
    sinfo -o "%10P %5D %34N  %5c  %7m  %47f  %23G"
    Scroll to Top