FAQ – High Performance Computing

General Questions

How to login to Vilje?

How can I reset my user account password?

Can I ssh to Vilje from off campus?

It is not possible to ssh directly into Vilje from off-campus except in special circumstances. We recommend that you use VPN. You can also log in using ssh via login.ansatt.ntnu.no or login.stud.ntnu.no.

How do I check my disk quotas?

See Disk Quota and Backup

What is the difference between cpus, cores and hyperthreading?

Each node has two CPU's
Each CPU has eight physical cores
Each CPU has sixteen logical cores (hyperthreading)

See the About Vilje page for a graphical layout of a compute node.

The shell command 'cost' do not give the same output as before.

The old cost command is now cost_ntnu. The new cost command is the standard version for all NOTUR systems. Outputs from the different versions are shown below.

If you like the the former version best, make an alias, example in the code block below:

$ cost -p ntnu935
Report for account ntnu935 on vilje.hpc.ntnu.no
Last updated on Fri Mar 14 13:49:48 2014
==================================================
Account                                 Core hours
==================================================
ntnu935            avail                  40936.99
ntnu935            usage                      0.00
ntnu935            reserved                   0.00
==================================================
 
$ cost_ntnu -p ntnu935
Report for account ntnu935 on village
Allocation period 2013.2 (start Tue Oct 01 00:00:01 2013)
             (end Mon Mar 31 23:59:59 2014)
Last updated on Fri Mar 14 13:50:11 2014
==================================================
Account                             Core hours
==================================================
ntnu935         avail                 40936.99
ntnu935         usage                  9063.01
ntnu935         reserved                  0.00
ntnu935         quota (pri)           50000.00
ntnu935         quota (unpri)               NA
--------------------------------------------------
ntnu935         toumas                 9055.75
ntnu935         gianda                    7.55
==================================================
 
  
$ alias cost='/sw/ntnu/software/utilities/bin/cost_ntnu'

Fluent licenses

The fluent license server is on the IVT-faculty. The server is lisens01.ivt.ntnu.no. To get information about liceses, do:

$ module load fluent
$ lmstat -a -c 1055@lisens01.ivt.ntnu.no

I cannot login - "Connection closed by 129.241.21.219"

This is the typical response if your user are disabled. Your account may be disabled because you have not changed your password in time. Quite a few users experience this when they get their password credentials from UNINETT Sigma2 the first time. A password sent to your mobile phone is only valid for seven days.

I get an error message that contains ^M

If you receive an error message with the ^M character, for example:

"/bin/bash^M: bad interpreter: No such file or directory"

then you have copied a text file from Windows to Vilje. This text file therefore contains Windows line-end characters, which are different from UNIX line-end characters. You have to convert the text file to UNIX format by using the dos2unix command:

$ dos2unix filename

I get a locale error message on login: /usr/bin/manpath: can't set the locale; make sure $LC_* and $LANG are correct

You will need to change the locale setting in your Mac terminal preferences. In your Mac Terminal select "Terminal" -> "Preferences" -> "Advanced". Uncheck "Set locale environment variables on startup". You need to stop/kill all your terminal sessions for the change to take effect. On startup the locale environment variables will be unset, and any Linux session following a ssh login, will not inherit the locale setting from the Mac Terminal.

Application Development

What do the error "Tcl command execution failed..." when loading a module file mean?

Some modules have dependencies that require other modules to be loaded first. E.g. trying to load the 'boost/1.53.0' module will result in:

$ module load boost/1.53.0
boost/1.53.0(30):ERROR:151: Module 'boost/1.53.0' depends on one of the module(s) 'intelcomp/13.0.1 '
boost/1.53.0(30):ERROR:102: Tcl command execution failed: prereq          intelcomp/13.0.1

This means you must load 'intelcomp/13.0.1' before loading 'boost/1.53.0'.

Where are the mpicc and mpif90 compiler wrappers located?

You need to load the mpt module file before building MPI applications:

 $ type mpicc
-bash: type: mpicc: not found
$ module load mpt
$ type mpicc
mpicc is /sw/sgi/mpt/mpt-2.06/bin/mpicc

My code fails to link with the message "...relocation truncated to fit..."

This happens if your code need more than 2GB of static data. Compile the code with the -mcmodel=medium -shared-intel options.

How do I get support for C++11?

You will need to do a module load of gcc/6.2.0 after you have loaded the Intel Compiler. If you only do module load intelcomp/17.0.0, icpc -v will report compatibility with GCC 4.3.0. GCC 4.3.0 has not implemented the C++11 standard. After a 'module load gcc/6.2.0' will 'icpc -v' report compatibility with GCC 6.2.0 which has implemented the C++11 standard.

I want to use a new compiler, but the module depends on an older one: ERROR:151: Module 'openjpeg/2.1.0' depends on one of the module(s) 'intelcomp/15.0.1 '

Modules can have prerequisites for older compilers. You may still use the module with a new compiler. You load the newer compiler by doing a 'module switch <load module> <new module>':

$ module list
 
Currently Loaded Modulefiles:
 
  1) intelcomp/16.0.1
 
$ module load openjpeg/2.1.0
 
openjpeg/2.1.0(32):ERROR:151: Module 'openjpeg/2.1.0' depends on one of the module(s) 'intelcomp/15.0.1 '
 
openjpeg/2.1.0(32):ERROR:102: Tcl command execution failed: prereq intelcomp/15.0.1
 
$ module switch intelcomp/16.0.1 intelcomp/15.0.1
$ module load openjpeg/2.1.0
$ module switch intelcomp/15.0.1 intelcomp/16.0.1 
$ module list
 
Currently Loaded Modulefiles:
 
  1) intelcomp/16.0.1   2) openjpeg/2.1.0

Running Jobs

Is there a scratch disk on each compute node? Is there a swap disk on each node?

No. The nodes are diskless. Please use the global filesystem on /work (it is the fastest and can provide the highest I/O rate). Note, you should try to have few but large files for scratch. Try to avoid task-local files, like one per process. Please do not exceed the per-node RAM since there is no swap space and you will crash the node.

Is hyperthreading enabled, and how can this be utilized?

Yes, hyperthreading is enabled by default on nodes. This means that PBS sees two virtual cores for each physical processor, i.e. for each compute node there are 32 virtual cores present. To utilize this, e.g. for an MPI application running on two nodes, specify 32 MPI processes (mpiprocs keyword) for each node:

#PBS -l select=2:ncpus=32:mpiprocs=32

Will MPI programs benefit from hyperthreading?

Only testing an application with its dataset will show whether a program will benefit from hyperthreading or not.

What do the error "qsub: Job has no walltime requested" mean?

Users must specify the wall clock limit for a job. See PBS Consumable Resources.

What is the estimated start time for my queued job?

Adding options -Tiw to command qstat show estimated start times for queued jobs:

$ qstat -Tiw

I specified mem=32gb. Now my job is queuing forever. Why?

Some of the available memory on each node is set aside for the OS. You need to reduce this, e.g. to mem=29gb.

My job script is not accepted. The batch systme returns: 'qsub: illegal -N value'

There is something wrong with the script name, the parameter given to #PBS -N. It can be at most 15 characters long and must consist of printable, non-white space characters with the first character alphabetic.

#PBS -N  Atmost15chrlong

How do I get information about running and completed jobs?

To list all running jobs, including array jobs:

$ qstat -r -u $USER -wt

To list all jobs that are idle with estimated start times:

$ qstat -r -u $USER -wt

To list all jobs, including those that have finished:

$ qstat -x -u $USER

Be careful with using the -x option. It can cause PBS to become unresponsive for all users if you forget to use -u $USER or you have many jobs in the history. History has been set to about one month.

To get a full listing of a job:

$ qstat [-x] -f <JOBID>

To get detailed job scheduling information:

$ tracejob <JOBID>

How can I monitor the output from running jobs?

The output to stdout and stderr are directed to a special subdirectory in your home directory:

pbs.$PBS_JOBID.x8z

where $PBS_JOBID is listed in the leftmost column in the output from command qstat. You can look at these files, e.g. stdout, issuing the command:

$ tail -f $HOME/$PBS_JOBID.x8z/$PBS_JOBID.OU

replacing $PBS_JOBID with the actual job id number. The stdout and stderr files are copied back to the job's working directory at job completion.

How can I check the CPU usage pattern of my job?

You can log into nodes that your job are running on. To show compute nodes run command

$ qstat -t -n -1 | grep [username]

The last column list the active compute nodes. You can log into one of the compute nodes with

$ ssh [nodename]

and run the top command to see the CPU usage pattern.

My OMP_NUM_THREADS or MKL_NUM_THREADS setting does not appear to be recognized.

If your application is built with the single dynamic MKL library, mkl_rt, you need to specify at runtime if you want to use the threaded or sequential mode of MKL. By default, specified in the Intel compiler modulefile, sequential mode is used. If you want to use Intel threading, specify this in the jobscript using the MKL_THREADING_LAYER variable, e.g. (bash syntax):

module load intelcomp/13.0.1
export MKL_THREADING_LAYER=INTEL
export OMP_NUM_THREADS=16
export MKL_NUM_THREADS=16

Notice, always set environment variables after 'module load' commands so that your own variable settings are not reset.LikeBe the first to like this