File systems and data handling

 

Storage

 

File System

The global filesystem on Vilje is Lustre. This is the file system for all home areas, /home=113T, and a larger work area, /work=395T. Files not used in the last 21 days are periodicly removed from /work.

 

Disk Quota and Backup

The disk usage are limited with disk quotas. Default quota:

FilesystemDisk blocksFiles
/home100 GB400000
/work1000 GB900000

Check your file system quota using the “dusage” command, specifying the username and filesystem. The listed values are reported in kbytes.

For the home directory:

service1: # dusage
=====================================================================
User         Account   Type     Usage     Soft Limit   Hard Limit 
$user      $HOME     Disk    261.1 GB   476.8 GB     476.8 GB 
$user      $HOME     Files   738113     10000000     10000000 
$user      $WORK     Disk    29.7 GB    953.6 GB     953.6 GB 
$user      $WORK     Files   0          900000       900000   
---------------------------------------------------------------------

The home areas are backed up daily incrementally, that is all files modified since last full backup are copied. A full backup is carried out monthly. There is no backup of files at the /work area.


 

Lustre best practice

  • Avoid doing metadata operations (i.e. traverse directries with thousands of files)
  • Always set a stripecount > 1 on files larger than 1GB
  • Always set small stripecount for small files (stripecount=1)
  • keep small and large files i separate directories with appropriate stripecount. 1 for smaller files and > 1 for larger files
  • Applications which does parallell writes to files, will benefit from striping.
 

Set striping

How to set stripecount
# set stripecounts on directory
$ lfs setstripe -c 8 [directory]
# all subsesquent files or directories inside this directory will inherit this setting
#
#give a file a new stripe setting. copy/delete/move
$ lfs setstripe -c 8 [new_nonexist_file]
$ cp oldfile [new_nonexist_file]
$ rm oldfile
$ mv [new_nonexist_file] oldfile
 

Get striping information

get stripeinfo
# lfs getstripe testfile
testfile
lmm_stripe_count:   4
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  28
obdidx       objid       objid       group
28       194384117      0xb9610f5                0
18       196694846      0xbb9533e                0
35       195258885      0xba36a05                0
20       195521965      0xba76dad                0
 
# lfs getstripe --count testfile
4

 

Transferring Data

 

Basic Tools

See this section

 

Binary Data (Endianess)

See this section


 

Compressing/uncompressing files and directories

The available UNIX-commands for compression and uncompression are:

  • gzip/gunzip – for description see ‘man gzip
  • bzip2/bunzip2 – for a description see ‘man bzip2′


To compress a file, follow the example:

$ ls -l datafile
-rw-r--r-- 1 mad xno 1536000 2010-07-28 14:17 datafile
$ gzip datafile
$ ls -l datafile.gz
-rw-r--r-- 1 mad xno 307447 2010-07-28 14:17 datafile.gz

To uncompress a gzipped file, follow the example:

$ ls -l datafile.gz
-rw-r--r-- 1 mad xno 307447 2010-07-28 14:17 datafile.gz
$ gunzip datafile.gz
$ ls -l datafile
-rw-r--r-- 1 mad xno 1536000 2010-07-28 14:17 datafile

To compress a directory, the compression tool needs to be used in combination with a archiving tool, like tar:

$ ls -ld project
drwx------ 15 mad xno 32768 2009-12-09 18:25 project
$ tar czf project.tar.gz project/
$ ls -l project.tar.gz
-rw------- 1 mad xno 304827 2010-08-09 11:07 project.tar.gz

To unpack a archived directory:

$ tar xzf project.tar.gz
$ ls -ld project
drwx------ 15 mad xno 32768 2009-12-09 18:25 project
 

Scientific Data Libraries

 

Netcdf

See the NetCDF page. 

 

HDF5

See the HDF5 page.