Skip to end of metadata
Go to start of metadata

Introduction.

Distributed Matlab is MPI programming without knowledge of MPI. It is easy to use and looks like Matlab Parallel Computing Toolbox.

With Distributed Matlab you can run on many computers and cores in parallel, using same code.

All source codes are open and free, but with NTNU copy right ( (Note! Matlab is not free and you need a license).

Note! It is also possible to add MPI calls into your program (see Matlab MPI).

You can not use Matlab parfor, but still; you can run one matlab instance on each core on the compute node(s). See job scripts below.

Matlab MPI and Distributed Matlab is installed on Fram, Vilje, Idun/Lille and Maur HPC clusters.

You need an account on Fram or Vilje :

    Fram: Send an application to Sigma2, see here.

    Vilje (local users): Send an application NTNU, see here.

User guides:

    Fram, see here.

    Vilje, see here.

Support: john.floan@ntnu.no

Available variables.

num_ranks:    Number of ranks (or more exact MPI processes) that are selected in the job script (see Job scripts).

my_rank:        MPI ID. The rank numbers are from 0 to num_ranks-1. 1 rank is 1 matlab instance.

Master_rank:  Rank number 0.

Rank(s) can be one CPU core or more, and/or one compute node or more. (see Job scripts)    

Distributed functions.

Overview

  • parallel_start and parallel_end: Definition of the parallel region in your program. Note. The parallel_end must be in the end of the program.
  • parsteps: Divides the for-loop iterations into chunks. (see example below). (Note! A parallel for-loop must be iterational independent).
  • reduction and allreduction: Do a reduction of variables and arrays from all ranks, as eg summation. You can reduce one array or several variables
                Reduction operators: '+', '*', 'MAX', 'MIN' ,'AND' and 'OR'  (+:Summation, *:multiplication,MAX:Maximum value, MIN:Minimum value, AND:Logial and, OR: logical or).
  • reduce_array: Reduce multidim array.
  • spread and collect: 
                Spread several variables or one array from master to all ranks as: [a,b]=spread(a,b);  array=spread(array);                                 
                Collect several variables or one array from all ranks to master as: [a,b]=collect(a,b): or [arrays{1:num_ranks}]=collect (array). 
                The array must be one dim.
  • distrib and join.
                Distribute an array (1 to 3 dimensional) to all ranks, chunked in num_ranks coloumns.
                Join all chunked arrays (1 to 3 dimensional) to one array.
  • mdisplay: Same as display but only master rank display the text.
  • get_my_rank  and get_num_ranks

See examples below.

Parallel_start and end

Example: Hello world.

!!! Note that parallel_end shall always be in the end of the program.

Output (if 2 computers (2 ranks)):

Hello world from rank number 0 of total 2

Hello world from rank number 1 of total 2

Reduction and parsteps

Example: Parallel for-loop on 8 compute nodes.

Sequential code 

Parallel code 

What parsteps do; is to chunk the iterations (ii) to eg. 8 compute nodes, and all for-loops starts simultaneously:

             rank 0              rank 1                 rank 2       ...........            rank 7

             for ii = 1: 8       for ii = 9 :16        for ii = 17 : 24                  for ii =57 : 64

NOTE! For all parallel for-loops; each iteration must be independet of the itereration before.

NOTE2! Number of iterations can not be less then number of ranks.

 

More advanced use of parallel for loop (double for-loop)

(Note! The inner loop is not in parallel. All ranks count jj from 1 to n).

 

Example: Reduction  array with Max, and find the maximum value of the local arrays from all ranks.

 

Example: Reduction of an array with '+'  and '*': Summation/multiplication of all arrays. 

Exampel : Reduction of array with '+': Summation of all arrays (element for elements) to master rank (0).

Input

    rank 0: data = [1  2  3];

    rank 1: data = [2  3  4];

data = reduction('+',data):

After reduction: 

    rank 0: data =  3  5  7  (Master rank)

    rank 1: data =  0  0  0 (Other ranks)

 

Exampel : Reduction of array with '*': Multiplication of all arrays (element for elements) to master rank.

Input

    rank 0: data = [2  3  4];

    rank 1: data = [3  4  5];

data = reduction('*',data):

After reduction: 

    rank 0: data =  6  12  20 (Master rank)

    rank 1: data =  0   0    0  (Other ranks)

 

Example: Update a array for each rank ( 2 ranks in this case)

Input

    rank 0: data = [ 1 2 3 0 0 0 ]

    rank 1: data = [ 0 0 0 4 5 6 ]

After reduction

    rank 0 : [1 2 3 4 5 6] (Master rank)

    rank 1: [0 0 0 0 0 0 ] (Other ranks

 

Example: Reduction of multi-dim array.

Allreduction

Same as reduction but all ranks get same reduction value(s) 

 

Reduce multi-dim array

Reduce multi-dimensional array (max 3 dim) beetween all ranks, and all ranks get same values

Operators: '+'  (default),  '*' , 'MAX', 'MIN', 'AND' and 'OR'

Syntax:

     arrayout = reduce_array ( arrayin);  %Operator is default '+'

     arrayout = reduce_array ( operator , arrayin );

 

Example code 1 (2 ranks): 2 dim array

Input array (2x2)

Rank 0     Rank 1   

1    2         0   0   

0    0         3   4    

Output array from all ranks (after reduce_array)

1   2                 

3   4                     

 

Example code 2 (2 ranks): 2 dim array

Input to array: 

Rank 0      Rank 1 

1   2          2   2    

2   2          3   4 

Output from all ranks (after reduce_array)

2   4                  

6   8              

Spread and collect.

Example code for spread.

Array:

All ranks receive the same array, created by the master rank. All receive: (0,0,0) (Note, my_rank for master is 0)

Variables:

All ranks receive variables a and b, created by the master rank: a=0, b=1. (Note, my_rank for master is 0)

 

Example code for collect:

Array:

Input to collect (2 ranks): Rank 0: (0,0,0), Rank 1: (1,1,1).

Output of collect (master): arrays is a cell array with 2 cells (1 cell each rank):  arrays{1}=(0,0,0), arrays{2}=(1,1,1)

Variables:

Input to collect for 2 ranks: Rank 0: a = 0 , b = 10, Rank 1: a = 1 , b = 11.

Output of collect (master): a = (0 , 1),  b = (10 , 11)

Distribute and join (distrib and join)

Distrib divide a array in  "number of ranks" chunks. Join gather the local array to master rank.

Exampel code; distribute:

Input from master (rank 0) (A: 2x2 elements):

    A = 1      2

          3      4

Output (for eg. 2 ranks): (A: 2x1 elements)

     Rank 0        Rank 1

      A =  1           A  =  2                  

             3                    4

Example code: join

Input join:

  Rank 0          Rank 1       

  A  =  1           A  =  2                  

          3                    4

Output join (master)

A   =   1       2 

           3       4

Job scripts.

For more information about job scripts and job execution; see here.

 

1.  To run one Matlab job on each compute node. 
     Example: One Matlab job on each of 4 nodes. That is 4 ranks (Note! You still have 32 cores (Fram) available (16 cores on Vilje) each node. Use operators and function that support multicore running)

FRAM

 

VILJE:

 

Typical use of this configuration is use of eg Matlab linear algebra functions and operators. Matlab support multicore running for several functions and operators..

 

2.  To run Matlab jobs on each core on the compute node.

FRAM:

 Example: 4 nodes and 32 MPI processes each node, that is 128 ranks (or cpu cores).

 

VILJE:

Example: 2 nodes and 16 MPI processes each node, that is 32 ranks (or cpu cores).

For the parallel for-loop example above, the iterations of ii are as:

Rank 0        Rank 1    ...  Rank 32
for ii=1:2     for ii=3:4       for ii=31:32

Typical use of this configuration is, similar as parfor; for-loops with non multicore supported function and operators; as eg calculation with variables (a = b + c);

Object Oriented Matlab.

Example code: AverageOO

AverageOO Class:

Save to file.

Normally you let the Master rank save to file as:  (or else use different file name for each rank):

!!! Note that parallel_end shall always be in the end of the program.

 

Job script Idun

 

 

  • No labels