The Concept
We are going to look at the MPI communication related to a simplified diffusion problem, namely the Laplace equation in 2D. We will use the five point stencil as our discretisation scheme and a Dirichlet boundary condition. The resulting system is solved by simple Jacobi iteration: start by initialising the boundary and guess a solution for the inner nodes. Then while the system has not converged, update all the inner node to be the average of the 4 neighbouring nodes.
The work can be divided into squares of size NxN that are divided among the processes. The variables lying at the boundary have to be communicated with the neighbouring nodes resulting in a communication cost of order N (four sides of size N). The computation cost per node is N^2. Given that communication is expensive we want the total communication cost to be negligible compared to the total computational cost which will require N to be big.
The Code
The domain is discreticed into a 6 (width) by 8 (height) grid which is to be divided into 2 by 2 squares handled by different processes; 12 processes in total. Each process also has to store the closest neighbouring data making the matrix storage requirement (width+2)*(height+2)=16. The processes are set up with a communicator that uses a two dimensional Cartesian topology. The key point here is to allow the library implementers to do hardware specific optimizations for such a topology as well as hiding the task of keeping track of neighbours from the programmer. MPI_Cart_coords and MPI_Cart_shift can easily be used to find the coordinates and the rank of neighbouring processes. The program exchange data across process boundaries in four steps by moving data up, down, left and right. The output of the program is information about the ranks of the processes, their coordinates and neighbours as well as the state of the global matrix (which is collected by the master) before the exchange, after the vertical exchange and after the horizontal exchange.
$ cd basic_mpi/05_diffusion/ $ cat src/cart.c /****************************************************************************** * * * Basic MPI Example - Diffusion * * * * Uses a cartesian grid to exchange boundary data between processes. * * * ****************************************************************************** * * * The original code was written by Gustav at University of Indiana in 2003. * * * * The current version has been tested/updated by the HPC department at * * the Norwegian University of Science and Technology in 2011. * * * ******************************************************************************/ #include#include #include #include #define FALSE 0 #define TRUE 1 #define MASTER_RANK 0 #define UPDOWN 0 #define SIDEWAYS 1 #define RIGHT 1 #define UP 1 #define PROCESS_DIMENSIONS 2 #define PROCESS_ROWS 4 #define ROWS 4 #define DISPLAY_ROWS 16 /* must be PROCESS_ROWS * ROWS */ #define PROCESS_COLUMNS 3 #define COLUMNS 4 #define DISPLAY_COLUMNS 12 /* must be PROCESS_COLUMNS * COLUMNS */ int main ( int argc, char **argv ) { int pool_size, my_rank, destination, source; MPI_Status status; char char_buffer[BUFSIZ]; int i_am_the_master = FALSE; int divisions[PROCESS_DIMENSIONS] = {PROCESS_ROWS, PROCESS_COLUMNS}; int periods[PROCESS_DIMENSIONS] = {0, 0}; int reorder = 1; MPI_Comm cartesian_communicator; int my_cartesian_rank, my_coordinates[PROCESS_DIMENSIONS]; int left_neighbour, right_neighbour, bottom_neighbour, top_neighbour; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &pool_size); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if (my_rank == MASTER_RANK) i_am_the_master = TRUE; MPI_Cart_create ( MPI_COMM_WORLD, PROCESS_DIMENSIONS, divisions, periods, reorder, &cartesian_communicator ); if (cartesian_communicator != MPI_COMM_NULL) { int matrix [ROWS][COLUMNS]; int i, j; MPI_Datatype column_type; MPI_Comm_rank ( cartesian_communicator, &my_cartesian_rank ); MPI_Cart_coords ( cartesian_communicator, my_cartesian_rank, PROCESS_DIMENSIONS, my_coordinates ); MPI_Cart_shift ( cartesian_communicator, SIDEWAYS, RIGHT, &left_neighbour, &right_neighbour ); MPI_Cart_shift ( cartesian_communicator, UPDOWN, UP, &bottom_neighbour, &top_neighbour ); if (! i_am_the_master ) { sprintf(char_buffer, "process %2d, cartesian %2d, \ coords (%2d,%2d), left %2d, right %2d, top %2d, bottom %2d", my_rank, my_cartesian_rank, my_coordinates[0], my_coordinates[1], left_neighbour, right_neighbour, top_neighbour, bottom_neighbour); MPI_Send(char_buffer, strlen(char_buffer) + 1, MPI_CHAR, MASTER_RANK, 3003, MPI_COMM_WORLD); } else { int number_of_c_procs, count; number_of_c_procs = divisions[0] * divisions[1]; for (count = 0; count < number_of_c_procs - 1; count++) { MPI_Recv(char_buffer, BUFSIZ, MPI_CHAR, MPI_ANY_SOURCE, 3003, MPI_COMM_WORLD, &status); printf ("%s\n", char_buffer); } printf( "process %2d, cartesian %2d, \ coords (%2d,%2d), left %2d, right %2d, top %2d, bottom %2d\n", my_rank, my_cartesian_rank, my_coordinates[0], my_coordinates[1], left_neighbour, right_neighbour, top_neighbour, bottom_neighbour); } for ( i = 0; i < ROWS; i++ ) { for ( j = 0; j < COLUMNS; j++ ) { matrix [i][j] = my_cartesian_rank; } } if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 3003, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 3003 ); MPI_Sendrecv ( &matrix[ROWS - 2][0], COLUMNS, MPI_INT, top_neighbour, 4004, &matrix[0][0], COLUMNS, MPI_INT, bottom_neighbour, 4004, cartesian_communicator, &status ); MPI_Sendrecv ( &matrix[1][0], COLUMNS, MPI_INT, bottom_neighbour, 5005, &matrix[ROWS - 1][0], COLUMNS, MPI_INT, top_neighbour, 5005, cartesian_communicator, &status ); if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 6006, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 6006 ); MPI_Type_vector (ROWS, 1, COLUMNS, MPI_INT, &column_type); MPI_Type_commit (&column_type); MPI_Sendrecv ( &matrix[0][1], 1, column_type, left_neighbour, 7007, &matrix[0][COLUMNS - 1], 1, column_type, right_neighbour, 7007, cartesian_communicator, &status ); MPI_Sendrecv ( &matrix[0][COLUMNS - 2], 1, column_type, right_neighbour, 8008, &matrix[0][0], 1, column_type, left_neighbour, 8008, cartesian_communicator, &status ); if (my_cartesian_rank != MASTER_RANK ) MPI_Send ( matrix, COLUMNS * ROWS, MPI_INT, MASTER_RANK, 9009, cartesian_communicator ); else collect_matrices ( cartesian_communicator, my_cartesian_rank, matrix, 9009 ); } MPI_Finalize (); exit(0); } int print_array (int array [DISPLAY_ROWS] [DISPLAY_COLUMNS], int vertical_break, int horizontal_break) { int k, l; printf ("\n"); for (k = DISPLAY_ROWS - 1; k >= 0; k -- ) { for (l = 0; l < DISPLAY_COLUMNS; l ++ ) { if (l % horizontal_break == 0) printf (" "); printf ( "%2d ", array [k][l] ); } printf ( "\n" ); if (k % vertical_break == 0) printf ( "\n" ); } } int collect_matrices (MPI_Comm cartesian_communicator, int my_cartesian_rank, int matrix[ROWS][COLUMNS], int tag) { int coordinates[PROCESS_DIMENSIONS]; int client_matrix[ROWS][COLUMNS]; int display[DISPLAY_ROWS][DISPLAY_COLUMNS]; int i, j, k, l, source; MPI_Status status; for ( i = PROCESS_ROWS - 1; i >= 0; i -- ) { for ( j = 0; j < PROCESS_COLUMNS; j ++ ) { coordinates[0] = i; coordinates[1] = j; MPI_Cart_rank ( cartesian_communicator, coordinates, &source ); if (source != my_cartesian_rank) { MPI_Recv ( client_matrix, BUFSIZ, MPI_INT, source, tag, cartesian_communicator, &status ); for ( k = ROWS - 1; k >= 0; k -- ) { for ( l = 0; l < COLUMNS; l++ ) { display [i * ROWS + k] [j * COLUMNS + l] = client_matrix[k][l]; } } } else { for ( k = ROWS - 1; k >= 0; k -- ) { for ( l = 0; l < COLUMNS; l ++ ) { display [i * ROWS + k] [j * COLUMNS + l] = matrix[k][l]; } } } } } print_array (display, ROWS, COLUMNS); }
Previous instructions
MPI_Init() and MPI_Finalize(); Are used to initialize and end the MPI program.
MPI_Comm_rank(); Is used to get the rank as usual, only this time using both MPI_COMM_WORLD and the cartesian_communicator.
MPI_Comm_size(); Is used to get the total number of processes.
MPI_Send() and MPI_Recv(); Are used to pass data from the slaves to the master
New instructions
MPI_Comm cartesian_communicator; Declares the Cartesian communicator.
MPI_Cart_create( MPI_COMM_WORLD, PROCESS_DIMENSIONS, divisions, periods, reorder, &cartesian_communicator ); Creates a Cartesian topology from MPI_COMM_WORLD. where PROCESS_DIMENSIONS (2) is the dimension, divisions is an array specifying the size of each dimension. periods is a Boolean specifying for each dimension if it is periodic. reorder is a Boolean value that sets if it is OK to reorder the processes (e.g. to improve performance based their physical location.) cartesian_communicator is the handle to the new communicator.
MPI_Cart_coords( cartesian_communicator, my_cartesian_rank, PROCESS_DIMENSIONS, my_coordinates ); Places the Cartesian coordinates of my_cartesian_rank within the cartesian_communicator in my_coordinates, which is has the same size as the number of dimensions specified by PROCESS_DIMENSIONS.
MPI_Cart_shift( cartesian_communicator, SIDEWAYS, RIGHT, &left_neighbour, &right_neighbour ); Is used to find the left_neighbour and right_neighbour within the cartesian_communicator. A slightly different call using UPDOWN and UP finds the bottom_neighbour and top_neighbour.
MPI_Sendrecv( &matrix[ROWS - 2][0], COLUMNS, MPI_INT, top_neighbour, 4004, &matrix[0][0], COLUMNS, MPI_INT, bottom_neighbour, 4004, cartesian_communicator, &status ); Is used to pass the boundary of each process to the process above; each process sends to the top_neighbour and receives from the bottom_neighbour. The data being sent is the first row the top_neighbour does not have which is the second row from the top of the sending process (&matrix[ROWS - 2][0]). The data is placed in the first row of the receiving process (&matrix[0][0]). COLUMNS number of MPI_INT is both sent and received using the tag 4004. Similar calls are then used to transfer data from top to bottom, right to left and, finally, left to right.
MPI_Datatype column_type; Declares the column_type.
MPI_Type_vector(ROWS, 1, COLUMNS, MPI_INT, &column_type ); Is used to create a datatype for the columns to be used with MPI_Sendrecv(). The arrays in C are stored in row-major order, meaning that the stride between the each element in a row is 1. When sending a row with MPI_Sendrecv() it is therefore sufficient to use MPI_INT as type, but when sending columns we need to take into account the COLUMNS stride between each element in a column. This is done by creating the column_type, that contains ROWS elements with COLUMNS stride. The old datatype was MPI_INT and there is only 1 element in each block.
MPI_Type_commit(&column_type); Is used to commit the column_type. This is necessary in order to use it in a communication.
Compile & Run
If you have not already done so, obtain all the example code here.
Switch to the Intel compiler (optional, only necessary once in each terminal session)
$ module load intel
Compile the program using
$ make
Submit the job to the queue
$ make submit
The output from the program execution is placed in the output folder
$ cat output/ process 4, cartesian 4, coords ( 1, 1), left 3, right 5, top 7, bottom 1 process 1, cartesian 1, coords ( 0, 1), left 0, right 2, top 4, bottom -2 process 5, cartesian 5, coords ( 1, 2), left 4, right -2, top 8, bottom 2 process 3, cartesian 3, coords ( 1, 0), left -2, right 4, top 6, bottom 0 process 10, cartesian 10, coords ( 3, 1), left 9, right 11, top -2, bottom 7 process 6, cartesian 6, coords ( 2, 0), left -2, right 7, top 9, bottom 3 process 11, cartesian 11, coords ( 3, 2), left 10, right -2, top -2, bottom 8 process 7, cartesian 7, coords ( 2, 1), left 6, right 8, top 10, bottom 4 process 8, cartesian 8, coords ( 2, 2), left 7, right -2, top 11, bottom 5 process 9, cartesian 9, coords ( 3, 0), left -2, right 10, top -2, bottom 6 process 2, cartesian 2, coords ( 0, 2), left 1, right -2, top 5, bottom -2 process 0, cartesian 0, coords ( 0, 0), left -2, right 1, top 3, bottom -2 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 9 9 9 10 9 10 10 11 10 11 11 11 9 9 9 10 9 10 10 11 10 11 11 11 9 9 9 10 9 10 10 11 10 11 11 11 6 6 6 7 6 7 7 8 7 8 8 8 9 9 9 10 9 10 10 11 10 11 11 11 6 6 6 7 6 7 7 8 7 8 8 8 6 6 6 7 6 7 7 8 7 8 8 8 3 3 3 4 3 4 4 5 4 5 5 5 6 6 6 7 6 7 7 8 7 8 8 8 3 3 3 4 3 4 4 5 4 5 5 5 3 3 3 4 3 4 4 5 4 5 5 5 0 0 0 1 0 1 1 2 1 2 2 2 3 3 3 4 3 4 4 5 4 5 5 5 0 0 0 1 0 1 1 2 1 2 2 2 0 0 0 1 0 1 1 2 1 2 2 2 0 0 0 1 0 1 1 2 1 2 2 2