The Random Pie – High Performance Computing

The Concept

Pi will be calculated using a Monte Carlo approach where the ratio between the are of a circle and square is utilized. The circle has radius 1 and therefore an area of pi, while the square has sides of 2, giving an area of 4. The circle and square lie concentric, so if we chose points inside the square at random from a uniform distribution, we expect them to be inside the circle pi/4 of the time.

The Code

The program takes the tolerance epsilon as input on the command line. One process will have the single purpose of feeding the other processes random numbers. This ensures that the processes are not using the same random numbers. An alternative method could be to seed each process based on their rank. It is convenient to create a communicator that only includes the worker processes, so that it is easy to exchange information using MPI_Allreduce(). The communicator workers are made via creating a group that excludes the server process. The main loop of the workers compute whether the two-tuple of random numbers are inside or outside the circle and update the statistics accordingly. When they run out of random numbers they collect the data and checks if the error is below the tolerance. If not they request more random numbers from the server. When done, the process with rank 0 prints information about the points used to find the estimate of pi.

$ cd mpi_basic/07_random_pie
$ cat src/random_pie.c
/******************************************************************************
 *                                                                            *
 *  Basic MPI Example - Random Pie                                            *
 *                                                                            *
 *  Approximating pi using a Monte Carlo approach.                            *
 *                                                                            *
 ******************************************************************************
 *                                                                            *
 *  The original code was written by Gustav at University of Indiana in 2003. *
 *                                                                            *
 *  The current version has been tested/updated by the HPC department at      *
 *  the Norwegian University of Science and Technology in 2011.               *
 *                                                                            *
 ******************************************************************************/
#include 
#include  /* function random is defined here */
#include  /* INT_MAX is defined here */
#include 
#include 
 
#define CHUNKSIZE 1000
#define REQUEST      1
#define REPLY        2
#define TOTAL     5000000
 
main(argc, argv)
     int argc;
     char *argv[];
{
  int iter;
  int in, out, i, iters, max, ix, iy, ranks[1], done, temp;
  double x, y, Pi, error, epsilon;
  int numprocs, myid, server, totalin, totalout, workerid;
  int rands[CHUNKSIZE], request;
  MPI_Comm world, workers;
  MPI_Group world_group, worker_group;
  MPI_Status stat;
 
  MPI_Init(&argc, &argv);
  world = MPI_COMM_WORLD;
  MPI_Comm_size(world, &numprocs);
  MPI_Comm_rank(world, &myid);
  server = numprocs - 1;
  if (myid == 0)
    sscanf( argv[1], "%lf", &epsilon);
  MPI_Bcast(&epsilon, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
  MPI_Comm_group( world, &world_group);
  ranks[0] = server;
  MPI_Group_excl(world_group, 1, ranks, &worker_group);
  MPI_Comm_create(world, worker_group, &workers);
  MPI_Group_free(&worker_group);
 
  if(myid == server) {
    do {
      MPI_Recv(&request, 1, MPI_INT, MPI_ANY_SOURCE, REQUEST, world, &stat);
      if (request) {
        for (i = 0; i < CHUNKSIZE; i++)
          rands[i] = random();
        MPI_Send(rands, CHUNKSIZE, MPI_INT, stat.MPI_SOURCE, REPLY, world);
      }
    }
    while (request> 0);
  }
  else {
    request = 1;
    done = in = out = 0;
    max = INT_MAX;
    MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
    MPI_Comm_rank(workers, &workerid);
    iter = 0;
    while(!done) {
      iter++;
      request = 1;
      MPI_Recv(rands, CHUNKSIZE, MPI_INT, server, REPLY, world, &stat);
      for (i=0; i < CHUNKSIZE; ) {
        x = (((double) rands[i++])/max) * 2 - 1;
        y = (((double) rands[i++])/max) * 2 - 1;
        if (x*x + y*y < 1.0)
          in++;
        else
          out++;
      }
      MPI_Allreduce(&in, &totalin, 1, MPI_INT, MPI_SUM, workers);
      MPI_Allreduce(&out, &totalout, 1, MPI_INT, MPI_SUM, workers);
      Pi = (4.0*totalin)/(totalin + totalout);
      error = fabs( Pi-3.141592653589793238462643);
      done = ((error < epsilon) || ((totalin+totalout) > TOTAL));
      request = (done) ? 0 : 1;
      if (myid == 0) {
        printf( "\rpi = %23.20lf", Pi);
        MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
      }
      else {
        if (request)
          MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
      }
    }
    MPI_Comm_free(&workers);
  }
  if (myid == 0)
    printf( "\npoints: %d\nin: %d, out: %d\n",
            totalin+totalout, totalin, totalout);
  MPI_Finalize();
  exit (0);
}

Previous instructions

MPI_Init() and MPI_Finalize(); Are used to initialize and end the MPI program.

MPI_Comm world, workers; Declares the communicators world and workers.

MPI_Comm_size(); Used to find the total number of processes.

MPI_Comm_rank(); Used to find the rank within the communicators world and workers.

MPI_Status stat; Stat placeholder for MPI_Recv().

MPI_BCast(); Used to broadcast epsilon.

MPI_Send(); Used to send requests for random numbers as well as sending random numbers.

MPI_Recv(); Used to receive requests for random number or receiving random numbers.

MPI_Allreduce(); Used to sum the total number of hits inside and outside the circle on all processors.

New instructions

MPI_Group world_group, worker_group; Declares the groups world_group and worker_group.

MPI_Comm_group( world, &world_group ); Sets world_group to a group of the communicator world.

MPI_Group_excl( world_group, 1, ranks, &worker_group ); The worker_group is created by excluding 1 process specified in ranks (the server) from the world_group.

MPI_Comm_create( world, worker_group, &workers ); Creates the communicator workers from the world communicator and the worker_group.

MPI_Group_free( &worker_group ); Marks the worker_group for deallocation.

Compile & Run

If you have not already done so, obtain all the example code here.

Switch to the Intel compiler (optional, only necessary once in each terminal session)

$ module load intel

Compile the program using

$ make

Submit the job to the queue

$ make submit

The output from the program execution is placed in the output folder

$ cat output/*
pi =  3.14243678160919559517
points: 5002500
in: 3930010, out: 1072490