Interacting Particles – High Performance Computing

The Concept

Charged particles act on each other with electromagnetic forces that is described by Coulomb's law. For a given particle the forces can be summed to find the net force acting on the particle. By considering Newton's second law, the acceleration and thus the speed and position of the particle can be found. Euler's method will be used for the numerical integration in time.

The Code

The program splits 10k particles between the participating processes. Each process initialize its own particles - random location, unit mass, no acceleration or velocity and alternating -1, 1 as the charge. The information about the particles is exchanged so that all the processes has all the information. Each process then calculates the new information about its own particles, the information is shared, the new information is calculated and so the Euler iterations continue on. A total of 20 iterations is performed before statistics about the calculations are displayed.

$ cd mpi_basic/06_interacting_particles/
$ cat src/particles.c
/******************************************************************************
 *                                                                            *
 *  Basic MPI Example - Interacting Particles                                 *
 *                                                                            *
 *  Simulate the interaction between 10k electrically charged particles.      *
 *                                                                            *
 ******************************************************************************
 *                                                                            *
 *  The original code was written by Gustav at University of Indiana in 2003. *
 *                                                                            *
 *  The current version has been tested/updated by the HPC department at      *
 *  the Norwegian University of Science and Technology in 2011.               *
 *                                                                            *
 ******************************************************************************/
#include 
#include 
#include 
#include 
 
#define FALSE            0
#define TRUE             1
#define MASTER_RANK      0
 
#define MAX_PARTICLES 10000
#define MAX_PROCS       128
#define EPSILON           1.0E-10
#define DT               0.01
#define N_OF_ITERATIONS 20
 
 
int main ( int argc, char **argv )
{
  int pool_size, my_rank;
  int i_am_the_master = FALSE;
  extern double drand48();
  extern void srand48();
 
  typedef struct {
    double x, y, z, vx, vy, vz, ax, ay, az, mass, charge;
  } Particle;
 
  Particle  particles[MAX_PARTICLES];  /* Particles on all nodes */
  int       counts[MAX_PROCS];         /* Number of ptcls on each proc */
  int       displacements[MAX_PROCS];  /* Offsets into particles */
  int       offsets[MAX_PROCS];        /* Offsets used by the master */
  int       particle_number, i, j, my_offset, true_i;
  int       total_particles;           /* Total number of particles */
  int       count;                     /* Count time steps */
 
  MPI_Datatype particle_type;
 
  double    dt = DT;                   /* Integration time step */
  double    comm_time, start_comm_time, end_comm_time, start_time, end_time;
 
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &pool_size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
 
  if (my_rank == MASTER_RANK) i_am_the_master = TRUE;
 
  particle_number = MAX_PARTICLES / pool_size;
 
  if (i_am_the_master)
    printf ("%d particles per processor\n", particle_number);
 
  MPI_Type_contiguous ( 11, MPI_DOUBLE, &particle_type );
  MPI_Type_commit ( &particle_type );
 
  MPI_Allgather ( &particle_number, 1, MPI_INT, counts, 1, MPI_INT,
                  MPI_COMM_WORLD );
 
  displacements[0] = 0;
  for (i = 1; i < pool_size; i++)
    displacements[i] = displacements[i-1] + counts[i-1];
  total_particles = displacements[pool_size - 1]
    + counts[pool_size - 1];
        
  if (i_am_the_master)
    printf ("total number of particles = %d\n", total_particles);
 
  my_offset = displacements[my_rank];
 
  MPI_Gather ( &my_offset, 1, MPI_INT, offsets, 1, MPI_INT, MASTER_RANK,
               MPI_COMM_WORLD );
 
  if (i_am_the_master) {
    printf ("offsets: ");
    for (i = 0; i < pool_size; i++)
      printf ("%d ", offsets[i]);
    printf("\n");
  }
 
  srand48((long) (my_rank + 28));
 
  /* Here each process initializes its own particles. */
 
  for (i = 0; i < particle_number; i++) {
    particles[my_offset + i].x = drand48();
    particles[my_offset + i].y = drand48();
    particles[my_offset + i].z = drand48();
    particles[my_offset + i].vx = 0.0;
    particles[my_offset + i].vy = 0.0;
    particles[my_offset + i].vz = 0.0;
    particles[my_offset + i].ax = 0.0;
    particles[my_offset + i].ay = 0.0;
    particles[my_offset + i].az = 0.0;
    particles[my_offset + i].mass = 1.0;
    particles[my_offset + i].charge = 1.0 - 2.0 * (i % 2);
  }
 
  start_time = MPI_Wtime();
  comm_time = 0.0;
 
  for (count = 0; count < N_OF_ITERATIONS; count++) {
 
    if (i_am_the_master) printf("Iteration %d.\n", count + 1);
 
    /* Here processes exchange their particles with each other. */
 
    start_comm_time = MPI_Wtime();
 
    MPI_Allgatherv ( particles + my_offset, particle_number,
                     particle_type,
                     particles, counts, displacements, particle_type,
                     MPI_COMM_WORLD );
 
    end_comm_time = MPI_Wtime();
    comm_time += end_comm_time - start_comm_time;
 
    for (i = 0; i < particle_number; i++) {
 
      true_i = i + my_offset;
 
      /* initialize accelerations to zero */
 
      particles[true_i].ax = 0.0;
      particles[true_i].ay = 0.0;
      particles[true_i].az = 0.0;
 
      for (j = 0; j < total_particles; j++) {
 
        /* Do not evaluate interaction with yourself. */
 
        if (j != true_i) {
 
          /* Evaluate forces that j-particles exert on the i-particle. */
 
          double dx, dy, dz, r2, r, qj_by_r3;
 
          /* Here we absorb the minus sign by changing the order
             of i and j. */
 
          dx = particles[true_i].x - particles[j].x;
          dy = particles[true_i].y - particles[j].y;
          dz = particles[true_i].z - particles[j].z;
 
          r2 = dx * dx + dy * dy + dz * dz; r = sqrt(r2);
 
          /* Quench the force if the particles are too close. */
 
          if (r < EPSILON) qj_by_r3 = 0.0;
          else qj_by_r3 = particles[j].charge / (r2 * r);
 
          /* accumulate the contribution from particle j */
 
          particles[true_i].ax += qj_by_r3 * dx;
          particles[true_i].ay += qj_by_r3 * dy;
          particles[true_i].az += qj_by_r3 * dz;
        }
      }
    }
 
    /*
     * We advance particle positions and velocities only *after*
     * we have evaluated all accelerations using the *old* positions.
     */
 
    for (i = 0; i < particle_number; i++) {
 
      double qidt_by_m, dt_by_2, vx0, vy0, vz0;
 
      true_i = i + my_offset;
 
      /* Save old velocities */
 
      vx0 = particles[true_i].vx;
      vy0 = particles[true_i].vy;
      vz0 = particles[true_i].vz;
 
      /* Now advance the velocity of particle i */
 
      qidt_by_m = particles[true_i].charge * dt / particles[true_i].mass;
      particles[true_i].vx += particles[true_i].ax * qidt_by_m;
      particles[true_i].vy += particles[true_i].ay * qidt_by_m;
      particles[true_i].vz += particles[true_i].az * qidt_by_m;
 
      /* Use average velocity in the interval to advance the particles'
         positions */
 
      dt_by_2 = 0.5 * dt;
      particles[true_i].x += (vx0 + particles[true_i].vx) * dt_by_2;
      particles[true_i].y += (vy0 + particles[true_i].vy) * dt_by_2;
      particles[true_i].z += (vz0 + particles[true_i].vz) * dt_by_2;
    }
  }
 
  MPI_Barrier(MPI_COMM_WORLD);
 
  end_time = MPI_Wtime();
 
  if (i_am_the_master) {
    printf ("Communication time %8.5f seconds\n", comm_time);
    printf ("Computation time   %8.5f seconds\n",
            end_time - start_time - comm_time);
    printf ("\tEvaluated %d interactions\n",
            N_OF_ITERATIONS * total_particles * (total_particles - 1));
  }
 
  MPI_Finalize ();
 
  exit(0);
}

Previous Instructions

MPI_Init() and MPI_Finalize(); Initialize and end the MPI program.

MPI_Comm_size(); Used to find the number of processes.

MPI_Comm_rank(); Used to find the rank of a given process.

MPI_Type_commit(); Used to commit the new type particle_type before it is used.

MPI_Wtime(); Used for timing.

MPI_Datatype particle_type; Declares the particle_type.

New Instructions

MPI_Type_contiguous( 11, MPI_DOUBLE, &particle_type ); Defines particle_type to be an array with 11 elements of type MPI_DOUDBLE.

MPI_Allgather( &particle_number, 1, MPI_INT, counts, 1, MPI_INT, MPI_COMM_WORLD ); The counts array is updated for every process to contain the value of particle_number for the process with rank j at position j. In other words: after the instruction has been executed all the processes know how many particle each process has. 1 MPI_INT is sent from particle_number and 1 MPI_INT is received into counts. The exchange is happening within the communicator MPI_COMM_WORLD.

MPI_Allgatherv( particles + my_offset, particle_number, particle_type, particles, counts, displacements, particle_type, MPI_COMM_WORLD ); This is an extension of MPI_Allgatherv() where the displacement in the receive buffer must be specified on a per process basis. Which in this case has been computed and placed in displacements. The rest of the arguments are as in MPI_Allgatherv(). The end result is that all the processes has the latest information about all the particles.

MPI_Barrier( MPI_COMM_WORLD ); Blocks until all the processes in the MPI_COMM_WORLD communicator have called in. This is to ensure that the all communication has finished before finalizing the statistics, displaying them and ending the program.

Compile & Run

If you have not already done so, obtain all the example code here.

Switch to the Intel compiler (optional, only necessary once in each terminal session)

$ module load intel

Compile the program using

$ make

Submit the job to the queue

$ make submit

The output from the program execution is placed in the output folder

$ cat output/*
625 particles per processor
total number of particles = 10000
offsets: 0 625 1250 1875 2500 3125 3750 4375 5000 5625 6250 6875 7500 8125 8750 9375
Iteration 1.
Iteration 2.
Iteration 3.
Iteration 4.
Iteration 5.
Iteration 6.
Iteration 7.
Iteration 8.
Iteration 9.
Iteration 10.
Iteration 11.
Iteration 12.
Iteration 13.
Iteration 14.
Iteration 15.
Iteration 16.
Iteration 17.
Iteration 18.
Iteration 19.
Iteration 20.
Communication time  0.24709 seconds
Computation time    9.11842 seconds
        Evaluated 1999800000 interactions