Performance tests of OpenFOAM with CUDA

Introduction

In this document, single precision (SP) performance of the OpenFOAM working on GPU cards with CUDA libraries will be shown. The version of the OpenFOAM is 2.1.x and the benchmark case is the calculation of the flow around square rod by LES turbulence model. This flow problem is widely used for testing LES models by researchers. It can be found several articles about this flow case in the literature. Particularly, more information about the flow parameters of this benchmark case can be found in Arslan, et al. [1] . In this benchmark test, ofgpu v.0-2 linear solver library from Symscape and CUDA5.5 libraries have been used to build OpenFOAM’s pisoFoam solvers.

The system

This is the specs of the system which is used for the benchmark.

  • · Intel E5-2609@2.40GHz. (with 4cores), 64GB memory, Centos 6.4 OS

NVIDIA TESLA K20Xm graphics card, (2688 cuda cores, 6GB memory)

Test case

The parameters chosen in the case are tabulated below. In order to see the performance of the GPU card better, highly unsteady and turbulent flow characteristics are imposed with those parameters. The instantaneous velocity field can be seen in the Figure 1.

Figure 1. Velocity field for the flow around square rod (Re=21,400)

Reynolds number

21,400

SGS model

Smagorinsky

Solver

pisoFoam

Precision

SP

Mesh size

1.04 million

Time step size

0.00025 s

Number of time steps

130

Solver for pressure eqn.

PCG

Table 1. Case Parameters

Results

Simulations are performed for 130 time steps for each simulation. The time step size is 2.5×10-4 seconds. The SP (single precision) performance is measured based on the total of wall clock time for 130 time steps. First 10 time steps are discarded from the total wall clock time. Figure 2 shows the performance of the GPU versus CPU cores. The overall speed-up for the GPU based on 1 CPU core is around 4.1. When all of 4 the cores are used in the CPU, simulation is still slower than 1 core and GPU configuration. A super linear speed up is also observed on the 4 core simulation over the results from the serial simulation.

Figure 2. Performance chart for the OpenFOAM (Tesla K20Xm GPU card vs Intel E5-2609).

Conclusions

This work demonstrates that OpenFOAM shows promising speed up with NVIDIA’s TESLA K20Xm graphics card versus 1 core calculation at single precision. It brings significant advantage for simulating the flow with highly unsteady behaviour because of the faster convergence in solving the pressure equation. However the memory of the GPU card is limited and it can be a bottleneck for simulating larger cases. Thus simulation with multi-GPU cards is necessary. This library we have used in this work does not support MPI communication between multiple cards for OpenFOAM. However different approaches will come up in the future.

References

[1] Arslan, T., Khoury, G. K. E., Pettersen, B., and Andersson, H. I., 2012, “Simulations of Flow around a Three-Dimensional Square Cylinder Using LES and DNS,” Proceedings of Seventh International Colloquium on Bluff Bodies Aerodynamics and Applications, Shanghai, China, pp. 909-918.