Friday, 3 June 2016

WRF on SONIC

These are my notes on WRF timing when running on multiple nodes on sonic at UCD.

The domain used for speedup tests was a 300 x 250 grid (namelist.wps):
(jupyter notebook: PlotWPSDomain.ipynb)

SONIC has a mixture of 24-core and 40-core (hyper-threaded) nodes. Only the 40-core nodes are guaranteed to have infiniband. There are cores with more nodes (e.g. the highmem node) but these mess up the MPI messaging. So, to ensure best/consistent performance, use the infiniband queue:

qsub -q infiniband ...

#PBS -l nodes=04:ppn=40
...
module purge
module load WRF

module list
...

time mpirun -np 64 --map-by ppr:1:core wrf.exe

The map by core is required to unsure that hyper-threading is not used.

I ran WRF using different numbers of nodes and cores. The IO time becomes significant for larger number of cores, as the others have to wait during IO. To get a fairer picture of speedup with more cores, I've done the following:

  • changed the history value in namelist.input to greater than the forecast range, so wrfout files are not written (apart from analysis)
  • stripped out only the compute timings from the rsl.out.0000 log file, using the following command, which skips timings at and 1 minute after each half hour, as the timings are larger then.

  • NC=04x64
    grep 'main:.*[2-9]:00' RUNDIR${NC}/rsl.out.0000 | awk '{print $9}' > ComputeTimings.${NC}
Here are the timings I got: