MetClim: WRF on SONIC

These are my notes on WRF timing when running on multiple nodes on sonic at UCD.

The domain used for speedup tests was a 300 x 250 grid (namelist.wps):

(jupyter notebook: PlotWPSDomain.ipynb)

SONIC has a mixture of 24-core and 40-core (hyper-threaded) nodes. Only the 40-core nodes are guaranteed to have infiniband. There are cores with more nodes (e.g. the highmem node) but these mess up the MPI messaging. So, to ensure best/consistent performance, use the infiniband queue:

qsub -q infiniband ...

#PBS -l nodes=04:ppn=40
...
module purge
module load WRF

module list
...

time mpirun -np 64 --map-by ppr:1:core wrf.exe

The map by core is required to unsure that hyper-threading is not used.

I ran WRF using different numbers of nodes and cores. The IO time becomes significant for larger number of cores, as the others have to wait during IO. To get a fairer picture of speedup with more cores, I've done the following:

changed the history value in namelist.input to greater than the forecast range, so wrfout files are not written (apart from analysis)

stripped out only the compute timings from the rsl.out.0000 log file, using the following command, which skips timings at and 1 minute after each half hour, as the timings are larger then.

NC=04x64
grep 'main:.*[2-9]:00' RUNDIR${NC}/rsl.out.0000 | awk '{print $9}' > ComputeTimings.${NC}

Here are the timings I got:

(jupyter notebook: WRFspeedupSONICcomputetimes.ipynb)

MetClim

Pages

Friday, 3 June 2016

WRF on SONIC