The domain used for speedup tests was a 300 x 250 grid (namelist.wps):
(jupyter notebook: PlotWPSDomain.ipynb)
SONIC has a mixture of 24-core and 40-core (hyper-threaded) nodes. Only the 40-core nodes are guaranteed to have infiniband. There are cores with more nodes (e.g. the highmem node) but these mess up the MPI messaging. So, to ensure best/consistent performance, use the infiniband queue:
qsub -q infiniband ...
#PBS -l nodes=04:ppn=40
...
module purge
module load WRF
module list
...
time mpirun -np 64 --map-by ppr:1:core wrf.exe
The map by core is required to unsure that hyper-threading is not used.
I ran WRF using different numbers of nodes and cores. The IO time becomes significant for larger number of cores, as the others have to wait during IO. To get a fairer picture of speedup with more cores, I've done the following:
- changed the history value in namelist.input to greater than the forecast range, so wrfout files are not written (apart from analysis)
- stripped out only the compute timings from the rsl.out.0000 log file, using the following command, which skips timings at and 1 minute after each half hour, as the timings are larger then.
NC=04x64 grep 'main:.*[2-9]:00' RUNDIR${NC}/rsl.out.0000 | awk '{print $9}' > ComputeTimings.${NC}
(jupyter notebook: WRFspeedupSONICcomputetimes.ipynb)