WRF-GC on Cannon/Odyssey Harvard Cluster

Summary

This is a working draft guide for setting up WRF/WRF-Chem/WRF-GC on the Harvard Research Computing Cannon (formerly “Odyssey”) cluster.

The guide is being actively updated and polished.

Libraries

module purge
module load git/2.17.0-fasrc01
module load intel/17.0.4-fasrc01
module load openmpi/2.1.0-fasrc02
module load netcdf/4.5.0-fasrc02
module load netcdf-fortran/4.4.4-fasrc06
module load jasper/1.900.1-fasrc02

Note that WRF expects netCDF and netCDF-Fortran to be in the same directory. It also does not like lib64. Do some trickery in a folder you’d like:

mkdir bin
mkdir include
mkdir lib
ln -sf $NETCDF_HOME/lib64/* lib/
ln -sf $NETCDF_FORTRAN_HOME/lib/* lib/
ln -sf $NETCDF_HOME/bin/* bin/
ln -sf $NETCDF_FORTRAN_HOME/bin/* bin/
ln -sf $NETCDF_HOME/include/* include/
ln -sf $NETCDF_FORTRAN_HOME/include/* include/

Set up an environmental file with the modules above and the following directives:

export NETCDF=$(pwd)
export JASPERLIB=$JASPER_HOME/lib64
export JASPERINC=$JASPER_HOME/include

export CC=icc
export OMPI_CC=$CC

export CXX=icpc
export OMPI_CXX=$CXX

export FC=ifort
export F77=$FC
export F90=$FC
export OMPI_FC=$FC
export COMPILER=$FC
export ESMF_COMPILER=intel

# MPI Communication
export ESMF_COMM=openmpi
export MPI_ROOT=$MPI_HOME

# WRF options
export WRF_EM_CORE=1
export WRF_NMM_CORE=0
export WRF_CHEM=1

Downloading WRF

wget https://github.com/wrf-model/WRF/archive/V3.9.1.1.zip
unzip V3.9.1.1.zip
mv WRF-3.9.1.1 WRFV3

If you are installing WRF-GC, install WRF-GC:

cd WRFV3/
rm -rf chem

And git clone our private repository to the chem directory.

Configure and build WRF

Launch an interactive node because the compile process is highly memory intensive:

srun -p test -n 6 --pty --mem 30960 -t 0-06:00 /bin/bash

Go to the WRFV3 folder and run:

./configure -hyb

Use 15 (INTEL (ifort/icc) (dmpar)) and 1 for nesting.

Install the WRF-GC registry:

cd chem
make install_registry

And compile:

./compile em_real

You may want to do this in a screen so it does not interrupt when you exit the shell. It usually takes about 2-4 hours.

Configure and build WPS

Use similar principles as WRF or refer to my WRF-GC on AWS Guide

Remember that WPS must be built after WRF is installed as it relies on I/O libraries in WRF. Also, WRF must be placed in the directory above WPS with the folder name WRFV3.

Configure: ./configure, choose option 17. Linux x86_64, Intel compiler (serial).
Run this: export MPI_LIB="". See explanation below.
Compile: ./compile. You should now have all the exe:

$ ls *.exe
geogrid.exe  metgrid.exe  ungrib.exe

Why do I have to set `MPI_LIB`?

Otherwise in WPS compile, you may get this error in the compile logs:

/n/helmod/apps/centos7/Comp/intel/17.0.4-fasrc01/mvapich2/2.3b-fasrc02/lib64: file not recognized: Is a directory
make[1]: [geogrid.exe] Error 1 (ignored)`.

This is somehow caused by a command like this:

ifort  -o geogrid.exe ... -lnetcdff -lnetcdf \
        /n/helmod/apps/centos7/Comp/intel/17.0.4-fasrc01/mvapich2/2.3b-fasrc02/lib64

You may wonder why this lib64 is appearing out of nowhere. Well, it’s because Odyssey’s modules sets MPI_LIB to lib64 for the MPI path, but WPS also uses this variable internally to store a link command to MPI (i.e. -L/n/helmod/apps/centos7/Comp/intel/17.0.4-fasrc01/mvapich2/2.3b-fasrc02/lib64). This clashes and blows the world up. We are compiling WPS in (serial) so we don’t really need MPI_LIB, just unset it for now.

Sample configuration files

WPS

A copy of WPS geographical input files have been prepared at /n/holyscratch01/jacob_lab/hplin/wps-geog-3.9 if you can access jacob_lab.

namelist.wps

&share
 wrf_core = 'ARW',
 max_dom = 1,
 start_date = '2019-12-01_00:00:00',
 end_date   = '2020-01-01_00:00:00',
 interval_seconds = 21600,
 io_form_geogrid = 2,
/

&geogrid
 parent_id         =   1,
 parent_grid_ratio =   1,
 i_parent_start    =   1,
 j_parent_start    =   1,
 e_we              =  245,
 e_sn              =  181,
 geog_data_res = 'gtopo_2m+usgs_2m+nesdis_greenfrac+2m','default',
 dx = 27000,
 dy = 27000,
 map_proj = 'mercator',
 ref_lat   =  38,
 ref_lon   =  105,
 truelat1  =  38.0,
 stand_lon = 105,
 geog_data_path = '/n/holyscratch01/jacob_lab/hplin/wps-geog-3.9'
/

&ungrib
 out_format = 'WPS',
 prefix = 'FILE',
/

&metgrid
 fg_name = 'FILE'
 io_form_metgrid = 2,
/

You have to supply your own WRF input meteorological datasets and use the appropriate Vtable.

A quick reminder that your simulation’s horizontal resolution is set both here (namelist.wps) and in the WRF run directory (WRFV3/run/namelist.input), and that you can only use map_proj as mercator or lat-lon as these are the only grid projections supported by HEMCO in WRF-GC at this moment.

You’re welcome to consult the WRF-GC homepage and specifically Running WRF-GC.

WRF

Tell WRF-GC to use the HEMCO emissions data available on Odyssey, which is at /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData and shared by everyone.

Go to the WRF run directory (WRFV3/run) and edit the following files: Remember that these paths need to be updated every time you recompile the code, because fresh files are installed in their place.

input.geos

Run directory           : ./
Root data directory     : /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData
Global offsets I0, J0   : 0 0

All other options in input.geos are ignored as they are relevant to GEOS-Chem Offline only.

HEMCO_Config.rc

ROOT:                        /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/HEMCO
METDIR:                      /does/not/apply/for/WRFGC

HEMCO_Config.rc is fully operational in WRF-GC and you can use it to set the emission inventories you want to use in your simulation.

`namelist.input`

The main namelist should be configured according to the WRF-GC User’s Guide: Running WRF-GC.

Running WPS and WRF(-GC)

WPS

Launch an interactive node with sufficient memory. The test partition is sufficient, WPS is not very compute intensive:

srun -p test -n 6 --pty --mem 30960 -t 0-06:00 /bin/bash

Directly run geogrid.exe, ungrib.exe and metgrid.exe.

WRFV3

To run real.exe (prepare input data for WRF) and wrf.exe use the following batch script (remember to change the name of the executable).

This is written for 1 node 32 cores on huce_intel; adjust memory and stuff accordingly if you want to use a different partition:

#!/bin/bash
#SBATCH -n 32
#SBATCH -N 1
#SBATCH -t 1-00:00
#SBATCH -p huce_intel
#SBATCH --mem=128000

module purge
module load git/2.17.0-fasrc01
module load intel/17.0.4-fasrc01
module load openmpi/2.1.0-fasrc02
module load netcdf/4.5.0-fasrc02
module load netcdf-fortran/4.4.4-fasrc06
module load jasper/1.900.1-fasrc02

# srun -n $SLURM_NTASKS --mpi=pmix wrf.exe

mpirun -np $SLURM_NTASKS ./wrf.exe

Save it as e.g. wrf.sh and then sbatch wrf.sh to submit the batch job.

Considerations:

For roughly 40000 grid boxes (e_we * e_sn), say China at 27x27km resolution, it takes about 4 hours on 36 cores to run about 48 hours simulation. In order for your run not to be interrupted you will have to consider asking for a little more time as there may be fluctuations.
Edit the -n 32 for number of cores and -N 1 for number of nodes to use.
Test on the test partition first (with a duration of 0-01:00 or shorter), then submit to huce_intel or the main partitions, so you don’t waste core-hours. The major errors always appear in the first hour of the run. If the first hour runs through, it will likely run to completion.