Running WRF-GC on the AWS Cloud with ParallelCluster

Summary

This guide is being actively updated as we prepare to publicly release WRF-GC. Stay tuned for updates.

This is a tutorial on how to set-up the WRF-GC model on the Amazon Web Services cloud.

There have already been successful efforts and guides on setting up WRF and GEOS-Chem on the AWS cloud and GEOS-Chem input data is already readily available on AWS S3. Thus it is very easy to set-up the WRF-GC coupled model to run ultra-high resolution simulations on the cloud.

In this tutorial we will learn how to use AWS ParallelCluster to create your own cluster on the AWS cloud and run WRF-GC on it. WRF-GC supports MPI-based parallelization so it can take advantage of multiple compute nodes.

Steps

In this guide I will document steps for:

Setting up AWS ParallelCluster to create your own HPC cluster
Configuring the software environment for building WRF-GC
Running a test WRF-GC simulation across nodes

I will not go into much detail on setting up AWS infrastructure. I’d like to point you to an excellent tutorial written by Jiawei Zhuang.

Setting up AWS ParallelCluster

Work in progress - refer to Jiawei’s AWS HPC Guide for setting up.

Creating the software environment for WRF-GC: Compilers

First, get the spack package manager for HPC:

cd /shared  # install to shared disk
git clone https://github.com/spack/spack.git
echo 'export PATH=/shared/spack/bin:$PATH' >> ~/.bashrc  # to discover spack executable
source ~/.bashrc

Load intelmpi in your modules:

module load intelmpi
source /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/mpivars.sh -ofi_internal=0

You might want to put this in your ~/.bashrc.

You have two paths to choose now for your compiler choice:

Use the Intel C/Fortran compilers, if you have a license. They generally afford better performance. If you choose to go this route, skip to the Intel section.
Use the free and open-source GNU C/Fortran compilers. In this case skip to the GNU section.

Intel

You will need a valid Intel compiler license. (You may be eligible as a student).

Setting up Intel compilers with spack requires some additional configuration. You can follow Spack’s official guide or the quick and dirty version below:

Edit the compiler specification file using spack config --scope=user/linux edit compilers. Your file should look like this: the stub paths will be filled later.

compilers:
- compiler:
    target:     x86_64
    operating_system:   centos7
    modules:    []
    spec:       [email protected]
    paths:
        cc:       stub
        cxx:      stub
        f77:      stub
        fc:       stub

Now install the compilers using spack install [email protected] %[email protected].
Find the actual paths of the compiler executables using find $(spack location -i intel) -name icc -type f -ls.
Put these paths in the spack config --scope=user/linux edit compilers compiler specification file. It should look something like this, but do not copy the paths below:

compilers:
- compiler:
    target:     x86_64
    operating_system:   centos7
    modules:    []
    spec:       [email protected]
    paths:
        cc:       /shared/spack/opt/spack/.../linux/bin/intel64/icc
        cxx:      /shared/spack/opt/spack/.../linux/bin/intel64/icpc
        f77:      /shared/spack/opt/spack/.../linux/bin/intel64/ifort
        fc:       /shared/spack/opt/spack/.../linux/bin/intel64/ifort

Noting that the compilers for cc, cxx, f77, fc are icc, icpc, ifort and ifort respectively.

Load the compilers using source $(spack location -i intel)/bin/compilervars.sh -arch intel64.
Tell intelmpi and everyone else to use the Intel compilers. Add this to your ~/.bashrc:

source $(spack location -i intel)/bin/compilervars.sh -arch intel64
export I_MPI_CC=icc
export I_MPI_CXX=icpc
export I_MPI_FC=ifort
export I_MPI_F77=ifort
export I_MPI_F90=ifort
export CC=icc
export FC=ifort
export CXX=icpc

GNU Fortran

You should be ready to go although with an older compiler. Add this to your ~/.bashrc:

export I_MPI_CC=gcc
export I_MPI_CXX=g++
export I_MPI_FC=gfortran
export I_MPI_F77=gfortran
export I_MPI_F90=gfortran
export CC=gcc
export FC=gfortran
export CXX=g++

Software: Required Libraries

The list of dependencies for WRF and WRF-GC are as follows:

MPI (we are using intelmpi built-in with AWS here)
hdf5
netCDF-C, netCDF-Fortran
JasPer JPEG library 1.900.1

Tell spack that intelmpi is already available, by creating the file ~/.spack/packages.yaml:

packages:
  intel-mpi:
    paths:
      [email protected]: /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/
    buildable: False

(You may want to check if that is indeed the path for intel-mpi on your system. Usually, which mpirun will tell you the rough path.)

Install these dependencies using spack - no need to compile your own, so easy!

spack -v install netcdf-fortran %intel ^hdf5+fortran+hl ^intel-mpi
spack -v install [email protected] %intel

Tell WRF how to locate these dependencies. I recommend adding these to your ~/.bashrc:

export PATH=$(spack location -i netcdf-c)/bin:$PATH
export PATH=$(spack location -i netcdf-fortran)/bin:$PATH
export HDF5=$(spack location -i hdf5)
export NETCDF=$(spack location -i netcdf-fortran)
export JASPERLIB=$(spack location -i [email protected])/lib
export JASPERINC=$(spack location -i [email protected])/include

export LD_LIBRARY_PATH=$HDF5/lib:$NETCDF/lib:$LD_LIBRARY_PATH

WRF expects netcdf-c to be installed in the same place as netcdf-fortran, so you need to do some moving around. Just link netcdf-C to the netCDF-Fortran folder (ugly ugly…)

# Only need to run this once
NETCDF_C=$(spack location -i netcdf-c)
ln -sf $NETCDF_C/include/*  $NETCDF/include/
ln -sf $NETCDF_C/lib/*  $NETCDF/lib/
ln -sf $NETCDF_C/bin/*  $NETCDF/bin/

Some extra ~/.bashrc entries, courtesy of Jiawei and some I added for convenience when you run the model:

# this prevents segmentation fault when running the model
ulimit -s unlimited

# WRF-specific settings
export WRF_EM_CORE=1
export WRFIO_NCD_NO_LARGE_FILE_SUPPORT=0
export WRF_CHEM=1 # compile WRF-GC

# Some quick aliases to work with
alias vn="vi namelist.input"
alias vrc="vi ~/.bashrc"

alias tt="tail -f rsl.out.0000"
alias te="tail -n 50 rsl.* | less"

alias mco="rm rsl.*; rm wrfout_*"

Downloading WRF, GEOS-Chem and WRF-GC

A quick recap of the WRF-GC directory hierarchy. WRF-GC is driven by the WRF model exactly like how WRF-Chem is driven by WRF.

The WRF model rests at the top-most directory, usually named WRFV3:

[centos@ip-172-31-93-89 WRFV3]$ ls
arch     configure             dyn_exp   hydro     phys                      README.hydro       README.SSIB         run    var
chem     configure.wrf         dyn_nmm   inc       README                    README.io_config   README_test_cases   share
clean    configure.wrf.backup  external  main      README.DA                 README.NMM         README.windturbine  test
compile  dyn_em                frame     Makefile  README.hybrid_vert_coord  README.rsl_output  Registry            tools

A chem directory contains all code pertaining to chemistry. Inside chem, you can find the WRF-GC files and a copy of GEOS-Chem in the chem/gc subdirectory, usually like:

chem/
  gc/
  config/
  chem_driver.F
  wrfgc_convert_state_mod.F
  chemics_init.F
  ...

Let’s get started.

Downloading WRF: Obtain from the WRF GitHub repository a copy of the compatible WRF model version, currently 3.9.1.1 for WRF-GC 1.0, and extract it:

mkdir WRFV3
wget https://github.com/wrf-model/WRF/archive/V3.9.1.1.tar.gz
tar -xvzf V3.9.1.1.tar.gz --directory WRFV3

You may want to rename your WRF folder to WRFV3 or something easy to type.

Removing existing WRF-Chem code. WRF has recently begun shipping WRF-Chem code alongside its main source. We do not need that code for WRF-GC operation and you should remove it. Go inside the WRF directory and remove the chem subdirectory:

cd WRFV3
rm -f chem

Downloading WRF-GC: Obtain from the WRF-GC Release GitHub repository and obtain a copy of WRF-GC, cloning the git repository to a chem folder (that you’ve just deleted) in WRF’s folder:

git clone https://github.com/jimmielin/wrf-gc-release.git chem

Note: WRF-GC has not been publicly released yet. During the private beta period please refer to the WRF-GC website to contact Prof. Tzung-May Fu for obtaining a copy.

WRF-GC is now publicly available on GitHub: jimmielin/wrf-gc-release. Please also visit the WRF-GC website for the latest updates.

The latest version of WRF-GC, v2.0.1, includes GEOS-Chem 12.8.3. Refer to the latest documentation PDF and the WRF-GC 2.0 paper by Feng et al., 2021 for more information!

Only if using older versions of WRF-GC:

Downloading GEOS-Chem: Obtain from the GEOS-Chem GitHub repository a copy of GEOS-Chem. WRF-GC 1.0 currently only supports GEOS-Chem versions 12.2.1 and you will need to clone that specific version.

cd chem
wget https://github.com/geoschem/geos-chem/archive/12.2.1.tar.gz
tar -xvzf 12.2.1.tar.gz --directory gc

You should already have a chem/gc folder before cloning the GEOS-Chem repository. That is fine. There is code to interact with the GEOS-Chem modules in the chem/gc/GCHP folder pre-existing as part of WRF-GC. You will need to keep those.

Downloading HEMCO emissions and GEOS-Chem input data

You will need to download HEMCO emissions and a set of basic GEOS-Chem input data for use with the WRF-GC model. In your /shared directory, create a ExtData folder to store all GEOS-Chem inputs, according to GEOS-Chem conventions.

cd /shared
mkdir ExtData
mkdir ExtData/HEMCO

This section is under construction - I am coordinating with the GEOS-Chem developers to work on a download script from S3. Stay tuned!

I would like to refer you to the excellent GEOS-Chem on the Cloud guide if you have any questions.

Downloading WRF input data

Lots of different meteorological boundary and initial conditions can be used to drive the WRF model. Refer to this list of free data sets for driving WRF.

You can find more information on input data from the official WRF-GC guide.

Building WRF(-GC)

Configuring

Building WRF-GC is just like WRF-Chem. If you have used the above ~/.bashrc you will be ready to go - cd to your WRFV3 directory and ./configure WRF:

./configure -hyb

Noting that the -hyb option must be enabled for WRF 3.9.1.1 to include the sigma-eta hybrid grid, required by GEOS-Chem.

If you chose to use the Intel compiler set, choose icc/ifort with (dmpar) option; if you chose GNU, then choose gcc/gfortran with (dmpar) option.

Once configured successfully, proceed to install the WRF-GC registry file:

cd chem
make install_registry
cd ..

This step is mandatory - you may otherwise get Registry: errors during WRF-GC compile complaining about species not found.

Building WRF

Issue the compile command:

./compile em_real

You may want to run this in a screen, because the compile process takes quite long. Once finished WRF should tell you “Executables successfully built”. If you find errors you may want to look at my other “common compile problems” blog post 🙂

Building WPS

WRF requires meteorological fields (boundary and initial conditions) and configuration of the simulation domain before proceeding. You will also need to compile WPS (WRF PreProcessing System). The procedure is exactly the same as on a regular Linux cluster, noting a few important compile issues.

Download WPS and extract it in the same level as your WRFV3 folder:

wget https://github.com/wrf-model/WPS/archive/v3.9.1.tar.gz
tar -xvzf v3.9.1.tar.gz --directory WPS
rm v3.9.1.tar.gz

Configure WPS using the ./configure command, choosing (serial) and the appropriate compiler option. Compile using ./compile.

If you are experiencing issues compiling WPS and cannot find ungrib.exe: Edit WPS/configure.wps and look for COMPRESSION_LIBS, COMPRESSION_INCLUDE. Change these to the output of:

spack location -i [email protected]

By appending lib/ and include/ to the path. The JasPer library is required for compiling ungrib.exe, so if you cannot find -ljasper then this is the issue.

Preparing WRF input data and configuring WRF-GC (GEOS-Chem)

To prepare input data and more information on WPS, please refer to the WRF-GC user’s guide.

Up to now we have all worked on the login node. This is not good practice if you have a shared cluster or need lots of resources. When running WPS tasks for generating the geographical grid, ungribbing the met fields and gridding them to the simulation domain, we need to run them on a compute node. So how do we run things on the compute nodes?

We will instruct compute nodes to run tasks using the srun command for SLURM. SLURM is a scheduler, which in short manages the resources on your cluster. It is very well integrated with AWS ParallelCluster and will automatically add compute nodes when necessary, so essentially you have a pay-per-use “private supercomputing cluster”. Neat!

You will likely only need one node to run the geogrid.exe, ungrib.exe and metgrid.exe scripts. You can do this in two ways:

Creating a one-node interactive shell session. If you’ve worked in a supercomputer cluster before, you might have heard of a “interactive” session. This launches a shell on a compute node and you can do compute intensive tasks on it. To do this you can use:

srun -N 1 --pty /bin/bash

Then you can run WPS tasks simply using ./geogrid.exe, ./ungrib.exe and ./metgrid.exe. Remember to exit from the shell after you are finished, so your compute node can shut down.

Running the task using srun. The following command will run the task on a designated number (-N 1, -N 2, etc.) of computational nodes:

srun -N 1 --ntasks-per-node 1 ./geogrid.exe

Note that since we compiled WPS as (serial) we are forcing one core here - more cores will not make WPS faster. If you are running a huge domain you may want to compile with (dmpar) for MPI-parallel of WPS, then you can use multiple nodes and cores.

Running WRF-GC

To run WRF-GC, you will likely need to run the real.exe pre-processor in WRFV3/run first to generate the WRF input files from the met fields generated in WPS. This is the first task you likely need to run in multiple nodes. If I am using c5n.18xlarge nodes with 36 physical cores and 72 “virtual” hyper-threading cores, and want to use, say, 12 nodes, then I can run real.exe like so:

srun -N 12 --ntasks-per-node 36 ./real.exe

Since WRF is a resource-intensive task, you should always use the number of physical cores. Virtual cores will not do you good in this case. Here is some further information from AWS or on the GEOS-Chem wiki about scalability in “hyperthreading” cores.

To run WRF:

srun -N 12 --ntasks-per-node 36 ./wrf.exe

You can track your model progress by looking at the output from each core. Usually the master node has more information, so you can tail -f rsl.out.0000 to look at information.

What if I have errors?

Please stay tuned as this guide is being updated and a separate guide for “compile woes with WRF(-GC), GCHP” follows.

Happy modeling!

Appendix: My `~/.bashrc`

# User specific aliases and functions
export PATH=/shared/spack/bin:$PATH

module load intelmpi
export I_MPI_CC=icc
export I_MPI_CXX=icpc
export I_MPI_FC=ifort
export I_MPI_F77=ifort
export I_MPI_F90=ifort

export CC=icc
export FC=ifort
export CXX=icpc


source $(spack location -i intel)/bin/compilervars.sh -arch intel64

export PATH=$(spack location -i netcdf-c)/bin:$PATH
export PATH=$(spack location -i netcdf-fortran)/bin:$PATH
# Environment variables required by WRF
export HDF5=$(spack location -i hdf5)
export NETCDF=$(spack location -i netcdf-fortran)

# run-time linking
export LD_LIBRARY_PATH=$HDF5/lib:$NETCDF/lib:$LD_LIBRARY_PATH

# this prevents segmentation fault when running the model
ulimit -s unlimited

# WRF-specific settings
export WRF_EM_CORE=1
export WRFIO_NCD_NO_LARGE_FILE_SUPPORT=0
export WRF_CHEM=1

export ESMF_COMM=intelmpi
export ESMF_COMPILER=intel

export I_MPI_PMI_LIBRARY=/opt/slurm/lib/libpmi.so  # enable slurm

export I_MPI_FABRICS=shm:ofi  # use libfabric (default)
export FI_PROVIDER=efa  #  enable EFA  (default)
source /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/mpivars.sh -ofi_internal=0  # do not use intel-provided libfabr$
# Some quick aliases to work with
alias vn="vi namelist.input"
alias vrc="vi ~/.bashrc"

alias tt="tail -f rsl.out.0000"
alias te="tail -n 50 rsl.* | less"

alias mco="rm rsl.*; rm wrfout_*"

alias itr="srun -N 1 --ntasks-per-node 36 --pty /bin/bash"

# NCL
export NCARG_ROOT="/shared/ncl"
export PATH="$NCARG_ROOT/bin:$PATH"