Getting Started with FESOM2
This chapter describes several ways of getting started with FESOM2. First we show a minimum set of comands that will lead to a working setup on systems where FESOM2 is used activelly. We also have instructions for Docker/Singularity and Ubuntu.
TL;DR version for supported HPC systems
Supported systems are: generic ubuntu
, albedo
at AWI, levante
at DKRZ, JURECA
at JSC, HLRN
, Hazel Hen
, MareNostrum 4
at BSC. During configuration the system will be recognised and apropriate environment variables and compiler options should be used.
git clone https://github.com/FESOM/fesom2.git
cd fesom2
git checkout refactoring
bash -l ./configure.sh
Create file fesom.clock
in the output directory with the following content (if you plan to run with COREII forcing):
0 1 1958
0 1 1958
after that, one has to adjust the run script for the target system and run it:
cd work
sbatch job_albedo
Detailed steps of compiling and runing the code
The following section assumes you are located on one of the supported HPC systems. To install FESOM2 on your local machine we recoment to use Docker based installation and read about Necessary Ubuntu packages if you decide not to use Docker.
First thing is to checkout FESOM2 code from the repository. The code is developed in open repository on GitHub.
Build model executable with Cmake
Clone the GitHub repository with a git command:
git clone https://github.com/FESOM/fesom2.git
The repository contains model code and two additional libraries: Metis (domain partitioner) and Parms (solver), necessary to run FESOM2. To build FESOM2 executable one have to compile Parms library and the code of the model (src folder). In order to build executable that is used for model domain partitioning (distribution of the model mesh between CPUs) one have to compile Metis library and also some code located in the src directory (see Build partitioner executable). Building of the model executable and the partitioner is usually done automatically with the use of CMake. If you going to build the code not on one of the supported platforms (ollie, DKRZ, HLRN, HAZELHEN, and BSC, general Ubuntu), you might need to do some (usually small) modifications described in Adding new platform for compilation section.
Change to the fesom2 folder and execute:
cd fesom2
git checkout refactoring
As a good practice, if one wants to make modifications to the source code or any of the files, it is advisable to create a branch from refactoring:
git checkout -b <my branch> refactoring
After confirming that the right FESOM2 branch is being used, compile the model with:
bash -l ./configure.sh
In the best case scenario, your platform will be recognized and the Parms library and model executable will be built and copied to the bin directory. If something went wrong have a look at Troubleshooting section.
If you would like to select platform manually (which is necessary in the case of Ubuntu, for example), type:
bash -l ./configure.sh ubuntu
Data and mesh files
The FESOM2 repository contains only very small example meshes and data (in the test
directory, see the note below). However, if you want to run realistic simulations, you ether have to have them on your system, or download an archive with sample data. There is a chance that your system already has some of the necesseary files, you can check it in the setups/paths.yml
file. If not, the easiest way to start is to download example set from DKRZ cloud (12 Gb) by executing:
curl https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/FESOM2.0_tutorial/FESOM2_one_year_input.tar > FESOM2_one_year_input.tar
and untar:
tar -xvf FESOM2_one_year_input.tar
You will have a folder named FESOM2_one_year_input
that contains all the data you need to do initial run of the model. The mesh directory contains two meshes: pi
and core2
. The pi
mesh is very small global FESOM2 mesh, that can run relativelly fast even on a laptop. The CORE
mesh is our 1 degree equivalent mesh and is used in many tuning and testing studies. Mesh folders already include several prepared partitionings (dist_
folders), so you don’t have to worry about partitioning during your first steps with FESOM.
The input
folder contains files with initial conditions (phc3.0
) and atmospheric forcing (JRA55
) for one year (1958).
Note
You can find more standard FESOM2 meshes in https://gitlab.awi.de/fesom . Download instructions are available in each mesh repository.
Note
The FESOM2 distribution contains minimal set of data to run the model in the test
directory, namelly pi
and soufflet
(channel) meshes, WOA13 initial conditions and CORE2 forcing data for one day. Those are mainly used for testing, and require a bit more involved modification of namelists. For more details see instructions on Docker based installation.
Preparing the run
You have to do several basic things in order to prepare the run.
First, be aware of the files you need to modify according to your run configurations. Normally, those are:
namelist.config
: inside of theconfig
folder. In this file you can set several configurations, such as the path to your mesh, climatology and results, as well as run length, units and start year of your run.namelist.forcing
: inside of theconfig
folder. In this file you can set the path to your forcing files.job_<name-of-hpc>
: inside of thework
folder. In this file you can set other important configurations, such as the time, tasks and tasks per node you allocate to your run.
The exact changes necessary to those file are indicated later in this documentation. Before doing so, create a directory to store your output. Usually, it is created in the model root directory:
mkdir results
You might make a link to some other directory located on the part of the system where you have a lot of storage.
In your results directory, create a file named fesom.clock
(NOTE: if you change runid
in namelist.config
to something like runid=mygreatrun
, the file will be named mygreatrun.clock
).
Inside the file you have to put two identical lines:
0 1 1958
0 1 1958
This is initial date of the model run, or the time of the cold start of your model. In case you want to start your run with a specific forcing from a specific year, substitute 1958 to the desired year. More detailed explanation of the clock file will be given in the The clock file section.
The next step is to make some changes in the model configuration. All runtime options can be set in the namelists that are located in the config directory:
cd ../config/
As mentioned before, in this directory, you will normally have to change two files: namelist.config
and namelist.forcing
. Both of these files ask for paths to initial conditions. Normally, these paths can be found under ./setups/paths.yml
.
Changing namelist.config
In namelist.config
, the options that you might want to change for your first FESOM2 run are:
run_length
: length of the model run in run_length_unit (see below).run_length_unit
: units of the run_length. Can bey
(year),m
(month),d
(days),s
(model steps).
Note
you might need to adjust the run time to the length of your run. In some setups and/or for some machines, if you set run_length
to 10 and run_length_unit
to y
, for example, the run time needs to be enough for a 10-year run at once.
yearnew
: define the same as the year in yourfesom.clock
;MeshPath
: path to the mesh you would like to use (e.g./youdir/FESOM2_one_year_input/mesh/pi/
, slash at the end is important!);ClimateDataPath
: path to the folder with the file with model temperature and salinity initial conditions (e.g./youdir/FESOM2_one_year_input/input/phc3.0/
). The name of the file with initial conditions is defined in namelist.oce, but during first runs you probably don’t want to change it;ResultPath
: path to your results folder. The output of the model will be stored there.
More detailed explanation of options in the namelist.config
is in the section General configuration (namelist.config).
Changing namelist.forcing
In namelist.forcing
, the options you need to change for your first FESOM2 run depends on the forcing you decide to use to initialize your experiment. Please note that the year you initialize your experiment with needs to be included in the forcing data files.
In section &nam_sbc
, change the path of all the files to the path to the forcing you have chosen. For example, if you want to initialize your experiment with JRA55 forcing on levante
, the path to each fiel will be:
'/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/<name_of_variable>'
More detailed explanation of options in the namelist.forcing
is in the section Forcing configuration (namelist.forcing).
Running the model
Change to the work
directory. You should find several batch scripts that are used to submit model jobs to different HPC machines. The scripts also link fesom.x
executable to the work
directory and copy namelists with configurations from config folder.
Note
Model executable, namelists and job script will be located in the same directory (usually work
).
If you are working on AWI’s albedo
supercomputer, you have to use job_albedo
, in other case use the job script for your specific platform, or try to modify one of the existing ones.
In the job file, the changes are done based on the HPC you are using. For levante
, you should adapt for example:
#SBATCH --job-name
: name of your experiment; e.g. myexperiment_001;#SBATCH --ntasks-per-node
: number of cores per node. This number has to be divisible by the number of tasks. If you choose thentasks
/4, for example, you will run your experiment with 4 nodes;#SBATCH --ntasks
: number of cores. This number has to be the same of your desired mesh partitioning. It is thexx
number in yourdist_xx
mesh folder;#SBATCH --time
: be generous with your run time, in case you are running a longer simulation and the job is not being resubmmited after each time step;#SBATCH -A <account>
: define your project account.
On levante
the submission of your job is done by executing the following command:
sbatch job_levante
The job is then submitted. In order to check the status of your job on ollie you can execute:
squeue -u yourusername
The output of the model run should appear in the results
directory that you have specified in the namelist.config
. After the run is finished the fesom.clock
file (or if you change your runid, runid.clock
) will be updated with information about the time of your run’s end, that allows running the next time portion of the model experiment by just resubmitting the job with sbatch job_ollie
.
Some files will also be stored on the work folder. Those are
A file containing information about errors during job preparation and submission, usually containing
err.out
in its name;A file containing information about the job itself, such as duration, folders, etc, usually contining
out.out
in its name;A file containing information about the simulation, usually called
fesom2-0.out
;A binary file
fesom.x
specific to that simulation;A copy of the namelists used to define the configurations of your run.
In case your simulation crashes, usually the job error file or fesom2-0.out
contain valuable information to either fix the issue causing the crash or to give the developers an idea of what can be done to help you.
Other things you need to know earlier on
The clock file
The clock file is located in your output directory (specified in ResultPath
option of namelist.config
) and controls the time. At the start of a new experiment that we want to initialize from climatology (a so-called cold start), the fesom.clock
file would usually look like this:
0 1 1958
0 1 1958
In this example, 1958
is the first available year of the atmospheric JRA55
forcing. The two identical lines tell the model that this is the start of the experiment and that there is no restart file to be read. Also make sure that the yearnew
option of the namelist.config
is set to the year you would like the cold start to begin (1958 in this case).
Let’s assume that we run the model with a timestep of 30 minutes (= 1800 seconds) for a full year (1948). After the run is successfully finished, the clock file will then automatically be updated and look like this:
84600.0 365 1958
0.0 1 1959
where the first row is the second of the day of the last time step of the model, and the second row gives the time when the simulation is to be continued. The first row indicates that the model ran for 365 days (in 1958) and 84600 seconds, which is 1 day - 1
FESOM timestep in seconds. In the next run, FESOM2 will look for restart files for the year 1958 and continue the simulation at the 1st of January in 1959.
Tricking FESOM2 into accepting existing restart files
The simple time management of FESOM2 allows to easily trick FESOM2 to accept existing restart files. Let’s assume that you have performed a full JRA55
cycle until the year 2019 and you want to perform a second cycle, restarting from the last year of the first cycle. This can be done by (copying and) renaming the last year into:
mv fesom.2019.ice.nc fesom.1957.ice.nc
mv fesom.2019.oce.nc fesom.1957.oce.nc
by changing the clock file into:
84600.0 365 1957
0.0 1 1958
In case the second cycle starts again at the very first year (e.g. 1958 in JRA55
) of the forcing, namelist.config needs to be modified, otherwise the model will always perform a cold start in 1958 instead of restarting from the 1957 restart files:
&clockinit
timenew=0.0
daynew=1
yearnew=1957
Build partitioner executable
First meshes you will use probably will come with several predefined partitionings (dist_XXXX
folders). However at some point you might need to create partitioning yourself. To do so you have to first compile the partitioner. First you change to the mesh_part
directory:
cd mesh_part
if you work on the one of the supported systems, you shoule be able to execute:
bash -l ./configure.sh
or, in case of the Ubuntu, or other customly defined system:
bash -l ./configure.sh ubuntu
The cmake
should build the partitioner for you. If your system is not supported yet, have a look on how to add custom system in Adding new platform for compilation. The executable fesom_ini.x
should now be available in bin
directory. Now you can proceed with Running mesh partitioner.
Running mesh partitioner
You have to do this step only if your mesh does not have partitioning for the desired number of cores yet. You can understand if the partitioning exists by the presence of the dist_XXXX
folder(s) in your mesh folder, where XXX is the number of CPUs. If the folder contains files with partitioning, you can just skip this step.
Partitioning is going to split your mesh into pieces that correspond to the number of cores you going to request. Now FESOM2 scales until 300 vertices per core, further increase in the amount of cores will probably have relatively small effect.
In order to tell the partitioner how many cores you need the partitioning for, one has to edit &machine
section in the namelist.config
file (see also General configuration (namelist.config)). There are two options: n_levels
and n_part
. FESOM mesh can be partitioned with use of several hierarchy levels and n_levels
define the number of levels while n_part
the number of partitions on each hierarchy level. The simplest case is to use one level and n_part
just equal to the number of cores and we recoment to use it at the beggining:
n_levels=1
n_part= 288
This will prepear your mesh to run on 288 computational cores.
In order to run the partitioner change to the work
directory. You should find several batch scripts that are used to submit partitioner jobs to HPC machines (have _ini_
in their names). The scripts also links fesom_ini.x
executable to the work
directory and copy namelists with configurations from config
folder (for partitioner we actually need only namelist.config
, but scripts copy everything).
Note
For the partitioner to run, the fesom_ini.x
executable, configuration namelists (in particular namelist.config
) and job script have to be located in the same directory (usually work
).
If you are working on AWI’s ollie
supercomputer, you have to use job_ini_ollie
, in other case use the job script for your specific HPC platform, or try to modify one of the existing ones. For relativelly small meshes (up to 1M nodes) and small partitions it is usually fine just to run the partitioner on a login node (it is serial anyway), like this:
./fesom_ini.x
Note
Make sure that you have the same enviroment that was used during compilation of fesom_ini.x
. Usually the easiest way to do this is to first (example for ollie
platform):
source ../env/ollie/shell
This file (shell
) is used to setup the environment during the compilation of both fesom_ini.x
and fesom.x
.
If you trying to partition large mesh, then on ollie
for example the submission of your partitioning job is done by executing the following command:
sbatch job_ini_ollie
Model spinup / Cold start at higher resolutions
Cold start of the model at high mesh resolutions with standard values for timestep and viscosity will lead to instabilities that cause the model to crash. If no restart files are available and a spinup has to be performed, the following changes should be made for the first month long simulation and then adjusted gradually over the next 6-8 months:
First thing to try, that usually helps, is to set in the
namelist.oce
:w_split=.true.
Try to reduce the timestep in
namelist.config
, for example to:step_per_day=720
or even lower (e.g. value 1440 will lead to 1 minute timestep).
Note
Make sure that for the high resolution runs (with mesh resolution over considerable portions of the domain finer than 25-10 km) you don’t use the combination of default “Easy Backscatter” vescosity (visc_option=5
) and easy_bs_return= 1.5
. This is true not only for the spinup, but for the whole duration of the run. The “Easy Backscatter” option works very good on low resolution meshes, but for high resolution meshes (eddy resolving) it makes more harm than good. If you would like to use visc_option=5
for high resolution runs, put easy_bs_return= 1.0
.
In
namelist.oce
make sure thatvisc_option
is set to 7 or 5 (see also the note above about option 5) and increasegamma1
to something like:gamma1=0.8
or even higher. After running for about a month try to reduce it. If you change the values of run lengh and restart output frequency (which you probably want to do during the spinup, to run for short periods), don’t forget to change them back in the namelist.config
:
run_length= 1
run_length_unit='m'
...
restart_length=1
restart_length_unit='m'
Increase the timestep gradually. Very highly resolved meshes may require an inital timestep of one-two minutes or even less.
Adding new platform for compilation
In order to add a new platform for compilation, you simply have to specify the computational environment. In a simplest case this requires:
To edit the
env.sh
file.To add a folder with the name of the platform to the
env
folder and put theshell
file with enrionment setup.
In the env.sh
file you have to add one more elif
statement in to the if
control stucture, where the platform (let’s call it mynewhost
) is selected:
elif [[ $LOGINHOST = mynewhost ]]; then
STRATEGY="mynewhost"
As you can see in the env.sh
file some host systems are authomatically identified by using regular expressions, but the simpliest way is just to explicitly provide the name of the host system.
The next step is to create additional folder in the env
folder:
mkdir ./env/mynewhost
and add a file name with the name shell
to it. This file will be sourced before the compilation, so you can setup the environment (bash syntax) in it. Please have a look at the shell
file in other folders for examples. Now you should be able to do:
bash -l ./configure.sh mynewhost
to do the compilation.
If you are lucky this will be everything you need. However in more complicated cases one had to adjust CMake files (CMakeLists.txt
located in folders), so the knowlege of CMake is required.
Change compiler options
Compiler options for FESOM2 code can be changed in the ./src/CMakeLists.txt
file. Currently the defenition of compiler options for Intel compiler looks like:
if(${CMAKE_Fortran_COMPILER_ID} STREQUAL Intel )
target_compile_options(${PROJECT_NAME} PRIVATE -r8 -i4 -fp-model precise -no-prec-div -no-prec-sqrt -fast-transcendentals -xHost -ip -init=zero)
At present only Intel and GNU compilers are supported, but the user can realtivelly easy add options by following the same pattern.
Troubleshooting
Error can not determine environment for host:
If you on Ubuntu system, add ubuntu
as input parameter for configure.sh
:
./configure.sh ubuntu
Otherwise you have to add another system - have a look at Adding new platform for compilation section.
Model blows up
There could by many reasons for this, but the first thing to try is to reduce time step or/and increase model viscosity for short period of time. Have a look at Model spinup / Cold start at higher resolutions for instructions.
Docker based installation
The best way to run the model locally is to use Docker container. You obviously have to have Docker installed for your system. The Docker image we are going to use have all necessary libraries installed plus have the mkrun
python script (Docker file), that helps to create FESOM2 configurations. As a result of the steps below, you will run pi
mesh for one day using data files that comes with the model.
Get the image:
docker pull koldunovn/fesom2_test:refactoring2
Go to the folder with your version of fesom2 folder (NOT inside fesom2 folder, one up, the one you run
git clone https://github.com/FESOM/fesom2.git
in).Run:
docker run -it -v "$(pwd)"/fesom2:/fesom/fesom2 koldunovn/fesom2_test:refactoring2 /bin/bash
This should get you inside the container. You now can edit the files in your fesom2 folder (on host system), but run compule and run the model inside the container.
When inside the container, to compile do:
cd fesom2 git checkout refactoring bash -l configure.sh ubuntu
To prepare the run (this will do the test with pi mesh):
mkrun pi test_pi -m docker
To run the model:
cd work_pi/ ./job_docker_new
As a next step you can modify the setup in work_pi
to try different parameters. You can also follow the steps described in Detailed steps of compiling and runing the code. To make your life a bit easier place FESOM2_one_year_input
in the fesom2
folder, so that the data are available inside the container. You also can generate setup that would use JRA55
forcing, and adjust it - this will save you some time on editing namelist.forcing
, since original setup in work_pi
folder use old CORE2
forcing.
mkrun pi_jra55 test_pi -m docker -f JRA55
Necessary Ubuntu packages
Here is the list of packages you need to install on Ubuntu
to compile and run FESOM2. Should work (with adjustments for package managers and names) for other linux distributions.
apt-get -y install make gfortran gcc g++ libblas-dev libopenmpi-dev apt-get -y install cmake vim git libnetcdf-dev libnetcdff-dev libpmi2-pmix