Wed Dec 15 13:44:28 CET 2021 For the coupling between FMS (Fortran90) and Aeolus2 (python) we use forpy. ----- https://github.com/ylikx/forpy -> SP: looks qite easy. When creating a numpy array with ndarray_create_nocopy, no copy of the Fortran array is made. This is more efficient than ndarray_create, but there are some things to consider: Changes to the Fortran array affect the numpy array and vice versa. You have to make sure that the Fortran array is valid as long as the numpy array is in use. Since the Fortran array can now be modified not only directly but also indirectly by the ndarray, it is necessary to add the asynchronous attribute to the Fortran array declaration, since without it compiler optimization related bugs can occur (depending on code, compiler and compiler options). Alternatively you could also use the volatile attribute. ---- The code is expected in .../src/atmos_aeolus2/forpy/ cd .../src/atmos_aeolus2 git clone https://github.com/ylikx/forpy.git ----------- Sigh. In tests/test_ndarray, initialisation of forpy with numpy fails. The anaconda-numpy loads mkl, which encounters some problem. But the generated error message is very generic and not helpful, it conceals the real problem :-( It recommends to install some mkl helper package - but that is already installed. The test_ndarray program works with gfortran from compiler/gnu/7.3.0 , and with changing th LDFLAGS so that libpython3.8 is linked dynamically(!). The default Makefile configuration uses python3.8-config --ldflags --embed , which points to a static libpython3.8.a . in general, I would prefer that, but for some reason it does not work with numpy and mkl. Now, that also works with intel/2020.4 :-) ======================================= Date: Mon Mar 13 13:02:45 2023 +0100 First steps to use MPI_Comm_spawn for coupling FMS and python. After many attempts I conclude that linking the python interpreter into the Fortran-implemented FMS is too fragile. It depends on compiler version, compiler flags, and strange settings required by whatever python / conda installation is used. Especially condas compilation of numpy/MKL/fftw3 requires some strange massging of compiler and linker settings. And the still remains fragile at runtime. Workaround: keep FMS in Fortran, Aeolus2 and Dedalus in python, and each in its own separate executable. The FMS-to-Aeolus2 interface uses MPI_Comm_spawn() to create python subprocesses, which run MPI-parallel. FMS as well as the python processes each have their private version of MPI_COMM_WORLD, which comprises only their own siblings. Communication between the two worlds is done via an Intercommunicator, which is created by the MPI_Comm_spawn operation. Some things to consider: - From within slurm jobs, I got it to work with intel/2020.4 , version 2018.x did not work for me. - some environment variables need to be set with care, see scriptlet below. - process start must be done with mpiexec, not with srun. - arrays are passed from FMS to python in Fortran memory order, and need to be reshaped on the python side. And vice versa. - TODO: control task placement. Ideally, we want the python tasks to be located on the same CPUs as the FMS atmos tasks, to minimize communication distances. It is yet unclear how to achieve that. ># Intel MPI version 19 ff need some more environment massage >mpiexecver=`mpiexec --version | awk '/Intel.R. MPI Library for Linux/ {print $8;}'` >mpiexecupd=`mpiexec --version | awk '/Intel.R. MPI Library for Linux/ {print $10;}'` >#mpiexecbld=`mpiexec --version | awk '/Intel.R. MPI Library for Linux/ {print $12;}'` > >case "${mpiexecver}-${mpiexecupd}" in > 2015-*|2016-*|2017-*|2018-*) > echo 'This MPI version does not support MPI_Comm_Spawn within slurm ??' > ;; > 2019-*|2021-*|2022-*) >unset I_MPI_DAPL_UD >unset I_MPI_DAPL_UD_PROVIDER >export I_MPI_FABRICS=shm:ofi >export FI_MLX_ENABLE_SPAWN=yes >#export I_MPI_HYDRA_DEBUG=yes >;; >*) > echo dont know how to handle this version of mpiexec > mpiexec --version > # do nothing special >;; >esac > > time mpiexec.hydra -genvall -n 28 \ > -outfile-pattern fms.out-$SLURM_JOBID-$SLURM_NNODES-64-'%r' \ > -errfile-pattern fms.out-$SLURM_JOBID-$SLURM_NNODES-64-'%r' \ > ./fms_MOM_LAD_AEOLUS2.x --------- Mon Mar 20 12:11:12 CET 2023 Created a stripped-down python program, aeolus2printgrid.py, which sets up the Aeolus2 grid then prints out a grid description that can be used with the FMS mosaic tools to generate exchange grids. aeolus2printgrid.py without args generates aeolus2grid-smooth.txt ln -s ../../../exp/CM2M_coarse_BLING/INPUT/land_hgrid.nc \ ../../../exp/CM2M_coarse_BLING/INPUT/land_mosaic.nc \ ../../../exp/CM2M_coarse_BLING/INPUT/ocean_hgrid.nc \ ../../../exp/CM2M_coarse_BLING/INPUT/ocean_mosaic.nc \ ../../../exp/CM2M_coarse_BLING/INPUT/ocean_vgrid.nc \ ../../../exp/CM2M_coarse_BLING/INPUT/navy_topography.data.nc \ ../../../exp/CM2M_coarse_BLING/INPUT/topog.nc \ . rm -f atmos_hgrid.nc \ atmos_mosaic.nc \ *mosaic*X* \ grid_spec.nc \ ocean_mask.nc \ land_mask.nc # the make_coupler_mosaic tool requires a dimension named ntiles # but this is not contained in the GFDL-provided topog.nc files # sigh. the name topog.nc is hardcoded in the ocean topography module rm topog.nc ncap2 -o topog.nc -s 'defdim("ntiles",1)' ../../../exp/CM2M_coarse_BLING/INPUT/topog.nc ../../../src/tools/make_hgrid/make_hgrid \ --grid_type from_file --my_grid_file aeolus2grid-smooth.txt \ --nlon 1536 --nlat 768 \ --verbose \ --grid_name aeolus2grid-smooth ../../../src/tools/make_solo_mosaic/make_solo_mosaic --dir . --num_tiles 1 \ --tile_file aeolus2grid-smooth.nc --mosaic_name aeolus2mosaic-smooth ../../../src/tools/make_coupler_mosaic/make_coupler_mosaic \ --atmos_mosaic aeolus2mosaic-smooth.nc \ --ocean_mosaic ocean_mosaic.nc \ --ocean_topog topog.nc \ --land_mosaic land_mosaic.nc \ --mosaic_name grid_spec \ --check make_coupler_mosaic: one row is add to the south end to cover the globe The maximum area is at tile=0, i=177, j=35, ratio=2.4896e-07, area1=7.81307e+08, area2=7.81307e+08 NOTE: axo_area_sum is 359665076910973.875000 and ocean fraction is 70.513650% NOTE: axl_area_sum is 150399394996805.562500 and land fraction is 29.486350% NOTE: tiling error is 0.000000% --------------- Mon Jul 24 16:16:58 CEST 2023 Experiment with MPMD facility of mpiexec resp. srun, i.e. the possibility to start more than executable in one MPI application. mpiicc -I/home/petri/netcdf-4.7.4-intel15/include -sox -g -qno-opt-dynamic-align -L/home/petri/netcdf-4.7.4-intel15/lib -lnetcdf_c++ -lnetcdff -lnetcdf -L/home/petri/hdf5-1.10.6-intel15/lib -lhdf5_hl -lhdf5 -lz -L/p/system/packages/udunits/2.2.19/lib -ludunits2 -lstdc++ -traceback -no-ipo -lchkp -lchkpwrap -lmcheck test_mpi_mpmd.c -o test_mpi_mpmd cp -p test_mpi_mpmd test_mpi_mpmd-a cp -p test_mpi_mpmd test_mpi_mpmd-b mpiexec -usize INFINITE -np 24 ./test_mpi_mpmd-a : -np 32 ./test_mpi_mpmd-b ./test_mpi_mpmd-b running on library version Intel(R) MPI Library 2021.6 for Linux* OS ./test_mpi_mpmd-a: This MPI does not support UNIVERSE_SIZE. ./test_mpi_mpmd-a 11869: task 11 of 56 in a universe of size -1 Apparently both executables are started in the same MPI_COMM_WORLD, which has the added size of both aplication parts, on disjoint subsets of tasks. ./test_mpi_mpmd-a 12510: task 0 of 56 in a universe of size -1 ./test_mpi_mpmd-a 12511: task 1 of 56 in a universe of size -1 ./test_mpi_mpmd-a 12512: task 2 of 56 in a universe of size -1 [..] ./test_mpi_mpmd-a 12533: task 23 of 56 in a universe of size -1 ./test_mpi_mpmd-b 12534: task 24 of 56 in a universe of size -1 [..] ./test_mpi_mpmd-b 12564: task 54 of 56 in a universe of size -1 ./test_mpi_mpmd-b 12565: task 55 of 56 in a universe of size -1 Note also that this works without crash with Intel compiler 2021.6.0 20220226 and Intel MPI 2021.6 Build 2022022 . With Version 2019.4 I get malloc corruption errors soon after MPI_Init(). I have not yet tested this with slurm. Open question: This would most probably require a wrapper around FMS which splits MPI_COMM_WORLD into an FMS-Communicator and a python communicator, and passes the FMS communicator on to the FMS / MPP library. =================== USE IFPORT CALL GETENV (ename,evalue) ename (Input) Character*(*). Environment variable to search for. evalue (Output) Character*(*). Value found for ename. Blank if ename is not found. use IFPORT character*40 libname CALL GETENV ("LIB",libname) TYPE *, "The LIB variable points to ",libname ------------ Intrinsic Subroutine: Gets the value of an environment variable. CALL GET_ENVIRONMENT_VARIABLE (name [, value, length, status, trim_name, errmsg]) name (Input) Must be a scalar of type default character. It is the name of the environment variable. value (Output; optional) Must be a scalar of type default character. If specified, it is assigned the value of the environment variable specified by name. If the environment variable does not exist, value is assigned all blanks. length (Output; optional) Must be a scalar of type integer. If specified, its value is the length of the environment variable, if it exists; otherwise, length is set to 0. status (Output; optional) Must be a scalar of type integer. If specified, it is assigned a value of 0 if the environment variable exists and either has no value or its value is successfully assigned to value. It is assigned a value of -1 if the value argument is present and has a length less than the significant length of the environment variable value. It is assigned a value of 1 if the environment variable does not exist. For other error conditions, it is assigned a processor-dependent value greater than 2. trim_name (Input; optional) Must be a scalar of type logical. If the value is FALSE, then trailing blanks in name are considered significant. Otherwise, they are not considered part of the environment variable's name. errmsg (Input; output; optional) Must be a scalar of type default character. If an error occurs, it is assigned a processor-dependent explanatory message; otherwise, it is unchanged. program print_env_var character name*20, val*40 integer len, status write (*,*) 'enter the name of the environment variable' read (*,*) name call get_environment_variable (name, val, len, status, .true.) if (status .ge. 2) then write (*,*) 'get_environment_variable failed: status = ', status stop end if if (status .eq. 1) then write (*,*) 'env var does not exist' stop end if if (status .eq. -1) then write (*,*) 'env var length = ', len, ' truncated to 40' len = 40 end if if (len .eq. 0) then write (*,*) 'env var exists but has no value' stop end if write (*,*) 'env var value = ', val (1:len) end ---------------------------------------