跳转至

WPS/WRF/GSI ERROR

1. v_cfl 积分错误

问题描述

rsl.out.0000 出现下面日志

d03 2022-11-06_06:04:40         1444  points exceeded v_cfl = 2 in domain d03 at time 2022-11-06_06:04:40 hours
d03 2022-11-06_06:04:40 Max   W:      6      1     33 W:   -3.07  w-cfl:    3.89  dETA:    0.01

解决办法

修改嵌套区域,让 d02 和 d03 不重合


2. GSI 同化失败

问题描述

ifort 可能打印以下堆栈问题

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
gsi.x              0000000001961669  Unknown               Unknown  Unknown
libpthread-2.28.s  0000151EFCD88C20  Unknown               Unknown  Unknown
gsi.x              000000000131381F  frfhvo_                    63  smoothzrf.f90
gsi.x              0000000001086A52  bkgcov_                    67  bkgcov.f90
libiomp5.so        0000151EF5934A43  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        0000151EF58F7CDA  Unknown               Unknown  Unknown
libiomp5.so        0000151EF58F723B  Unknown               Unknown  Unknown
libiomp5.so        0000151EF5934EB1  Unknown               Unknown  Unknown
libpthread-2.28.s  0000151EFCD7E17A  Unknown               Unknown  Unknown
libc-2.28.so       0000151EF462CDC3  clone                 Unknown  Unknown

解决办法

增加 OpenMP 的堆栈大小(ifort 默认 4M)

# 根据模拟区域大小适当设置 (2^n)M
export OMP_STACKSIZE=16M

3. MPI 问题

问题描述

可执行文件调用 Intel MPI 时出现下面错误

Abort(1094031) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(607)......:
MPID_Init(793).............:
MPIDI_NM_mpi_init_hook(667): OFI addrinfo() failed (ofi_init.h:667:MPIDI_NM_mpi_init_hook:No data available)

解决方法

添加 OFI 环境变量以链接到 libfabric, 进而链接到 Intel MPI

export FI_PROVIDER=sockets
# export FI_PROVIDER=tcp  # 应该也可以

4. real.exe ERROR

问题描述

国家数据集 CMA-GFS real.exe rsl.error.0000 出现下面的错误

Not enough soil temperature data for Noah LSM scheme

解决方法

sf_surface_physics(默认使用的 CONUS 套装 = 2) 更改为 1
之后又可以正常运行


5. metgrid.exe ERROR: Error in ext pkg write field

问题描述

2024 年美国 GFS 数据集运行 metgrid.exe 时失败,2023 年的数据可以正常运行,log.metgrib 出现下面错误

ERROR: Error in ext_pkg_write_field

解决方法

gfs 数据集有问题,更换其他天数的数据集

6. CMA-GFS 数据集 ungrib.exe 运行慢

问题描述

运行24小时的情况下:
国家数据集精度是 0.125°,用时 20~30 分钟
美国数据集精度是 0.25°,用时 2~3 分钟
正常应该慢4倍左右,实际慢10倍左右

解决方法

暂无,使用多台机器并行

7. real.exe the domain size is too small for this many processors, or the decomposition aspect ratio is poor

  Domain # 1: dx = 10000.000 m
   For domain            1 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
   Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
  e_we =    43, nproc_x =    4, with cell width in x-direction =   10
  e_sn =    47, nproc_y =    8, with cell width in y-direction =    5
  --- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:    2797
NOTE:       1 namelist settings are wrong. Please check and reset these options
-------------------------------------------
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source

Stack trace terminated abnormally.

解决方法

修改 mpirun 的 cpu 核心数,使 e_we(e_sn)/nproc_x(nproc_y) > 10

参考: