Skip to content

using ladjust_bury_coeff in MARBL requires specific properties from PE layout #74

Description

@mnlevy1981

Description of the issue:

A user was trying to run with ladjust_bury_coeff in user_nl_marbl (which is not a very common configuration); he was also trying to get 100+ SYPD out of the gx3v7 grid (which is not a very common requirement), so he was running with 288 ocean tasks. gen_pop_decomp was giving a layout that creating 290 blocks, and reported the model crashing in ecosys_driver.F90:513 at

    508     allocate(rmean_vals(size(marbl_instances(1)%glo_avg_rmean_interior_tendency)))
    509     lscalar = .false.
    510     call ecosys_running_mean_saved_state_get_var_vals('interior_tendency', lscalar, rmean_vals(:))
    511     do n = 1, size(rmean_vals)
    512        do iblock = 1, size(marbl_instances)
    513           marbl_instances(iblock)%glo_avg_rmean_interior_tendency(n)%rmean = rmean_vals(n)
    514        end do
    515     end do
    516     deallocate(rmean_vals)

it turns out the issue is that marbl_instances is size max_blocks_clinic (2, in his configuration) and we only want these loops running through nblocks_clinic (1 on most tasks), so ladjust_bury_coeff currently can't be true if any block has nblocks_clinic < max_blocks_clinic. Fixing that moved the error to ecosys_driver:640:

    637     if ((size(glo_avg_fields_interior, dim=4) /= 0) .or. (size(glo_avg_fields_surface, dim=4) /= 0)) then
    638        allocate(glo_avg_area_masked(nx_block, ny_block, nblocks_clinic))
    639        where (land_mask(:,:,:))
    640           glo_avg_area_masked(:,:,:) = TAREA(:,:,:)
    641        else where
    642           glo_avg_area_masked(:,:,:) = c0
    643        end where

(I think the third dimension of land_mask and TAREA are both max_blocks_clinic while the allocate() statement for glo_avg_area_masked in line 638 shows it uses nblocks_clinic instead.)

As you can tell, I've started working on a fix for this... I think I changed the above block to explicitly use 1:nblocks_clinic for the third dimension of land_mask in 639 and TAREA in 640, but got yet another error elsewhere.

The original user who reported the problem was happy to be given a 252 task layout that keeps max_blocks_clinic=1, so fixing this is not urgent. I'm putting all this detail in the issue ticket because I'm going to set it aside for a few weeks while I focus on more pressing issues, but it would probably be good to eventually come back and fix the bug.

I also think it would be useful to update the test suite to try to explicitly test cases where ladjust_bury_coeff = .true. and either some tasks have more blocks than others, or some tasks have no blocks. I expect both of those tests would fail currently.

Version:

  • CESM: 2_3_beta09; I believe the first user was running CESM 2.1.x
  • POP2: cesm_pop_2_1_20220322

Machine/Environment Description:

error was reported on cheyenne and that's also where I reproduced the issue in the latest codebase

Any xml/namelist changes or SourceMods:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions