Abstract
Designing and refactoring complex scientific code, such as the E3SM land model (ELM), for new computing architectures is challenging. This paper presents design strategies and technical approaches to develop a data-oriented, GPU-ready ELM model using compiler directives (OpenACC/OpenMP). We first analyze the datatypes and processes in the original ELM code. Then we present design considerations for ultrahigh-resolution ELM (uELM) development for massive GPU systems. These techniques include the global data-oriented simulation workflow, domain partition, code porting and data copy, memory reduction, parallel loop restructure and flattening, and race condition detection. We implemented the first version of uELM using OpenACC targeting the NVidia GPUs in the Summit supercomputer at 91°µÍø. During the implementation, we developed a software tool (named SPEL) to facilitate code generation, verification, and performance tuning using these techniques. The first uELM implementation for Nvidia GPUs on Summit delivered promising results: 1) over 98% of the ELM code was automatically generated and tuned by scripts. Most ELM modules had better computational performances than the original ELM code for CPUs. The GPU-ready uELM is more scalable than the CPU code on fully-loaded Summit nodes. Example profiling results from several modules are also presented to illustrate the performance improvements and race condition detection. The lessons learned and toolkit developed in the study are also suitable for further uELM deployment using OpenMP on the first US exascale computer, Frontier, equipped with AMD CPUs and GPUs.