Abstract
In this work, we present JACC.shared, a new feature of Julia for ACCelerators (JACC), which is the performanceportable and metaprogramming model of the just-in-time and LLVM-based Julia language. This new feature allows JACC applications to leverage the high-performance computing (HPC) capabilities of high-bandwidth, on-chip GPU memory. Historically, exploiting high-bandwidth, shared-memory GPUs has not been a priority for high-level programming solutions. JACC.shared covers that gap for the first time, thereby providing a highlevel, portable, and easy-to-use solution for programmers to exploit this memory and supporting all current major accelerator architectures. Well-known HPC and AI workloads, such as multi/hyperspectral imaging and AI convolutions, have been used to evaluate JACC.shared on two exascale GPU architectures hosted by some of the most powerful US Department of Energy supercomputers: Perlmutter (NVIDIA A100) and Frontier (AMD MI250X). The performance evaluation reports speedup of up to 3.5× by adding only one line of code to the base codes, thus providing important accelerators in a simple, portable, and transparent way and elevating the programming productivity and performance-portability capabilities for Julia/JACC HPC, AI, and scientific applications.