Abstract
Computational scientists often face challenges when developing and optimizing code for high-performance computing (HPC), especially when trying to leverage GPUs. Given the heterogeneity of the nodes that comprise many modern HPC facilities, considerable demand exists for performance portable solutions for the core computational kernels used in many scientific computing libraries. In this work, we demonstrate a fourth-order finite volume method–based implementation of the Euler equations, which are an integral part of computational fluid dynamics. Our performance-portable multiGPU implementation for Euler equations uses ProtoX to generate kernels and IRIS for portability. ProtoX is a domain-specific language that uses a structured-grid partial differential equation library called Proto as its front end and the SPIRAL code generation system as its back end to generate optimized kernels for different architectures. Optimized kernels generated by ProtoX are orchestrated through the IRIS intelligent runtime system to provide portability. Two levels of optimizations within the IRIS runtime— directed acyclic graph fusion and task fusion—are explored to efficiently utilize computing resources in a multiGPU environment. Performance improvement through these optimizations is showcased by comparing the base ProtoX-IRIS implementation on AMD GPUs (Frontier node) and on NVIDIA GPUs (NVIDIA DGX-1).