91°µÍø

Skip to main content
SHARE
Publication

MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky)

by Mohammad Alaul Haque Monil, Narasinga Rao Miniskar, Pedro Valero Lara, Keita Teranishi, Jeffrey S Vetter
Publication Type
Conference Paper
Book Title
Asynchronous Many-Task Systems and Applications: Second International Workshop, WAMTA 2024, Knoxville, TN, USA, February 14–16, 2024
Publication Date
Page Numbers
59 to 70
Publisher Location
Cham, Switzerland
Conference Name
WAMTA 24: Workshop on Asynchronous Many-Task Systems and Applications 2024
Conference Location
Knoxville, Tennessee, United States of America
Conference Sponsor
TCL, LSU, ICL
Conference Date
-

The ubiquitous in-node heterogeneity of HPC and cloud computing platforms makes software portability and performance optimization extremely challenging. Described here, the MatRIS multilevel math library abstraction framework employs tasking to alleviate these difficulties. MatRIS includes the IRIS task-based runtime on the bottom level and exposes different layers of abstraction to render algorithms architecturally agnostic. MatRIS ensures the decomposition and creation of tasks that represent the necessary encapsulation of the optimized kernels from both vendor and open-source math libraries. Once built, MatRIS can select different combinations of accelerators at runtime, making it portable even on diverse heterogeneous architectures. By leveraging the IRIS runtime’s features for managing heterogeneity, MatRIS deploys algorithms that remove the need to specify orchestration and data transfer. This study describes how the serial task abstraction of a tiled Cholesky factorization is made portable and scalable in the case of multi-device and multi-vendor heterogeneity on a node with NVIDIA and AMD GPUs by using MatRIS. First, we demonstrate that Cholesky in MatRIS provides multi-GPU scalability that offers competitive performance versus cuSolverMG. Then, we present the challenges and opportunities for heterogeneous execution.