autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures...

Show authors

Publication Type

Conference Paper

Book Title

2024 SC24: International Conference for High Performance Computing, Networking, Storage and Analysis SC

Publication Date

November, 2024

Page Numbers

292 to 306

Publisher Location

New Jersey, United States of America

Conference Name

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Conference Location

Atlanta, Georgia, United States of America

Conference Sponsor

ACM/91��

Conference Date

Nov 17, 2024 - Nov 22, 2024

Abstract

This paper presents an open-source library that pushes the limits of performance portability for irregular General Matrix Multiplication (GEMM) on the widely-used Arm architectures. Our library, autoGEMM, is designed to support a wide range of Arm processors: from edge devices to HPC-grade CPUs. autoGEMM generates optimized kernels for various hardware configurations by auto-combining fragments of autogenerated micro-kernels that employ hand-written optimizations to maximize computational efficiency. We optimize the kernel pipeline by tuning the register reuse and the data load/store overlapping. In addition, we use a dynamic tiling scheme to generate balanced tile shapes. Finally, we position autoGEMM on top of the TVM framework where our dynamic tiling scheme prunes the search space for TVM to identify the optimal combination of parameters for code optimization. Evaluations on five different classes of Arm chips demonstrate the advantages of autoGEMM. For small matrices, autoGEMM achieves 98% of peak and up to 2.0x speedup over state-of-the-art libraries such as LIBXSMM and LibShalom. For irregular matrices (i.e. tall skinny and long rectangles), autoGEMM is 1.3-2.0x faster than widely-used libraries such as OpenBLAS and Eigen. autoGEMM is publicly available at: .

91����

autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures...

Abstract

Researchers

Organizations

91��