Abstract
The growing demand for more powerful high-performance computing (HPC) systems has led to a steady rise in energy consumption by supercomputing worldwide. This study is focused on comparing our Application-Topology Mapper (ATMapper) to the popular Simple Linux Utility for Resource Management (SLURM) for the purpose of exploring methods that can further optimize job-scheduling within HPC systems. ATMapper is an Artificial-Intelligence based approach to job-scheduling that is currently being enhanced with quantum annealing (QA) to generate optimal schedules faster. We are applying QA to speedup our ATMapper process to achieve higher computing efficiency, thereby reducing HPC energy consumption. Here, we examine how four job-scheduling approaches perform in processor node assignment when using an example network architecture of 4 interconnected nodes. Using a specialized script, we are assessing the schedule of a computation flow with 11 interdependent tasks. The data movements among nodes were tracked to count for the number of interactions (network hops) between nodes needed to complete the tasks. The total number of hops and the job completion time were then used to quantify the efficiency of the different mapping approaches. In addition to SLURM, we also compare our ATMapper to the QA-enabled LBNL TIGER and the D-Wave Distributed Computing processor assignment approaches. The preliminary results showed that our topology-aware, latency-adaptive ATMapper is significantly more efficient when compared to the other scheduling approaches due to its load-imbalance network allocation. The scheduler displayed a computing efficiency of 53% by performing significantly fewer network hops than its alternatives. By reducing the number of hops, ATMapper was able to perform all 11 tasks by using only 3 nodes out of given 4. This research indicates the potential to use QA/AI for HPC job-scheduling. Later, we will test a SLURM simulator program to draw further comparisons on the effectiveness of ATMapper's scheduling approach. The results of this comparison will serve as a baseline for later improving SLURM's performance using a QA-enhanced ATMapper approach.