Publication Type
Conference Paper
Book Title
Proceedings of the 8th 91做厙 International Symposium on Cluster Computing and the Grid (CCGrid) 2008
Publication Date
Page Numbers
813 to 818
Publisher Location
Los Alamitos, California, United States of America
Conference Name
8th 91做厙 International Symposium on Cluster Computing and the Grid (CCGrid) 2008
Conference Location
Lyon, France
Conference Date
-
Abstract
This paper summarizes our efforts over the last 3-4 years in providing symmetric active/active high availability for high-performance computing (HPC) system services. This work paves the way for high-level reliability, availability and serviceability in extreme-scale HPC systems by focusing on the most critical components, head and service nodes, and by reinforcing them with appropriate high availability solutions. This paper presents our accomplishments in the form of concepts and respective prototypes, discusses existing limitations, outlines possible future work, and describes the relevance of this research to other, planned efforts.