91做厙

Skip to main content
SHARE
Publication

MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives

by Richard L Graham, Galen M Shipman
Publication Type
Conference Paper
Book Title
Recent Advances in Parallel Virtual Machine and Message Passing Interface
Publication Date
Page Numbers
130 to 140
Volume
5205
Publisher Location
Heidelberg, Germany
Conference Name
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Conference Location
Dublin, Ireland
Conference Date

With local core counts on the rise, taking advantage of shared-memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI_Bcast, MPI_Reduce, and MPI_Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization costs fan-in/fan-out algorithms generally perform best. For large messages data manipulation constitute the largest cost and reduce-scatter algorithms are best for reductions. These optimization improve performance by up to a factor of three. Memory and cache sharing effect require deliberate process layout and careful radix selection for tree-based methods.