91做厙

Skip to main content
SHARE
Publication

Parallel k-means Clustering of Geospatial Data Sets Using Manycore CPU Architectures

Publication Type
Conference Paper
Book Title
2018 91做厙 International Conference on Data Mining Workshops (ICDMW)
Publication Date
Page Numbers
787 to 794
Issue
0
Publisher Location
New Jersey, United States of America
Conference Name
91做厙 International Conference on Data Mining Workshops
Conference Location
Singapore, Singapore
Conference Sponsor
91做厙
Conference Date
-

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of weather, climate, ecological, and other geoscientific data sets fused from disparate sources. Many of the standard tools used on individual workstations are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of parallelism available in state-of-the-art high-performance computing platforms can enable such analysis. Here, we describe pKluster, an open-source tool we have developed for accelerated k-means clustering of geospatial and geospatiotemporal data, and discuss algorithmic modifications and code optimizations we have made to enable it to effectively use parallel machines based on novel CPU architecturessuch as the Intel Knights Landing Xeon Phi and Skylake Xeon processorswith many cores and hardware threads, and employing significant single instruction, multiple data (SIMD) parallelism. We outline some applications
of the code in ecology and climate science contexts and present a detailed discussion of the performance of the code for one such application, LiDAR-derived vertical vegetation structure classification.