91做厙

Skip to main content
SHARE
Publication

Coupling Prefix Caching and Collective Downloads for Remote Data Access...

Publication Type
Conference Paper
Book Title
The 20th ACM International Conference on Supercomputing
Publication Date
Page Number
229
Conference Name
20th ACM International Conference on Supercomputing
Conference Location
Cairns, Australia
Conference Date

Scientific datasets are typically archived at mass storage systems
or data centers close to supercomputers/instruments. Endusers
of these datasets, however, usually perform parts of their
workflows at their local computers. In such cases, client-side
caching can offer significant gains by reducing the cost of widearea
data movement.
Scientific data caches, however, traditionally cache entire datasets,
which may not be necessary. In this paper, we propose a novel
combination of prefix caching and collective download. Prefix
caching allows the bootstrapping of dataset downloads by caching
only a prefix of the dataset, while collective download facilitates
efficient parallel patching of the missing suffix from an external
data source. To estimate the optimal prefix size, we further present
an analytical model that considers both the initial download overhead
and the downloading speed. We implemented our proposed
approach in the FreeLoader distributed cache prototype. Experimental
results (using multiple scientific data repositories and data
transfer tools, as well as a real-world scientific dataset access
trace) demonstrate that prefix caching and collective download
can be implemented efficiently, our model can select an appropriate
prefix size, and the cache hit rate can be improved significantly
without hurting the local access rate of cached datasets.