Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets...

Show authors

Publication Type

Journal

Journal Name

ACM Transactions on Storage

Publication Date

August, 2006

Page Number

Volume

Issue

Abstract

High-end computing is suffering a data deluge from experiments, simulations, and apparatus that
creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass
storage systems, storage area clusters, and data centers. These storage facilities offer a large range
of choices in terms of capacity and access rate, as well as strong data availability and consistency
support. However, for most end-users, the "last mile" in their analysis pipeline often requires data
processing and visualization at local computers, typically local desktop workstations. End-user
workstations-despite having more processing power than ever before-are ill-equipped to cope
with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a
large portion of desktop storage is unused.
We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O
bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting
data access locality. This article presents the FreeLoader architecture, component design, and
performance results based on our proof-of-concept prototype. Its architecture comprises contributing
benefactor nodes, steered by a management layer, providing services such as data integrity,
high performance, load balancing, and impact control. Our experiments show that FreeLoader is
an appealing low-cost solution to storing massive datasets by delivering higher data access ratesthan traditional storage facilities, namely, local or remote shared file systems, storage systems,
and Internet data repositories. In particular, we present novel data striping techniques that allow
FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local
I/O bandwidth. In addition, the performance impact on the native workload of donor machines
is small and can be effectively controlled. Further, we show that security features such as data
encryptions and integrity checks can be easily added as filters for interested clients. Finally, we
demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.

91����

Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets...

Abstract

Organizations

91��