Article History

Cluster filesystem

The Tester family currently uses 360GB of RAID-5 storage using a 3Ware controller and 4× 120GB hard drives on our main server. This system is currently facing a number of problems:

The aim of this page is to explore some posibilities in constructing a distributed and expandable network storage system. This filesystem should be resiliant to hardware failures, similar to RAID-4/5.


There are a number of issues that a storage system will need to cover:

  1. Be distributed.
  2. Be reliable!
  3. Be easily expanded.

The third issue could in fact be very important. This system will hopefully hold over a terrabyte of data. Backing up all that data just to add a storage node or two will be just about impossible. Using CDR's or DVDR's would be impractical (and probably not very reliable). Tape backups of any decent capacity are very expensive. So basically, the data is going to sit there until we move to another system.


Lustre is a distributed filesystem made for supercomputing clusters. As I understand it, each storage node stores blocks, similar to a normal local filesystem. A master node contains the metadata necessary to map files to these blocks. The master is important, but hopefully most of the load will be distributed to the storage nodes. A “RAID layer” is mentioned on the Lustre site. I need to investigate this important feature more.


AFS is a distributed filesystem developed at CMU and has since been used in many large environments. IBM later bought it and released it as OpenAFS.

OpenAFS allows a directory tree to be distributed and duplicated over several servers.

OpenAFS is also available for a wide variety of platforms, including Windows. So the Windows client machines could directly access the servers, instead of having to use Samba.


Coraid has developed ATA-over-Ethernet (AoE). It essentially replaces the hard drive interface with commodity Ethernet hardware. I think it's really only an advantage for large installations.

AoE would involve one of two setups:

  1. A single file server. It performs software RAID and exports the filesystem(s) to the rest of the network.
  2. A group of file servers. They operate similarly to the single server, but use a cluster filesystem (e.g GFS, OCFS2) to allow multiple machines to mount the same filesystem simultaneously.

The first setup still involves a single point of failure (and bottleneck). The second is probably beyond our needs. And both still suffer from not being easily expanded. They would put us back in the same situation we are in now: lots of data in a RAID that needs to be moved to something bigger.