The Tester family currently uses 360GB of RAID-5 storage using a 3Ware controller and 4Ã 120GB hard drives on our main server. This system is currently facing a number of problems:
- 360GB is starting to get a little cramped.
- The performance is not that great. I frequently send the system load way over 1.0 by transferring CD/DVD (ISO) images over the 100Mbit network. It has trouble sustaining about 8MB/s for any length of time e.g for burning a DVD image.
- The array is not (easily) expandable.
The aim of this page is to explore some posibilities in constructing a distributed and expandable network storage system. This filesystem should be resiliant to hardware failures, similar to RAID-4/5.
There are a number of issues that a storage system will need to cover:
- Be distributed.
- Be reliable!
- Be easily expanded.
The third issue could in fact be very important. This system will hopefully hold over a terrabyte of data. Backing up all that data just to add a storage node or two will be just about impossible. Using CDR's or DVDR's would be impractical (and probably not very reliable). Tape backups of any decent capacity are very expensive. So basically, the data is going to sit there until we move to another system.
Lustre is a distributed filesystem made for supercomputing clusters. As I understand it, each storage node stores blocks, similar to a normal local filesystem. A master node contains the metadata necessary to map files to these blocks. The master is important, but hopefully most of the load will be distributed to the storage nodes. A âRAID layerâ is mentioned on the Lustre site. I need to investigate this important feature more.
OpenAFS allows a directory tree to be distributed and duplicated over several servers.
OpenAFS is also available for a wide variety of platforms, including Windows. So the Windows client machines could directly access the servers, instead of having to use Samba.
Coraid has developed ATA-over-Ethernet (AoE). It essentially replaces the hard drive interface with commodity Ethernet hardware. I think it's really only an advantage for large installations.
AoE would involve one of two setups:
- A single file server. It performs software RAID and exports the filesystem(s) to the rest of the network.
- A group of file servers. They operate similarly to the single server, but use a cluster filesystem (e.g GFS, OCFS2) to allow multiple machines to mount the same filesystem simultaneously.
The first setup still involves a single point of failure (and bottleneck). The second is probably beyond our needs. And both still suffer from not being easily expanded. They would put us back in the same situation we are in now: lots of data in a RAID that needs to be moved to something bigger.