RAID

(or, why you don't want to use the "RAID" provided by your motherboard)

There are three types of RAID:

  1. Software RAID
  2. Fake RAID
  3. Hardware RAID
I'll explain the differences and why they matter below. Note that I won't discuss the benefits (and downfalls) of RAID itself, as this information is widely available. Rather, I'll explain the benefits (and downfalls) of the 3 main types of RAID implementations, as this information is not widely available and misunderstanding or misinformation is very common.

Software RAID

Your kneejerk reaction is probably that this would be the worst option, throwing more hardware at a problem always makes it better, right? Well, sadly (or happily in this case), no.

In Linux, you can create RAID devices using any regular block device (including whole drives, partitions, regular files, other RAID devices, etc) with mdadm. You can mix and match RAID levels using RAID 0, 1, 4, 5, 6, 10 and linear (linear not really being RAID per se, but it's handled by the same framework). You can also arbitrarily nest RAID devices, so you can create a RAID 0 of RAID 6s of RAID 1s if that's what floats your boat. You can also physically rip the drives out of one machine, plug them into another, and your RAID array(s) will continue working as before, with no twiddling needed.

mdadm of course supports all of the features you'd expect like hot spares, hot swappable drives (hardware permitting), but it also has several other useful features. Of particular note is that you can grow a RAID 5 array completely online (it calls this feature reshaping). That is, take an n drive array with n-1 capacity, add an additional drive and (completely online) end up with an n+1 drive array with n capacity. Furthermore, you can add in as many drives as you'd like and compose them into the same array, hanging them off the ports on the motherboard, ports on an expansion card, external drives, drives on the network...

Well, that all sounds great, but what about performance? The good news is that performance of Software RAID is generally on par with Hardware RAID and almost always (significantly) better than Fake RAID. You might not have noticed, but in the last decade or two, CPUs have become very fast, greatly outpacing hard drive speed. Even with a full RAID 5 resync in progress with many fast drives, you're unlikely to see more than 25% CPU usage, and that's just on a single core, these days you probably have at least 4 cores. RAID levels that don't involve parity (0, 1, 10, linear) incur essentially no CPU load.

So, why wouldn't I want to use a RAID array built with mdadm? Really the only reason you wouldn't is if you needed to (heaven forbid!) boot non-Linux OSes on the same set of drives.

In summary, Software RAID with mdadm:

Fake RAID

You've probably never heard of Fake RAID before, at least by that name, but it's extremely common, just about every motherboard these days features it. Most low end add-in RAID cards also fall into this category. There's a slew of reasons why you wouldn't want to use Fake RAID and basically only one reason why you'd want to use it.

Fake RAID is essentially software RAID provided by the BIOS on the motherboard, however, it has none of the benefits of Software RAID and none of the benefits of Hardware RAID, hence, Fake RAID. A very important fact to remember about Fake RAID is that the implementation varies from motherboard to motherboard, some are better tested than others, some are missing features that should be there, etc. Fake RAID from one vendor to another is almost guaranteed to be completely different.

Unlike Hardware RAID, Fake RAID does not present the array as a single logical disk to the OS, so the OS still needs to explicitly support Fake RAID. Unlike Software RAID, Fake RAID does not use a consistent on disk format, and if your motherboard dies, your data is probably lost unless you can find another identical motherboard. Fake RAID rarely supports any RAID levels other than 0 or 1. Fake RAID rarely supports hot spares or hot swappable drives. Fake RAID does not support nesting of RAID arrays and only supports RAIDing an entire disk (not a partition, file, or generic block device).

The one upside of Fake RAID is that it does allow you to boot multiple OSes from the same array of drives, provided that both OSes support the Fake RAID. Though, that's only relevant if the two OSes have mutually incompatible implementations of Software RAID (or don't have one at all), booting 2 different distros of Linux is trivially easy when using Software RAID instead of Fake RAID.

So, with Fake RAID:

Hardware RAID

Hardware RAID is a lot like the big brother of Fake RAID, nothing is worse (besides the price) and a few things are better, but still lacks a number of features that Software RAID has. Expect to pay at least $500 for a Hardware RAID card, anything that costs less is probably just a Fake RAID card packaged as an add-in card.

One important differentiator for Hardware RAID is that it will present the RAID array as a single logical disk to the OS, and thus the OS does not need to explicitly support the RAID card. Hardware RAID has dedicated circuitry for computing RAID 5/6 parity, thus removing the (typically small) load from the host. Hardware RAID is more likely to support RAID levels beyond 0 and 1 than Fake RAID. Hardware RAID usually supports hot spares and hot swappable drives. In general, you still need to construct your RAID array out of whole disks, and you can't nest RAID arrays. Importantly, the on disk format still varies from card to card, so if your RAID card goes up in flames, so does your data. In very large (a dozen or more drives) RAID 5 and 6 arrays, you may get better performance from Hardware RAID than Software RAID, although that depends on the card.

In summary, with Hardware RAID:

So, what RAID implementation should I use? Probably Software RAID, but ultimately the decision is up to you.