Home > Data Storage Media > RAID > Guide to RAID

 RAID
RAID
What is RAID by Michael Neuffer
Duplexing by Charles M. Koziero
Striping by Charles M. Koziero
Guide to RAID by David Risley
Mirroring by Charles M. Koziero
Drive Swapping by Charles M. Koziero

 

RAID: your guide

RAID: your guide by David Risley
last update: May 11, 2001© PC Mechanic


Raid


Two things blend together to make RAID more powerful than ever: An increasing number of die-hard, PC-loving speed- freaks and an ever-decreasing price of the hard drive. We're (for most of us) beyond the stage of thinking our hard drives are too small. We're beyond the stages of making due because a hard drive costs so much. But, for the PC enthusiast, we're not beyond the stage of saying, "Damn, that hard drive is too slow!".


There is where RAID comes in. Individually, most hard drives today are too slow. Regardless of how fast they are designed to be, with the speed of today's processor and other system components, hard drives today are a source of incredible bottleneck for a system. With RAID, we can blend the power of two or more hard drives together to accomplish great things.


What is it?


RAID stands for Redundant Array of Inexpensive Disks. This is actually a great name for it. And with the price decreasing like never before, the "Inexpensive" part of the name is now becoming a reality. Depending on the setup you choose for your RAID array, it can offer you increased performance by using the power of two hard drives as a single volume or simply using the redundancy of a second drive for increased data security. Just like designers do in mission-critical machines (building redundant systems in case of the failure of one), a RAID array can provide increased security in the event of the failure of one of the drives. I will get into the RAID types in a minute, but any good RAID array will use mirroring technology, meaning that whenever you write something to your primary drive, the RAID setup will simultaneously write the same info to the secondary disk, meaning you always have a duplicate copy. In the event one drive fails, you have an exact, working copy of your entire system on the second disk


The word "array" usually implies a series of elements, each of a similar size and nature. Well, RAID is no different. The optimum setup for a RAID array employs two identical hard drives. If one of your drives is a 7200 RPM drive, then its best to be sure the other one is also a 7200 RPM drive. The same goes for capacity. If you have one 20 gig drive and the other is a 10 gig drive, your 20 gig drive will only operate on the RAID array as a 10 gig drive. In the example preceding, that RAID array would operate at 5400 RPM if you had a 5,400 RPM drive paired up with the 7200 RPM drive. Summing up, your RAID array will always operate at the speed or capacity of the weakest or smallest drive. A chain is only as strong as its weakest link. So, obviously, if you're looking to set up a RAID setup, buy two identical drives.


As you might guess, you need a special controller to set up a RAID array. The controller handles the task of managing read/write requests to both drives, managing the mirroring, etc. On some operating systems, namely NT Server or Win2000, you can use the OS itself as a software-based controller. But, it is always better to install a separate, hardware-based PCI controller. The PCI controller handles all the work onboard, saving the CPU cycles that a software controller would use. Controller cards also come with software to allow you to monitor the status of the array.


Redundancy is the key to a RAID array, but regardless of whichever setup you employ, you will definitely use one or more of the following:


Striping


This is a RAID configuration that can offer huge performance gains. Data in a striped array is interleaved across all the drives in the array. Data is read and written on both drives at the same time. A good analogy would be this: Imagine having to write an essay on a sheet of paper. You can take a pen and write it. Now, imagine for a second that you were a mythological God or something and could write with both hands, nice and neat, at the SAME time. Imagine how fast you could write that paper now! This theory applies to a RAID array using striping. By splitting the data up and using both drives to read/write, it effectively doubles the speed.


The performance of a striped array is governed by the stripe width and stripe size. The width is equal to the number of drives in your array. To outline this, assume you need to write a 1 meg Word file to your RAID array. If you have two drives, then the stripe width is two. For purpose of clarifying, assume you will be writing this data in 50K chunks. That is 20 write cycles to write the entire Word file, 10 write cycles per drive. So, the first drive writes the first 50K, then the third, then the fifth, etc. At the same time, the other drive writes the second, then the fourth, etc. You can see that this setup would write the entire 1 meg file in about half the time of one drive. You can increase performance even more by adding another hard drive to the RAID array, thereby increasing the stripe width to 3.


The stripe size is basically the size of those chunks of data being written across the array. Default for an IDE configuration is usually 64K. Contrary to common sense, increasing the stripe size can have a negative impact on performance. See, if the data chunks are huge, then many times the parallel nature of RAID will not even be employed, because the chunks may be larger than the files themselves. This would lead to no better performance than a non-RAID setup. On the flip side, a stripe size that is too small will guarantee that your file will be broken up across the array (increasing performance) but increases the likelihood of small-time random accesses to the array, meaning your drives will likely be busier. As you can see, its a give-and-take thing.


Mirroring


With striping alone, you do not get any redundancy. The data is all split up amongst the drives in the array, so if you lose one of the drives, you're screwed. Mirroring is the other feature of RAID that comes to the rescue. The only problem is that with mirroring, you don't get striping. Mirroring is a simple concept: whatever you write to one drive, you write simultaneously to the other. Thus, you always have an exact duplicate of your data on the second drive. The cool parts of this come with the controller you decide to use. For example, most controllers will automatically sense a drive failure and instantly switch to the backup drive, meaning virtually no downtime. This is great for servers and other mission-critical machines. If the controller doesn't support this, it will most likely at least automatically transfer the data from the backup drive to the new drive.


Mirroring does give a small performance benefit as well. Since both drives contain similar data, the controller can read data from one drive while simultaneously requesting data from the copy. But, write speeds will slow down some, because the controller must write all data twice.


Parity


Parity is another type of redundancy built into some RAID arrays. Instead of simply making copies of everything, the RAID controller adds a parity bit to all binary info being written to the array. Basically, its just an extra bit of data appended onto the actual data. This series of parity bits is added up by the controller to equal either an even or an odd number. By analysing this value, the controller can determine whether the information has been compromised in any way. If it has, it can replace the data automatically with data from the other drive.


Most parity setups use the XOR to do their magic. This is a type of Boolean logic, the eXclusive OR. Basically, it analyses the series of 0's and 1's and returns either a TRUE or FALSE (even numbers are TRUE, odd is FALSE). By using this data, the controller can "fill in the blanks". Its like algebra. We know that 3 + 4 = 7. If you see an equation like 3 + __ = 7, you know the blank is supposed to be a 4. The XOR logic is used in this way to rebuild corrupted data on the array, thus maintaining integrity.


The more commonly used RAID levels are RAID 0, RAID 1, RAID 0+1, and RAID 5. Each "level" is simply a different configuration of the RAID standard, each providing certain benefits and performance parameters.


RAID 0


RAID 0 could be said to not be technically RAID. Why? Because it lacks the "R" - redundancy. RAID 0 is basically a RAID setup that employs the striping I talked about above. This setup requires at least two hard drives to be configured into a "striped set". RAID 0 is becoming increasingly popular amongst power users. As discussed before, this setup offers much higher read/write speeds than normal and will really help to speed up a computer. People who are into raw speed for gaming, multimedia, etc, will enjoy RAID 0. But, because it lacks the redundancy factor, it is not typically used in corporate, mission-critical environments. If one drive of the RAID 0 array dies, the whole array is screwed.


RAID 1


RAID 1 employs the mirroring capability discussed previously. It can, in some cases, provide a little performance benefit, but it is primarily used for redundancy, pure and simple. With RAID 1, you have the option of attaching a third drive to the controller. It acts as a spare drive. It is not part of the RAID array, but simply kicks in in the event that one of the drives fails. The controller would perform an automatic restore to the spare drive, notify you of the failure, and continue operating as though nothing happened. RAID 1 is used more on corporate networks and web servers. Desktop users don't typically need it, although some who REALLY need that redundancy do use it on desktop machines.


RAID 0+1


RAID 0+1, as you might be able to tell from the name, gives you the best of both worlds. It can be costly, though, as it requires at least 4 hard drives to do it. Two of the drives are striped, as in a RAID 0 array, and the other two are mirrors of the first two. This is the only option for IDE users who want both the speed and the redundancy. Due to the cost of buying 4 hard drives plus a RAID controller, this is not the most popular option in town. It does, though, kick ass, and you will find desktop users and web server guys using this.


RAID 5


RAID 5 uses the high performance capability of striping with the increased integrity of the parity bit. The setup requires at least 3 drives. To see why it needs 3, see the discussion of parity above. By comparing the data on two of the drives, it can "fill in the blanks" on the third drive, just like solving an algebraic equation. This is what gives RAID 5 the security. Because both the data and parity info is spread out across all drives, it is often called "distributed parity".


RAID 5 is typically not an option for desktop users. It offers the best of all worlds, but typically only SCSI RAID controllers have the ability to handle it. This means IDE cannot be used, which in turn means this option will cost a crapload. RAID 5 is typically thought to be used in enterprise servers and the like.


JBOD


I love the name of this one - JBOD, "Just a Bunch of Drives". No kidding. This is barely RAID at all. It basically uses the controller to span two drives together into a single drive volume. When one of the disks fill up, it starts using the other one, transparently to the user. This setup will utilize all the space of the drives, which means you won't lose any space with differently sized drives placed on the array. On the flip side, though, it doesn't offer any redundancy or performance benefits. You will find that many controllers offer this as an option, although there's not a huge point in using it, in my opinion.


Simply put, IDE. Okay, well let me clarify my position.


Until more recently, an IDE RAID array would have been grounds for fun and laughter. IDE was slow and the whole setup just wasn't worth the hassle. But, we now have ATA/66 and, better yet, ATA/100 drives. And they are dirt cheap. Today, IDE RAID arrays are a great alternative to a SCSI array.


First, let me tell you, SCSI arrays are EXPENSIVE. A good SCSI RAID controller will set you back several hundred dollars. Add on top of that the need for two or more SCSI hard drives (depending on what RAID level you will be employing). Are you seeing the dollar signs yet? On the plus side, SCSI does offer a wider array of options and is faster. Also, in big server environments, IDE would be a bad option because the IDE design limits the number of drives to four. SCSI RAID can support up to 60.


IDE RAID is more affordable and quite fast. Many times, a good RAID 0 array using two decent speed IDE drives can outperform a high-end SCSI alone, while costing much less. For this reason, many users are finding IDE RAID 0 (or possibly RAID 0+1) a good way to go. Some of today's real powerful systems are now employing RAID arrays.


Setting up an IDE RAID array is not that difficult. There are some things you will need and some things you need to do first.


  1. Make sure you have a valid, working system disk before doing anything. Create it and test it. Make sure it also has the necessary files to boot your CD-ROM. If you have a CD backup, you will need to get that CD-ROM working before you can proceed.
  2. In # 1, I said "If you have a CD backup". In Step 2, I say, "Make a CD backup". Or some kind of backup. Unless you are setting up a simple RAID 1 array, your data on the first drive will be hosed. You are starting your system from scratch. So, before doing anything, make backups of everything you deem important. A more thorough solution would be to create an image file of your whole drive and back it up on a CD.
  3. Wet, rinse, repeat step 2. (Just emphasizing.)
  4. You'll need hard drives and a controller. You will need at least 2 hard drives, possibly more depending on the type of RAID you are doing. Get a RAID controller that matches your drives' specs.

Start playing:


  1. Grab some standard IDE ribbon cables and connect your two hard drives to your RAID controller card. For two drive configurations, use one cable per drive, and attach each drive to its own channel on the card. For four-drive configurations, use a cable per 2 drives, just like you would with a standard master/slave setup.
  2. Going with the previous step, you need to act as if each channel on the RAID controller is an IDE channel of your motherboard. Thus, if you only have two drives, each drive should be set to master. If you have four, each channel should have one master and one slave.
  3. Install the controller into an open PCI slot.
  4. Connect the hard drives to the case, as you would with any other drive installation. Be sure to connect the power leads, too.
  5. After confirming everything is in place and ready to go, boot the PC. You will likely be brought to a configuration screen for your RAID controller. Here you will need to configure the array type, such as striped, mirrored, etc.
  6. Insert your system disk into Drive A: and reboot.
  7. Use FDISK to partition the array.
  8. Likewise, format the array. All this is just like any normal installation.
  9. Restore your programs and backup data.
  10. Pat yourself on the back. Twice.

Things to Watch


There are two major points to keep in mind when installing RAID in your system.


First, your motherboard must have a good bus-mastering DMA sequencer on board. Bus-mastering is a technique which allows hardware to communicate to other hardware on the same bus without going through the processor. This reduces the load on the CPU. The DMA sequencer is what assigns your four of eight bus-mastered DMA channels to your PCI slots. Not only does your RAID controller have to be installed into one of these bus-mastered slots, but the DMA sequencer must be robust enough to handle it. This is kind of a trial and error thing, although some controller manufacturers will post on their web site a listing of tested motherboards that will work well.


Second, your RAID card is picky in that it wants to be in the very first bus-mastered slot on the motherboard. You will need your motherboard's manual to determine which slot is the first. Some boards count their bus numbers from the top down, others from the bottom up. Some start from the AGP slot and count down. So, since the RAID controller needs to be first, make sure it is. If you have a PCI video card, make sure the RAID controller is above it (or whichever slot is numbered first). If you're using AGP, pay attention to see if the first PCI slot below that shares an IRQ with the AGP video card. If it does, you'll need to move the RAID controller down a slot. The manual is your best reference. In a crunch, you can always use trial and error - remove all cards from system except for RAID and video, and move around until it works, then re-install all the other cards.


Conclusion


I hope you found this article useful. RAID is definitely a viable option for the speed freaks out there. Some of us even have a few hard drives lying around with decent specs. Popping a cheap RAID controller into a system and putting those drives to use could really improve your system's performance.