Home > Data Storage Media > RAID > Drive Swapping

What is RAID by Michael Neuffer
Duplexing by Charles M. Koziero
Striping by Charles M. Koziero
Guide to RAID by David Risley
Mirroring by Charles M. Koziero
Drive Swapping by Charles M. Koziero


Drive Swapping

The article published below is from www.pcguide.com

In the "good old days" of RAID, fault tolerance was provided through redundancy, but there was a problem when it came to availability: what do you do if a drive fails in a system that runs 24 hours a day, 7 days a week? Or even in a system that runs 12 hours a day but has a drive go bad first thing in the morning? The redundancy would let the array continue to function, but in a degraded state. The hard disks were installed deep inside the server case, and this required the case to be opened to access the failed drive and replace it. Furthermore, the other drives in the array that continued to run despite the failure, would have to be powered off, interrupting all users of the system anyway. Surely there had to be a better way, and of course, there is.

An important feature that allows availability to remain high when hardware fails and must be replaced is drive swapping. Now strictly speaking, the term "drive swapping" simply refers to changing one drive for another, and of course that can be done on any system (unless nobody can find a screwdriver! :^) ) What is usually meant by this term though is hot swapping, which means changing a hard disk in a system without having to turn off the power and open up the system case. In a system that supports hot swap, you can easily remove a failed drive, replace it with a new one and have the system rebuild the replaced drive immediately. The users of the system don't even know that the change has occurred.

Unfortunately, "hot swap" is another one of those terms that is used in a non-standard way by many, frequently leading to confusion. In fact, there are a hierarchy of different swap "temperatures" that properly describe the state of the system at the time a drive is swapped:

  • Hot Swap: A true hot swap is defined as one where the drive can be replaced while the rest of the system remains completely uninterrupted. This means the system carries on functioning, the bus keeps transferring data, and the hardware change is completely transparent. Warn Swap: In a so-called "warm swap", the power remains on to the hardware and the operating system continues to function, but all activity must be stopped on the bus to which the device is connected. This is worse than a hot swap, obviously, but clearly better than a cold one.
  • Cold Swap: The system must be powered off before making the swap.

It is common for a system to be described as capable of hot swapping when it really is only doing warm swaps. True hot swapping requires support from all of the components in the system: the RAID controller, the bus (usually SCSI), the enclosure (which must have open bays for the drives so they can be accessed from the front of the case), and the interface. It requires special connectors on the drives that are designed to ensure that the ground connections between the drive and the bus are maintained at any time that the device has power. This means that when removing a device, the power connection has to be broken before the ground connection, and when re-inserting a device, the ground connection has to be made before the power connection is re-established. This is typically done by designing the connectors so that the ground connector pins are a bit longer than the other pins. This design is in fact used by SCSI SCA, the most common interface used by hot-swappable RAID arrays. See this discussion of SCA for more, as well as this discussion of drive enclosures.

As mentioned above, the SCA method on SCSI is most commonly used for hot-swappable arrays. In the IDE/ATA world, the best you can usually do is warm swapping using drive trays, which "convert" regular IDE/ATA drives to a form similar in concept to how SCA works, though not quite the same. This is still pretty good, but not really hot swapping. The system usually needs to be halted before you remove the drives.

A system that cannot do hot swapping, or even warm swapping, will benefit from the use of hot spares. If your system can only cold swap, you will at some point have to take it down to change failed hardware. But if you have hot spares, you can restore the array to full functionality immediately, and thus delay shutting the system down to a more convenient time, like 3:00 am (heh, I meant more convenient for the users, not you, the lucky administrator. :^) ) In fact, hot sparing is a useful feature even if you have hot swap capability; read more about it here