How to Fix: Test RAID for Bad Sectors (Sync Errors)

Dennis Faas's picture

Infopackets Reader Steve H. writes:

" Dear Dennis,

I have a RAID 1 set up with 4 hard drives in a mirrored array. Recently my RAID software reported that one of the drives was dropped from the array and has since been marked as 'rebuilding', however, during the rebuild process I keep receiving RAID controller errors that data could not be synced / written to the drive. I'm not sure what to do from here. Is there any way for me to independently check the drive for errors? I understand that if the drive is bad I should replace it, but I'm not even sure which drive is which. "

My response:

This is a good question. Yes, it's possible to test drives independently, but you must power down and disassemble the array, then test each drive independently (preferably in another system) without writing any data to the drive. Once you are finished, the array must be reassembled in the same order.

How to Know Which Drive is Bad in a RAID?

For your first question: the easiest way to know which drive is bad / is having errors is to refer to your RAID hard drive software and look for the drive that is currently rebuilding. If there are more than two drives rebuilding at the same time, then you will have to refer to the RAID error log to determine which drive is giving you error messages. Once you know which drive is problematic, you will need to power down the system and unplug it from the array so you can test the drive for surface errors.

How to Locate the Bad Drive via the RAID Controller

Physically locating the bad drive attached to your RAID controller can be tricky.

When I had a problematic drive, my RAID software reported that "drive 0-1" was bad (which meant controller port 0, drive 1). My RAID controller has 2 ports; each port links to 4 drives, for a total of 8 drives. To locate the drive, I shut down my computer and looked at my controller card and saw the controller ports were labeled as 0-3 and 4-7 (in this case, I knew that 0-3 meant "port 0" and 4-7 meant "port 1"). I then followed the breakout cable from port 0 and located drive 1. I knew I picked the correct drive because of two reasons: one is that I previously wrote the drive number on the hard drive using permanent black marker before I built my array; the other reason is that the breakout cable also had 'p1' written on the cable, so I knew that "p1" also corresponding to the proper drive.

Testing Hard Drives Independently from the RAID

There is no way to test hard drives independently once they part of an array. Therefore the only way to achieve this is to unplug the suspect drive from the array, then insert the drive into another computer, then perform a disk surface test WITHOUT attempting to repair the errors, as this will corrupt the array. For this task, I use Macrorit Disk Scanner, which shows a map of the hard drive (in green) and any corresponding bad sectors in red.

In theory you can unplug all drives from the RAID and test them in this manner - but only if:

  • you do not write any data to the drive once it has been unplugged from the array
     
  • you do not power on the RAID until all drives are placed back into their respective order

Always Replace the Bad Drive in the Array with an Identical Unit

It is not possible to 'fix' a hard drive with bad sectors and then place it back into a RAID array. That's because the list of bad sectors are stored in the master boot record (MBR); if you were to place this drive back into a RAID, the MBR would be ignored as the MBR counts as the entire RAID and not an independent drive, as far as I understand.

It's also worth noting that hard drives usually have reserved capacity for bad sectors, and these sectors are used up transparently if and when needed. If you test a drive with software and it lists bad sectors, it's most likely because your reserved capacity has already been used and the drive is beginning to fail.

That said, the proper way to deal with a bad drive in a RAID is to replace the drive altogether with an identical unit, assuming your entire array is made up of all the same hard drives.

What You Can Do with the Bad Hard Drive

If the hard drive is still under warranty you can send it in for replacement, and this should be the first option you should choose, if possible.

If it is out of warranty, then you can (a) use it independently outside of an array, and (b) do a low level format on the drive, which tests each sector of the drive and maps out bad sectors as it goes through the format, then writes the data to the MBR. At this point you can continue to use the drive (at your own risk), or discontinue using it altogether.

Hope that helps.

Got a Computer Question or Problem? Ask Dennis!

I need more computer questions. If you have a computer question -- or even a computer problem that needs fixing -- please email me with your question so that I can write more articles like this one. I can't promise I'll respond to all the messages I receive (depending on the volume), but I'll do my best.

About the author: Dennis Faas is the owner and operator of Infopackets.com. With over 30 years of computing experience, Dennis' areas of expertise are a broad range and include PC hardware, Microsoft Windows, Linux, network administration, and virtualization. Dennis holds a Bachelors degree in Computer Science (1999) and has authored 6 books on the topics of MS Windows and PC Security. If you like the advice you received on this page, please up-vote / Like this page and share it with friends. For technical support inquiries, Dennis can be reached via Live chat online this site using the Zopim Chat service (currently located at the bottom left of the screen); optionally, you can contact Dennis through the website contact form.

Rate this article: 
Average: 5 (3 votes)

Comments

stekcapofni's picture

Please define "identical unit" when replacing one of the drives in an array.

Does identical mean that it must be the same exact model number?
(What if the model number has been discontinued?)

Does it mean identical in capacity, speed, interface, etc.?

Dennis Faas's picture

Identical unit means exactly that: the exact same hard drive model number, manufactured by the same manufacturer. For example, if you're using WD20EARS (Western Digital 2.0 TB Green), then you would replace it with the same drive - a WD20EARS by Western Digital. If your drive is discontinued, then you can purchase a used drive through ebay; that's what I did. If that is not an option, then you can backup your entire array and purchase all new drives to replace all the old ones. Usually I do this once drive capacities have doubled or quadrupled and are reasonably priced. Example: to go from 8 x 2 TB = 8TB mirrored to 4 x 8 TB = 16TB mirrored.