How to Fix: Attach, Rebuild RAID 5 after Failed Disk (Dell PERC S100, S300)

Dennis Faas's picture

Infopackets Reader Harlen W. writes:

" Dear Dennis,

I own a Dell PowerEdge T110 server, which runs Windows Server 2008 R2 Foundation and is equipped with a Dell Perc 300 RAID card. We have 3 x 250 GB hard drives in RAID 5 format attached to the Perc 300 RAID card. According to the Dell OpenManage Server Administrator (which we use to manage the RAID via the browser), the 'Alert' log file is reporting 'Device failed: Physical Disk 0:1 Controller 0, Connector 0'. I would like to hire you to help me fix this problem, including testing the failed hard drive. If it passes, I want to put it back into the RAID. Can you please help? "

My response:

I asked Harlen if he would like me to connect to his Windows Server using my remote desktop service, and he agreed.

After reviewing Harlen's Alert log I discovered that the "Device failed: Physical Disk 0:1 Controller 0, Connector 0" error message appeared repeatedly for months - dating back as far as January this year. That makes almost 10 consecutive months where Physical Disk #1 on controller 0 was marked as failed. Since Harlen is using only 3 disks in a RAID 5, that meant his RAID was functional but without redundancy since one disk was marked as "failed". For a RAID 5, redundancy is described as N-1 disks. So if you have 3 disks, 2 disks can operate without redundancy; if another disk fails, it means data loss. In other words, this is a very serious problem!

What Causes a Disk to "Fail" and be Removed from the RAID Array?

Based on my experience, if a disk is marked as "failed" (and subsequently removed from the RAID), it is because of one or more reasons:

  1. the hard drive has bad sectors
  2. the hard drive has a logic error on the hard drive PCB board
  3. the hard drive has an internal mechanical failure
  4. the RAID controller had a "hiccup" (likely due to a hard drive timeout) and removed the drive from the array
  5. the RAID card or connectors are damaged / not plugged in properly and/or needs to be replaced

After discussing the options with Harlen, I suggested he backup his RAID to an external hard drive, then shut down the system and disconnect the bad disk, then re-attach it to the motherboard so that I could do a surface scan of the drive for bad sectors. Once that was complete, he booted the system back up and I tested the "bad disk" for bad sectors. After some time passed, the test results came back with zero bad sectors.

Determining exactly what is wrong with the failed drive is done with process of elimination. Since there were no bad sectors, the next thing I did was format the drive (to remove the RAID data, which would then mark the drive as foreign) and then add it back into the array manually.

The theory here is that if the drive is put back into the array (after being tested and no bad sectors) and it still fails, then we are most likely dealing with a mechanical failure or logic error. In this case, the disk needs to be replaced as soon as possible to avoid any data loss.

Adding the hard drive back into the RAID is always "easier said than done" because each RAID interface is different than the next. In this case I'll explain what I did to add a previously-marked failed disk back into the RAID because trying to find the answer to this in Google proved to be next to impossible.

How to Fix: Attach, Rebuild RAID 5 after Failed Disk (Dell PERC S100, S300)

If you are replacing a hard drive in a failed RAID or putting back the same disk (after it's been tested) and wish to rebuild your RAID on a Dell PERC S100-S300, here is what you need to do:

  1. If you are re-attaching a previously-used failed RAID hard drive, format the hard disk first (while it is not attached to the RAID card) - so that you can manually add the drive back into the RAID. If you are using a brand new disk, proceed to the next step.
     
  2. Next, shut down the system, then power off. Open the chassis and attach the previously used RAID drive / new hard drive to the RAID card.
     
  3. Boot the system and ignore any error messages about the RAID.
     
  4. Login to the system, then go to the "Dell Server Administrator" (for RAID) using the web interface. The web address should be: https://server:1311. If you receive an error message that the "connection is not secure" - you can ignore it and add the page to your security exceptions list.
     
  5. Now it's time to put the disk back into the array. On the Dell PERC S100-S300 card, this is done by assigning the disk as a "global hot spare" and will cause the array to start rebuilding. When that is finished, you will then need to "unassign" the disk as a "global hot spare" to have it become part of the RAID. This seems counterintuitive, but that what I had to do to make the disk become part of the RAID, otherwise the disk remains a "global hot spare".

    To do so:

    (a) On the "Dell Server Administrator" page, expand the "System" link (on the left), then click "Storage" -> "PERC S300 (PCI Slot X)" -> "Connector X (RAID)" -> "Physical disks". From there, note the location of the disk that you are adding back into the array (example: Physical Disk 0:1) and look under the "Tasks" heading - you will see a pull down option that says "Tasks Available". Click that, then select "Set as Global Hot Spare".

    (b) At this point, the RAID will begin rebuilding. This can take anywhere from a few hours to a few days to complete.

    (c) When the RAID finishes rebuilding, go back to the Physical Disks section (as mentioned in step "a" above), then change the the "Tasks Available" pull down to "Unassign Global Hot Spare". When the operation is "Executed", the options will change from "Tasks Available" to "No Tasks Available" (as with all the other RAID member disks), and your newly added disk will now become part of the array.

I hope that helps anyone else that has this issue.

Additional 1-on-1 Support: From Dennis

If all of this is over your head, or if you need help troubleshooting a failed RAID disk (and re-attaching it), I can help using my remote desktop support service. Simply contact me, briefly describing the issue and I will get back to you as soon as possible.

Got a Computer Question or Problem? Ask Dennis!

I need more computer questions. If you have a computer question -- or even a computer problem that needs fixing -- please email me with your question so that I can write more articles like this one. I can't promise I'll respond to all the messages I receive (depending on the volume), but I'll do my best.

About the author: Dennis Faas is the owner and operator of Infopackets.com. With over 30 years of computing experience, Dennis' areas of expertise are a broad range and include PC hardware, Microsoft Windows, Linux, network administration, and virtualization. Dennis holds a Bachelors degree in Computer Science (1999) and has authored 6 books on the topics of MS Windows and PC Security. If you like the advice you received on this page, please up-vote / Like this page and share it with friends. For technical support inquiries, Dennis can be reached via Live chat online this site using the Zopim Chat service (currently located at the bottom left of the screen); optionally, you can contact Dennis through the website contact form.

Rate this article: 
Average: 5 (3 votes)

Comments

sytruck_8413's picture

Did you get him to add a couple of disks so he has a proper RAID 5 setup?