May 23, 200917 yr Hi! My unraid server is in bad shape and I don't know what's going on and what I should do next :-( Here is what is happening: This week a write error error apparently occurred. A red ball appeared next to one of the two Seagate 750gb drive I have in my unraid as pictured below: I checked its SMART status and no problem related to sectors write was reported (as far as my non-expertiveness can say). However I decided to remove it and upgrade this slot with a WD 1TB, as I read here and there the Seagate could be not very reliable. Next, after having properly shutdown and swapped the disks and started the recovery process, all the disks went into working to apparently rebuild the new disk. My real problems have started here: After >12h (PCI based SATA) of disks activity (all red lights pulsing), the disk leds went all green. Unfortunately the web management console was down (and why it is systematically the case, I would like to know!) and I was not able anymore know the status of my unraid. I was able to ping and telnet login but I did not really know what was going on. I tried to shutdown but did not have the powerdown package installed so I switch it off by pressing the case power button. Here is a copy of the syslog: http://euuff.com/unraid/syslog-2009-05-22.txt Next the tower rebooted by itself immediately and started to work on all drives again. but then again, after more than 12h, all drives lights are green but the web mgmt is down and I still don't know what's going on. Here is a copy of this syslog: http://euuff.com/unraid/syslog-2009-05-23.txt I cannot get out of this loop. Reading the saved syslogs I have the feeling the second Seagate 750 might be causing problems. Sorry for being such a basic user, I am no computer scientist and was enjoying my first unraid server built since last september without problems. Thanks in advance if someone can help. MA
May 23, 200917 yr Your case is a perfect example of why we are more and more strongly recommending a complete parity check before any major change to the array, such as your drive rebuild. It would almost certainly have found the issue with Disk 10, that stopped the rebuild. Fortunately, the problem with Disk 10 appears to be just a cabling issue, not a drive problem. (That may also be true of the drive you removed, since you saw no serious drive errors on the SMART report.) A cable problem could be either a defective or loose SATA or power cable or splitter, or defective or poorly seated backplane. Disk 10 was disabled early, and that resulted in all of the rest of the errors. That killed the rebuild, as you can't rebuild without all other drives being perfect. The new Disk 11 is showing serious Reiser file system damage, which is completely understandable since it was not fully rebuilt, and the ReiserFS module crashed, causing a message about emhttp being 'tainted', which probably explains the loss of web page. You do NOT want to run reiserfsck on Disk 11, because it still needs to be fully rebuilt, from the beginning. I think your safest option right now, is to re-install the previous Disk 11, and then run the Trust My Array procedure, to fully restore the array. It will start a parity check, which you should allow to finish. Expect to see parity errors, but it will be correcting them along the way. Once it finishes, might want to run a second parity check, just to prove everything is fine. Then, if you wish, you can again replace Disk 11, and start a rebuild. I do recommend you replace the cables to both Disk 10 and Disk 11, with the best quality cables you can find. Very minor point: Disk 10 (sdj - ata-ST3750640AS_3QD0KNPC) appears to still have its SATA150 jumper installed. See the Improving unRAID Performance, Remove SATA150 Jumper section.
July 26, 200916 yr Author Dear RobJ, As I was very busy lately it took me some time to work on my unRAID server. I followed your instructions and my server is now running fine. Thank you very much for your support! I think I'll need to upgrade it to some faster PCI-e based sata controller in the future in order to perform quicker parity checks. Thank you again, MA
Archived
This topic is now archived and is closed to further replies.