February 14, 201016 yr Hi guys, I experienced multiple drive failures in very close succession. Essentially disk 4 went down, and I replaced the drive and started to rebuild the information from parity. During the rebuild disk 3 started to play up, and had loads of sync errors show up (7 million +). The extent of errors was such that once parity had finished the array started (with Disk 4 having been rebuilt) but with disk 3 now being excluded from the array. I attempted to run smartctl with no success, but the drive was still accessible as a share. Unfortunately I didn't take the opportunity to copy the data across at that stage, and i decided to stop the array so i could remove the drive and test it using seatools on a windows box. Result of which is that it fails the smart test (won't perform it at all) and fails the generic short test as well. I have now just restarted the array and the drive is showing up but the array thinks its a "replaced drive", and wants to rebuild the drive from parity. My questions are: 1) Given the extent of the errors during the first disk rebuild, does the parity disk have the information "rewritten" as part of the rebuild process? My feeling would be no, as it should have just been rebuilding the first failed disk (No. 4) which should have left disk 3's info on the parity intact? following the rebuild of Disk 4 unmenu reported : Parity is Valid:. Last parity check < 1 day ago . Parity updated 76371522 times to address sync errors - hence the reaosn for my concern? 2) As the flaky drive has reappeared in the array as a new disk, I don't really want to rebuild the array onto this disk (given how flaky it is), but can i somehow try and copy the information of it to a separate location. Then replace the disk and rebuild the drive from parity? If anything is corrupted at leats themn I might have a copy of the data? 3)I tried midnight commander to see if I could copy the information but can't see the drive? am probably doing somethign wrong. Current syslog attached - unfortunately i didn't copy the syslog after the parity rebuild which showed all the errors. UPDATE: As the drive appeared when i restarted the array, I remove it from the array and was able to get a smart report. See attached. The seek errors, unknown attribute and the raw read error rate are what concerns me greatest? comments any advice appreciated. syslog-2010-02-14.txt smart.txt
February 14, 201016 yr UPDATE: As the drive appeared when i restarted the array, I remove it from the array and was able to get a smart report. See attached. The seek errors, unknown attribute and the raw read error rate are what concerns me greatest? comments any advice appreciated. The three attributes you mentioned are not my concern. However, you have over 1800 sectors pending re-allocation. You have over 250 that have been re-allocated. Most drives only have a couple thousand spare sectors. You've basically reached that limit even though it has not "failed" the smart report. Your drive is failing to read many sectors. It is a prime candidate for replacement. The only reason the sectors are pending is because they have not yet been written to since their read failures and the disk has no idea when to put in the replacement sectors. Joe L.
February 14, 201016 yr Author Thanks Joe, figured as much that the drive was dead/dying. Seeing it tossed so many errors up during the rebuild of the other disk, would it be "safe" to assume the parity drive is ok, and that if I replace the dying disk, I can use the parity to rebuild the data? Was just concerned based on the system reporting "Parity updated 76371522 times to address sync errors " Does this mena the parity has been corrupted? Before I start pushing buttons I would ratehr make sure i get it right. Thanks for the prompt response as ever Joe. thx, Paul
February 14, 201016 yr Was just concerned based on the system reporting "Parity updated 76371522 times to address sync errors " Does this mena the parity has been corrupted? Odds are good the bad contents of the old data drive were read and used to update the parity drive. It might be useless. Only way to know for sure is to un-assign the defective drive and re-start the array. The files you then see are those on the un-assigned drive are simulated by parity and the other data drives. Make copies of them to other disks and test/view them for correctness. The parity sync errors might have been in a file, or might be on an un-used part of the disk. Just don't press the button labeled "restore" and don't calculate parity with that old drive in place. You should replace the bad drive and use the "Start" button to rebuild onto it.
February 15, 201016 yr Author Joe, I went out and got 2 replacement drives today, and am currently preclearing one to replace the failed disk 3. however a funny thing is happening and I don't know why. The array is showing disk 3 as "not installed". BIOS is picking up the drive, the syslog has allocated it as "SDD" but even when i stop the array and go to the devices page it doesn't give me the option to select the drive. any ideas?
February 15, 201016 yr Joe, I went out and got 2 replacement drives today, and am currently preclearing one to replace the failed disk 3. however a funny thing is happening and I don't know why. The array is showing disk 3 as "not installed". BIOS is picking up the drive, the syslog has allocated it as "SDD" but even when i stop the array and go to the devices page it doesn't give me the option to select the drive. any ideas? It apparently is presenting itself with different attributes then previously. Or not at all. Only the syslog would tell the story, but since you obviously powered down to install the new drives, who knows. If the new drives resulted in your old drive being assigned to a new device by linux, it might be the reason.
February 17, 201016 yr Author ok, finally got the drive to be recognised, and rebuilt parity. there has definitely been lost files, which is ok as i had backups or they were movies I just need to re-rip. However after the parity rebuilt the drive shows the drive has 835gb used space, however when you look at disk 3 it only shows approximately 40GB in actualy files. how do I regain back the space on disk3? Ideas appreciated
February 17, 201016 yr There may be file-system damage. See here in the wiki on how to check and repair http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems
February 18, 201016 yr Author thanks again Joe, I was already following that path, and have been able to recover a lot of the data by using reiserfsck. However the declared total size of the drive in unraid is still showing 936GB (ie. 1TB) but the drive that replaced it is a 1.36TB (1.5TB). Any ideas on how to reclaim back the missing space on it? The syslog shows that the drive has been recognised for its correct size, but unraid doesnt reflect it. Advice appreciated. paul
February 18, 201016 yr thanks again Joe, I was already following that path, and have been able to recover a lot of the data by using reiserfsck. However the declared total size of the drive in unraid is still showing 936GB (ie. 1TB) but the drive that replaced it is a 1.36TB (1.5TB). Any ideas on how to reclaim back the missing space on it? The syslog shows that the drive has been recognised for its correct size, but unraid doesnt reflect it. Advice appreciated. paul Attach a copy of your syslog to the next post you make.
Archived
This topic is now archived and is closed to further replies.