January 1, 201511 yr I've been using unraid for just over a year. It has been a great experience with the exception of my stupid choice to use seagate drives... I've lost about 4 or so in just under a year... Time to switch to wd red I think. I woke up this morning to an email showing that my number 6 drive had failed. Stopped the array, rebooted the system hoping the drive would come back. No such luck. Found the offensive drive in my rig and replaced it with another drive. Logged into unraid and verified that all the other drives were fine, had the blue dot on my drive six replacement, started the array and started the rebuild process. I just checked on the rebuild completion status and noticed now that disk 8 has * for temperature and 171141458 errors now as well. Im am thoroughly afraid of losing data. Advice on how to proceed is greatly appreciated from this great community.
January 1, 201511 yr I would post a syslog. I have a feeling that there's going to be alot of of ATA errors in it relating to disk 8. (Probably cable related since you just swapped out a drive) - I'm a HUGE fan of hotswap bays because of this. Unfortunately you *may* have some corruption on disk 8 because of those errors. When unRaid detects a read error, what it does is reads all of the other drives to calculate what the appropriate data should be and then writes it back to the appropriate drive. Unfortunately, since the one drive (6) was in the middle of a reconstruction, the data which it read may or may not have been valid, so the data written to 8 may or may not be correct. (this may be a bug in unRaid - in my opinion it shouldn't automatically correct read errors to a drive if the system is undergoing a rebuild at the time) I had a similar problem about a year ago, and am still finding the odd movie which doesn't play correctly that is stored on the offending drive. (as an aside, I now also have MD5 checksums for everything stored on the drives so that if this problem ever happens I could easily discover the problem files)
January 1, 201511 yr Author the entire syslog was 16mb in size... here is a portion.. If there is a better way to post a syslog, please let me know. Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761704 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761712 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761720 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761728 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761736 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761744 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761752 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761760 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761768 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761776 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761784 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761792 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] Unhandled error code Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] Jan 1 11:57:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] CDB: Jan 1 11:57:59 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 00 08 36 1c 48 00 00 04 00 00 00 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761800 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761808 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761816 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761824 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761832 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761840 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761848 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761856 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761864 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761872 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761880 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761888 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761896 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761904 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761912 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761920 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761928 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761936 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761944 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761952 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761960 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761968 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761976 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761984 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137761992 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762000 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762008 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762016 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762024 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762032 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762040 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762048 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762056 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762064 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762072 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762080 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762088 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762096 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762104 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762112 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762120 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762128 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762136 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762144 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762152 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762160 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762168 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762176 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762184 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762192 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762200 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762208 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762216 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762224 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762232 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762240 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762248 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762256 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762264 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762272 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762280 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762288 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762296 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762304 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762312 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762320 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762328 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762336 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762344 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762352 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762360 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762368 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762376 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762384 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762392 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762400 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762408 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762416 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762424 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762432 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762440 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762448 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762456 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762464 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762472 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762480 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762488 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762496 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762504 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762512 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762520 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762528 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762536 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762544 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] Unhandled error code Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762552 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762560 Jan 1 11:57:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762568 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] CDB: Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762576 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762584 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762592 Jan 1 11:57:59 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 00 08 36 20 48 00 00 02 38 00 00 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762600 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762608 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762616 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762624 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762632 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762640 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762648 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762656 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762664 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762672 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762680 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762688 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762696 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762704 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762712 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762720 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762728 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762736 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762744 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762752 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762760 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762768 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762776 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762784 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762792 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762800 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762808 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762816 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] Unhandled error code Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] Jan 1 11:57:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00 Jan 1 11:57:59 Tower kernel: sd 10:0:0:0: [sdl] CDB: Jan 1 11:57:59 Tower kernel: cdb[0]=0x88: 88 00 00 00 00 00 08 36 22 80 00 00 01 c8 00 00 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762824 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762832 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762840 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762848 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762856 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762864 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762872 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762880 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762888 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762896 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762904 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762912 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762920 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762928 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762936 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762944 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762952 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762960 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762968 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762976 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762984 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137762992 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763000 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763008 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763016 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763024 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763032 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763040 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763048 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763056 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763064 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763072 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763080 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763088 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763096 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763104 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763112 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763120 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763128 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763136 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763144 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763152 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763160 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763168 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763176 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763184 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763192 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763200 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763208 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763216 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763224 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763232 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763240 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763248 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763256 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763264 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763272 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763280 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763288 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763296 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763304 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763312 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763320 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763328 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763336 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763344 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763352 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763360 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763368 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763376 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763384 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763392 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763400 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763408 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763416 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763424 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763432 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763440 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763448 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763456 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763464 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763472 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763480 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763488 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763496 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763504 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763512 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763520 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763528 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763536 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763544 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763552 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763560 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763568 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763576 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763584 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763592 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763600 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763608 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763616 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763624 Jan 1 11:57:59 Tower kernel: md: disk8 read error, sector=137763632
January 1, 201511 yr Author so looking at the gui, it is showing only 10 writes to drive 8, would this mean it isn't correcting bits on drive 8? Is unraid trying to read bits from drive 8 to rebuild drive 6? Is drive 6 going to be corrupt?
January 1, 201511 yr http://lime-technology.com/wiki/index.php?title=Troubleshooting#Capturing_your_syslog so looking at the gui, it is showing only 10 writes to drive 8, would this mean it isn't correcting bits on drive 8? Is unraid trying to read bits from drive 8 to rebuild drive 6? Is drive 6 going to be corrupt? Thats exactly how the system works. It reads from all the working drives (including parity) to recalculate the information for disk 6. Not sure at this point about corruption however.
January 1, 201511 yr I've been using unraid for just over a year. It has been a great experience with the exception of my stupid choice to use seagate drives... I've lost about 4 or so in just under a year... Time to switch to wd red I think. HGST's are the most reliable. Look at the most recent backblaze study (sticky I created in the hard drives subforum). I woke up this morning to an email showing that my number 6 drive had failed. Stopped the array, rebooted the system hoping the drive would come back. No such luck. It should NEVER come back under this circumstance. Once a drive is red-balled, it will never become un-red-balled unless you take some action (or if there is a bug). If it did magically come back, it would be a VERY bad thing as parity would be out-of-sync with your drives. Found the offensive drive in my rig and replaced it with another drive. Logged into unraid and verified that all the other drives were fine, had the blue dot on my drive six replacement, started the array and started the rebuild process. So the current state is that the old drive 6 is outside the array (in your hands), and the new drive 6 is being rebuilt? I just checked on the rebuild completion status and noticed now that disk 8 has * for temperature and 171141458 errors now as well. Im am thoroughly afraid of losing data. Advice on how to proceed is greatly appreciated from this great community. It is very uncommon (I have never seen it) that a drive dies like a light bulb - blink and its gone. It typically starts showing signs. More often than not a red-ball is a cabling issue. I suspect that the disk6 in your hands is fine. However, it did red-ball, and once that happens, any writes to "disk6" update a "simulated disk6". So the physical disk6 contents and the simulated disk6 contents are now out of sync. How far out of sync? It depends on how long ago the disk red-balled and how much data you copied to simulated disk6. If you don't recover the simulated disk6, any writes done to disk6 since the red-ball are lost. So now we have disk8 spewing errors. This likely means that disk8's cable got knocked loose (as Squid said). I can almost guarantee that you do not have drive cages. Your story is the poster child for why every array needs them! But here is what you have to do. 1 - Stop the rebuild 2 - Stop the array 3 - Backup the config folder from your flash drive (this has to be done with the server offline) 4 - Shut down the server (so it powers down) 5 - VERY CAREFULLY open the server and secure both sides of the disk8 cable without knocking anything else loose. This is not easy, but take your time and do your best. The backup we took at step 3 will enable you to retry should you not be successful in getting all the drives connected. 6 - Power up 7 - Unassign slot 6 (if assigned) 8 - Start the array 9 - Examine the simulated disk6. See if it looks good. If not, post back 10 - Stop array, assign slot 6 to your new disk6 (your original disk6 is still in your hands) 11 - Start the array and rebuild of disk6 Disk6 should rebuild using the contents of parity and the other disks in the array. It should look exactly like the simulated disk6 you looked at at step 9.
January 1, 201511 yr Author So the current state is that the old drive 6 is outside the array (in your hands), and the new drive 6 is being rebuilt? Yes, I physically removed the old disk six and it is sitting on a shelf. The new drive 6 was what was being rebuilt. How far out of sync? It depends on how long ago the disk red-balled and how much data you copied to simulated disk6. If you don't recover the simulated disk6, any writes done to disk6 since the red-ball are lost. I personally have not copied anything to my server today. The disk redballed this morning and I have not copied anything to the server. Per your directions, I have since stopped the array and the rebuild of drive 6. I will copy the flash drive onto my desktop as a backup, and do my best to fix any cabling issues on drive 8. If i make it to your step 10, will the rebuild of drive 6 start from the beginning again using correct data from drive 8 if it is now connected properly or will it try to rebuild where it left off. Thank you both for your help.
January 1, 201511 yr Rebuild will occur from the beginning. Although likely the data cable is the culprit, it could also be the power cable that nudged loose. Check both.
January 1, 201511 yr Author rebuild is at 5%, drive 8 is showing no errors. fingers crossed! Time to look at a new case with cages. I will look at your advise on drives as well. Thanks!
January 1, 201511 yr Author as far as I could tell. There were mostly iso's and jpgs that would load.
January 1, 201511 yr as far as I could tell. There were mostly iso's and jpgs that would load. I suspect all will be well. The disk6 in your hand (that you pulled from the server) could be used to do a file by file md5 comparison to ensure all files match, or you could just spot check some files. I would especially focus on the last several gigabytes of files copied to disk6. If any files are corrupted, it is likely one of them. Good luck!
January 1, 201511 yr If you find any that mismatch, DO NOT DELETE EITHER FILE and let me know. In order to have both drives in the server you would need to mount thie disk outside the array. If you have an extra SATA port, I can explain how to mount it to do your compare. Rebuild still working ok?
January 1, 201511 yr Author Rebuilding is still going. no errors visible. I had hopes that I'd be able to mount the failed drive 6 using a usb adapter to another pc but from my reading the past half hour that doesn't seem possible but rather mounting outside the array as you suggest is what will have to be done.
January 1, 201511 yr You may be able to mount the disk in the unRAID server via the USB adapter. Start a thread and maybe someone can help. I once precleared a USB-mounted disk. Mounting it in the server itself is only complicated because you have to open your server to connect it up. If you were able to put it in a drive cage, the commands to mount it are simple. (Sorry to rub it in )
January 2, 201511 yr Author still rebuilding. I'm kind of curious now as to what to do with the failed disk 6 i have (after i verify files) as well as another failed disk i have. (both were simply random red dots one day) I will try a few runs of preclear on them and see if they're fine. Does that sound like a bad move?
January 2, 201511 yr still rebuilding. I'm kind of curious now as to what to do with the failed disk 6 i have (after i verify files) as well as another failed disk i have. (both were simply random red dots one day) I will try a few runs of preclear on them and see if they're fine. Does that sound like a bad move? First check the SMART reports, which takes no time. No use preclearing them if they already have lots of problems. But if the SMART report looks ok, preclearing is the right next step. Feel free to post the reports and someone can let you know if anything looks concerning.
January 2, 201511 yr Author So it seems that I could mount an unraid drive over usb on a windows box using a driver or program however it seems these do not allow for writing to the drive (which my md5 program will want to do when creating the hash file) It looks like mounting the drive outside the array would be a better option. Do you recommend SNAP?
January 2, 201511 yr mdsum will work and it writes it's output to the console, which you could pipe to a file on a writable disk. There is a free version I found for Windows that works just like the Linux version that comes with unRaid. I have never used SNAP, but others have had success with it. I can't be much help as I am out of town this weekend with no access to my server.
Archived
This topic is now archived and is closed to further replies.