December 9, 201510 yr He is my situation. I have a red/disabled disk. Here is what I did to my system today: 1. I installed a new PCIe USB3 card for passthrough to my VM. I completed this task...passed the card through and tested it by plugging in a USB DVD drive and ripped a movie. I then converted the movie to mp4 and everything was fine for several hours. 2. Changed network settings...removed bonding...rebooted unraid entered bios and disabled the second NIC on my motherboard. After completing this item...unraid booted up and I immediately got a notification on the unraid webgui indicating that a disk6 was disabled...it also showed errors on disk3 but that disk was still green and it indicated that parity was invalid. I grabbed a diagnostic it is attached. I then attempted to shutdown unraid and it would not powerdown for over an hour. I manually powercycled the Tower and when unraid came backup there were no disks assigned...which I have never had this happen previously. Before I do anything, I thought I would solicit suggestions from the forum. tower-diagnostics-20151209-1250.zip
December 9, 201510 yr Author Yes all drives show up in bios. They are also all available for assignment in the unraid webgui. But I haven't assigned yet as I am waiting for feedback. I am tempted to assign disks and trust parity...but not sure if that is wise
December 9, 201510 yr What version of UnRAID are you currently running? The "Trust Parity" function was NOT working correctly in early v6 versions -- it was fixed in v6.1.5, so you do NOT want to do that unless you're using either 6.1.5 or 6.1.6 You also want to be VERY careful that you don't do anything that will preclude doing a rebuild of disk #6 just in case it was really bad (and not just a loose cable/poor connection as a result of "fiddling" inside the system). Assuming you're running the current version (if not, you can redo the USB flash drive with the latest version before proceeding), you can safely do this: (a) Do a New Config with the "Parity is already valid" option checked. (b) Start the system -- and do a NON CORRECTING parity check to confirm whether or not all the disks are reading okay (if disk #6 has issues, you'll find out here while not actually changing parity as a result of any errors, so you'll be able to do a rebuild of disk #6). If all is well, then the system is likely just fine. To really confirm that disk #6 isn't having issues, you need to write something to it -- but do steps (a) and (b) first. If disk #6 has issues while doing (b), you should replaced it with another disk and do a rebuild => if parity was indeed valid, then the rebuild will be successful. If not, you'll still have the "bad" disk to attempt recovery from.
December 10, 201510 yr Author Gary, Thanks for taking the time to respond. I am on unraid 6.16...so I should good in that regard. I will give it a try when I get home.
December 10, 201510 yr Author Gary, I did as you suggested and when the array came up it was the exact same thing...disk6 red/disabled with a couple of errors and then disk 3 was showing errors but stayed green. I decided to pull my flash drive and make a backup of its contents and while making the zip of the contents...it hung up on two files....fsck0000 the file type was listed as rec in windows and there was no date or time. There was also a second file named fsck0001??? In any event I decided to reformat the flash drive and put a fresh install of unraid 6.16 on it. I am now able to follow your suggested procedure. So far disk6 is still green and I spot checked some of its contents and it seems fine. A non writing parity sync is underway...12 hours to go. Thanks again for your assistance. Dan
December 10, 201510 yr ... A non writing parity sync is underway...12 hours to go. I presume you mean a non-correcting parity CHECK. You DID check the "parity is already valid" box -- right? Otherwise, if you're actually doing a parity sync then it's too late to do any recovery, as it's redoing the parity drive. Not a big deal as long as disk #6 is okay, but not what I had suggested.
December 10, 201510 yr Author Gary, I did check the parity is already valid box and I guess I was tired while writing my last message...I did uncheck the box that said write parity corrections to disk and then clicked the parity check button...so I believe I did it correct. It says parity check in progress. That being said, I am 62% complete and there are 25 sync errors. So I am not sure what to think. EDIT: I am now 100% complete and there were 26 sync errors. Any suggestions on a next step or interpreting the 26 sync errors? Also smart report came back clean on disk6 and I also went into maintenance mode and ran an xfs file system check and the status windows showed: Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 3 - agno = 1 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Dan
December 11, 201510 yr That's a small # of mismatches ... it's likely they're "real" parity errors; but they could also be bad bits on drive #6. Do you have checksums or backups that will let you confirm the files on that disk? If not, you have to decide what you want to do next -- you can either rebuild disk #6 [Do it on a new disk, so the old disk is still intact just in case it wasn't bad -- if that's the case, the rebuilt disk will have incorrect data on it because of the bits of incorrect parity info]. ... OR you can just assume the errors are real and do a correcting check to fix them. This will, however, "correct" parity to match the bad data on your disk if the errors are actually on the disk. Unless you have some way to confirm the data (checksums or backups) there's really no way to know for sure which is the better approach.
December 12, 201510 yr Author Gary, I don't have any checksums (I did some reading and I can see where it make sense to have these...I am going to experiment with the unraid plugin). I have backups of only specific irreplaceable data...such as my photo shares. I don't have backups of my tv show recordings or movies. To be able to rebuild disk 6...I have some work to do....I need to mount a drive from another PC with the unassigned drives plugin...move the data to unraid...and then preclear this drive...then rebuild the data...so it may take a couple of days as I have a very busy weekend. I am hoping the sync error are from the hard reset...at the time I know some files were being synced with BTSync. Thanks for the advice!
December 12, 201510 yr Given the small # of parity errors, it's VERY likely these are actual parity errors, and not data errors ... but without the ability to absolutely confirm it, that's just an educated guess. I'd go through a random selection of files on the troublesome disk, and if everything looks fine, just go ahead and do a correcting check and be done with it. A more comprehensive alternative that will let you confirm all was good with reasonable certainty is to (a) copy ALL of the data from disk #6 to another disk outside of the array (perhaps on another computer); (b) do a correcting parity check; © rebuild disk #6 (either to the same disk or a new one); and then (d) run a file compare utility (if you're using Windows you could use FolderMatch) to compare the files that you saved with those on the rebuilt disk #6. If everything matches, you're most likely just fine. But nothing is as good as having checksums and backups, so you can (a) know when a file's been corrupted/modified; and (b) replace it with a good copy from your backups.
December 14, 201510 yr Author So I went through the process of replacing disk6 with a clean drive. The rebuild completed and I got the following warning notification at the end of the process: unRAID Data rebuild:: 13-12-2015 11:42 Notice [TOWER] - Data rebuild: finished (26 errors) Duration: unavailable (no parity-check entries logged) So what are these errors...is this telling me that the original 26 parity check errors may have been actual parity errors? At this point I have original disk6 plus the rebuilt disk6. I am assuming that I should proceed with a parity check and allow it to write the corrections. Would you agree? I have also attached a copy of my diagnostics for reference. Thanks again, Dan tower-diagnostics-20151213-1905.zip
Archived
This topic is now archived and is closed to further replies.