discreet-booby4798 Posted March 23 Share Posted March 23 I was looking to encrypt my drives, so I believe I had migrated everything off of my Toshiba 1tb disk. That's to say this failing disk that I am trying to rebuild, shouldn't have anything on it. But I would like to know I know how to rebuild failed disks correctly before the next emergency... For some reason my Toshiba 1tb disk started failing, it was mounted directly on a pcie card with two sata ports. A second disk (my seagate 2tb) on that card did not fail, so it couldn't be a cable issue, but could be a card issue, I digress... My two parity drives emulated the failing Toshiba disk, so I ordered a new Samsung 4 tb disk, popped it in, read through https://docs.unraid.net/unraid-os/manual/storage-management/. I am 99% certain I went through the https://docs.unraid.net/unraid-os/manual/storage-management/#normal-replacement Normal replacement process. As I recall there were 132 errors in the rebuild process, and I through that was weird/bad. (See screen shot) I thought I would run a parity-check, now I am getting 3418046386 errors, and my 4tb Samsung is showing as unmountable or no file system. What am I doing wrong? At some point I rebooted, so I hope that didn't reduce the chance of resolving this. servernas2-diagnostics-20240323-0740.zip servernas2-syslog-20240323-1136.zip Quote Link to comment
JorgeB Posted March 23 Share Posted March 23 During disk3 rebuild parity2 was already wrong, meaning there was an issue before that, do you know what could have caused that? Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: recon D3 ... Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=0 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=8 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=16 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=24 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=32 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=40 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=48 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=56 Mar 17 16:28:00 ServerNas2 kernel: md: recovery thread: Q corrected, sector=64 Quote Link to comment
discreet-booby4798 Posted March 23 Author Share Posted March 23 As far as what cause the original parity problem. - possibly power loss, that went out shortly before these problems but the machine is ancient so who knows, my newer machines are on UPS. If I suspected I had data on the disk how might we go about this? - How can I trouble shoot the bad parity? - How might I rebuild from parity again? - What were the 132 errors? My real plan is to disable the network and fire up my main app, resilio sync, and reformat disk 3 and rebuild the parity, then see if it want's to write data to my other offsite fail overs. Quote Link to comment
discreet-booby4798 Posted March 23 Author Share Posted March 23 I guess I just want to "rebuild from parity", "forget disk", then "rebuild"; or something like that. "reformatting" will probably delete the disk in parity now that I think about it. Quote Link to comment
JorgeB Posted March 23 Share Posted March 23 2 hours ago, discreet-booby4798 said: possibly power loss I don't think that explains so many errors on parity2, and from the beginning disk4 also failed to emulate, suggesting parity1 was also not valid, you can try checking filesystem on disk4, if that doesn't work your best bet may be to see if you can recover some data from the old disk. Quote Link to comment
discreet-booby4798 Posted March 24 Author Share Posted March 24 I understand what you are saying about so many errors on the parity disks, but I don't understand why you are talking about disk 4, I only have 3 data disks. I am now more concerned about the large number of errors, because I was trying to use these machines as redundant backup. Do you have any resources on how can I track down the source of these errors? Quote Link to comment
trurl Posted March 24 Share Posted March 24 17 minutes ago, discreet-booby4798 said: talking about disk 4 Probably a typo. Disk3 rebuild was compromised by bad parity. Check filesystem on disk3 from the webUI not the command line. Post the output. Quote Link to comment
JorgeB Posted March 24 Share Posted March 24 5 hours ago, trurl said: Probably a typo. Yep, sorry. Quote Link to comment
discreet-booby4798 Posted March 25 Author Share Posted March 25 Ok I ran the check filesystem. (I stripped out the dots) Phase 1 - find and verify superblock bad primary superblock - bad magic number !!! attempting to find secondary superblock Sorry, could not find valid secondary superblock Exiting now Quote Link to comment
JorgeB Posted March 25 Share Posted March 25 That means there's no valid filesystem on that disk, further suggesting parity wasn't valid during the rebuild, so the rebuilt disk could be corrupt. Quote Link to comment
discreet-booby4798 Posted March 25 Author Share Posted March 25 So pop in the original disk 3 and try to repair that disk? Quote Link to comment
JorgeB Posted March 25 Share Posted March 25 If it's still available it's probably your best bet, but for now don't add the disk to the array, see it mounts with the UD plugin. Quote Link to comment
discreet-booby4798 Posted March 25 Author Share Posted March 25 This is what I found. FS: xfs Executing file system check: /sbin/xfs_repair -n '/dev/sdh1' 2>&1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_fdblocks 484728573, counted 488140140 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 3 - agno = 1 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. File system corruption detected! Quote Link to comment
Solution trurl Posted March 25 Solution Share Posted March 25 Do it again without -n. If it asks for it use -L. Post the results. Quote Link to comment
discreet-booby4798 Posted March 25 Author Share Posted March 25 Ok, It did xfs_repair -e, then it asked me to try and mount it, and it mounted. I unmounted, and ran the check again, and got this... FS: xfs Executing file system check: /sbin/xfs_repair -n '/dev/sdh1' 2>&1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. No file system corruption detected! Nothing is in there as expected... I am recalling that it has been complaining about disk tempature, and then the new disk has been complaining about disk temp... I think my last question is how do I put this disk back in as disk 3, and fix the parity? So I am planning on picking up 1. an extra case fan 2. ups 3. ecc ram Any plugins to "stress test" unraid to find the source of these parity issues? Quote Link to comment
JorgeB Posted March 25 Share Posted March 25 39 minutes ago, discreet-booby4798 said: Nothing is in there as expected... What do you mean, isn't this old disk3? Quote Link to comment
discreet-booby4798 Posted March 25 Author Share Posted March 25 On 3/23/2024 at 7:59 AM, discreet-booby4798 said: I was looking to encrypt my drives, so I believe I had migrated everything off of my Toshiba 1tb disk. That's to say this failing disk that I am trying to rebuild, shouldn't have anything on it. But I would like to know I know how to rebuild failed disks correctly before the next emergency... Yes, ti's the old disk... It was actually a Seagate 2tb disk... but either way I do feel more comfortable that I will be able to recover things next time. My next concern is the bad parity. Other than the steps above, and setting parity-check to run frequenlty Is there anything else I can do? 47 minutes ago, discreet-booby4798 said: So I am planning on picking up 1. an extra case fan 2. ups 3. ecc ram Quote Link to comment
trurl Posted March 25 Share Posted March 25 14 minutes ago, discreet-booby4798 said: Yes, ti's the old disk Have you examined it's data? 15 minutes ago, discreet-booby4798 said: setting parity-check to run frequenlty No reason to do that. Most only do monthly or even less frequently. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.