Mizerka Posted June 23, 2020 Share Posted June 23, 2020 So had some heat issues today, some disks hit 60c before I realised. Anyway, I sorted but found some strange behaviour, but primary io write errors from smb, so loaded up logs and found a lot of issues. took array down and up again, no go. rebooted in maintainence and found disk14 reported xfs_check issues, but then after leaving it for a while and checking logs it's filled with below. So.... how bad is it? looking at docs I should run xfs_check -V /dev/sdX ,which I tried with disk14 which was only one that actually reported issue in webgui using xfs check. But that's been running for past 15mins trying to find 2ndary superblocks in filesystem So, help please Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 (edited) xfs_repair in webgui, after 2nd run of -nv said, "just start the array up bro, it'll be good", as doubtful as I was, I tried it and so far... it seems okay. but scrubbing btrfs cache as well just in case. edit; ye okay, spoke too soon. thoughts on best action? I'll try to unmount again and repair but doubt it'll work Edited June 23, 2020 by Mizerka Quote Link to comment
trurl Posted June 23, 2020 Share Posted June 23, 2020 39 minutes ago, Mizerka said: run xfs_check -V /dev/sdX You can't repair the sdX device, and you shouldn't repair the sdX1 partition of a disk in the array, or you will invalidate parity. You must run it on the md# device. 10 minutes ago, Mizerka said: xfs_repair in webgui If you do it from the webUI then it will do it correctly. Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 (edited) 5 minutes ago, trurl said: You can't repair the sdX device, and you shouldn't repair the sdX1 partition of a disk in the array, or you will invalidate parity. You must run it on the md# device. If you do it from the webUI then it will do it correctly. I see, the doc specifies either can be used, I'll try now with md# instead, to confirm if I want to run it from webui I should just change -nv to -v? trying to run from webui now only displays this; not allowing any actions using md# also returns device is busy Edited June 23, 2020 by Mizerka Quote Link to comment
trurl Posted June 23, 2020 Share Posted June 23, 2020 You have to stop the array, then start it in maintenance mode, to run repair. https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui If you already repaired the sdX1 partition using the command line, then you have already invalidated parity. A correcting parity check would be needed in this case. Quote Link to comment
trurl Posted June 23, 2020 Share Posted June 23, 2020 Also, see this section on that same wiki page: https://wiki.unraid.net/Check_Disk_Filesystems#Drive_names_and_symbols Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 (edited) 8 minutes ago, trurl said: Also, see this section on that same wiki page: https://wiki.unraid.net/Check_Disk_Filesystems#Drive_names_and_symbols okay, ye makes sense, so run it against md# instead, I've gone back to maintenance and I'm getting the errors in edit, md14 is saying drive busy and webui refuses to run beyond -n/-nv I've tried to run repair, but it never got past saying magic number failed and trying to find secondary superblock which outputs this if it helps Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... Metadata CRC error detected at 0x43c89d, xfs_bnobt block 0x3a381e28/0x1000 Metadata CRC error detected at 0x43c89d, xfs_bnobt block 0x74703c48/0x1000 btree block 1/1 is suspect, error -74 btree block 2/1 is suspect, error -74 bad magic # 0xdaa0086c in btbno block 1/1 bad magic # 0x2fdfba35 in btbno block 2/1 Metadata CRC error detected at 0x43c89d, xfs_cntbt block 0x3a381e30/0x1000 btree block 1/2 is suspect, error -74 bad magic # 0x419e48e9 in btcnt block 1/2 agf_freeblks 122094523, counted 0 in ag 1 agf_longest 122094523, counted 0 in ag 1 Metadata CRC error detected at 0x43c89d, xfs_cntbt block 0x74703c50/0x1000 btree block 2/2 is suspect, error -74 bad magic # 0xa8692ca5 in btcnt block 2/2 agf_freeblks 121856058, counted 0 in ag 2 agf_longest 121856058, counted 0 in ag 2 Metadata CRC error detected at 0x46ad5d, xfs_inobt block 0x3a381e38/0x1000 btree block 1/3 is suspect, error -74 Metadata CRC error detected at 0x46ad5d, xfs_inobt block 0x74703c58/0x1000 bad magic # 0x639e272e in inobt block 1/3 btree block 2/3 is suspect, error -74 bad magic # 0x796a2ce3 in inobt block 2/3 Metadata CRC error detected at 0x46ad5d, xfs_inobt block 0xaea85a78/0x1000 btree block 3/3 is suspect, error -74 bad magic # 0x15f1f03 in inobt block 3/3 sb_ifree 59, counted 44 sb_fdblocks 2926555418, counted 2681574888 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 4 - agno = 3 - agno = 14 - agno = 22 - agno = 8 - agno = 9 - agno = 5 - agno = 6 - agno = 10 - agno = 12 - agno = 15 - agno = 16 - agno = 13 - agno = 17 - agno = 18 - agno = 2 - agno = 21 - agno = 7 - agno = 19 - agno = 20 - agno = 23 - agno = 11 No modify flag set, skipping phase 5 Inode allocation btrees are too corrupted, skipping phases 6 and 7 Maximum metadata LSN (904557511:-555599277) is ahead of log (1:6247). Would format log to cycle 904557514. No modify flag set, skipping filesystem flush and exiting. Edited June 23, 2020 by Mizerka Quote Link to comment
trurl Posted June 23, 2020 Share Posted June 23, 2020 What version of Unraid are you running? Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 1 minute ago, trurl said: What version of Unraid are you running? Version 6.8.3 2020-03-05 Stable afaik Quote Link to comment
trurl Posted June 23, 2020 Share Posted June 23, 2020 34 minutes ago, Mizerka said: xfs_repair in webgui, after 2nd run of -nv said, "just start the array up bro, it'll be good" What did it actually say? The -n (nomodify) flag means check but don't repair anything. 1 Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 1 minute ago, trurl said: What did it actually say? The -n (nomodify) flag means check but don't repair anything. running webgui with a -v flag gives this output; Phase 1 - find and verify superblock... - block cache size set to 6097840 entries Phase 2 - using internal log - zero log... zero_log: head block 6247 tail block 6235 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 also attached diagnostics if you want to have a look but doubt there's anything interesting in config side of this nekounraid-diagnostics-20200623-2216.zip Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 (edited) 47 minutes ago, trurl said: What did it actually say? The -n (nomodify) flag means check but don't repair anything. After looking around forums a bit more came across similar post, mod advised to run against /dev/mapper/md# if drives are encrypted (all of mine are btw), then to -L it. which spits out this output, same as webui Clearly it wants me to run with -L but that sounds destructive? It's a 12tb mostly filled, I'd really hate to lose it, at this point I'd almost be better to remove it and let parity emulate it probably and move data around before reformatting and adding back to array? Edited June 23, 2020 by Mizerka Quote Link to comment
Mizerka Posted June 23, 2020 Author Share Posted June 23, 2020 okay, so I think I'm good now, ended up booting back into full array with md14 mounted, moved all data off of it without issues, then went back into maintenance and could now run -v, once complete I've started array again and seems good fine for last 20mins or so, crisis averted for now. if it didn't -v I'd probably -L and just reformat it if it corrupts the filesystem. Quote Link to comment
trurl Posted June 23, 2020 Share Posted June 23, 2020 Running it from the webUI on encrypted drives should still do the correct thing; i.e., 1 hour ago, Mizerka said: run against /dev/mapper/md# -L is usually necessary, since the log can't be used if the disk is unmountable. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.