Arcaeus Posted May 4, 2022 Author Share Posted May 4, 2022 4 minutes ago, trurl said: That will give us a chance to see what if anything needs correcting. Do the contents of the disk look reasonably correct? Yea, everything looks reasonably correct. At a glance nothing seems to be missing or broken. file system check completed. I don't see a lost & found folder on the drive so I'm assuming there weren't any errors? Although I'm assuming if this is just a check it's not moving anything. Here is the output from the command: root@MediaVault:~# xfs_repair -nv /dev/sdk1 Phase 1 - find and verify superblock... - block cache size set to 542368 entries Phase 2 - using internal log - zero log... zero_log: head block 20 tail block 20 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Wed May 4 15:02:47 2022 Phase Start End Duration Phase 1: 05/04 15:02:38 05/04 15:02:38 Phase 2: 05/04 15:02:38 05/04 15:02:38 Phase 3: 05/04 15:02:38 05/04 15:02:44 6 seconds Phase 4: 05/04 15:02:44 05/04 15:02:44 Phase 5: Skipped Phase 6: 05/04 15:02:44 05/04 15:02:47 3 seconds Phase 7: 05/04 15:02:47 05/04 15:02:47 Total run time: 9 seconds 6 minutes ago, trurl said: Probably is the plan. It is possible to get it to not rebuild parity, then fool it into thinking it still needs to rebuild disk5. But maybe that won't be necessary if contents of original disk looks good enough. I am still a bit concerned about your hardware and its ability to reliably rebuild anything. Do we think the controller firmware update should have fixed this? I still haven't seen any more errors, although the array has been stopped or in maintenance mode most of this whole time. Thus, I'm not sure if it's really had to do anything or has been communicating heavily with the drives. If it means anything, up until the disks started freaking out all of the files seemed reliable despite the large amount of CRC errors. Quote Link to comment
trurl Posted May 4, 2022 Share Posted May 4, 2022 4 minutes ago, Arcaeus said: if this is just a check it's not moving anything. correct That looks a lot better. Go ahead and run without -n, probably nothing will be lost+found. Then we can go ahead and New Config that disk back into the array. And rebuilding parity will be a good test of the hardware. Quote Link to comment
Arcaeus Posted May 4, 2022 Author Share Posted May 4, 2022 6 minutes ago, trurl said: correct That looks a lot better. Go ahead and run without -n, probably nothing will be lost+found. Then we can go ahead and New Config that disk back into the array. And rebuilding parity will be a good test of the hardware. Command ran again without the n, and nothing is in lost + found: root@MediaVault:~# xfs_repair -v /dev/sdk1 Phase 1 - find and verify superblock... - block cache size set to 542368 entries Phase 2 - using internal log - zero log... zero_log: head block 30 tail block 30 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... XFS_REPAIR Summary Wed May 4 15:24:30 2022 Phase Start End Duration Phase 1: 05/04 15:24:20 05/04 15:24:20 Phase 2: 05/04 15:24:20 05/04 15:24:20 Phase 3: 05/04 15:24:20 05/04 15:24:27 7 seconds Phase 4: 05/04 15:24:27 05/04 15:24:27 Phase 5: 05/04 15:24:27 05/04 15:24:27 Phase 6: 05/04 15:24:27 05/04 15:24:30 3 seconds Phase 7: 05/04 15:24:30 05/04 15:24:30 Total run time: 10 seconds done What's next? Unmount and add back into array? Quote Link to comment
trurl Posted May 4, 2022 Share Posted May 4, 2022 Just to make sure there is no misunderstanding, post a screenshot of Main - Array Devices. Quote Link to comment
Arcaeus Posted May 4, 2022 Author Share Posted May 4, 2022 1 minute ago, trurl said: Just to make sure there is no misunderstanding, post a screenshot of Main - Array Devices. Sure, no problem. Would rather be safe than sorry. Here is what the main - array screen is showing right now. The old disk 5 is currently in UD and mounted. Quote Link to comment
trurl Posted May 4, 2022 Share Posted May 4, 2022 1 minute ago, Arcaeus said: old disk 5 is currently in UD and mounted. Unmount, assign as disk5. When you go to start the array, there should be a checkbox for rebuilding parity. You must rebuild parity. Quote Link to comment
Arcaeus Posted May 4, 2022 Author Share Posted May 4, 2022 (edited) Disk is unmounted and assigned to the disk 5 slot. Array is not started yet. Earlier you mentioned backing up the contents of disk 5. Is that something we need to do before we start the array? 11 minutes ago, trurl said: Unmount, assign as disk5. When you go to start the array, there should be a checkbox for rebuilding parity. You must rebuild parity. Is the checkbox for rebuilding parity the "Parity is already valid" one? Which should be unchecked as we want to rebuild parity. Just wanted to confirm. What about disk 4? Will we handle that afterwords? Edited May 4, 2022 by Arcaeus Quote Link to comment
Arcaeus Posted May 4, 2022 Author Share Posted May 4, 2022 Alright, array has been started with that box unchecked. Parity-Sync/Data-Rebuild is in progress. Quote Link to comment
trurl Posted May 4, 2022 Share Posted May 4, 2022 Backup would have been taken while it was still mounted in UD. You can reconsider backups after parity build completes. During rebuild, you should see a lot of writes to the disk being rebuilt (parity), a lot of reads from all other disks in the array, and zeros in the Errors column on Main. Quote Link to comment
Arcaeus Posted May 4, 2022 Author Share Posted May 4, 2022 Just now, trurl said: Backup would have been taken while it was still mounted in UD. You can reconsider backups after parity build completes. During rebuild, you should see a lot of writes to the disk being rebuilt (parity), a lot of reads from all other disks in the array, and zeros in the Errors column on Main. Understood. No CRC errors so far so seems like the firmware update did the trick. Will update this thread when the parity rebuild completes. Quote Link to comment
trurl Posted May 4, 2022 Share Posted May 4, 2022 1 minute ago, Arcaeus said: No CRC errors Usually when we refer to CRC errors we are talking about the SMART attribute where the disk firmware records those. The disk firmware detects inconsistency in the data it has received. The Errors column in Main I was referring to include all I/O errors, many would not be recorded as CRC errors because the disk never received any data to check. Quote Link to comment
Arcaeus Posted May 6, 2022 Author Share Posted May 6, 2022 (edited) On 5/4/2022 at 4:38 PM, trurl said: Usually when we refer to CRC errors we are talking about the SMART attribute where the disk firmware records those. The disk firmware detects inconsistency in the data it has received. The Errors column in Main I was referring to include all I/O errors, many would not be recorded as CRC errors because the disk never received any data to check. Alright, Parity sync has completed. Zero IO errors and zero CRC errors. On the dashboard, the log is showing 92% full. What log is that referring to and how do I check it or clean it out? Edited May 6, 2022 by Arcaeus Quote Link to comment
JorgeB Posted May 6, 2022 Share Posted May 6, 2022 17 minutes ago, Arcaeus said: What log is that referring to and how do I check it or clean it out? Rebooting will clear that, you might want to save the diags before doing in case they are needed, and if the log keeps growing you should post new diags. Quote Link to comment
Arcaeus Posted May 6, 2022 Author Share Posted May 6, 2022 (edited) 5 hours ago, JorgeB said: Rebooting will clear that, you might want to save the diags before doing in case they are needed, and if the log keeps growing you should post new diags. Yep, looks like that got cleared out and so far staying at 1%. Alright, last question I have would be the process to format & mount the 2 blank 16TB drives currently sitting in UD. The idea here is that I would use these as a local backup of the data just in case something happens to the data on the array (like what almost happened here). While I wouldn't have parity in these, the plan would be to have a complete backup of everything that is kept in sync with Rclone or something similar, as well as a cloud backup offsite. Currently I can see the disks, but the Mount button is greyed out (I'm assuming due to them being precleared but not formatted yet). Destructive mode is enabled and UD Plus plugin is installed. Where do I go to format those? I'm assuming that I should just format them in XFS as I don't plan to remove them from my server, but does in make any sense to format them in NTFS? I saw this link that you had posted a few years back but it looks broken: https://forums.lime-technology.com/topic/44104-unassigned-devices-managing-disk-drives-outside-of-the-unraid-array/ . Is there an updated link or what are your thoughts on this idea and a process to complete it? Edited May 6, 2022 by Arcaeus Quote Link to comment
JorgeB Posted May 6, 2022 Share Posted May 6, 2022 You need to enable destructive mode in the UD settings to format disks, I would just use XFS, they can always be read in any Linux computer, or by using an Unraid trial key. Quote Link to comment
Arcaeus Posted May 6, 2022 Author Share Posted May 6, 2022 4 minutes ago, JorgeB said: You need to enable destructive mode in the UD settings to format disks, I would just use XFS, they can always be read in any Linux computer, or by using an Unraid trial key. Here is what I'm seeing as I must be missing something: Quote Link to comment
JorgeB Posted May 6, 2022 Share Posted May 6, 2022 Remove the existing partitions by clicking on the red x then format. Quote Link to comment
Arcaeus Posted May 6, 2022 Author Share Posted May 6, 2022 1 hour ago, JorgeB said: Remove the existing partitions by clicking on the red x then format. So when I do that, it gives me this message saying that I would remove the preclear signature and it would have to re-clear the disk. I guess that doesn't matter as I'm not adding the disk into the array? Quote Link to comment
JorgeB Posted May 6, 2022 Share Posted May 6, 2022 15 minutes ago, Arcaeus said: I guess that doesn't matter as I'm not adding the disk into the array? Correct. Quote Link to comment
Solution Arcaeus Posted May 9, 2022 Author Solution Share Posted May 9, 2022 (edited) Marking this issue as solved. Once the LSI 9207-8i was updated from FWVersion(20.00.00.00) to 20.00.07.00 (latest at this time), the CRC and IO errors stopped. The remaining posts were to resolve possible data corruption issues on my drives. Now SABnzbd is showing "OSError: [Errno 5] Input/output error: '/data/usenet_incomplete/...", and I opened a thread in the Binhex-SABnzbd channel here: https://forums.unraid.net/topic/44118-support-binhex-sabnzbd/?do=findComment&comment=1124052 Edited May 9, 2022 by Arcaeus Quote Link to comment
Arcaeus Posted May 10, 2022 Author Share Posted May 10, 2022 (edited) On 5/6/2022 at 12:46 PM, JorgeB said: Correct. Hey Jorge, I'm trying to figure out some share errors that aren't showing up now. When I ran 'ls -lah /mnt' it's showing question marks for disk 7 despite it showing ok in Main: root@MediaVault:~# ls -lah /mnt /bin/ls: cannot access '/mnt/disk7': Input/output error total 16K drwxr-xr-x 19 root root 380 May 6 10:54 ./ drwxr-xr-x 21 root root 480 May 10 09:58 ../ drwxrwxrwx 1 nobody users 80 May 6 11:04 cache/ drwxrwxrwx 9 nobody users 138 May 6 11:04 disk1/ drwxrwxrwx 8 nobody users 133 May 7 16:10 disk10/ drwxrwxrwx 5 nobody users 75 May 7 16:10 disk11/ drwxrwxrwx 6 nobody users 67 May 6 11:04 disk2/ drwxrwxrwx 9 nobody users 148 May 6 11:04 disk3/ drwxrwxrwx 4 nobody users 41 May 6 11:04 disk4/ drwxrwxrwx 10 nobody users 166 May 6 11:04 disk5/ drwxrwxrwx 7 nobody users 106 May 6 11:04 disk6/ d????????? ? ? ? ? ? disk7/ drwxrwxrwx 6 nobody users 73 May 7 16:10 disk8/ drwxrwxrwx 6 nobody users 67 May 6 11:04 disk9/ drwxrwxrwt 5 nobody users 100 May 6 13:41 disks/ drwxrwxrwt 2 nobody users 40 May 6 10:52 remotes/ drwxrwxrwt 2 nobody users 40 May 6 10:52 rootshare/ drwxrwxrwx 1 nobody users 138 May 6 11:04 user/ drwxrwxrwx 1 nobody users 138 May 6 11:04 user0/ I ran the file system check on disk 7 and got this output: entry ".." at block 0 offset 80 in directory inode 282079658 references non-existent inode 6460593193 entry ".." at block 0 offset 80 in directory inode 282079665 references non-existent inode 4315820253 entry "Season 1" in shortform directory 322311173 references non-existent inode 2234721490 would have junked entry "Season 1" in directory inode 322311173 entry "Season 2" in shortform directory 322311173 references non-existent inode 4579464692 would have junked entry "Season 2" in directory inode 322311173 entry "Season 3" in shortform directory 322311173 references non-existent inode 6453739819 would have junked entry "Season 3" in directory inode 322311173 entry "Season 5" in shortform directory 322311173 references non-existent inode 2234721518 would have junked entry "Season 5" in directory inode 322311173 entry "Season 6" in shortform directory 322311173 references non-existent inode 4760222352 would have junked entry "Season 6" in directory inode 322311173 would have corrected i8 count in directory 322311173 from 3 to 0 entry ".." at block 0 offset 80 in directory inode 322311206 references non-existent inode 6453779144 entry ".." at block 0 offset 80 in directory inode 322311221 references non-existent inode 6460505325 No modify flag set, skipping phase 5 Inode allocation btrees are too corrupted, skipping phases 6 and 7 No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Tue May 10 10:06:15 2022 Phase Start End Duration Phase 1: 05/10 10:06:11 05/10 10:06:11 Phase 2: 05/10 10:06:11 05/10 10:06:11 Phase 3: 05/10 10:06:11 05/10 10:06:15 4 seconds Phase 4: 05/10 10:06:15 05/10 10:06:15 Phase 5: Skipped Phase 6: Skipped Phase 7: Skipped Total run time: 4 seconds Is there any reason to not run the file system check without the -n flag now? After that completes would I rebuild the drive like we did before or how does that work? New diags attached if needed. mediavault-diagnostics-20220510-1009.zip Edited May 10, 2022 by Arcaeus Quote Link to comment
JorgeB Posted May 10, 2022 Share Posted May 10, 2022 No need to rebuild since the disk is enable, but you do need to run xfs_repair without -n to fix the current corruptions. Quote Link to comment
Arcaeus Posted May 10, 2022 Author Share Posted May 10, 2022 Just now, JorgeB said: No need to rebuild since the disk is enable, but you do need to run xfs_repair without -n to fix the current corruptions. Attempted to repair the file system as I was getting local errors on the monitor attached to the bare metal computer. When trying to run 'xfs_repair -v /dev/md7' received this error: root@MediaVault:~# xfs_repair -v /dev/md7 Phase 1 - find and verify superblock... - block cache size set to 542376 entries Phase 2 - using internal log - zero log... zero_log: head block 159593 tail block 159588 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Restarted the array regularly, and now disk 7 is showing 'Unmountable: not mounted' in Main - Array (attached). Quote Link to comment
Arcaeus Posted May 10, 2022 Author Share Posted May 10, 2022 17 minutes ago, JorgeB said: Use -L Ok ran it with that flag, but my computer (not Unraid Server) crashed midway through and killed the SSH session. Is there a way to check on the progress? Maybe run the same command but with the -n flag to check? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.