MatrixMJK Posted March 18, 2021 Share Posted March 18, 2021 Hello, Been using UnRAID for a few years now, had a few bumps along the way and using the forums usually been able to help myself. I have read through a couple of other posts with similar/same problems, but they are a couple years old and I want to make sure I proceed correctly. A month or so ago I replaced my parity drive with a larger one, had a 10TB and replaced with a 12TB, since those are getting better price wise. I have mosttly 8TB drives and now the one 12TB and precleared the 10TB and using it as a data drive now. I also bought a second 12TB and precleared it as well to have as a spare. A few days ago I was getting a lot of read errors and drive 6 dropped out. Drive 5 also had some read errors, but UnRAID did not kick it out. I know this can be a bad/lose power or data cable, so I stopped the array and shut down the system. I went through and reseated all the drive connections and powered the system back up and drive 6 was still marked as not useable. So I thought to be safe I would put the precleared 12TB spare drive in the system and replace drive 6 with it, then later I would preclear the original drive 6 and see if it really had problems. I shutdown the system and installed the 12TB spare, powered up, and selected the new drive for slot 6. I started the array and the rebuild took place. It stopped a couple time from the parity check pause plugin but it resumed and finished. I thought it was OK, but drive six showed as after the parity completed. It may have listed it that way during the rebuild, but I am not sure at this point. I still have the original drive 6 in the system, and it shows up right now as 'Dev 2' in unassigned devices as shown in the screen shot. I have attached a diagnostic and screen shot. I'd like some advice as how to proceed, as I am not sure the REAL state of the array. Thank you for your time and help. Matt tower-diagnostics-20210317-2210.zip Quote Link to comment
JorgeB Posted March 18, 2021 Share Posted March 18, 2021 Check filesystem on disk 6: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Run it without -n, and if it asks for -L use it. Quote Link to comment
MatrixMJK Posted March 18, 2021 Author Share Posted March 18, 2021 (edited) Ok, I ran the disk check without the -n and then added the -L as instructed. Did not take long but generated thousands of entries like this: entry ".." in directory inode 8879439719 points to non-existent inode 4394111961 bad hash table for directory inode 8879439719 (no data entry): rebuilding rebuilding directory inode 8879439719 After doing the disk check, I re-started the array and the drive shows up correctly, but only has 1.31TB of the data that used to be just over 7TB. So I'm not sure if I trust the data on the device. Would it be any benefit to put in a new drive (I have two more 12TB drives un-used but NOT yet pre-cleared) and rebuild onto it? Just as an FYI, the drive now in UnRAIDs slot 6 is on a new cable, not the same cable from the original drive 6 that was reporting read errors and started this whole process. I may also want to change one or more of the SAS cables, but don't want to change too much to impede troubleshooting. Thanks again for help with this. Matt Edited March 19, 2021 by MatrixMJK Quote Link to comment
JorgeB Posted March 19, 2021 Share Posted March 19, 2021 9 hours ago, MatrixMJK said: Would it be any benefit to put in a new drive (I have two more 12TB drives un-used but NOT yet pre-cleared) and rebuild onto it? That won't help, since parity is always updated real time. Best bet is the old disk, it looks healthy, so you should be able to mount it with UD and copy the data to the new disk/array, note that you need to change the XFS UUID first to be able to mount both at the same time, that can be done in the UD settings. Quote Link to comment
MatrixMJK Posted March 19, 2021 Author Share Posted March 19, 2021 Thank you for that suggestion. I did try mounting it but got the UUID error. I now tried to change the UUID, but still getting a superblock error. Below is the log from before the UUID change, and after. Looks like it can't change the UUID till the log replays or the superblock is repaired. Mar 19 08:57:53 Tower unassigned.devices: Adding disk '/dev/sdd1'... Mar 19 08:57:53 Tower unassigned.devices: Mount drive command: /sbin/mount -t xfs -o rw,noatime,nodiratime '/dev/sdd1' '/mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W' Mar 19 08:57:53 Tower kernel: XFS (sdd1): Filesystem has duplicate UUID 5f37ccbd-b83f-40d0-be94-6aa9b2c0c81f - can't mount Mar 19 08:57:53 Tower unassigned.devices: Mount of '/dev/sdd1' failed. Error message: mount: /mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error. Mar 19 08:59:57 Tower unassigned.devices: Changing disk '/dev/sdd' UUID. Result: ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_admin. If you are unable to mount the filesystem, then use the xfs_repair -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Mar 19 09:00:05 Tower unassigned.devices: Adding disk '/dev/sdd1'... Mar 19 09:00:05 Tower unassigned.devices: Mount drive command: /sbin/mount -t xfs -o rw,noatime,nodiratime '/dev/sdd1' '/mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W' Mar 19 09:00:05 Tower kernel: XFS (sdd1): Filesystem has duplicate UUID 5f37ccbd-b83f-40d0-be94-6aa9b2c0c81f - can't mount Mar 19 09:00:05 Tower unassigned.devices: Mount of '/dev/sdd1' failed. Error message: mount: /mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error. So I ran from the terminal: root@Tower:~# xfs_repair -L /dev/sdd1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_fdblocks 221012495, counted 222965907 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 5 - agno = 4 - agno = 2 - agno = 3 - agno = 1 - agno = 7 - agno = 6 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:1780937) is ahead of log (1:2). Format log to cycle 4. done root@Tower:~# but still getting this in the disk log after the xfs_repair and trying to change the UUID in 'Settings - Unassigned Devices' and no mounting of the drive: Mar 19 09:10:44 Tower unassigned.devices: Error: shell_exec(/usr/sbin/xfs_admin -U generate /dev/sdd1) took longer than 1s! Mar 19 09:10:44 Tower unassigned.devices: Changing disk '/dev/sdd' UUID. Result: command timed out Mar 19 09:10:51 Tower unassigned.devices: Adding disk '/dev/sdd1'... Mar 19 09:10:51 Tower unassigned.devices: Mount drive command: /sbin/mount -t xfs -o rw,noatime,nodiratime '/dev/sdd1' '/mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W' Mar 19 09:10:51 Tower kernel: XFS (sdd1): Filesystem has duplicate UUID 5f37ccbd-b83f-40d0-be94-6aa9b2c0c81f - can't mount Mar 19 09:10:51 Tower unassigned.devices: Mount of '/dev/sdd1' failed. Error message: mount: /mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error. Thanks! Matt Quote Link to comment
JorgeB Posted March 19, 2021 Share Posted March 19, 2021 It's still showing a duplicate UUID, try changing it manually: xfs_admin -U generate /dev/sdX1 Replace X with the correct letter, if still the same it's /dev/sdd1, note the 1 in the end. Quote Link to comment
MatrixMJK Posted March 19, 2021 Author Share Posted March 19, 2021 Thank you again for the command (I don't have a lot of experience with XFS tools) root@Tower:~# xfs_admin -U generate /dev/sdd1 totally zeroed log Clearing log and setting UUID writing all SBs new UUID = df031299-61a0-458e-98b9-9bf4e1cd2f1d root@Tower:~# So that took a minute or so but worked, but now the disk log shows this when trying to mount: Mar 19 09:16:22 Tower unassigned.devices: Error: shell_exec(/usr/sbin/xfs_admin -U generate /dev/sdd1) took longer than 1s! Mar 19 09:16:22 Tower unassigned.devices: Changing disk '/dev/sdd' UUID. Result: command timed out Mar 19 09:31:54 Tower unassigned.devices: Adding disk '/dev/sdd1'... Mar 19 09:31:54 Tower unassigned.devices: Mount drive command: /sbin/mount -t xfs -o rw,noatime,nodiratime '/dev/sdd1' '/mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W' Mar 19 09:31:54 Tower kernel: XFS (sdd1): Mounting V5 Filesystem Mar 19 09:31:54 Tower kernel: XFS (sdd1): Corruption warning: Metadata has LSN (1:1780937) ahead of current LSN (1:2). Please unmount and run xfs_repair (>= v4.3) to resolve. Mar 19 09:31:54 Tower kernel: XFS (sdd1): log mount/recovery failed: error -22 Mar 19 09:31:54 Tower kernel: XFS (sdd1): log mount failed Mar 19 09:31:54 Tower unassigned.devices: Mount of '/dev/sdd1' failed. Error message: mount: /mnt/disks/ST8000DM004-2CX188_ZCT0LQ5W: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error. I'm not sure what it is asking me to run with xfs_repair. I'll do some research too. Thank you very much for your time and effort, it is appreciated! Matt Quote Link to comment
JorgeB Posted March 19, 2021 Share Posted March 19, 2021 9 minutes ago, MatrixMJK said: I'm not sure what it is asking me to run with xfs_repair. It is; xfs_repair -v /dev/sdd1 1 Quote Link to comment
MatrixMJK Posted March 19, 2021 Author Share Posted March 19, 2021 (edited) Sweet that worked and I can mount it! The "(>= v4.3)" threw me, I thought it was asking to specify a version of the tool. I can move the data back now I hope! By the way what is a safe way to move those files back into place? Should I use 'Krusader' on the UnRAID box to the specific drive or just to the shares from a client machine? I think I have read not to copy directly to the drive share eg. 'Drive 6'. Just so I don't run into this again, did I do something wrong or miss a step when replacing the drive that had issues? I have replaced a couple drives in the past but never had a problem doing it. You guys need a tip jar (or do you have one?) Thanks again for the assistance! Matt Edited March 19, 2021 by MatrixMJK Quote Link to comment
JorgeB Posted March 19, 2021 Share Posted March 19, 2021 1 hour ago, MatrixMJK said: By the way what is a safe way to move those files back into place? Yes. 1 hour ago, MatrixMJK said: Should I use 'Krusader' on the UnRAID box to the specific drive or just to the shares from a client machine? If you're using Windows 10 you can use windows explorer to move/copy from the UD device to the array, and the transfer will still be done locally, it won't use the network. 1 hour ago, MatrixMJK said: Just so I don't run into this again, did I do something wrong or miss a step when replacing the drive that had issues? Diags posted don't show anything out of the ordinary, but before the time covered there you mentioned another disk with read errors while that one was disable, this could very easily have caused issues with the emulated disk, since you only have one parity all the other disks need to be 100% for the emulated disk to be OK. Quote Link to comment
MatrixMJK Posted March 19, 2021 Author Share Posted March 19, 2021 So I just ran 'xfs_repair -nv' from the GUI on disk5 (the other one that showed read errors and the output looks good to me: Phase 1 - find and verify superblock... - block cache size set to 3018840 entries Phase 2 - using internal log - zero log... zero_log: head block 1283169 tail block 1283169 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 7 - agno = 2 - agno = 4 - agno = 6 - agno = 5 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Fri Mar 19 15:07:04 2021 Phase Start End Duration Phase 1: 03/19 15:06:46 03/19 15:06:46 Phase 2: 03/19 15:06:46 03/19 15:06:46 Phase 3: 03/19 15:06:46 03/19 15:06:56 10 seconds Phase 4: 03/19 15:06:56 03/19 15:06:56 Phase 5: Skipped Phase 6: 03/19 15:06:56 03/19 15:07:04 8 seconds Phase 7: 03/19 15:07:04 03/19 15:07:04 Total run time: 18 seconds Anything else I can/should check for silent corruption? Quote Link to comment
JorgeB Posted March 20, 2021 Share Posted March 20, 2021 That's not the issue, the problem could have been a result of the read errors on disk5 when another disk was being emulated, that disk could be corrupted. Quote Link to comment
MatrixMJK Posted March 20, 2021 Author Share Posted March 20, 2021 This thread can be marked solved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.