Gico Posted June 25, 2023 Share Posted June 25, 2023 (edited) Sequence of events: 1. I had a working array with one parity drive and a spare disk already in the server. 2. Stopped the array, assigned the spare disk to 2nd parity and started the array. 3. disk 5 turned invalid (Red X) immediately. 4. I stopped the array, unassigned disk 5, started the array, stopped it, reassigned disk 5 and started it again, so 2nd parity and disk 5 being rebuilt. 5. Overnight disk 11 got read errors, but still in the array (no red-X). Parity built seems to continues, but in main I can see no writes to parity 2 and to disk 5 and also no reads from disk 11. I can access these 2 disks fs, but the files data is corrupted. I had issues with Red-Xs in multiple disks in the past that were caused by insufficient cooling to the server, and in particular to the on board HBA. I'm guessing that the data on these disks is ok, but this needed to be checked when they are unassigned. Ideas how to continue now, please. Update: - disk 11 actually dropped from the array because it appears in the Unassigned Devices list. - I can't stop the Parity operation through the web interface: It just continued when I try. How can I stop it through CLI? juno-diagnostics-20230625-0704.zip Edited June 25, 2023 by Gico Quote Link to comment
JorgeB Posted June 25, 2023 Share Posted June 25, 2023 Type reboot in the console, if it doesn't after 5 minutes you will need to force it, then replace disk11 cables/swap slots and try again. Quote Link to comment
Gico Posted June 25, 2023 Author Share Posted June 25, 2023 I rebooted, started in maintenance mode and executed xfs_repair -n for disks 5 and 11. Logs attached. They seemed ok to me so I did new config, preserved all current assignments and started the array. Disk 11 is mounted, Disk 5 is not (syslog following), so now I think only option is maintenance mode and xfs_repair with correction, right? The disk fs type is set to "auto" so I can't execute the command from the gui. Any ideas / What is the command I need to run? Jun 25 13:59:01 Juno emhttpd: mounting /mnt/disk5 Jun 25 13:59:01 Juno emhttpd: shcmd (279): mkdir -p /mnt/disk5 Jun 25 13:59:01 Juno emhttpd: shcmd (280): mount -t xfs -o noatime,nouuid /dev/md5p1 /mnt/disk5 Jun 25 13:59:01 Juno kernel: XFS (md5p1): Mounting V5 Filesystem Jun 25 13:59:02 Juno kernel: XFS (md5p1): Corruption warning: Metadata has LSN (4:1317887) ahead of current LSN (4:1316525). Please unmount and run xfs_repair (>= v4.3) to resolve. Jun 25 13:59:02 Juno kernel: XFS (md5p1): log mount/recovery failed: error -22 Jun 25 13:59:02 Juno kernel: XFS (md5p1): log mount failed Jun 25 13:59:02 Juno root: mount: /mnt/disk5: wrong fs type, bad option, bad superblock on /dev/md5p1, missing codepage or helper program, or other error. Jun 25 13:59:02 Juno root: dmesg(1) may have more information after failed mount system call. Jun 25 13:59:02 Juno emhttpd: shcmd (280): exit status: 32 Jun 25 13:59:02 Juno emhttpd: /mnt/disk5: no btrfs or device /dev/md5p1 is not single Jun 25 13:59:02 Juno emhttpd: /usr/sbin/zpool import -d /dev/md5p1 2>&1 Jun 25 13:59:02 Juno emhttpd: no pools available to import Jun 25 13:59:02 Juno emhttpd: disk5: no uuid Jun 25 13:59:02 Juno emhttpd: shcmd (281): mount -t reiserfs -o noatime,user_xattr,acl /dev/md5p1 /mnt/disk5 Jun 25 13:59:02 Juno kernel: REISERFS warning (device md5p1): sh-2021 reiserfs_fill_super: can not find reiserfs on md5p1 Jun 25 13:59:02 Juno root: mount: /mnt/disk5: wrong fs type, bad option, bad superblock on /dev/md5p1, missing codepage or helper program, or other error. Jun 25 13:59:02 Juno root: dmesg(1) may have more information after failed mount system call. Jun 25 13:59:02 Juno emhttpd: shcmd (281): exit status: 32 Jun 25 13:59:02 Juno emhttpd: /mnt/disk5 mount error: Unsupported or no file system Jun 25 13:59:02 Juno emhttpd: shcmd (282): rmdir /mnt/disk5 xfs_repair.Disk11.txt xfs_repair.Disk5.txt Quote Link to comment
itimpi Posted June 25, 2023 Share Posted June 25, 2023 If you are sure the disk is XFS format, then with the array stopped you can click on it on the Main tab and explicitly set the format to xfs. When you now start in Maintenance mode you will offered the option to run xfs_repair via the GUI. Quote Link to comment
Gico Posted June 25, 2023 Author Share Posted June 25, 2023 Yes I'm sure 🙂 Anyway this saga ended for now: Stopped the array, changed the fs type to xfs and started in maintenance mode. xfs_repair -n still seemed ok. xfs_repair generated this error: ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. So I stopped the array and tried again (Unsuccessfully) to mount. Next was maintenance mode again and xfs_repair -L which WAS SUCCSSFULL. All drives are mounted, content of disk5 seems ok, and Parity-Sync is running. Thanks a lot @JorgeB and @itimpi! 1 Quote Link to comment
Gico Posted June 27, 2023 Author Share Posted June 27, 2023 Parity-Sync is going quite slowly at about 55MB/sec, sometimes even slower. disk 5 got again some read errors. still has a green dot, but it's read speed is about twice the speed of other disks so I guess it has issues and it delays the Parity-Sync. Occasionally disk 11 speeds also has faster read speed than the rest of the disks. Both disks are of the same model (Seagate 16TB HDD Exos X16) and bought together on Amazon. They are probably out of warranty. My questions: - Can I pause the Parity-Sync, shut down the server to check/replace cables and continue the Parity-Sync after boot? - Can I somehow check the disks before starting back the array? maybe mount in read-only and do data-read-test somehow? Attached smart from both disks juno-smart-20230627-1006.zip juno-smart-20230627-0927.zip Quote Link to comment
Gico Posted June 27, 2023 Author Share Posted June 27, 2023 Attached. juno-diagnostics-20230627-1201.zip Quote Link to comment
Solution JorgeB Posted June 27, 2023 Solution Share Posted June 27, 2023 Parity cannot be correctly built with read errors on data disks, I would cancel the sync, replace cables for disk5 and try again. Quote Link to comment
Gico Posted June 27, 2023 Author Share Posted June 27, 2023 (edited) But if the disk still has a green dot, doesn't it mean that the fs overcame this issue and the data was read successfully? Edited June 27, 2023 by Gico Quote Link to comment
JorgeB Posted June 27, 2023 Share Posted June 27, 2023 12 minutes ago, Gico said: But if the disk still has a green dot Green dot only means that the disk is not disabled, for parity to be successfully build there can't be any read errors from any data disks, if you continue parity won't be valid for those affected sectors. 1 Quote Link to comment
JorgeB Posted June 27, 2023 Share Posted June 27, 2023 If you prefer you can let it finish, and then run a correcting parity check once the issue is solved, but that will take longer. Quote Link to comment
Gico Posted June 27, 2023 Author Share Posted June 27, 2023 (edited) I cancelled, but why unraid won't stop the parity sync if it's unreliable? Similarly when at first I added the 2nd parity drive, and one disk was disabled, parity sync continued without writing further to the new parity disk, so again..imo unraid should have stopped the parity-sync. Will update when I'll get home. Edit: How do the smart reports of disk5 and disk 11 look like? Are they ok apart from the read issues which might be caused by cables? Edited June 27, 2023 by Gico Quote Link to comment
JorgeB Posted June 27, 2023 Share Posted June 27, 2023 9 minutes ago, Gico said: I cancelled, but why unraid won't stop the parity sync if it's unreliable? Because if it was a disk problem it would still create parity, it wouldn't be 100% in sync but almost, better than nothing. Quote Link to comment
Gico Posted June 27, 2023 Author Share Posted June 27, 2023 SAS cable replaced although no external damage can be seen. Started the array and parity-sync. All disks have same read speed, similar to previous parity-syncs. Quote Link to comment
Gico Posted July 2, 2023 Author Share Posted July 2, 2023 Parity-Sync completed successfully. Thanks for the help. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.