wilsonhomelab Posted March 21, 2022 Share Posted March 21, 2022 (edited) A month ago, I transferred my array (Ironwolf 8tb x2, 8 months old) from desktop hardwares to server hardwares (dual Xeon E5-2680 v4, 64GB Samsung ecc memory, Supermicro X10DRL-i server mobo ). Unraid recognised all drives (parity, disk 1) in the array and ran smoothly for couple of weeks. 10 day ago, I purchased an additional Ironwolf 8tb (disk2) and added to the array after preclear. I ran parity check without any problem with 220MB/s average speed. Last week, the disk 1 started showing errors after I transferred some media files from unassigned drives using Krusader (docker container). So I tried swapping SATA cable / SATA ports, but the disk errors only seemed worse (from 4 errors to 80 errors). Then I tried parity check which only gave me KB/s speed (I paused it). I knew something was wrong. I quickly checked the diagnostic and showing "ata6: hard resetting link", "ata6: link is slow to respond, please be patient (ready=0)" unraid-xeon-diagnostics-20220312-1735.zip Last night, I saw the accumulated disk 1 errors as high as 800 (after a week), so in order to rule out the potential problems from the mobo SATA ports and the SATA calbles I used, I installed a working reliable LSI SAS2008 with a working reliable SFF8087-to-4-SATA cable. After booting up Unraid, all disks were recognised, but disk1 and disk 2 were both "unmountable" with option to format. unraid-xeon-diagnostics-20220321-0240.zip I then upgraded the mobo BIOS to the latest and ran 6 hrs of Memtest86+ without error. This morning, I restarted Unraid and saw disk 1 "unmountable"; parity and disk 2 are working. Shares stored in disk 1 is not showing up (not emulated at all). I just wonder why emulation is not working with only one disk down? unraid-xeon-diagnostics-20220321-0909.zip Could you please also advise what should I do now? repair the xfs file system? Please help~ Edited March 21, 2022 by wilsonhomelab Quote Link to comment
wilsonhomelab Posted March 21, 2022 Author Share Posted March 21, 2022 I removed the disk 1, and restarted Unraid. As I expected, the parity drive doesn't emulate the missing drive. Please help! My priority will be recovering the missing data from parity. I hope it will be still possible. Then test the problem drive as it is still under warranty. Quote Link to comment
itimpi Posted March 21, 2022 Share Posted March 21, 2022 Actually your screen shot suggests that the drive IS being emulated - just that it is flagged as unmountable. Handling of unmountable drives is covered here in the online documentations accessible via the ‘Manual’ link at the bottom of the GUI. Quote Link to comment
wilsonhomelab Posted March 21, 2022 Author Share Posted March 21, 2022 I click the "CHECK" button under "disk1 setting", and nothing happen. So I typed the command as follow root@UNRAID-Xeon:~# xfs_repair /dev/md1 Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Could you please explain what the "ERROR" session ? Thanks Quote Link to comment
itimpi Posted March 21, 2022 Share Posted March 21, 2022 That is a standard warning message and can normally be ignored as mentioned here in the online documentation that can be accessed via the Manual link at the bottom of the GUI Quote Link to comment
wilsonhomelab Posted March 21, 2022 Author Share Posted March 21, 2022 I went ahead with the -L option. Now I can see the disk 1 content!😀 If I understand correctly, will the parity start rebuilding the physical disk 1 once I restart the array with disk 1 plugged in? I also want to make sure whether this issue is an indication of a failing drive. May I perform a pre-clear of disk 1 first to see if it can withstand the heavy read / write process? If it is good, I put it back to the array for rebuild. root@UNRAID-Xeon:~# xfs_repair /dev/md1 -L Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - 04:55:12: zeroing log - 119233 of 119233 blocks done - scan filesystem freespace and inode maps... - 04:55:13: scanning filesystem freespace - 32 of 32 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - 04:55:13: scanning agi unlinked lists - 32 of 32 allocation groups done - process known inodes and perform inode discovery... - agno = 0 - agno = 15 - agno = 30 - agno = 16 - agno = 17 - agno = 1 - agno = 31 - agno = 18 - agno = 2 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 3 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 4 - agno = 5 - agno = 29 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - 04:55:24: process known inodes and inode discovery - 148992 of 148992 inodes done - process newly discovered inodes... - 04:55:24: process newly discovered inodes - 32 of 32 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 04:55:24: setting up duplicate extent list - 32 of 32 allocation groups done - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 2 - agno = 7 - agno = 9 - agno = 11 - agno = 12 - agno = 21 - agno = 22 - agno = 23 - agno = 13 - agno = 28 - agno = 5 - agno = 1 - agno = 8 - agno = 19 - agno = 17 - agno = 6 - agno = 18 - agno = 20 - agno = 4 - agno = 15 - agno = 24 - agno = 25 - agno = 26 - agno = 10 - agno = 27 - agno = 16 - agno = 29 - agno = 30 - agno = 14 - agno = 31 - 04:55:24: check for inodes claiming duplicate blocks - 148992 of 148992 inodes done Phase 5 - rebuild AG headers and trees... - 04:55:25: rebuild AG headers and trees - 32 of 32 allocation groups done - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... - 04:55:31: verify and correct link counts - 32 of 32 allocation groups done Maximum metadata LSN (6:909635) is ahead of log (1:2). Format log to cycle 9. done Quote Link to comment
itimpi Posted March 21, 2022 Share Posted March 21, 2022 Putting the disk back will cause it to be rebuilt to match the emulated drive. 39 minutes ago, wilsonhomelab said: May I perform a pre-clear of disk 1 first to see if it can withstand the heavy read / write process? If it is good, I put it back to the array for rebuild. That is fine. You could also try running an extended SMART test on the drive before adding it back. Quote Link to comment
wilsonhomelab Posted March 21, 2022 Author Share Posted March 21, 2022 While disk 1 is performing pre-clear, disk 2 came up 5 errors🤨. Should I repeat the same procedure? start as maintenance mode and do check-and-repair xfs file system again? I think I am still not getting the cause of the disk error problems. unraid-xeon-diagnostics-20220322-0936.zip Quote Link to comment
Solution wilsonhomelab Posted March 25, 2022 Author Solution Share Posted March 25, 2022 I think I finnally found the culprit. The I/O errors actually came from, I believe, the insufficient sata power. I used a molex-to-2x-15-pin-sata cable to power the silverstone 5HDD hot-swag cage due to the clerance problem. And I never ran into problems untill I populated all 5 HDD into a single cage (I have two of the cages). I found a simular issue, which prompted me to try power the cage with two dedicated sata cable straigh from the PSU. THE end result is that the disk 1 was pre-clear at double of data rate at 200+MB/s instead of 90MB/s . The rebuild process was done in 10 hrs with average 202MB/s. I wish the system log could be more precise about the I/O error (" hard resetting link") , or the drive itself should have reported under-power issue. I hope this will be helpful for someelse. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.