mikkle Posted December 21, 2023 Share Posted December 21, 2023 Hello! I have an Unraid array consisting of twelve 14 TB drives, driven by an Adaptec ASR-71605 controller in JBOD mode. The array is configured with xfs and 2-drive parity, and a cache pool with two SSDs. I've had this setup for about three years now, and the array is about halfway full. About two months ago, Unraid raised a warning that Disk 1 had a SMART error (UDMA CRC error count > 30) and disabled the drive. At the time things were crazy at work and I was not able to look into the problem, so I left the array with Disk 1 disabled (and its contents emulated via Parity). A few days ago however Unraid started complaining about a second drive due to read errors, and disabled that drive as well. I ran a SMART self-test on this drive (I believe it was Disk 7), and it came back successful with no errors. At this point I tried to fix the array -- probably incorrectly. I stopped the array, unassigned the two drives that had been disabled, and restarted the array. Once the array was back up, I stopped the array one more time, and re-added Disk 7 (which seemed to be healthy). I expected that the array would be re-built from Parity without Disk 1, however the array is now showing three drives with the status: "Unmountable: Unsupported or no file system." When I SSH into the system I can see all my shares, and a cursory look at them shows that most of the files are still there and accessible. I can't say for sure if all the files are there, or if a bunch of them are missing due to the unmountable disks. What is the correct way to get the array back to a healthy state, given the current state of things? Note that I updated Unraid from 6.12.3 to 6.12.6 a week ago. This is significant, because I just noticed today that there is an "Adaptec 7 Series HBA not compatible" known issue mentioned in the release notes. Looking at the attached diagnostics, I see a ton of aacraid warnings such as "Host adapter abort request" or "Outstanding commands." I wonder if this is the source of the above problems, and if I should go ahead and roll back to Unraid 6.12.4 before doing anything else. Thanks in advance for your help! diagnostics.zip Quote Link to comment
JorgeB Posted December 21, 2023 Share Posted December 21, 2023 First thing to do is to downgrade back to 6.12.4, there's a known kernel issue with those controllers, see the release notes, then post new diags after array start. Quote Link to comment
mikkle Posted December 21, 2023 Author Share Posted December 21, 2023 Thanks, Jorge! I downgraded to 6.12.4 and just ran the diagnostics again. I've attached them to this post. The syslog looks much cleaner. The same drives are showing as unmountable after the downgrade. Thanks again. diagnostics.zip Quote Link to comment
JorgeB Posted December 21, 2023 Share Posted December 21, 2023 Check filesystem for the affected disks, run it without -n Quote Link to comment
mikkle Posted December 21, 2023 Author Share Posted December 21, 2023 Thanks! Here is the output of the Check Filesystem commands, when run from the WebGUI without the -n parameter. Disk 1: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Disk 5: Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5) ERROR: The log head and/or tail cannot be discovered. Attempt to mount the filesystem to replay the log or use the -L option to destroy the log and attempt a repair. I don't understand the specifics, but neither of these sound too promising Quote Link to comment
Solution JorgeB Posted December 21, 2023 Solution Share Posted December 21, 2023 Use -L, it's usually OK. Quote Link to comment
mikkle Posted December 24, 2023 Author Share Posted December 24, 2023 Thanks Jorge, I appreciate your help! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.