As a long time Unraid user (over a decade now, and loving it!), I rarely have issues (glossing right over those Ryzen teething issues). It is with that perspective that I want to report that there are major issues with 6.9.2.
I'd been hanging on to 6.8.3, avoiding the 6.9.x series as the bug reports seemed scary. I read up on 6.9.2 and finally decided that with two dot.dot patches it was time to try it. My main concern was that my two 8 TB Seagate Ironwolf drives might experience this issue:
I had a series of unfortunate events that makes it extremely difficult to figure out what transpired, and in what order, so I'll just lay it all out. I'd been running 6.9.2 for almost a week, and I felt I was in the clear. I hadn't noticed any drives going offline.
Two nights ago (4/27), somehow my power strip turned off - either circuit protection kicked in, or possibly a dog stepped on the power button, regardless, I didn't discover this before my UPS was depleted and the server shut itself down.
Yesterday, after getting the server started up again, I was surprised to see my two Ironwolf drives had the red X's next to them, indicating they were disabled. I troubleshot this for a while, finding nothing in the logs, so it's possible that a Mover I kicked off manually yesterday (which would have been writing to these two drives) caused them to go offline on spin-up (according to the issue linked above), but that the subsequent power failure caused me to lose the logs of this event. [NOTE: I've since discovered that the automatic powerdown from the UPS failure was forced, which triggered diagnostics, and those logs were lost after all - diagnostics attached!!!]
I was concerned that the Mover task had only written the latest data to the simulated array, so a rebuild seemed the right path forward to ensure I didn't lose any data. I had to jump through hoops to get Unraid to attempt to rebuild parity to these two drives - apparently you have to un-select them, start/stop the array, then re-select them, before Unraid will give the option to rebuild. Just a critique from a long-time user, this was not obvious and seems like there should be a button to force a drive back into the array without all these obstacles. Anyways, now to the real troubles. Luckily, I only have two Ironwolf drives, and with my dual parity (thanks LimeTech!!!), this was a recoverable situation.
The rebuild only made it to about 46 GB before stopping. It appeared that Unraid thought the rebuild was still progressing, but obviously it was stalled. I quickly scanned through the log, finding no errors but lots of warnings related to the swapper being tainted. At this point, I discovered that even thought the GUI was responsive (nice work GUI gang!), the underlying system was pretty much hung. I couldn't pause or cancel the data rebuild, I couldn't powerdown or reboot, not through the GUI, and not through the command line. Issuing a command in the terminal would hang the terminal. Through the console I issues a powerdown, and it said it was doing it forcefully after awhile, but hung on collecting diagnostics. I finally resorted to the 10-second power button press to force the server off (and diagnostics are missing).
I decided that the issue could be those two Ironwolf drives, and since I had two brand new Exos drives of the same capacity, I swapped those in and started the data rebuild with those instead. I tried this twice, and the rebuild never made it further than about 1% (an ominous 66.6 GB was the max rebuilt).
At this point, I really didn't know if I had an actual hardware failure (the power strip issue was still in my thoughts), or software issue, but with a dual-drive failure and a fully unprotected 87 TB array, I felt more pressure to quickly resolve the issue rather than gather more diagnostics (sorry not sorry). So I rolled back to 6.8.3 (so glad I made that flash backup, really wish there was a restore function), and started the data rebuild again last night.
This morning, the rebuild is still running great after 11 hours. It's at 63% complete, and should wrap up in about 6.5 hours based on history. So something changed between 6.8.3 and 6.9.2 that is causing this specific scenario to fail.
I know a dual-drive rebuild is a pretty rare event, and I don't know if it has received adequate testing on 6.9.x. While the Seagate Ironwolf drive issue is bad enough, that's a known issue with multiple topics and possible workarounds. But the complete inability to rebuild data to two drives simultaneously seems like a new and very big issue, and this issue persisted even after removing the Ironwolf drives.
I will tentatively offer that I may have done a single drive rebuild, upgrading a drive from 3TB to an 8TB Ironwolf, on 6.9.2. Honestly, I can't recall now if I did this before upgrading to 6.9.2 or after, but I'm pretty sure it was after. So on my system, I believe I was able to perform a single drive rebuild, and only the dual-drive rebuild was failing.
I know we always get in trouble for not including Diagnostics, so I am including a few files:
The 20210427-2133 diagnostics are from the forced powerdown two nights ago, on 6.9.2, when the UPS ran out of juice, and before I discovered that the two Ironwolf drives were disabled. Note, they might be disabled already in these diags, no idea of what to look for in there.
The 20210420-1613 diagnostics is from 6.8.3, the day before I upgraded to 6.9.2. I think I hit the diagnostics button by accident. Figured it won't hurt to include it.
And finally the 20210429-0923, is from right now, after downgrading to 6.8.3, and with the rebuild still in progress.
Paul
tower-diagnostics-20210427-2133.zip tower-diagnostics-20210429-0923.zip tower-diagnostics-20210420-1613.zip
-
1
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.