rookie701010 Posted December 4, 2022 Share Posted December 4, 2022 Hi there, this seems to be a GUI-related trap (bug???). I added a 4TB drive to my array, the parity disk is 18TB. It zeroed, I formatted it (with XFS) and everything was okay. No user data on it. Then I decided to up my SSD cache array (went fine) and to replace the disk with 8TB (and an additional fan for better air flow). System restarted, I replaced the disk in the array and the rebuild started. No pre-clearing, no formatting before. The information on unraid/main says 8TB free on this disk, everything fine. The process crashed reliably, twice. The whole system went unresponsive, no screen output, not reachable by network. The way out of this was to erase the disk (array not started) and start the array. Then it will be formatted, and afterwards the rebuild starts. This seems like a handling issue: Rebuild on a replaced disk with different size should only start after formatting. I'm running unraid version 6.11.3. with best regards rookie701010 Quote Link to comment
itimpi Posted December 4, 2022 Share Posted December 4, 2022 1 hour ago, rookie701010 said: Rebuild on a replaced disk with different size should only start after formatting. Rebuild overwrites all sectors so formatting a disk would be pointless. Quote Link to comment
rookie701010 Posted December 4, 2022 Author Share Posted December 4, 2022 (edited) 16 minutes ago, itimpi said: Rebuild overwrites all sectors so formatting a disk would be pointless. Then it shouldn't crash and handle it as a 4TB disk (not optimal, but okay). There was no data on the original, but rebuild crashed... which poses some interesting questions. An erased and formatted disk is just rebuilding parity, with *exactly the same visuals* as restoring a disk. Or do I miss something? The I/O stats say that the replaced disk is written to, which would imply restoring the data and the 4TB file system. This looks inconsistent. Edited December 4, 2022 by rookie701010 Quote Link to comment
itimpi Posted December 5, 2022 Share Posted December 5, 2022 4 hours ago, rookie701010 said: Then it shouldn't crash and handle it as a 4TB disk (not optimal, but okay). There was no data on the original, but rebuild crashed... which poses some interesting questions. An erased and formatted disk is just rebuilding parity, with *exactly the same visuals* as restoring a disk. Or do I miss something? The I/O stats say that the replaced disk is written to, which would imply restoring the data and the 4TB file system. This looks inconsistent. The rebuild should start by restoring the original 4GB file system on the new disk, then if that completes successfully Unraid will try to mount the drive and expand the file system to fill the whole disk. Quote Link to comment
rookie701010 Posted December 5, 2022 Author Share Posted December 5, 2022 Well. Something goes horribly wrong with that, now for the third time. Currently rebuilding parity after doing the drive removal the documented way. Will take some time, but three crashes in a row is a bit unsettling. Quote Link to comment
JorgeB Posted December 5, 2022 Share Posted December 5, 2022 4 hours ago, rookie701010 said: Well. Something goes horribly wrong with that If you didn't reboot yet post the diagnostics. Quote Link to comment
rookie701010 Posted December 5, 2022 Author Share Posted December 5, 2022 (edited) With "horribly wrong" I mean completely unresponsive, no network, no console. So... hard reset is the way to re-awaken the box. Maybe I can set up forensics like a dmesg -W in a ssh terminal on another server and hope that something shows. However, now parity is rebuilding, and the new disk is getting precleaned, though. Would need to duplicate on another setup. Edited December 5, 2022 by rookie701010 Quote Link to comment
JorgeB Posted December 5, 2022 Share Posted December 5, 2022 You enable the syslog server. Quote Link to comment
rookie701010 Posted December 6, 2022 Author Share Posted December 6, 2022 Okay, rsyslog is enabled and appears to be working. The parity rebuild also resulted in a hard crash. Now unraid is in "zombie" mode with VMs running and a stale configuration, parity check is progressing. But now we have a log 👯♀️The array shows as not started, but provides its services... anyway, lets see what the parity check will do. Quote Link to comment
rookie701010 Posted December 6, 2022 Author Share Posted December 6, 2022 Hmm the parity check went trough pretty fast. Now everything is normal. Next up: Add the pre-cleared drive 😈 Quote Link to comment
rookie701010 Posted December 6, 2022 Author Share Posted December 6, 2022 ... aaand it worked. No idea what caused the hiccup, and unfortunally, no diagnostics of the crash. Maybe I'm able to reproduce it on a different box. Needs to be set up first, though. Quote Link to comment
Solution rookie701010 Posted December 20, 2022 Author Solution Share Posted December 20, 2022 Update to add: The crashes kept coming, and as it almost always is in this case, it's something hardware related. In this case the culprit is the RAM (pretty sure), just changed it to Kingston HyperX Fury/Renegade 3600, also 128GB. Since the change required some disassembly I also changed the CPU to Ryzen 9-3950X. What's not to like Since this thing is running VMs and containers, more cores are a good thing. Why am I so sure regarding RAM: I had similar issues with this kit in completely different hardware, after 18 months. So there appears to be a degradation issue. I changed everything (!) else in the box, same behaviour. Changed the RAM, stable... although the MSI boards seem to have ageing effects, too. I will close this issue now, it is at least linked to the hardware issue. There was no useful info in the rsyslog, btw. last entry was some cron hourly job, then completely unresponsive box. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.