River_Tahm Posted April 1 Share Posted April 1 I had a disk go into error state a few days ago, but went through the spaceinvaderone video on XFS repair and did SMART tests and all that jazz and I can't actually find anything wrong with the disk. No errors or anything, so I re-added it to the array, and started a parity rebuild of the drive data. Standard procedure, per my understanding - and I got through all that without any issues or questions. However, I keep coming back to my server to check in on the status of that parity rebuild only to find the array is stopped and the parity rebuild never completed. It has happened 3 times now. No idea what's going on. I do see a lot of weird errors about a USB device not responding in sys log, and I do see some warnings that UPS communication is dropping in and out in the GUI notifications. My server is plugged into its UPS via a USB cable so I wonder if that could be related - however, other things plugged into that UPS appear fine, so I don't think the UPS is failing to supply power. And if that's not the issue, I'm not sure what else to do with the UPS to troubleshoot it. I am also on a new USB boot drive. The previous one was the original drive I built the first iteration of my Unraid server with in like 2017 and it finally gave up the ghost. But I got a USB drive recommended for Unraid by spaceinvaderone - I don't think that's what's causing the USB errors. I added a Coral m.2 TPU and 2x80mm exhaust fans recently. The Coral seems to be working fine, and I mention the exhaust fans because I saw in another thread Squid mentioned CPU overheating can cause this kind of random shutdown. My CPU hasn't overheated before, and it has more fans now, so I'm pretty sure that's not it. The new fans went in because the drives can run a little hot on hot days, but recent ambient temperature has been low. All told, I have a fair number of changes recently that make it a little harder to troubleshoot, and I have yet to find the actual crash or shutdown or whatever is happening in the logs. Still looking - I have a feeling the answer is in here somewhere - I'm just not super familiar with the log format and haven't tracked it down yet. Diagnostics attached. Let me know if ya'll have any ideas - thank you! greenplanet-diagnostics-20240401-0755.zip Quote Link to comment
JorgeB Posted April 1 Share Posted April 1 Enable the syslog server and post that after a crash. Quote Link to comment
River_Tahm Posted April 1 Author Share Posted April 1 Syslog server is enabled and it is set to copy to flash on shutdown. I don't see the "syslog-previous.txt" file referenced in the tooltips and I'm not sure if that's indicative of a problem of some kind, but there is a syslog file in the diagnostics I posted. Quote Link to comment
JorgeB Posted April 1 Share Posted April 1 1 hour ago, River_Tahm said: syslog-previous.txt" It should be there once you reboot. Quote Link to comment
River_Tahm Posted April 1 Author Share Posted April 1 Gotcha! I'm hesitant to interrupt the parity rebuild for a manual reboot - it takes 24h+ to complete and I'm 1/3 of the way through here (and hoping it doesn't crash again before it finishes this rebuild). I also want to clarify that the server started itself back up after the crash automatically - it just wouldn't start the array, on account of the error state drive needing a parity rebuild (that counts as a "configuration change" and disables autostart). So, if I'm understanding correctly, I think the server has technically been restarted since the crash, and I'm not sure that rebooting now will provide the correct time window in syslog-previous.txt that we're looking for, so I want to clarify (especially before considering interrupting a parity rebuild). I do see references in the syslog that's contained within my diagnostics to an unclean shutdown, which makes me think maybe somehow there is a power issue forcing the server to shutdown? Quote Link to comment
River_Tahm Posted April 2 Author Share Posted April 2 Alright, well, it crashed again so I don't have to worry about the parity rebuild progress (sadly). Did a manual reboot and here's the syslog-previous I get from it syslog-previous Quote Link to comment
JorgeB Posted April 2 Share Posted April 2 There's no parity check logged in that sylog, look in the flash drive /logs folder for a complete copy of the persistent log. Quote Link to comment
River_Tahm Posted April 2 Author Share Posted April 2 There isn't anything in /boot/logs other than the diagnostics I've generated trying to troubleshoot this: # ls -lah /boot/logs total 656K drwx------ 2 root root 16K Apr 1 17:50 ./ drwx------ 10 root root 16K Dec 31 1969 ../ -rw------- 1 root root 224K Apr 1 07:56 greenplanet-diagnostics-20240401-0755.zip -rw------- 1 root root 176K Apr 1 16:55 greenplanet-diagnostics-20240401-1655.zip -rw------- 1 root root 217K Apr 1 17:01 greenplanet-diagnostics-20240401-1701.zip I've now triple-checked that syslog server is indeed enabled so I have now tried disabling syslog rotation in the hopes that helps? Quote Link to comment
itimpi Posted April 2 Share Posted April 2 16 minutes ago, River_Tahm said: I've now triple-checked that syslog server is indeed enabled so I have now tried disabling syslog rotation in the hopes that helps? It might be worth posting a screenshot of the syslog server settings just in case there is something you missed? Quote Link to comment
River_Tahm Posted April 3 Author Share Posted April 3 Sure! Here's what I have for syslog server settings. Also, I've also successfully completed the parity rebuild without further crashes! It's hard to say for sure I 100% fixed it given the crashes were random and I might just be on a lucky streak, but the change I made that seems to have helped was removing the UPS communication cable. The cable has an RJ45 style connector on the UPS end (USB on the Unraid end) and I noticed the stay clip on the RJ45 was busted. This probably meant the cable had taken a hit at some point and I thought perhaps the damaged cable was disrupting Unraid's communication with its UPS. Because my Unraid server is configured to shut itself down when the UPS battery only has a few minutes of runtime left, it's possible disruptions in UPS communications might be causing the server to shut down. Quote Link to comment
JorgeB Posted April 3 Share Posted April 3 Mirror to syslog is not enabled, and the remote syslog server is not set, so that also won't work, it needs to be set with the local server IP, or just enable mirror to flash drive. Quote Link to comment
River_Tahm Posted April 3 Author Share Posted April 3 Weird! I only see references in the syslog server documentation to it being for a remote machine, nothing about setting it to the local server IP. Quote Remote Syslog Server: This is used when you have another machine on your network that is acting as a syslog server. This can be another Unraid server. You can also use virtually any other computer. You find the necessary software by googling for the syslog server of that computer's operating system. After you have set up the computer/server, you fill in the computer/server name or the IP address. (I prefer to use the IP address as there is never any confusion about what it is.) Then click on the 'Apply' button and your syslog will be mirrored to the other computer. The other computer has to be left on continuously until the problem occurs. The events captured will only start with the point at which the syslog daemon is started during the boot process thus missing the very start of the boot process. But sure, I can add the Unraid server's LAN IP there instead of leaving it blank. Thanks! Quote Link to comment
River_Tahm Posted April 9 Author Share Posted April 9 Thanks for the help! I haven't had another crash since I just removed the UPS communication cable entirely so as far as I can tell, that was the issue. My server is configured to shut itself down proactively if the UPS is running low on battery so I can see how the coms cable going bad could've caused shutdown issues. Probably the last update to the thread here just for posterity's sake in case somebody finds it searching for a similar issue and wants to know how I fixed it. Thanks again all who helped! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.