KDP Posted August 5, 2023 Share Posted August 5, 2023 As of last night I am having a random shutdown and I am just looking to see if I am troubleshooting in an efficient manner. I originally built my server over a decade ago. 2 years ago I changed the case, power supply and fans. I am currently running UNRAID 6.11.5. Last night I was downloading a large series of files and adding them to my array when my server first shutdown. When I turned it back on and logged in I received the notice that my unassigned device that I use for all of my extracting had returned to normal temperature. I assumed that this was the reason the server shut down and continued on with my evening. My downloading and extracting completed without any further issues and stopped worrying. About an hour later my server shutdown again. I started it back up and did not see any notices that would have indicated what may have happened so I turned on the syslog server to monitor further issues. A couple hours later the server was still running and I went to bed. I woke up this morning and the server was powered down again. I looked in the syslog server and there are nothing but time entries since my last action before bed (I changed a share from public to private). I then looked in to my IPMI while the server was down and the only entries I have are months old (probably my last reboot prior to the current problem) and there is mention of FAN1 30009 2023/05/01 20:23:03 FAN 1 Fan Lower Non-Critical - Going Low - Asserted 30010 2023/05/01 20:23:03 FAN 1 Fan Lower Critical - Going Low - Asserted 30011 2023/05/01 20:23:03 FAN 1 Fan Lower Non-Recoverable - Going Low - Asserted 30012 2023/05/01 20:24:36 FAN 1 Fan Lower Non-Recoverable - Going Low - Deasserted 30013 2023/05/01 20:24:36 FAN 1 Fan Lower Critical - Going Low - Deasserted 30014 2023/05/01 20:24:36 FAN 1 Fan Lower Non-Critical - Going Low - Deasserted This error does not appear for the two power ups after the shutdowns. All other reports for temps and voltage are all green (good). So I decided to open up the server and blow dust out and make sure everything is seated and everything seemed seated well. I started up the server again and watched the fans. One CPU fan at the low rpm on boot looks abnormal. It would spin smooth then hitch for a fraction of a second then spin smooth. Once the server booted up and RPMs increased it runs fine. For peace of mind I will be ordering a replacement. I checked the IPMI again and see new entries which I assume are referencing the fan in question 30015 1970/01/14 19:47:42 FAN 1 Fan Lower Non-Critical - Going Low - Asserted 30016 1970/01/14 19:47:42 FAN 1 Fan Lower Critical - Going Low - Asserted 30017 1970/01/14 19:47:43 FAN 1 Fan Lower Non-Recoverable - Going Low - Asserted 30018 1970/01/14 19:50:04 FAN 1 Fan Lower Non-Recoverable - Going Low - Deasserted 30019 1970/01/14 19:50:55 FAN 1 Fan Lower Critical - Going Low - Deasserted 30020 1970/01/14 19:50:55 FAN 1 Fan Lower Non-Critical - Going Low - Deasserted 30021 1970/01/14 19:52:36 FAN 1 Fan Lower Non-Critical - Going Low - Asserted 30022 1970/01/14 19:53:30 FAN 1 Fan Lower Non-Critical - Going Low - Deasserted I am not really sure if the time thing is just because it had not synced with a NTP server or not. However, for peace of mind again, I will be replacing the CMOS battery. However, looking in the IPMI on the time page, the time is now correct. I have included my diagnostic file, although it is post boot and will likely give no useful information. The server is again running a parity check, as it does when an unclean shutdown has occurred, and everything is in the green both in the UNRAID dashboard as well as the IPMI. Should I be looking at anything else or performing any other diagnostics? elvis-diagnostics-20230805-1120.zip Quote Link to comment
itimpi Posted August 5, 2023 Share Posted August 5, 2023 57 minutes ago, KDP said: Should I be looking at anything else or performing any other diagnostics? You should provide the log created by the syslog server so we can see if there is any clue in what lead up to the shutdowns. Quote Link to comment
KDP Posted August 5, 2023 Author Share Posted August 5, 2023 I woke up at 10 am and restarted the server from an off state. Theres nothing but time entries from when I went to bed to when I started it up again as previously mentioned log_1166679578.log Quote Link to comment
itimpi Posted August 5, 2023 Share Posted August 5, 2023 Not seen those ‘syslog time’ entries before - I wonder what causes them. Quote Link to comment
KDP Posted August 5, 2023 Author Share Posted August 5, 2023 I am not too experienced in the use of syslog. I just have the UNRAID server reporting to a free program I downloaded called win-syslog. Maybe it is some kind of heartbeat? Quote Link to comment
KDP Posted August 8, 2023 Author Share Posted August 8, 2023 initial parity check ran and found a few thousand sync errors and fixed. second parity check had no errors and I haven't had a shutdown since. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.