donr Posted October 21, 2019 Share Posted October 21, 2019 Hi! I am having an issue with my array. A few months back, I had overheating issues as mentioned in this thread: Everything was working for a while but then, I noticed that although my server was up and running, my array was not. This would happen every 24 to 48 hours apart after I restarted the array. I managed to install a plugin that would capture my syslog to see what is happening. Now, I need your help to decode what is going on. I did install the latest Unraid version thinking this would help but, it did not. The server has been stopped since October 9, and I have not had time to address the problem until now. I am attaching my latest syslog. syslog.zip Quote Link to comment
donr Posted October 23, 2019 Author Share Posted October 23, 2019 Really, no one!! Quote Link to comment
testdasi Posted October 23, 2019 Share Posted October 23, 2019 On 10/21/2019 at 2:23 PM, donr said: ... A few months back, I had overheating issues as mentioned in this thread: ... Everything was working for a while but then, I noticed that although my server was up and running, my array was not. This would happen every 24 to 48 hours apart after I restarted the array. I managed to install a plugin that would capture my syslog to see what is happening. Now, I need your help to decode what is going on. I did install the latest Unraid version thinking this would help but, it did not. The server has been stopped since October 9, and I have not had time to address the problem until now. I am attaching my latest syslog. syslog.zip 21.83 kB · 1 download You are not describing your issue very clearly. What do you mean by server was up and running but array was not? Like you login to the server and find your array suddenly in stopped state instead? If that's the case, that usually means your server rebooted itself when you weren't watching. The only thing in your log that stands out is this section: Oct 8 15:52:35 Tower apcupsd[5692]: UPS Self Test switch to battery. Oct 8 15:52:44 Tower apcupsd[5692]: UPS Self Test completed: Battery OK Oct 8 19:10:39 Tower kernel: mdcmd (57): spindown 1 Oct 8 19:10:40 Tower kernel: mdcmd (58): spindown 2 Oct 8 19:10:40 Tower kernel: mdcmd (59): spindown 3 Oct 8 19:10:41 Tower kernel: mdcmd (60): spindown 4 Oct 8 19:10:41 Tower kernel: mdcmd (61): spindown 5 Oct 8 19:21:01 Tower kernel: microcode: microcode updated early to revision 0xb4, date = 2019-04-01 Oct 8 19:21:01 Tower kernel: Linux version 4.19.56-Unraid (root@Develop67) (gcc version 8.3.0 (GCC)) #1 SMP Tue Jun 25 10:19:34 PDT 2019 Oct 8 19:21:01 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot So on Oct 08, At 15:52: there was a UPS test At 19:10: spindown commands were issued for your disks At 19:21: your server rebooted That suggests hardware failure to me. Considering you had overheating issue in the past, any number of devices could have been damaged leading to instability. There is not much you can do other than troubleshoot each device. I would suggest starting from the PSU / UPS. Quote Link to comment
donr Posted October 24, 2019 Author Share Posted October 24, 2019 At the time, I did troubleshoot psu/ups, and changed MB and cpu+ memory. The only thing left unchanged where the disks and psu. And yes, I did mean that when I opened my servers GUI in a browser, the array was offline. I restarted it yesterday and, from the log this morning, it was up for about an hour, maybe less. I included yesterday's syslog. I can't even restart the server because it can't find the flash drive. Maybe the log will show something, at least I hopeSyslog. Quote Link to comment
Frank1940 Posted October 24, 2019 Share Posted October 24, 2019 See if you can get a Diagnostics file Tools >>> Diagnostics Attach it to your next post. It contains much diagnostics information than the syslog. Quote Link to comment
Frank1940 Posted October 24, 2019 Share Posted October 24, 2019 @donr, do you have any pets or young children who might have pressed the reset button on the computer case? (If there is any possibility of this happening, I would suggest plugging the leads from the reset switch to the MB!) Quote Link to comment
donr Posted October 24, 2019 Author Share Posted October 24, 2019 (edited) I can't even boot in the server since it does not detect my flash drive. I posted a diagnostic report that I took when I first noticed I was having this problem on Sept 28/19. At the time, I had not upgraded to the latest version and was still with 6.6.3. Hope this helps. PS. I had to Zip the report as I could not attach it otherwise because it contains more than 25 files!! No pets and sever is in a large room by itself and I am the only one that has acces. tower-diagnostics-20190928-0844.zip Edited October 24, 2019 by donr Quote Link to comment
Frank1940 Posted October 24, 2019 Share Posted October 24, 2019 Oct 8 19:10:41 Tower kernel: mdcmd (61): spindown 5 Oct 8 19:21:01 Tower kernel: microcode: microcode updated early to revision 0xb4, date = 2019-04-01 Oct 8 19:21:01 Tower kernel: Linux version 4.19.56-Unraid (root@Develop67) (gcc version 8.3.0 (GCC)) #1 SMP Tue Jun 25 10:19:34 PDT 2019 Last entry is a normal one found in virtually every Unraid syslog with the preceding entries also being normal and... BAM, eleven minutes later (approx.), a reboot!!! It has to be a hardware issue. Plus, now there is the issue of the missing flash drive. I would start with the missing flash drive. I am assuming that your server is currently powered down. (If not, push and hold the power button to force a powerdown.) I would pull the flash drive and run a chkdsk on it. If that passes, try make a backup copy of its contents. (Everyone should have a copy of the contents of the flash drive.) If both of these worked, put the flash drive back in and try to boot the server. Report back. Quote Link to comment
Frank1940 Posted October 24, 2019 Share Posted October 24, 2019 You are not overclocking? Quote Link to comment
donr Posted October 24, 2019 Author Share Posted October 24, 2019 51 minutes ago, Frank1940 said: You are not overclocking? No I am not. Will do as instructed + run a memtest again for 12 hours to see if anything pops up, that is if I can boot from flash. Will report back when done Quote Link to comment
donr Posted October 27, 2019 Author Share Posted October 27, 2019 Here is an update. I ran memtest. The next day, I opened the unraid gui. The server had rebooted. The flash drive was corrupt so I ran the repair utility and restarted the server. So far it has been up and running without a glitch so I will tag this as solved. Thanks to Frank1940 for the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.