(Solved)Array gets booted out


donr

Recommended Posts

Hi! I am having an issue with my array. A few months back, I had overheating issues as mentioned in this thread:


Everything was working for a while but then, I noticed that although my server was up and running, my array was not. This would happen every 24 to 48 hours apart after I restarted the array.
I managed to install a plugin that would capture my syslog to see what is happening. Now, I need your help to decode what is going on. I did install the latest Unraid version thinking this would help but, it did not. The server has been stopped since October 9, and I have not had time to address the problem until now. I am attaching  my latest syslog.

syslog.zip

Link to comment
On 10/21/2019 at 2:23 PM, donr said:

...

A few months back, I had overheating issues as mentioned in this thread:

...

Everything was working for a while but then, I noticed that although my server was up and running, my array was not. This would happen every 24 to 48 hours apart after I restarted the array.
I managed to install a plugin that would capture my syslog to see what is happening. Now, I need your help to decode what is going on. I did install the latest Unraid version thinking this would help but, it did not. The server has been stopped since October 9, and I have not had time to address the problem until now. I am attaching  my latest syslog.

syslog.zip 21.83 kB · 1 download

 

You are not describing your issue very clearly. What do you mean by server was up and running but array was not?

Like you login to the server and find your array suddenly in stopped state instead?

If that's the case, that usually means your server rebooted itself when you weren't watching.

 

The only thing in your log that stands out is this section:

Oct  8 15:52:35 Tower apcupsd[5692]: UPS Self Test switch to battery.
Oct  8 15:52:44 Tower apcupsd[5692]: UPS Self Test completed: Battery OK
Oct  8 19:10:39 Tower kernel: mdcmd (57): spindown 1
Oct  8 19:10:40 Tower kernel: mdcmd (58): spindown 2
Oct  8 19:10:40 Tower kernel: mdcmd (59): spindown 3
Oct  8 19:10:41 Tower kernel: mdcmd (60): spindown 4
Oct  8 19:10:41 Tower kernel: mdcmd (61): spindown 5
Oct  8 19:21:01 Tower kernel: microcode: microcode updated early to revision 0xb4, date = 2019-04-01
Oct  8 19:21:01 Tower kernel: Linux version 4.19.56-Unraid (root@Develop67) (gcc version 8.3.0 (GCC)) #1 SMP Tue Jun 25 10:19:34 PDT 2019
Oct  8 19:21:01 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot

So on Oct 08,

  • At 15:52: there was a UPS test
  • At 19:10: spindown commands were issued for your disks
  • At 19:21: your server rebooted

That suggests hardware failure to me. Considering you had overheating issue in the past, any number of devices could have been damaged leading to instability. There is not much you can do other than troubleshoot each device. I would suggest starting from the PSU / UPS.

 

 

 

 

 

 

 

 

 

 

 

Link to comment

At the time, I did troubleshoot psu/ups, and changed MB and cpu+ memory. The only thing left unchanged where the disks and psu. And yes, I did mean that when I opened my servers GUI in a browser, the array was offline. I restarted it yesterday and, from the log this morning, it was up for about an hour, maybe less. I included yesterday's syslog. I can't even restart the server because it can't find the flash drive. Maybe the log will show something, at least I hopeSyslog.

Link to comment

I can't even boot in the server since it does not detect my flash drive. I posted a diagnostic report that I took when I first noticed I was having this problem on Sept 28/19. At the time, I had not upgraded to the latest version and was still with 6.6.3. Hope this helps.

PS.   I had to Zip the report as I could not attach it otherwise because it contains more than 25 files!! No pets and sever is in a large room by itself and I am the only one that has acces.

tower-diagnostics-20190928-0844.zip

Edited by donr
Link to comment
Oct  8 19:10:41 Tower kernel: mdcmd (61): spindown 5
Oct  8 19:21:01 Tower kernel: microcode: microcode updated early to revision 0xb4, date = 2019-04-01
Oct  8 19:21:01 Tower kernel: Linux version 4.19.56-Unraid (root@Develop67) (gcc version 8.3.0 (GCC)) #1 SMP Tue Jun 25 10:19:34 PDT 2019

Last entry is a normal one found in virtually every Unraid syslog with the preceding entries also being normal and...  BAM, eleven minutes later (approx.), a reboot!!!

 

It has to be a hardware issue. Plus, now there is the issue of the missing flash drive.

 

I would start with the missing flash drive.   I am assuming that your server is currently powered down.  (If not, push and hold the power button to force a powerdown.) I would pull the flash drive and run a chkdsk on it.  If that passes, try make a backup copy of its contents.  (Everyone should have a copy of the contents of the flash drive.)  If both of these worked, put the flash drive back in and try to boot the server.  Report back.

Link to comment

Here is an update. I ran memtest. The next day, I opened the unraid gui. The server had rebooted. The flash drive was corrupt so I ran the repair utility and restarted the server. So far it has been up and running without a glitch so I will tag this as solved. Thanks to Frank1940 for the help.

Link to comment
  • donr changed the title to (Solved)Array gets booted out

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.