[SOLVED]unRAID 6.9.2 Crashed/Locked-up Overnight & AGAIN just now


Recommended Posts

[SOLVED] Cause was RAM failure. Removed faulty RAM and so far problem has not re-occurred.

 

Last night I went to bed with the Plex client on my Nvidia Shield Pro playing music from my media server called AnimNAS (specs in signature). At some point the music stopped but as I was drifting in and out of sleep I just left it until this morning. Upon investigation, the server appeared to have crashed hard. The WebGUI was not accessible and my Plex and other Docker containers were offline. I also couldn't ping the IP and the system was shown as 'offline' in the pfSense firewall (running on a separate PC, not via VM).

 

I went to the system and unfortunately the attached keyboard/mouse wouldn't 'wake' the system. The attached monitor also didn't see any signal. This means I was unable to grab any diagnostics before proceeding with a shutdown and restart attempt. I tried a momentary press of the power switch to try and do a clean shutdown but there was no response from the system.

 

With no other options I then did the 'long press' of the power switch to shut the entire system down. I waited a minute before trying to restart it. It seemed to start normally and the unRAID boot process looked relatively normal from what I could see. All my drives are found and unRAID appears to have booted successfully, albeit with an obvious 'unclean shutdown detected'.

 

I have my system set to NOT autostart the array upon reboot. During the restart I noticed some messages that I wasn't used to seeing during a reboot. It did appear to start somewhat normally as I was able to access the system from the webgui. Without starting the array, I went to Tools -> Diagnostics and was able to grab them and they are attached.

 

At the final stage of the boot process, I saw some messages on the monitor attached to the system. The area in the red rectangle shows some of the new messages I hadn't seen before, with an error on the 2nd line.

 

AnimNAS-Error1.thumb.jpg.beef4137716e1ac804b15c465d9162b0.jpg

 

Before starting the array and the parity check (due to the unclean shutdown), I decided to try one more restart of the system. This restart had the same error as shown above, but of course was a proper restart so it cleared the requirement for a parity check. Regardless, I will start a manual parity check but I first want to troubleshoot the errors.

 

I also noticed that my system log is now filling up with messages about my Nvidia card; here's another pic showing the messages that are spamming the syslog:

 

NvidiaError-AnimNAS.thumb.jpg.02a4b603c236d2d4f22b371ee2682c7b.jpg

 

Any recommended steps/actions I should take before starting the array and the parity check? Ideally I'd like to fix any errors and the cause of the syslog spam. Any assistance is appreciated! Thanks in advance...

 

Dale

 

Edited by AgentXXL
Add solution... removed diags after solving
Link to comment
  • AgentXXL changed the title to unRAID 6.9.2 Crashed/Locked-up Overnight & AGAIN just now

UPDATE: it just happened again a few moments ago. The 1st sign of it not responding was Plex not being able to find the server. I checked my firewall and again it was listed as offline. Tried a momentary press of the power button and waited 5 mins but no shutdown so I was again forced to hard reset.

 

UPDATE 2: It crashed/locked up again within minutes of the last hard reset. I've now started the system in safe mode with no plugins loaded. If I can't retrieve the logs from the Flash drive before it crashes again, I'll try exploring the Flash drive on another computer.

 

UPDATE 3: After booting into safe mode I've encountered yet another lockup. It doesn't seem to be related to a plugin. I've disabled both the Docker and VM services. I was unable to retrieve the logs from the Flash drive before it locked up again. I'm going to start with Memtest86 and see if that shows any issues.

 

I did set my syslog to mirror to the Flash device so I'll check that shortly and see if I can find the logs. If found, I'll attach them to this message.

 

😢

Edited by AgentXXL
Another update x 3
Link to comment
9 hours ago, AgentXXL said:

messages about my Nvidia card

 

That's a feature of the closed source Nvidia driver being called every second by the GPU Stats plugin. It's discussed in the support threads for both. The easiest way to get rid of it is to uninstall the latter.

  • Thanks 1
Link to comment
  • AgentXXL changed the title to [SOLVED]unRAID 6.9.2 Crashed/Locked-up Overnight & AGAIN just now

As an electronics tech who has worked in essentially an IT role for most of my career, I often tell friends/family/clients to do a RAM test if you start seeing crashes/lockups that started 'out of the blue'. I followed that advice myself after my panic posting here and in a FB group and sure enough, the issue appears to be a RAM failure. Memtest86 (v5.01 on the unRAID flash device) found errors almost immediately.

 

Through process of elimination I've tracked it down to 1 of the 4 x 16GB sticks in my system. As my system doesn't support operation with 3 DIMMs, I've pulled one of the others so I'm only running 2 DIMMs. I can limp along with the 32GB of RAM just fine until I get the bad stick replaced (still under warranty). Regardless, I still have the error noted in my first message just as unRAID finishes it's bootup. I'll hopefully find a cause for that as well.

 

At least 'panic mode' is over.... a Sunday night with no Plex would have left me with a few messages from friends and family saying they can't seem to watch their shows! 😁

Link to comment
25 minutes ago, John_M said:

 

That's a feature of the closed source Nvidia driver being called every second by the GPU Stats plugin. It's discussed in the support threads for both. The easiest way to get rid of it is to uninstall the latter.

 

It's interesting to note that after finding the memory fault mentioned above, these messages are no longer showing up in the syslog. Hopefully that will remain the case and/or the author of the GPU Stats plugin is eventually able to correct it.

 

Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.