Jump to content

Indicative of hardware failure?


Recommended Posts

Hello all, 

 

As mentioned on a previous post, I have recently upgraded my system to use newer, better hardware. Ever since then, I have been plagued with crashes. I believe I have sorted out many of the software configuration related crashes as the system now only crashes once a day instead of every 3-4 hours, but this next error has me stumped. 

 

I am seeing numerous segfaults in the logs starting at about 3:00 am, and then resuming at 9:00 am. 

 

Things I have tried:

- Ran a memtest, so far I've only had passing results

- Updated BIOS to the newest release (required for mobo to be compatible with my CPU)

 

See syslogs attached. Due to the regularity of the crashes, I've setup a remote syslog server so I don't burn out my USB stick. The attached file is an export from that remote system. Apologies in advance for the different format. 

 

Thanks in advance for the second pair of eyes on this issue. 

unraid-syslog-export.csv

Link to comment

Smartctl segfaulting is usually a hardware issue, and since memtest is only definitive if it finds errors, try running the server with just one stick of RAM, if the same try a different one, that will basically rule out a RAM problem.

Link to comment
Posted (edited)

Update on findings to date:

 

Using only RAM stick 1:
After some time the system simply froze. Could no longer access the machine either via the web gui or via the keyboard and mouse plugged into the host machine. There is also no indication of issue in the log immediately prior to the system becoming unresponsive. However, around 10-15 min before the freeze, there was a stack trace with an interesting message:
 

kernel tried to execute NX-protected page - exploit attempt?

 

Using only RAM stick 2: 

The system hasn't crashed yet, but it is behaving very strangely. Some examples include:

  1. Jobs in docker containers are frozen.
  2. I cannot kill these docker containers either via web gui or shell
    1. Via gui: popup window after pressing 'stop'
      "Execution error
      Server error"
    2. Via shell:
      $ docker kill <conainer_name>
      Error response from daemon: Cannot kill container: <container_name>: tried to kill container, but did not receive an exit event 
  3. Can't start or stop the docker daemon via the settings screen
    1. Settings gui will show it's stopped, but when I `docker ps` all the containers are still running
  4. System doesn't respond to reboot requests, either via web gui or shell
  5. Can't take the array offline, the button doesn't respond in any way
  6. There are tons of CPUs that are stuck in a waiting state  

image.png.6d89aceac262800e92797b93d865d3e1.png

 

Additionally, there are quite a few stack traces here as well such as:
 

Unable to access opcode bytes at 0xffffffffffffffd6.

 

Maybe the issues documented under ram stick 2 aren't actually related to a bad ram stick, but it is what I have experience while trying to test the second ram stick. 

 

My plan is to continue to test ram stick 2 to see if it eventually does crash, but thought these issues were weird enough to note during the debugging process. 

unraid-syslog-ram-stick-1.csv unraid-syslog-ram-stick-2.csv

Edited by Nexal
Adding additional details
Link to comment

Today was the last day I could do any returns so worried that this was indeed a hardware failure, I took that option. New mobo and RAM will arrive by the end of the week to see if that resolves any of these issues. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...