Jump to content

USB Drive Fails After 24h


Heffe
Go to solution Solved by Heffe,

Recommended Posts

Hey all,

 

I recently moved my server into a Dell Percision 5820. The Unraid version is 6.11.5. The server works very well except that it fails after about 24h of being on. Sometimes this looks like an unresponsive system, other times I can access the GUI but there are no shares listed.

If I remove the USB and put it in a new port, the server boots fine and then fails about 24h later. If I simply try a reboot, the system will hang and not boot.

I've gone through ther BIOS and disabled all C-states but I have no idea what is causing this issue or where to start. Any help would be appriciated.

Thanks!

jarvis-diagnostics-20231126-2140.zip

Link to comment

I took one RAM stick out and ran the server all night. It still errored. This morning I swapped the RAM stick with another and it errored again. I suppose it could be the RAM but I doubt both sticks would be bad.

I do have a NVME PCIe card that my cache drive is on. I'm going to try and remove that and see if it makes a difference.

 

I've attatched the latest log.

syslog-192.168.0.10.log

Link to comment

The errors increase in frequency if I have the docker containers running. I have no way of troubleshooting if the CPU is the issue or the motherboard. The server was a cold spare from a working environment so I'm doubtful that its a hardware issue. 

 

I'll continue to update the software and hope that the updates fixe it. 

 

Thanks for your help @JorgeB.

  • Like 1
Link to comment

Just as an update, I'll keep this thread going in case someone else has similar issues. 

 

In order to troubleshoot the mobo and/CPU issues, I ran the Dell diagnostic tool built into the BIOS on the motherboard. The tool ran all night (~12 hours) and it passed all tests (including RAM tests).

 

After the tests were done, I booted the server in safe made (no GUI) and let it run. Its been on for about 5 hours and it has one "segfault". 

 

Here's the error:

 

Dec  7 09:01:45 Jarvis kernel: smartctl_type[17085]: segfault at 0 ip 0000000000000000 sp 00007ffcc6c52d68 error 14 in php[400000+3b000] likely on CPU 5

 

I'm really frustrated at this issue and I'm lost as to a solution. I'm 99% sure there's no hardware issue but yet here we are.  

jarvis-syslog-20231207-1446.zip

Edited by Heffe
Formatting error.
Link to comment

Well, here are the cahnges that I've tried since,

 

1. I changed the boot mode to UEFI instead of Legacy.

2. I downgraded the BIOS to v2.0.2 since there was a post on here suggesting that this verison was working for them.
3. I found some corrupted docker files and repaired them.

 

The server ran for about 10 hours and then crashed again. I'm going to run Memtest86 for an extended period of time and see if that finds anything. Lol, I'm so frustrated.
 

syslog-192.168.0.10_111223.log

Link to comment

So I gave it a rest and returned to the problem after a few days.

 

I had couple of "loop2" BTRFS errors that, from what I've read on the forum, are related to a corrupt docker image. I've since deleted my docker image and recreated it.

 

Secondly, after recreating my docker image, I pinned each docker to a specific CPU so I could find if there was an offending docker container. I think I found one. If I restart my nextcloud container while my mariadb container is running, I get:
 

kernel: php[8525]: segfault at ffffffffffffffff ip 000055871823b0b2 sp 00007ffd6737dd40 error 7 in php82[558718200000+2b4000] likely on CPU 5 (core 1, socket 0)

I get this error reliably when starting my nextcloud container.

Is there anything I can do to fix this? Is my nextcloud appdata corrupt? Thanks!

syslog-192.168.0.10_171223.log

Edited by Heffe
Uploaded log file.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...