Jump to content

Heffe

Members
  • Posts

    32
  • Joined

  • Last visited

Everything posted by Heffe

  1. My server has been running without error for almost a month now. The issue was a bad PCIe to NVMe adapter. (Link to the adapter) I might have just gotten unlucky with my adapter or maybe they don't work with Unraid, either way, using a proper Dell adapter solved my issue. Thanks for the kind support. =)
  2. Update - I removed my cache drive from the system and it has been stable for about 36h now. I'm still trying to narrow down the issue precisely. I use a NVMe to PCIe adapter for my NVMe drive since my motherboard doesn't have a m.2 slot. So either it's the m.2 adapter, or my NVMe drive. My money is on the adapter. I will keep those interested updated. :-)
  3. So does it reset the flash drive or just reset the settings on the drive?
  4. Boo. Can you give any insight to my question?
  5. @bonienl I have a question about the reset plugin. Does the plugin essentially "reset" the USB drive as well? Would manually resetting the USB drive be more thorough then the plugin? Thanks.
  6. Any update on this issue? Did you find a fix?
  7. I'm having similar symptoms and I have not found any solutions yet. Does it still error when you put it in safe mode?
  8. Where are you seeing the call traces and segfaults? Since I've updated to 6.12.6 I have not seen any errors or segfaults. I updated around 12:30 on December 18th.
  9. Since 6.12.6 was released, I've decided to update. I also noticed that there were some reports that macvlans were causing issues so I've changed my docker network over to ipvlans. After the update, I've had no errors in the syslog. None. Zip. Zilch. That being said, this morning when I woke up, the server was unreachable via the GUI and ssh, with no trace of an error in the log. I'm so confused as to what is causing this. I see there are some other posts descibing a similar issue. @JorgeB - Any ideas? jarvis-diagnostics-20231219-1014.zip syslog-192.168.0.10_191223.log
  10. I'm having a similar issue with my system. You're not alone. I looked through you logs, it looks like there is a misbehaving docker container. In the "syslog-previous" it looks like something is "flip-flopping" which might be causing some issues. Have you tried booting in "safe-mode"?
  11. Quick update: I updated my nextcloud verison to the latest version and it no longer gives the "php error" when the container is started. I sure hope that this is the cure!
  12. So I gave it a rest and returned to the problem after a few days. I had couple of "loop2" BTRFS errors that, from what I've read on the forum, are related to a corrupt docker image. I've since deleted my docker image and recreated it. Secondly, after recreating my docker image, I pinned each docker to a specific CPU so I could find if there was an offending docker container. I think I found one. If I restart my nextcloud container while my mariadb container is running, I get: kernel: php[8525]: segfault at ffffffffffffffff ip 000055871823b0b2 sp 00007ffd6737dd40 error 7 in php82[558718200000+2b4000] likely on CPU 5 (core 1, socket 0) I get this error reliably when starting my nextcloud container. Is there anything I can do to fix this? Is my nextcloud appdata corrupt? Thanks! syslog-192.168.0.10_171223.log
  13. I ran memtest86 for the last 24h and it didn't find any issues. There are some BTRFS errors in my logs so there might be an issue with my cache drive which is causing the whole system to stall out. I'm going to try and fix that this evening. Thank you for your reply.
  14. Well, here are the cahnges that I've tried since, 1. I changed the boot mode to UEFI instead of Legacy. 2. I downgraded the BIOS to v2.0.2 since there was a post on here suggesting that this verison was working for them. 3. I found some corrupted docker files and repaired them. The server ran for about 10 hours and then crashed again. I'm going to run Memtest86 for an extended period of time and see if that finds anything. Lol, I'm so frustrated. syslog-192.168.0.10_111223.log
  15. Would it be possible that this issue is being caused by a BIOS update? I've looked across the forum and it seems others are using the same server but on another BIOS version.
  16. Hey @Axon I've been having some massive issues with my Dell Precision 5820. It will hard-freeze after about 24h and require a hard reset to start it again. Did you have any other issues? How is your server running now? What version of UnRAID are you using? Thanks!
  17. Would running a SMART disk scan provide any useful info?
  18. Just as an update, I'll keep this thread going in case someone else has similar issues. In order to troubleshoot the mobo and/CPU issues, I ran the Dell diagnostic tool built into the BIOS on the motherboard. The tool ran all night (~12 hours) and it passed all tests (including RAM tests). After the tests were done, I booted the server in safe made (no GUI) and let it run. Its been on for about 5 hours and it has one "segfault". Here's the error: Dec 7 09:01:45 Jarvis kernel: smartctl_type[17085]: segfault at 0 ip 0000000000000000 sp 00007ffcc6c52d68 error 14 in php[400000+3b000] likely on CPU 5 I'm really frustrated at this issue and I'm lost as to a solution. I'm 99% sure there's no hardware issue but yet here we are. jarvis-syslog-20231207-1446.zip
  19. The errors increase in frequency if I have the docker containers running. I have no way of troubleshooting if the CPU is the issue or the motherboard. The server was a cold spare from a working environment so I'm doubtful that its a hardware issue. I'll continue to update the software and hope that the updates fixe it. Thanks for your help @JorgeB.
  20. I took one RAM stick out and ran the server all night. It still errored. This morning I swapped the RAM stick with another and it errored again. I suppose it could be the RAM but I doubt both sticks would be bad. I do have a NVME PCIe card that my cache drive is on. I'm going to try and remove that and see if it makes a difference. I've attatched the latest log. syslog-192.168.0.10.log
  21. Ill give that a try. Thank you.
  22. Here's the non-zipped log file. syslog-192.168.0.10 (2).log
  23. So this weekend I updated to tge latest unRAID version, updated all plugins and stopped all services on the server. I then let the server run until faults started appearing. Here's the log file. It shows that CPU 0 and CPU 4 are causing some errors? Does that mean I have a bad CPU? Can it be fixed? jarvis-syslog-20231203-1208.zip
  24. Good idea. I'll give that a try. Thank you for your suggestions.
×
×
  • Create New...