July 2, 20224 yr Hello, I currently have an unRAID server that's been crashing at seemingly random intervals. Sometimes it'll be idle for 8 hours, other times it'll crash while starting the array. I wouldn't even notice if the motherboard didn't have an audile click when it reboots. I've attached the diagnostics, although much of this is foreign to me. I've attached the last few days of syslog entries; I have it storing in a share so I don't lose syslog after each crash. olympus-diagnostics-20220702-1810.zip syslog-192.168.86.2.log
July 3, 20224 yr Community Expert Jul 1 22:04:41 Olympus kernel: nvme nvme0: Removing after probe failure status: -19 NVMe device is dropping out, but that won't make the server crash, unfortunately don't see nothing else relevant, one thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
July 3, 20224 yr Author 6 hours ago, JorgeB said: Jul 1 22:04:41 Olympus kernel: nvme nvme0: Removing after probe failure status: -19 NVMe device is dropping out, but that won't make the server crash, unfortunately don't see nothing else relevant, one thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Thanks for helping. Indeed, I recently switched toa WD NVMe cache drive from Samsung 2.5" SATA SSD and started having my cache mnt remounted as root, then I noticed "nvme controller is down, reset" in the logs so I switched back to the 2.5" SATA based SSD. While the NVMe drive was in I was getting `mcpe` hardware errors that the CPU's ECC were correcting, and it was all Level 1 cache errors. It's been up for 12 hours or so since a crash, without the NVMe even installed into the motherboard. No mcpe errors, nor crashes but we'll see. Right now it's running with just the array disks, no cache at all. Update: I should note it often seem to crash when unRAID accessed something off the cache, like mover running or Plex querying the metadata that was on a cached share. Not sure if that matters. Edited July 3, 20224 yr by FEENX Added information.
July 5, 20223 yr Author So after running it 24 hours as just the NAS; then I did another 24 hours with my normal docker containers running. No crashes or errors or any kind in the logs. Within 5 minutes of enabling the VM Engine and my Windows 11 virtual machine it crashed with no error logged, just hard restart. I'm going to try having the VM Engine running without any VMs active and see if if I can narrow down the issue some more.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.