Server crashing after cache upgrade


Recommended Posts

This all started back at the end of November, so about a month ago when I went from crucial SSD's for my cache drives (kept getting 197 errors on them) to Samsung 980 nvme drives. I installed 2 pci adapters to then install the 2, 2TB nvme's. I've been dealing with the server crashing anywhere from 2-4 days apart, consistently, and I can't find anything in the logs referencing what the issue is at the time it occurred. This morning 12/31/23 it crashed at roughly 10:11 am and the only thing I saw in the syslog was well before that at 9:02 am. I'm sure with everything that I've messed with over the past month, that there's things misconfigured as I've got more than one thing wrong with the server/disks at the moment. When it crashed at 10:11 am this morning, upon rebooting it, the docker containers are ALL gone, but the appdata and folders remain on the cache, and I do have them backed up, but it wouldn't let me restore. Something seems like it's corrupted now with the docker image but it won't let me access or delete it, on the server itself in the docker settings, or in the directory itself. I can navigate to /mnt/usr/system/docker/ but then it won't let me access the folder with the docker image in it because of file permissions it says.

 

Also, one of my 12TB parity drives started giving me errors about a week ago so I replaced and upgraded those to 14TB drives. I still have the 2 crucial drives in the system, but they aren't mounted. Oh, also, I changed the cache file system to zfs from btrfs when I upgraded to the nvme's. Would someone mind poking around my diagnostics and see if you guys can point me in the right direction here? I'd surely appreciate it!

Oh, I also am currently running memtest but it's 256GB so those results might be tomorrow before I see them.

 

 

trailheadmedia-diagnostics-20231231-1212.zip

Edited by rocky_mtn
Link to comment

Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.