February 2, 20242 yr A couple of days ago my Plex server started to stop responding. A restart would bring it back for roughly 25 hours before it would go down again. I enabled debug logging, but saw nothing out of the ordinary. I decided to delete the docker and recreate it. The docker compose failed with "Failed to create btrfs snapshot: input/output error", leading me to believe this is a corrupt docker image. I deleted/recreated my docker image and re-downloaded all the dockers. After about the 10th image starting up, I started getting loop2 WRITE errors again. I'm in the process of moving things off my NVME cache pool, but I'm at a loss for where the image corruption could be coming from. A short SELF test of each NVME shows 0 errors. The Attributes are not telling me much either. I do have a pre-failing disk 5, which I have the replacement sitting here on my desk. But I cannot see how disk 5 would be related to my docker image corrupting so quickly after recreating it. Am I missing anything? Or could my NVME drive(s) be failing? unraid-diagnostics-20240202-1308.zip EDIT: I am running the extended tests rn. I also see the zfs pool is showing an error. Anyway to dig into that deeper to know which drive is bad? Or are both having issues? Edited February 2, 20242 yr by UncleStu typo - addition of zpool status screenshot
February 3, 20242 yr Community Expert Ryzen with overclocked RAM like you have is known to corrupt data, so I would recommend correcting that and then recreate the pool.
February 3, 20242 yr Author Overclocked RAM? I wasn't aware I overclocked anything as I know servers don't care for it. Mind sharing where in the diags that is? Or how you came to see that I have overclocked RAM?
February 4, 20242 yr Community Expert Meminfo.txt in the diags, for your config RAM should be set @ 2666MT/s max, it's running @ 3200MT/s.
February 4, 20242 yr Author I see the speed set to 3200MT in the meminfo.txt file, but I couldn't find where you saw that it should be 2666.
February 5, 20242 yr Author I see the speed set to 3200MT in the meminfo.txt file, but I couldn't find where you saw that it should be 2666. EDIT: I changed my RAM speed to auto and verified in the bios that it was 2666. I then erased/re-created both my cache pools. After mover finished moving things back to my nvme cache pool, it still corrupted part way through starting dockers. How can I tell why/what is corrupting this? unraid-diagnostics-20240204-1904.zip
February 5, 20242 yr Community Expert 13 hours ago, UncleStu said: but I couldn't find where you saw that it should be 2666. Click the link I've posted above.
February 5, 20242 yr Author I did but didn't fully follow it. Either way, my docker image is still corrupting after formatting the cache pools. The extended smart tests showed no errors either.
February 5, 20242 yr Community Expert 56 minutes ago, UncleStu said: The extended smart tests showed no errors either. Very unlikely this is a device problem.
February 5, 20242 yr Author 1 minute ago, JorgeB said: Very unlikely this is a device problem. Assuming 'device' could be as broad as hardware, and as in the nvme devices? How can I go about troubleshooting this more? Nothing has changed in the system for a couple of years. Except now I have the latest BIOS and still have the same issue. Oh, and I did swap out my pre-failing disk 5. The data rebuild is 90% complete. But I can't see how that would be an issue. I have 2 cache pools. NVMe and SSD. When the docker image is on the NVMe pool, along with my appdata, it corrupts. I have been using /mnt/user/system/docker/ as the path for the image. And of course when I move it, I stop the services. Last night, when I recreated the image again, I used /mnt/s-cache/system/docker instead. Putting the image on the SSD cache. The appdata is still at /mnt/user/appdata/ with the appdata share using the cache as preferred and array as backup. I have not had any errors since starting the dockers last night, but my Plex server UI did timeout and required a restart of the docker. This is how it started when I first began to notice corruption. This are the last errors. Quote Feb 4 19:05:47 unRAID kernel: BTRFS warning (device loop3): csum failed root 407 ino 7327 off 16207872 csum 0x5780f703 expected csum 0x43ec18ca mirror 1 Feb 4 19:05:47 unRAID kernel: BTRFS error (device loop3): bdev /dev/loop3 errs: wr 0, rd 0, flush 0, corrupt 344, gen 0 Feb 4 19:05:47 unRAID kernel: BTRFS warning (device loop3): csum failed root 407 ino 7327 off 16207872 csum 0x5780f703 expected csum 0x43ec18ca mirror 1 Feb 4 19:05:47 unRAID kernel: BTRFS error (device loop3): bdev /dev/loop3 errs: wr 0, rd 0, flush 0, corrupt 345, gen 0 Feb 4 19:05:48 unRAID kernel: BTRFS warning (device loop3): csum failed root 407 ino 7327 off 16207872 csum 0x5780f703 expected csum 0x43ec18ca mirror 1 Feb 4 19:05:48 unRAID kernel: BTRFS error (device loop3): bdev /dev/loop3 errs: wr 0, rd 0, flush 0, corrupt 346, gen 0 Feb 4 19:05:48 unRAID kernel: BTRFS warning (device loop3): csum failed root 407 ino 7327 off 16207872 csum 0x5780f703 expected csum 0x43ec18ca mirror 1 Feb 4 19:05:48 unRAID kernel: BTRFS error (device loop3): bdev /dev/loop3 errs: wr 0, rd 0, flush 0, corrupt 347, gen 0
February 5, 20242 yr Community Expert Solution 6 minutes ago, UncleStu said: Assuming 'device' could be as broad as hardware, and as in the nvme devices? Device here means NVMe device(s). I would start by running memtest.
February 6, 20242 yr Author 8 hours ago, JorgeB said: I would start by running memtest. memtest found 2 of the 4 sticks had errors. Pulled those for now and started a RMA request with G.Skill. Memtest passed on the two sticks I left in too. I moved my system share and appdata into the same nvme pool. Everything started up with no issues. Historically the docker image would throw errors when on the nvme pool. No errors on the SSD pool or the array. I also added the 'nvme_core.default...' to my default boot, from this post. Time will tell at this point. Thank you @JorgeB for your assistance.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.