Is my server about to crash?


DizRD
Go to solution Solved by JorgeB,

Recommended Posts

I went to update the plex app, and it seemed to be stuck there forever. Eventually I reloaded the unraid docker page and it gave me an error on that page, something to the effect of plex.ico was read-only.. Then everything went bonkers. I started seeing errors in the systemlog about my cache drive not being accessible. I exported a diagnostic log at that time<attached>:

 

I went ahead and restarted the server to see maybe the docker container update and put something in a stuck state.. When I restarted, 4 of my pool drives said they were unmountable. I searched and found a thread about booting into maintenance mode and running xfs_repair on the drives. It fixed some things and then I rebooted.

 

Everything seems fine now, but i'm worried.

I ran a smart test on the cache drive, and it said it had errors, but I've never had good luck with my smart reports in unraid<attached>:

 

Anyone want to chime in on health insights or other suggestions?

 

Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?

 

 

 

 

deathstar-diagnostics-20220913-2024.zip deathstar-smart-20220913-2221.zip

Link to comment

Cache disk looks OK, problem appear to start with this:

 

Sep 13 07:10:40 deathstar kernel: DMAR: ERROR: DMA PTE for vPFN 0x70 already set (to 2e9b16001 not 20c994d002)

 

This is the same error some users start seeing with the first v6.10 releases and that we then found it can cause data corruption, first time I see it with v6.9.x, but wasn't looking for it before, in any case you should update to v6.10.3 ASAP, that issue can no longer happen there.

Link to comment

Thanks, I'd heard of Dynamix but was kind of worried about manually changing permissions to get it to work in 10.3 in case that messed with any default permission configurations needed for upgrade paths in the future.

 

Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?

 

Link to comment

I will try to update the version again and report back. I guess if it causes permission issues again with the update, I will try to figure out how to fix it with the Dynamix Permissions plugin..

 

For my piece of mind, What does this error mean in the log:

Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2986, gen 0 Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2987, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2988, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2989, gen 0

Link to comment

Weird, cause my syslog is still showing recent BTRFS errors:

Quote

Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5732, gen 0

Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5733, gen 0

Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5734, gen 0

Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5735, gen 0

Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5736, gen 0

Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5737, gen 0

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.