DizRD Posted September 14, 2022 Share Posted September 14, 2022 I went to update the plex app, and it seemed to be stuck there forever. Eventually I reloaded the unraid docker page and it gave me an error on that page, something to the effect of plex.ico was read-only.. Then everything went bonkers. I started seeing errors in the systemlog about my cache drive not being accessible. I exported a diagnostic log at that time<attached>: I went ahead and restarted the server to see maybe the docker container update and put something in a stuck state.. When I restarted, 4 of my pool drives said they were unmountable. I searched and found a thread about booting into maintenance mode and running xfs_repair on the drives. It fixed some things and then I rebooted. Everything seems fine now, but i'm worried. I ran a smart test on the cache drive, and it said it had errors, but I've never had good luck with my smart reports in unraid<attached>: Anyone want to chime in on health insights or other suggestions? Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache? deathstar-diagnostics-20220913-2024.zip deathstar-smart-20220913-2221.zip Quote Link to comment
DizRD Posted September 14, 2022 Author Share Posted September 14, 2022 Here's the Diagnostic after the reboot and things seem operational. Btw I haven't moved any cables or anything, everything hardware wise has been solid until now. deathstar-diagnostics-20220913-2334.zip Quote Link to comment
JorgeB Posted September 14, 2022 Share Posted September 14, 2022 Cache disk looks OK, problem appear to start with this: Sep 13 07:10:40 deathstar kernel: DMAR: ERROR: DMA PTE for vPFN 0x70 already set (to 2e9b16001 not 20c994d002) This is the same error some users start seeing with the first v6.10 releases and that we then found it can cause data corruption, first time I see it with v6.9.x, but wasn't looking for it before, in any case you should update to v6.10.3 ASAP, that issue can no longer happen there. Quote Link to comment
DizRD Posted September 14, 2022 Author Share Posted September 14, 2022 Interesting! I will try the update. I had updated to 10.3 before, but ran into surprise permission issues on shares and rolled back to my previous Unraid version and the permission issues went away. I guess I will have to see if they are fixed in the newest version Quote Link to comment
kizer Posted September 14, 2022 Share Posted September 14, 2022 You can install Dynamix File Manager and quickly fix Permission issues depending on what they are. Quote Link to comment
DizRD Posted September 15, 2022 Author Share Posted September 15, 2022 Thanks, I'd heard of Dynamix but was kind of worried about manually changing permissions to get it to work in 10.3 in case that messed with any default permission configurations needed for upgrade paths in the future. Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache? Quote Link to comment
trurl Posted September 15, 2022 Share Posted September 15, 2022 2 minutes ago, DizRD said: system shares are told to prefer cache.. What happens if cache really fails? You need to have backups. There are plugins to backup appdata and VMs Quote Link to comment
DizRD Posted September 16, 2022 Author Share Posted September 16, 2022 Absolutely, Backups have been made, but I'm more curious about if the system is fault tolerant enough to operate if the cache drive dies or if the system halts, since I don't know how long it would take me to get a replacement cache drive. Quote Link to comment
JorgeB Posted September 16, 2022 Share Posted September 16, 2022 Server works without cache, though you lose any services there, e.g., if you have dockers or VMs using it. Quote Link to comment
trurl Posted September 16, 2022 Share Posted September 16, 2022 If Docker / VM Manager is enabled when the array is started, and they don't find their .img files because of missing cache, new ones will be created on the array, but they won't have any contents. Quote Link to comment
DizRD Posted September 18, 2022 Author Share Posted September 18, 2022 I will try to update the version again and report back. I guess if it causes permission issues again with the update, I will try to figure out how to fix it with the Dynamix Permissions plugin.. For my piece of mind, What does this error mean in the log: Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2986, gen 0 Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2987, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2988, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2989, gen 0 Quote Link to comment
JorgeB Posted September 18, 2022 Share Posted September 18, 2022 15 minutes ago, DizRD said: What does this error mean in the log: It means data corruption is being detected by btrfs, likely the result of the error above. On 9/14/2022 at 9:21 AM, JorgeB said: we then found it can cause data corruption Quote Link to comment
DizRD Posted September 18, 2022 Author Share Posted September 18, 2022 Updated to 10.3, still seeing the btrfs errors. What should I check out now? deathstar-diagnostics-20220918-0427.zip Quote Link to comment
Solution JorgeB Posted September 18, 2022 Solution Share Posted September 18, 2022 Once there's corruption updating won't fix anything, but it should prevent more, run a scrub, if corrupt files are found they will be listed in the syslog, delete/replace them from backups. Quote Link to comment
DizRD Posted September 18, 2022 Author Share Posted September 18, 2022 Thanks! Running Scrub now. Now that I've updated to 10.3, should I open a separate thread for the permission issue it creates? Quote Link to comment
JorgeB Posted September 18, 2022 Share Posted September 18, 2022 Yes, if you still have issues with that do, but see here and here for some info, usually v6.10 is not the problem, how the containers are configured is. Quote Link to comment
DizRD Posted September 19, 2022 Author Share Posted September 19, 2022 (edited) Hmmm, I ran scrub, tracked down the 3 files it mentioned, and removed them. I've rerun scrub since then and found no errors, but I'm still seeing BTRFS errors on the cache drive. Do I need to restart? deathstar-diagnostics-20220919-0619.zip Edited September 19, 2022 by DizRD Quote Link to comment
JorgeB Posted September 19, 2022 Share Posted September 19, 2022 Not seeing any errors after the scrub. Quote Link to comment
DizRD Posted September 20, 2022 Author Share Posted September 20, 2022 Weird, cause my syslog is still showing recent BTRFS errors: Quote Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5732, gen 0 Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5733, gen 0 Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5734, gen 0 Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5735, gen 0 Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5736, gen 0 Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5737, gen 0 Quote Link to comment
JorgeB Posted September 20, 2022 Share Posted September 20, 2022 Run another scrub and post new diags when done. Quote Link to comment
DizRD Posted September 23, 2022 Author Share Posted September 23, 2022 Marked as solved, I haven't seen the btrfs error after the scrub and a reboot. I will open a separate ticket for the permissions problems. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.