hawihoney Posted November 2, 2020 Share Posted November 2, 2020 Yesterday, out of sudden, I did receive errors on my cache pool (2x NVMe M.2 disks). The Unraid main page didn't report these errors even when disk2 of that pool went offline. Today I did restart the server, Unraid comes up, but can't start the docker service. In syslog I see lots of BTRFS errors but Unraid still does not show any problems. It seems that the cache pool does not work any longer but Unraid is working as if nothing had happened. What are the steps to get the cache pool - and Dockers and VMs - back into operation? Rebalance? Diagnostics attached. Many thanks in advance. tower-diagnostics-20201102-0757.zip Quote Link to comment
hawihoney Posted November 2, 2020 Author Share Posted November 2, 2020 (edited) Update: I could get Docker/VM services to start. I had to delete the docker.img file. This one was corrupt. All Dockers were reconstructed and are running currently. BUT: BTRFS still shows errors on my cache pool. What do I need to fix these? Many thanks in advance. Edited November 2, 2020 by hawihoney Quote Link to comment
JorgeB Posted November 2, 2020 Share Posted November 2, 2020 One of the cache devices has been dropping: Nov 2 07:45:29 Tower kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 62451, rd 3119, flush 1898, corrupt 0, gen 0 More info here. Quote Link to comment
hawihoney Posted November 2, 2020 Author Share Posted November 2, 2020 I'm running the stats regulary. That's why I saw the errors. But Unraid didn't notice the errors til now. If I call stats that's the result: [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 Looking at syslog at the same time shows: Nov 2 11:00:04 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 15050349 off 1114939392 csum 0x382b6324 expected csum 0x54474642 mirror 2 Nov 2 11:00:04 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 15050349 off 1114939392 (dev /dev/nvme1n1p1 sector 534070312) Nov 2 11:12:38 Tower kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 1481548267520 wanted 16496481 found 16461691 Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548267520 (dev /dev/nvme1n1p1 sector 337334848) Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548271616 (dev /dev/nvme1n1p1 sector 337334856) Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548275712 (dev /dev/nvme1n1p1 sector 337334864) Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548279808 (dev /dev/nvme1n1p1 sector 337334872) The link in your answer mentioned scrub. Is scrub another name for balance? Many thanks in advance. Quote Link to comment
JorgeB Posted November 2, 2020 Share Posted November 2, 2020 1 hour ago, hawihoney said: Is scrub another name for balance? No: Quote Link to comment
hawihoney Posted November 2, 2020 Author Share Posted November 2, 2020 Oops, never seen that. I have no clue what both do. Ok, did run a corrective Scrub. Does that mean everythings ok now? Quote Link to comment
JorgeB Posted November 2, 2020 Share Posted November 2, 2020 Should be, as long as all shares are set to COW, NOCOW disables data checksum capability, like explained in the link above. Quote Link to comment
hawihoney Posted November 2, 2020 Author Share Posted November 2, 2020 (edited) cow? Hmm, system share on cache is set to Auto. What does that mean? Nevermind, found it in help text. Auto is COW for BTRFS. Thanks a lot. Edited November 2, 2020 by hawihoney Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.