March 30, 20233 yr My parity drive failed. I thought it was a disk problem, replaced the disk and started rebuild. Rebuild failed in 5 min. Took all the drives out of the enclosure, reordered them. Reset all cables. Started cache rebuild and it failed again in 5 min. I am seeing additional strange behavior: * Array takes a long time to start * Array will not stop, "retry unmounting disk share(s)" * Docker takes a long time to load applications I recently swapped an SSD cache for an HDD cache drive, but there were no system related files on this cache. I also recently added a new SSD to the system cache pool and converted to raid 1 mode. This is all I can see. Nothing stands out in Fix Common Problems. Googled a bunch of stuff, couldn't find anything similar. Any thought? unraid-diagnostics-20230329-2015.zip
March 30, 20233 yr Community Expert The Marvell based controller you are using is known to have multiple issues, if possible suggest replacing it with one of these:
March 31, 20233 yr Author Thanks Jorge. It's been so happy for so long though!! Here's the log file after the parity build failed, in case anything stands out. Is there any way to validate that it's the card? Thanks for the help. unraid-diagnostics-20230330-2037.zip
March 31, 20233 yr Community Expert The controller appears to be the problem, but since it dropped parity there's no SMART, so post new diags after a reboot to check that.
March 31, 20233 yr Author It looks like you are right. I took the controller card out of the machine, reseated it and restarted parity build. We're 9 hours in and it seems to be operating as expected. The problems did start after reconnecting cables, it may have jarred the PCI connection? However, I am still having Docker issues. Dockers are not running and the Docker page will not load. One strange thing that I noticed is that the docker.img file exists on both the cache drive and disk 3 with different modification dates. It's a single file, is this correct? Disk 3 (2023-03-29 22:42) cache (2023-03-30 21:44) Edit: VMs also seem to be broken and will not start. unraid-diagnostics-20230331-0636.zip Edited March 31, 20233 yr by Kboogie
March 31, 20233 yr Community Expert Cache pool has issues because one of the devices dropped offline in the past: Mar 30 21:26:13 unraid kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdi1 errs: wr 8027649, rd 215782, flush 266, corrupt 122521, gen 0 Run a correcting scrub, more info here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582
March 31, 20233 yr Author That could be it. One of the cache drives is on the SAS controller, so it was acting weird too. Here are the results. [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/sdi1].write_io_errs 8027649 [/dev/sdi1].read_io_errs 215782 [/dev/sdi1].flush_io_errs 266 [/dev/sdi1].corruption_errs 185739 [/dev/sdi1].generation_errs 0 Running a scrub now, although it does not seem to be progressing. Wonder if that is related to ongoing parity rebuild. UUID: 3e318b48-677a-4a28-8e79-9fefe817451e Scrub started: Fri Mar 31 07:12:03 2023 Status: running Duration: 0:02:35 Time left: 0:00:00 ETA: Fri Mar 31 07:14:38 2023 Total to scrub: 257.94GiB Bytes scrubbed: 0.00B (0.00%) Rate: 0.00B/s Error summary: no errors found
March 31, 20233 yr Community Expert 27 minutes ago, Kboogie said: Wonder if that is related to ongoing parity rebuild. It won't be, if the scrub fails you will need to backup what you can from the pool and reformat.
March 31, 20233 yr Author Thanks so much for the help. Could I drop the drive on the SAS controller from the pool and then reintroduce it without reformatting?
March 31, 20233 yr Community Expert You can use a different controller, but scrub will likely still fail.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.