6.8.3 New cache pool not balancing

GoldStig23 · February 9, 2021

The current cache drive in my system started throwing some reallocated sector count and uncorrectable error count errors a few weeks ago. I recently purchased a new SSD to create a cache pool, however when trying to balance the drives it fails after a certain point. The counters also don't increase on the second drive, as I've seen it should with other examples. I'm assuming it's failing due to the bad sectors on the primary SSD. What is my recourse? What's the easiest way to backup/restore onto the new cache drive without bringing it into a new pool?

tower-diagnostics-20210209-0740.zip

JorgeB · February 9, 2021

Current pool is just one device (sdj), and it has read errors:

Feb  9 07:26:12 Tower kernel: ata8.00: status: { DRDY ERR }
Feb  9 07:26:12 Tower kernel: ata8.00: error: { UNC }
Feb  9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible
Feb  9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support
Feb  9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible
Feb  9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support
Feb  9 07:26:12 Tower kernel: ata8.00: configured for UDMA/133
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 Sense Key : 0x3 [current]
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 ASC=0x11 ASCQ=0x4
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 CDB: opcode=0x28 28 00 24 f3 91 e0 00 00 08 00
Feb  9 07:26:12 Tower kernel: print_req_error: I/O error, dev sdj, sector 619942370
Feb  9 07:26:12 Tower kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0

It's logged as an actual device error, run an extended SMART test on the SSD.

GoldStig23 · February 9, 2021

45 minutes ago, JorgeB said:

It's logged as an actual device error, run an extended SMART test on the SSD.

Here's the extended test output. This drive is 4.5 years old and has seen a lot of work. It's probably not worth building a cache pool with it.

I think I might just take both SSD's out and clone the failing one onto the new, and buy another drive at some point to have a 2 drive cache pool. Is it possible to clone it in unraid at all?

Samsung_SSD_850_EVO_500GB_7491F-20210209-1100.txt

JorgeB · February 9, 2021

It failed the extended test, so it should be replaced, you can also try a full device write/discard to see if it clears the errors (all data would be deleted).

coblck · February 9, 2021

Just move all data om cache to array then pop in new cache drive/drives and then move data back, have a look at THIS may help you out.

GoldStig23 · February 10, 2021

16 hours ago, coblck said:

Just move all data om cache to array then pop in new cache drive/drives and then move data back, have a look at THIS may help you out.

I started doing that and most of the data moved, but the mover keeps stopping with about 50GB left, and I get a toast error when saying that the drive "uncorrectable error count" incremented again. I used the built in explorer to figure out what files are still there, and there was a large VM image, that I just removed by using the terminal. The last thing that's really needs to be moved from there is the docker.img.

Now, when I invoke the mover, it just refreshes the page and does nothing. Same from the terminal, it goes started and finished in a snap.

6.8.3 New cache pool not balancing

Recommended Posts

GoldStig23

Link to comment

JorgeB

Link to comment

GoldStig23

Link to comment

JorgeB

Link to comment

coblck

Link to comment

GoldStig23

Link to comment

Join the conversation