6.8.3 New cache pool not balancing


Recommended Posts

The current cache drive in my system started throwing some reallocated sector count and uncorrectable error count errors a few weeks ago. I recently purchased a new SSD to create a cache pool, however when trying to balance the drives it fails after a certain point. The counters also don't increase on the second drive, as I've seen it should with other examples. I'm assuming it's failing due to the bad sectors on the primary SSD. What is my recourse? What's the easiest way to backup/restore onto the new cache drive without bringing it into a new pool?

 

 

2021-02-09_09-45-08.png

2021-02-09_09-46-05.png

tower-diagnostics-20210209-0740.zip

Link to comment

Current pool is just one device (sdj), and it has read errors:

 

Feb  9 07:26:12 Tower kernel: ata8.00: status: { DRDY ERR }
Feb  9 07:26:12 Tower kernel: ata8.00: error: { UNC }
Feb  9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible
Feb  9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support
Feb  9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible
Feb  9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support
Feb  9 07:26:12 Tower kernel: ata8.00: configured for UDMA/133
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 Sense Key : 0x3 [current]
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 ASC=0x11 ASCQ=0x4
Feb  9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 CDB: opcode=0x28 28 00 24 f3 91 e0 00 00 08 00
Feb  9 07:26:12 Tower kernel: print_req_error: I/O error, dev sdj, sector 619942370
Feb  9 07:26:12 Tower kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0

 

It's logged as an actual device error, run an extended SMART test on the SSD.

Link to comment
45 minutes ago, JorgeB said:

It's logged as an actual device error, run an extended SMART test on the SSD.

 Here's the extended test output. This drive is 4.5 years old and has seen a lot of work. It's probably not worth building a cache pool with it.

 

I think I might just take both SSD's out and clone the failing one onto the new, and buy another drive at some point to have a 2 drive cache pool. Is it possible to clone it in unraid at all?

Samsung_SSD_850_EVO_500GB_7491F-20210209-1100.txt

Link to comment
16 hours ago, coblck said:

Just move all data om cache to array then pop in new cache drive/drives and then move data back, have a look at THIS may help you out.

 I started doing that and most of the data moved, but the mover keeps stopping with about 50GB left, and I get a toast error when saying that the drive "uncorrectable error count" incremented again. I used the built in explorer to figure out what files are still there, and there was a large VM image, that I just removed by using the terminal. The last thing that's really needs to be moved from there is the docker.img.

 

Now, when I invoke the mover, it just refreshes the page and does nothing. Same from the terminal, it goes started and finished in a snap.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.