GoldStig23 Posted February 9, 2021 Share Posted February 9, 2021 The current cache drive in my system started throwing some reallocated sector count and uncorrectable error count errors a few weeks ago. I recently purchased a new SSD to create a cache pool, however when trying to balance the drives it fails after a certain point. The counters also don't increase on the second drive, as I've seen it should with other examples. I'm assuming it's failing due to the bad sectors on the primary SSD. What is my recourse? What's the easiest way to backup/restore onto the new cache drive without bringing it into a new pool? tower-diagnostics-20210209-0740.zip Quote Link to comment
JorgeB Posted February 9, 2021 Share Posted February 9, 2021 Current pool is just one device (sdj), and it has read errors: Feb 9 07:26:12 Tower kernel: ata8.00: status: { DRDY ERR } Feb 9 07:26:12 Tower kernel: ata8.00: error: { UNC } Feb 9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible Feb 9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support Feb 9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible Feb 9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support Feb 9 07:26:12 Tower kernel: ata8.00: configured for UDMA/133 Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 Sense Key : 0x3 [current] Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 ASC=0x11 ASCQ=0x4 Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 CDB: opcode=0x28 28 00 24 f3 91 e0 00 00 08 00 Feb 9 07:26:12 Tower kernel: print_req_error: I/O error, dev sdj, sector 619942370 Feb 9 07:26:12 Tower kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 It's logged as an actual device error, run an extended SMART test on the SSD. Quote Link to comment
GoldStig23 Posted February 9, 2021 Author Share Posted February 9, 2021 45 minutes ago, JorgeB said: It's logged as an actual device error, run an extended SMART test on the SSD. Here's the extended test output. This drive is 4.5 years old and has seen a lot of work. It's probably not worth building a cache pool with it. I think I might just take both SSD's out and clone the failing one onto the new, and buy another drive at some point to have a 2 drive cache pool. Is it possible to clone it in unraid at all? Samsung_SSD_850_EVO_500GB_7491F-20210209-1100.txt Quote Link to comment
JorgeB Posted February 9, 2021 Share Posted February 9, 2021 It failed the extended test, so it should be replaced, you can also try a full device write/discard to see if it clears the errors (all data would be deleted). Quote Link to comment
coblck Posted February 9, 2021 Share Posted February 9, 2021 Just move all data om cache to array then pop in new cache drive/drives and then move data back, have a look at THIS may help you out. Quote Link to comment
GoldStig23 Posted February 10, 2021 Author Share Posted February 10, 2021 16 hours ago, coblck said: Just move all data om cache to array then pop in new cache drive/drives and then move data back, have a look at THIS may help you out. I started doing that and most of the data moved, but the mover keeps stopping with about 50GB left, and I get a toast error when saying that the drive "uncorrectable error count" incremented again. I used the built in explorer to figure out what files are still there, and there was a large VM image, that I just removed by using the terminal. The last thing that's really needs to be moved from there is the docker.img. Now, when I invoke the mover, it just refreshes the page and does nothing. Same from the terminal, it goes started and finished in a snap. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.