SSD Cache failing?

September 6, 20214 yr

I have a cache pool made up of two SSD 480gb SSD drives in a raid 1. Over the last few days I have noticed that my trim cron job has errored. Just looked at the server and the log file is nearly full. It is full of entries such as :

Sep 6 21:01:19 PCServer kernel: sd 10:0:0:0: [sdi] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Sep 6 21:01:19 PCServer kernel: sd 10:0:0:0: [sdi] tag#28 CDB: opcode=0x2a 2a 00 11 52 0f 00 00 00 80 00 Sep 6 21:01:19 PCServer kernel: blk_update_request: I/O error, dev sdi, sector 290590464 op 0x1:(WRITE) flags 0x1800 phys_seg 16 prio class 0 Sep 6 21:01:19 PCServer kernel: BTRFS warning (device sdf1): lost page write due to IO error on /dev/sdi1 (-5) Sep 6 21:01:19 PCServer kernel: BTRFS warning (device sdf1): lost page write due to IO error on /dev/sdi1 (-5) Sep 6 21:01:19 PCServer kernel: BTRFS warning (device sdf1): lost page write due to IO error on /dev/sdi1 (-5) Sep 6 21:01:19 PCServer kernel: BTRFS error (device sdf1): error writing primary super block to device 2 Sep 6 21:01:24 PCServer kernel: scsi_io_completion_action: 53 callbacks suppressed Sep 6 21:01:24 PCServer kernel: sd 10:0:0:0: [sdi] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Sep 6 21:01:24 PCServer kernel: sd 10:0:0:0: [sdi] tag#4 CDB: opcode=0x2a 2a 00 00 12 9c 88 00 00 18 00 Sep 6 21:01:24 PCServer kernel: print_req_error: 54 callbacks suppressed Sep 6 21:01:24 PCServer kernel: blk_update_request: I/O error, dev sdi, sector 1219720 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0

I have a feeling that the second cache drive is failing? I can not also undertake a smart report.

I would say the drive is failing or has failed? Unraid is not showing any errors in the 'Main tab'. The drive in question has a lot more reads and writes.

As part of a cache pool, I should be able to stop the array. Unassign the drive, power down the server, install new SSD, assign it and it should rebuild?

Any help or advice is greatly appreciated.

Quote

September 6, 20214 yr

Community Expert

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Quote

September 6, 20214 yr

Author

Full Diagnostics file.

pcserver-diagnostics-20210906-2139.zip

Quote

September 6, 20214 yr

Community Expert

Check connections on cache2 and post new diagnostics

Quote

September 7, 20214 yr

Author

I checked all the connections and made sure they were all ok. Rebooted server and can do a SMART test now. Diagnostics attached. Also screenshot of the disk status and I think this disk is failing.

pcserver-diagnostics-20210907-1823.zip

Quote

September 7, 20214 yr

Community Expert

Not entirely sure how to interpret SMART for SSDs. Run an extended SMART test on the drive.

Quote

September 7, 20214 yr

Author

Have done an extended smart test and no errors found

Quote

September 7, 20214 yr

Community Expert

Post new diagnostics or at least new SMART report for that disk

Quote

SSD Cache failing?

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)