RallyGallery Posted September 6, 2021 Share Posted September 6, 2021 I have a cache pool made up of two SSD 480gb SSD drives in a raid 1. Over the last few days I have noticed that my trim cron job has errored. Just looked at the server and the log file is nearly full. It is full of entries such as : Sep 6 21:01:19 PCServer kernel: sd 10:0:0:0: [sdi] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Sep 6 21:01:19 PCServer kernel: sd 10:0:0:0: [sdi] tag#28 CDB: opcode=0x2a 2a 00 11 52 0f 00 00 00 80 00 Sep 6 21:01:19 PCServer kernel: blk_update_request: I/O error, dev sdi, sector 290590464 op 0x1:(WRITE) flags 0x1800 phys_seg 16 prio class 0 Sep 6 21:01:19 PCServer kernel: BTRFS warning (device sdf1): lost page write due to IO error on /dev/sdi1 (-5) Sep 6 21:01:19 PCServer kernel: BTRFS warning (device sdf1): lost page write due to IO error on /dev/sdi1 (-5) Sep 6 21:01:19 PCServer kernel: BTRFS warning (device sdf1): lost page write due to IO error on /dev/sdi1 (-5) Sep 6 21:01:19 PCServer kernel: BTRFS error (device sdf1): error writing primary super block to device 2 Sep 6 21:01:24 PCServer kernel: scsi_io_completion_action: 53 callbacks suppressed Sep 6 21:01:24 PCServer kernel: sd 10:0:0:0: [sdi] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Sep 6 21:01:24 PCServer kernel: sd 10:0:0:0: [sdi] tag#4 CDB: opcode=0x2a 2a 00 00 12 9c 88 00 00 18 00 Sep 6 21:01:24 PCServer kernel: print_req_error: 54 callbacks suppressed Sep 6 21:01:24 PCServer kernel: blk_update_request: I/O error, dev sdi, sector 1219720 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0 I have a feeling that the second cache drive is failing? I can not also undertake a smart report. I would say the drive is failing or has failed? Unraid is not showing any errors in the 'Main tab'. The drive in question has a lot more reads and writes. As part of a cache pool, I should be able to stop the array. Unassign the drive, power down the server, install new SSD, assign it and it should rebuild? Any help or advice is greatly appreciated. Quote Link to comment
trurl Posted September 6, 2021 Share Posted September 6, 2021 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
RallyGallery Posted September 6, 2021 Author Share Posted September 6, 2021 Full Diagnostics file. pcserver-diagnostics-20210906-2139.zip Quote Link to comment
trurl Posted September 6, 2021 Share Posted September 6, 2021 Check connections on cache2 and post new diagnostics Quote Link to comment
RallyGallery Posted September 7, 2021 Author Share Posted September 7, 2021 I checked all the connections and made sure they were all ok. Rebooted server and can do a SMART test now. Diagnostics attached. Also screenshot of the disk status and I think this disk is failing. pcserver-diagnostics-20210907-1823.zip Quote Link to comment
trurl Posted September 7, 2021 Share Posted September 7, 2021 Not entirely sure how to interpret SMART for SSDs. Run an extended SMART test on the drive. Quote Link to comment
RallyGallery Posted September 7, 2021 Author Share Posted September 7, 2021 Have done an extended smart test and no errors found Quote Link to comment
trurl Posted September 7, 2021 Share Posted September 7, 2021 Post new diagnostics or at least new SMART report for that disk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.