February 1, 20242 yr Day 3 of errors continuing....diags attached. I have previously erased / formatted the drives and rebuilt docker, restored everything back to where it was - just for the errors to return again overnight. Next step was to wipe the drives / reformat, rinse / repeat same issue while restoring files - have some beefy extra files (~45GB worth) and it never completed restoring it. Ram Memtest for ~1 hour - 2 complete test cycles - no errors. (2*4GB) Base on some other comments i've seen I decided to replace the SATA cables. erase / reformat - in the process of restoring the extra files again and it freezes up. I restart the system and i have CRC errors on one of the drives (had them show up before on the other drive in the pool). but the pool mounts and evertyhing so i'm reinstalling my containers to see what it does or doesn't do. the space used on the pool is in the right range ~137GB out of 1TB. interstingly enough, it's now showing one of these drives as part of the pool and in unassigned devices not mounted. this didn't happen prior to me swapping cables out. one of the ssd's is a little over a year old, the other shows ~2+years powered on. Some of the daily stuff hitting the cache 24*7 would be plex transcode share and my omada SDN working folders. not sure if that would put excessive wear and tear on the drive. I did do SMART tests on both of them when they were working earlier and they both reported no issues. thought or suggestions for next troubleshooting steps? couple of things coming to mind: the cables i used to replace with are "older" but never really used previously. the ones i took out i believe came with my motherboard from a build from last year. (not this pc build - this mb / cpu is ~10yo). i did use the same ports on the board that i've been using for over year now. trying to change just one thing at a time where feasible rlyeh-diagnostics-20240201-1431.zip
February 1, 20242 yr Author fwiw i'm thinking of putting cables back since they're newer, then trying to rebuild everything with just a single drive (the newer one) and see what that does or doesn't do... also have a small 240GB drive that's been sitting on shelf for awhile. another option to try with. Edited February 1, 20242 yr by Azreal sp
February 2, 20242 yr Community Expert Crucial SSD is dropping offline: Feb 1 11:59:50 Rlyeh kernel: ata1: COMRESET failed (errno=-32) Feb 1 11:59:50 Rlyeh kernel: ata1: reset failed, giving up Feb 1 11:59:50 Rlyeh kernel: ata1.00: disable device Feb 1 11:59:50 Rlyeh kernel: ata1: EH complete Feb 1 11:59:50 Rlyeh kernel: sd 1:0:0:0: rejecting I/O to offline device Feb 1 11:59:50 Rlyeh kernel: I/O error, dev sdb, sector 73832 op 0x1:(WRITE) flags 0x1800 phys_seg 88 prio class 2 Could be cables, but since it's an MX500 see here: https://forums.unraid.net/topic/137795-6115-issues-with-btrfs-cache-after-ssds-replacement-and-hw-upgrade/?do=findComment&comment=1265949 Also see here for better pool monitoring: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582
February 2, 20242 yr Author 6 hours ago, JorgeB said: Crucial SSD is dropping offline: Feb 1 11:59:50 Rlyeh kernel: ata1: COMRESET failed (errno=-32) Feb 1 11:59:50 Rlyeh kernel: ata1: reset failed, giving up Feb 1 11:59:50 Rlyeh kernel: ata1.00: disable device Feb 1 11:59:50 Rlyeh kernel: ata1: EH complete Feb 1 11:59:50 Rlyeh kernel: sd 1:0:0:0: rejecting I/O to offline device Feb 1 11:59:50 Rlyeh kernel: I/O error, dev sdb, sector 73832 op 0x1:(WRITE) flags 0x1800 phys_seg 88 prio class 2 Could be cables, but since it's an MX500 see here: https://forums.unraid.net/topic/137795-6115-issues-with-btrfs-cache-after-ssds-replacement-and-hw-upgrade/?do=findComment&comment=1265949 Thanks i do have that firmware - i ran into issues with that immediately when i set it up and got pointed in the right direction. reading the pooling info now. setup my 240GB as a single cache drive and got everything up and working. setup another cache pool with the 2 1TB drives that started all of this, i've been able to copy large amounts of files over wwithout any errors. left powered on overnight no errors this morning. once my regular scheduled parity check finishes i'm going to try switching back to the drives that started all of this and see how it goes. i'm getting good at tearing it down and spinning it back up at least :). 6 hours ago, JorgeB said:
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.