cpthook Posted March 5, 2020 Share Posted March 5, 2020 (edited) Hello Forum Would you mind having a look at my attached syslog and diagnostics to help me analyze what may have caused this event in the syslog on 3/4 @8:24 that shows issues with disk4. This disk is in a 5 bay hot swap drive cage with all sata cables connected to MOBO sata ports. The disk became disabled along with all of the other drives in the cage (disk 9,11,7 and 5 went down around 0848 in log). Luckily, I was able to gracefully shutdown and then of course re-seated all cables to the drive cage and brought my server back up to notice that drive 4 was still disabled with emulated contents, and the drive marked "unmountable no file system". So I proceeded to to run the xsf_repair with -L flag and was able to rebuild the disk and add it back to my array. No errors were found with drive 4 when running SMART and parity sync/ rebuild finished with no issues so I'm thinking this may be power related. I do have a molex to sata power adapter powering the cage, but it's been in place for about 5 years with no issues however I replaced it with a new one anyway. I also ordered a new drive cage to rule out possible failures with cage also. I just need advice on what may have caused this event and how to prevent it happening again. I think I got lucky that only one drive affected and lost the xsf file system. Thanks, in advance for your help :) PSU = Corsair 650 RTX Drive Cage = iStarUSA BPN-DE350SS-RED tower-diagnostics-20200304-1027.zip Edited March 5, 2020 by cpthook Quote Link to comment
JorgeB Posted March 5, 2020 Share Posted March 5, 2020 This started with the disk dropping offline: Mar 4 08:24:11 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Mar 4 08:24:15 Tower kernel: ata4: COMRESET failed (errno=-16) Mar 4 08:24:21 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Mar 4 08:24:25 Tower kernel: ata4: COMRESET failed (errno=-16) Mar 4 08:24:31 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Mar 4 08:25:00 Tower kernel: ata4: COMRESET failed (errno=-16) Mar 4 08:25:00 Tower kernel: ata4: limiting SATA link speed to 3.0 Gbps Mar 4 08:25:05 Tower kernel: ata4: COMRESET failed (errno=-16) Mar 4 08:25:05 Tower kernel: ata4: reset failed, giving up Mar 4 08:25:05 Tower kernel: ata4.00: disabled Then it came back online, before dropping again, so it suggests a power/connection issue, especially if SMART is OK, replace/swap both cables or slot with a different disk to rule that out, if it happens again to the same disk after that it could be a disk problem. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.