ADVICE ON HOW TO PREVENT NEXT EVENT


Recommended Posts

Hello Forum

 

Would you mind having a look at my attached syslog and diagnostics to help me analyze what may have caused this event in the syslog on 3/4 @8:24 that shows issues with disk4.  This disk is in a 5 bay hot swap drive cage with all sata cables connected to MOBO sata ports.  The disk became disabled along with all of the other drives in the cage (disk 9,11,7 and 5 went down around 0848 in log).  Luckily, I was able to gracefully shutdown and then of course re-seated all cables to the drive cage and brought my server back up to notice that drive 4 was still disabled with emulated contents, and the drive marked "unmountable no file system".  So I proceeded to to run the xsf_repair with -L flag and was able to rebuild the disk and add it back to my array.  No errors were found with drive 4 when running SMART and parity sync/ rebuild finished with no issues so I'm thinking this may be power related.  I do have a molex to sata power adapter powering the cage, but it's been in place for about 5 years with no issues however I replaced it with a new one anyway.  I also ordered a new drive cage to rule out possible failures with cage  also.      

 

I just need advice on what may have caused this event and how to prevent it happening again.  I think I got lucky that only one drive affected and lost the xsf file system.

 

Thanks, in advance for your help :) 

 

PSU = Corsair 650 RTX

Drive Cage = iStarUSA BPN-DE350SS-RED

tower-diagnostics-20200304-1027.zip

Edited by cpthook
Link to comment

This started with the disk dropping offline:

Mar  4 08:24:11 Tower kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar  4 08:24:15 Tower kernel: ata4: COMRESET failed (errno=-16)
Mar  4 08:24:21 Tower kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar  4 08:24:25 Tower kernel: ata4: COMRESET failed (errno=-16)
Mar  4 08:24:31 Tower kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar  4 08:25:00 Tower kernel: ata4: COMRESET failed (errno=-16)
Mar  4 08:25:00 Tower kernel: ata4: limiting SATA link speed to 3.0 Gbps
Mar  4 08:25:05 Tower kernel: ata4: COMRESET failed (errno=-16)
Mar  4 08:25:05 Tower kernel: ata4: reset failed, giving up
Mar  4 08:25:05 Tower kernel: ata4.00: disabled

 

Then it came back online, before dropping again, so it suggests a power/connection issue, especially if SMART is OK, replace/swap both cables or slot with a different disk to rule that out, if it happens again to the same disk after that it could be a disk problem.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.