Double Disk Failure & CPU Spinning


mdoom

Recommended Posts

Hello,

I came home to work today to find quite the surprise.  Had alerts for 2 disks being disabled, while 3 in total had read errors, one just recovered from it.

I didn't panic, as I know I've had issues in the past with my current SAS cards, and occasionally having flukes and kicking multiple drives off. (I have replacement cards, just haven't gotten them installed yet)

 

Anyways, before I did anything, I figured I should stop all my dockers to just prevent anything from making it worse.  And when I attempted that, everything started hanging.  I have one CPU maxed at 100%.  The process is "kworker/1:0+events".   So before I did anything, I thought I'd download diagnostics and just get some extra sets of eyes helping me on best way to move forward.

 

disk2 and disk4 are the disabled ones currently.  disk6 is the one that also had read error but seems okay as in its 'green'.  Although looking at the diagnostics, disk2 4 and 6 all didn't produce SMART reports.  Also why I think it had to be something with my card / cables.

 

Any guidance is much appreciated.  As I sit here, CPU still spinning, and nothing happening. :-)

tiger-diagnostics-20181204-1439.zip

Link to comment
12 minutes ago, johnnie.black said:

If you know they are a problem and you already have replacements, what are you waiting for to replace them?

 

Well i dont know if they are a problem for sure, but yeah. I just haven't gotten around to replacing as the ones I have will require flashing new bios to them first, and just have been busy.

 

I did go ahead and get the server rebooted.  disk 2 and 4 both still had their disabled status, but disk 2 is good from what I can tell.  4 is throwing some legit smart errors and I'd like to replace.

 

Are there any issues with leaving disk2 disabled and emulated, while replacing disk 4? (and replacing disk 4 with a larger disk..  replace 3 TB with 8 TB)  

Once I get 4 replaced, then I'm confident I can 'trust' 2 and rebuild parity from there.  Although if I can get another replacement disk this week I may rebuild that too anyway then and then do further testing on the drive in the meantime.   This is first time I've really had any issues since switching to dual-parity so its a new world for me. 🙂

 

EDIT: I see you said it was the SAS controller that crashed.  So yes, that will also motivate me a bit now to get those cards replaced asap.

Edited by mdoom
Link to comment

You can rebuild only disk4, if disk2 is loooking good (there wasn't a SMART report for both on the previous diags since they dropped offline) and the emulated disk is mounting correctly you can also rebuild it at the same time, though using a spare disk for disk2 would be safer, and I would recommend replacing the SASLPs before rebuilding.

Link to comment
2 minutes ago, johnnie.black said:

You can rebuild only disk4, if disk2 is loooking good (there wasn't a SMART report for both on the previous diags since they dropped offline) and the emulated disk is mounting correctly you can also rebuild it at the same time, though using a spare disk for disk2 would be safer, and I would recommend replacing the SASLPs before rebuilding.

Thanks.  Yep i'm already working on getting those cards setup now first. Then will just rebuild 2 back onto itself, and 4 onto a new drive i have ready to go.  Thank you!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.