Jump to content

Updated to 6.12.11 from 6.11.5 - Drives start dropping, data might be lost.


Recommended Posts

My journey with 6.12 continues. 

 

I figured it was fine to upgrade finally - but it seems like I might have been too hasty as it seems like 6.12 still doesn't like my system. 

I updated - things seemed fine, I followed all the 'update' steps online. 

 

After a couple of minutes my first parity drive and drive 6 gets 'disabled'. I don't know why, there was no initial error or anything I could see - so I figured its just something wonky. Stopped the array, but it got hung on "sync filesystem" where I had to just shut it down. 

Started up and drives were still disabled, so I unassigned them, started in maintenance mode and then assigned them back again and fired it up. They both had passed smart tests and then 'rebuild' started where then both drives at the same time dropped again. But the sync continued to run with the other 'good' parity drive accumulating millions of errors, where I decided it was time to stop the sync. System is hung again, array won't stop and can't generate a diagnostic file.... Ugh, I feel like my system is cursed - any suggestions on how to proceed forward without losing all my data? Even pressing the shut down button the GUI or doing the command through SSH causes nothing to really happen. 

If it shuts down and restarts, I will try to generate a diag file and post it here. Ugh.....

Link to comment
Posted (edited)

Welp, is it possible that this could be from the HBA failing/overheating? I have had a past issue where I figured an HBA issue as all the drives dropped that were connected to HBA but none in the MB after firing up a vm connected to a video card next to the HBA.

What is the protocol in this situation of trying to renable disabled HDDs when it seems like the HDDs is fine (just did a short smart test and its fine)?

I do have a new HBA (Lenovo 430-16i) which is supposed to run cooler and has a larger heat sink - but realized today don't have the right cables so can't use it right now, but did order the mini-sas cables.  

Edited by hsingh314
Link to comment

Could this be due to a failing HBA?

I noticed that the only disks that were getting the errors were the ones attached to the HBA. When in maintenance mode everything was fine but as soon as the parity sync started, it seems fine for a couple of minutes and then everything goes down. I can't even see the smart results with the drives attached to the HBA, but I could for the ones attached to the MB after the parity sync stops and fails.

Could it be fine until a lot of data is going through it and then it overheats or something happens and the HBA fails?

Link to comment

I tried downgrading to 6.11.5 to see if that would make any difference, but same thing happened. 

I did a diag. bishop-diagnostics-20240805-2050.zip

 

But also I started getting a syslog error Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134045728 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 18

 

And I think this is why its not allowing me to shutdown the server without pushing the power button after the HBA disappears. 

Link to comment

Yikes! - I appreciate the confirmation - I dunno why I couldn't find that error. I have tried two separate slots and I have a small fan on the heatsink. 

 

I have been having issues with my HBA in the past and even repasted the heat sink - but maybe it's just been on the last legs in general. Hopefully, the Lenovo 430-16i will work properly (I think its a rebranded LSI 9400-16i) out of the box as I am not looking forward to it if I have to flash the bios & firmware of the HBA. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...