Parity and data disk disabled at the same time

September 15, 20223 yr

Hi all,

two days ago at night, just before going to bed, I realized that my array was off-line and that one parity and one data disk had been disabled at the same time. I was tired and decided to look after everything the next morning and shut down the server. I know now that that was a mistake, because that way I deleted the syslog

I have checked everything I could since then, but can only tell that both drives in question are fine and passed the SMART extended self-test.

My question now is, what is the safest way to restore these drives. Do I unassign the data disk first and then rebuild it unto itself? Do I start with the parity, since it is the faster drive and the rebuild will be quicker? Or am I thinking wrong and my situation calls for a completely different approach? I'll attach the diagnostics and hope they are sufficient even though the relevant syslog is overwritten.

Thanks in advance for any help.

Cheers,

Michael

deathstar-diagnostics-20220914-0949.zip deathstar-smart-20220915-0944.zip

Quote

September 15, 20223 yr

Community Expert

Disks look OK and since the emulated disk is mounting, and assuming contents look correct, you can rebuild on top:

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

You can do both at the same time.

Quote

September 15, 20223 yr

Author

Damn, that was fast Thanks Jorge, I'll do that. I just had to check with you guys first, because this is the first rebuild for my on Unraid and I wanted to make sure.

Thanks for the blazingly fast answer.

Quote

September 15, 20223 yr

Community Expert

You're welcome, and if it happens again don't forget to save the diags before rebooting.

Quote

September 16, 20223 yr

Author

I am not sure if it is preferred to create a new topic or to post in this one for continuity purposes. But I need help again, this time for real.

I had Unraid rebuild the array and after 12 hours everything was back to normal. Yay!

This morning I did the update to 6.11.0-rc5 and rebooted (saving the diagnostics beforehand, just in case). Everything worked normally for while and then suddenly at 11:00 am I get read errors an ALL 16 disks. I was out-of-office, so I see this only now.

Attached are both diagnostic files, the one from this morning after the update and before the reboot and one I did just now before taking the array off-line.

I hope there's no reason for panic and would appreciate any help.

Cheers,

Michael

deathstar-diagnostics-20220916-0912.zip deathstar-diagnostics-20220916-1813.zip

Quote

September 16, 20223 yr

Community Expert

Sep 16 11:02:30 Deathstar kernel: hpsa 0000:0d:00.0: handle_ioaccel_mode2_error: device is gone!

Problem with the RAID controller, reboot/power cycle to see if it comes back then and if yes post new dags after array start.

Quote

September 16, 20223 yr

Author

Hi Jorge, thanks for helping me again. I rebooted and it came back up. Diags enclosed.

deathstar-diagnostics-20220916-1846.zip

Quote

September 16, 20223 yr

That's odd Orca... I too also have a parity and data disk drop from the array over night... I rebuilt both of them with some spare disks that I have...

Both of the disks SMART report looks ok...

Im not going to say that this might be bug unless more reports come up... but timing between your failure and mine is odd

tower-diagnostics-20220916-1203.zip tower-diagnostics-20220915-0800.zip

Quote

September 16, 20223 yr

Community Expert
Solution

59 minutes ago, 0rca said:

I rebooted and it came back up. Diags enclosed.

Everything looks fine, hopefully it was a one time thing, if it happens again I suggest going back to last known good release to see if it's driver/kernel related.

Quote

September 16, 20223 yr

Author

Thanks Jorge. I'll do that. It could also be a failing HBA (HP Smart Array H240) though, right? I might get another one just in case, it's good to have a spare ready anyway.

Quote

September 16, 20223 yr

Community Expert

Just now, 0rca said:

It could also be a failing HBA (HP Smart Array H240) though, right?

It could, could also be overheating, or just check it's well seated or try it in a different PCIe slot.

Quote

September 16, 20223 yr

Author

Thanks, will check all that.

Quote

September 16, 20223 yr

Community Expert

2 hours ago, mathomas3 said:

timing between your failure and mine is odd

Timing not that odd. Every single day on this forum people have disconnected disks due to hardware problems, often bad connections.

Quote

September 16, 20223 yr

3 minutes ago, trurl said:

Timing not that odd. Every single day on this forum people have disconnected disks due to hardware problems, often bad connections.

We had some shifty power that day... hoping that was all that it was... Ordered a much larger UPS the same day

Quote

September 17, 20223 yr

Author

Just FYI, It happened again today, all disk showed errors and the HBA was gone. I caught it just in time, went to the basement and measured temps. On the HBA heatsink it showed close to 70 degrees Celsius, so the die temperature would be even higher. Not good.

I've added some active cooling to the HBA and booted back up. This time Parity 1 and Disk 4 were disabled. I am now rebuilding, hoping that my cooling is now sufficient.

I have two question, to better understand the situation: Is it normal, that two disks (1 parity and 1 data) are disabled, because in that specific moment, when the HBA crashes there's bound to always be one data drive with I/O and one parity or am I simply lucky that it is just two, but could easily be more?

Theoretically, what would happen in the latter case? Assuming that only a few bytes might actually be wrong, would there be a way to restore the rest of the data or would I have to copy the data from each drives somewhere else to save it? Having been a raid user for decades, the whole Unraid concept is still new to me....

Quote

September 18, 20223 yr

Community Expert

17 hours ago, 0rca said:

Is it normal, that two disks (1 parity and 1 data) are disabled, because in that specific moment, when the HBA crashes there's bound to always be one data drive with I/O and one parity or am I simply lucky that it is just two, but could easily be more?

It can disable any the disks, parity or data, which one(s) it's luck of the draw, but it won't disabled more disks than there are parity drives.

Quote

Parity and data disk disabled at the same time

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)