September 15, 20223 yr Hi all, two days ago at night, just before going to bed, I realized that my array was off-line and that one parity and one data disk had been disabled at the same time. I was tired and decided to look after everything the next morning and shut down the server. I know now that that was a mistake, because that way I deleted the syslog I have checked everything I could since then, but can only tell that both drives in question are fine and passed the SMART extended self-test. My question now is, what is the safest way to restore these drives. Do I unassign the data disk first and then rebuild it unto itself? Do I start with the parity, since it is the faster drive and the rebuild will be quicker? Or am I thinking wrong and my situation calls for a completely different approach? I'll attach the diagnostics and hope they are sufficient even though the relevant syslog is overwritten. Thanks in advance for any help. Cheers, Michael deathstar-diagnostics-20220914-0949.zip deathstar-smart-20220915-0944.zip
September 15, 20223 yr Disks look OK and since the emulated disk is mounting, and assuming contents look correct, you can rebuild on top: https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself You can do both at the same time.
September 15, 20223 yr Author Damn, that was fast Thanks Jorge, I'll do that. I just had to check with you guys first, because this is the first rebuild for my on Unraid and I wanted to make sure. Thanks for the blazingly fast answer.
September 15, 20223 yr You're welcome, and if it happens again don't forget to save the diags before rebooting.
September 16, 20223 yr Author I am not sure if it is preferred to create a new topic or to post in this one for continuity purposes. But I need help again, this time for real. I had Unraid rebuild the array and after 12 hours everything was back to normal. Yay! This morning I did the update to 6.11.0-rc5 and rebooted (saving the diagnostics beforehand, just in case). Everything worked normally for while and then suddenly at 11:00 am I get read errors an ALL 16 disks. I was out-of-office, so I see this only now. Attached are both diagnostic files, the one from this morning after the update and before the reboot and one I did just now before taking the array off-line. I hope there's no reason for panic and would appreciate any help. Cheers, Michael deathstar-diagnostics-20220916-0912.zip deathstar-diagnostics-20220916-1813.zip
September 16, 20223 yr Sep 16 11:02:30 Deathstar kernel: hpsa 0000:0d:00.0: handle_ioaccel_mode2_error: device is gone! Problem with the RAID controller, reboot/power cycle to see if it comes back then and if yes post new dags after array start.
September 16, 20223 yr Author Hi Jorge, thanks for helping me again. I rebooted and it came back up. Diags enclosed. deathstar-diagnostics-20220916-1846.zip
September 16, 20223 yr That's odd Orca... I too also have a parity and data disk drop from the array over night... I rebuilt both of them with some spare disks that I have... Both of the disks SMART report looks ok... Im not going to say that this might be bug unless more reports come up... but timing between your failure and mine is odd tower-diagnostics-20220916-1203.ziptower-diagnostics-20220915-0800.zip
September 16, 20223 yr Solution 59 minutes ago, 0rca said: I rebooted and it came back up. Diags enclosed. Everything looks fine, hopefully it was a one time thing, if it happens again I suggest going back to last known good release to see if it's driver/kernel related.
September 16, 20223 yr Author Thanks Jorge. I'll do that. It could also be a failing HBA (HP Smart Array H240) though, right? I might get another one just in case, it's good to have a spare ready anyway.
September 16, 20223 yr Just now, 0rca said: It could also be a failing HBA (HP Smart Array H240) though, right? It could, could also be overheating, or just check it's well seated or try it in a different PCIe slot.
September 16, 20223 yr 2 hours ago, mathomas3 said: timing between your failure and mine is odd Timing not that odd. Every single day on this forum people have disconnected disks due to hardware problems, often bad connections.
September 16, 20223 yr 3 minutes ago, trurl said: Timing not that odd. Every single day on this forum people have disconnected disks due to hardware problems, often bad connections. We had some shifty power that day... hoping that was all that it was... Ordered a much larger UPS the same day
September 17, 20223 yr Author Just FYI, It happened again today, all disk showed errors and the HBA was gone. I caught it just in time, went to the basement and measured temps. On the HBA heatsink it showed close to 70 degrees Celsius, so the die temperature would be even higher. Not good. I've added some active cooling to the HBA and booted back up. This time Parity 1 and Disk 4 were disabled. I am now rebuilding, hoping that my cooling is now sufficient. I have two question, to better understand the situation: Is it normal, that two disks (1 parity and 1 data) are disabled, because in that specific moment, when the HBA crashes there's bound to always be one data drive with I/O and one parity or am I simply lucky that it is just two, but could easily be more? Theoretically, what would happen in the latter case? Assuming that only a few bytes might actually be wrong, would there be a way to restore the rest of the data or would I have to copy the data from each drives somewhere else to save it? Having been a raid user for decades, the whole Unraid concept is still new to me....
September 18, 20223 yr 17 hours ago, 0rca said: Is it normal, that two disks (1 parity and 1 data) are disabled, because in that specific moment, when the HBA crashes there's bound to always be one data drive with I/O and one parity or am I simply lucky that it is just two, but could easily be more? It can disable any the disks, parity or data, which one(s) it's luck of the draw, but it won't disabled more disks than there are parity drives.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.