Disk 12 failed


Recommended Posts

Run a SMART report on the disk if you can.

http://SrvrName-IP/Main/Device?name=disk? inserting of course your server & disk #

http://136.294.75.128/Main/Device?name=disk##

It'll tell you what's wrong .

If you suspect otherwise replace both data/power cables on it and test again.

Do you have Backups or a Spare disk?

I personally check my array everyday for errors/issues. I know it's not very cool - but it keeps to above the curve, right?

Edited by KC
Link to comment

You do the SMRT on all drives?

If not, start now.

If they all pass SMRT.

Do a parity check w/ no correction.

If they all pass? Sometimes shit happens. To be Sammy Safety recheck all power/SAS-SATA connections and repeat.

Otherwise? Drives can and will correct themselves if they find a few bits off. However not reporting via SMRT is kind of rare.

See this older topic SMRT and bit correction are still the same on physical drives. https://forums.unraid.net/topic/27913-parity-disk-red-balled/?tab=comments#comment-260742

***Edit
Long smart test - to be sure.

Edited by KC
Link to comment

Let me attach a new diagnostic right after running the extended SMART test. I only ran it on disk 12, but will try the other disks later.

 

I keep on having this issue with Unraid. I had changed cables several times, so that's not the issue. I figured out though that the real issue is over-heating of my controller card. Once I kept the slot next to the controller card empty, the issue was largely resolved as the card wasn't exposed to the heat of the GPU. The issue still seems to happen though when building/rebuilding parity (I assume that parity building puts a lot of load/heat on the controller card). The issue only happens with drives connected to the controller card.

 

It seemed largely under control for close to a year now, so I am wondering whether the issue this time is indeed a broken disk. If no signs in extended SMART, I assume that my old heat issue is back? Maybe I need to buy another fan? Any other idea welcome!

Link to comment
30 minutes ago, steve1977 said:

Would this reduce heat?

No.

 

31 minutes ago, steve1977 said:

Or any other difference this would bring?

It would use the HBA driver instead of the RAID driver, if nothing else it's the most tested and used, also possibly a little better performance.

Link to comment

Did you see it in the log that it is in raid mode? I can switch it out and try the old card, which may be in IT mode. I need to somehow check though whether it is and I remember doing this with the dos bootdisk was a huge pain. But maybe I am lucky and the old card is already in IT mode.

Link to comment
39 minutes ago, steve1977 said:

Did you see it in the log that it is in raid mode?

Yes, and it's using the megaraid driver, which is unusual for an LSI 2008 based controller, they usually still use HBA driver even in raid mode, megaraid divers is usually for the most advanced raid controllers, with BBU, etc.

Link to comment

Do you know if your power supply is properly sized and healthy?  I've had problems in the past that would only appear during a parity check (all drives spinning) because the power supply was of the proper capacity but had become weak over time and didn't have any noticeable issues until it had a higher load on it.

Link to comment

My server has a long history and I had the issue of ejecting disks. I have changed so many components in an attempt to make it work. Changed the mobo, the PSU twice (!), and also the controller twice (!). Plus several times the cabling. All in the attempt to stop Unraid from ejecting one disk every couple of weeks. The symptoms have always been the same. So, my 1,000W PSU should not be the issue. I had thought about buying a UPS to secure "cleaner" electricity supply, but haven't done so. But I doubt that's the unlock.

 

Maybe 1-2 years ago, I had finally found what I thought was the answers of many years of issues. Overheating of the controller seemed to have been the issue. Someone in the forum had suggested this. I planned to buy a slot fan, but as a first step moved the controller card in a different slot. (one slot away from the GPU with an empty slot in between). I also thought that the issue may have been the slot, but it yet again started "ejecting the disk" when I added a second GPU to the now empty slot. So, it only runs stable when the slot next to the controller is empty.

 

I don't know why the card is set to megaraid, but it may well be that someone had suggested a few years ago. I bought 3 or even 4 M1015 cards as I thought that maybe a faulty or counterfeit controller card may be the issue. I also thought it may be a FW issue and I struggled to flash it in the past myself, so I bought an alrady flashed card. That's the current one with megaraid. I think I still have 1-2 others somewhere at home, so I can try another card and see whether it is in IT mode and may be more stable. Having said this, this card / setup has now finally been stable for more than a year, which is the best I ever had with a long history and many many years having Unraid issues.

 

I used also for some while to use Unraid without the parity. While this defeats the purpose, this has always been rock solid. My controller overheating issue never impacted the actual disks or usage, but it only impacts the parity in the way that it ejects the disk (when I say "eject", I should probably say disable; i.e., the red cross).

 

The parity rebuild now seems to be running at 40%, so maybe it goes through. Let me wait whether this indeed happens.

 

I am thinking to try three more things:

 

1) Switch to another card, which may be flashed to IT mode

 

2) Add two controller cards and add 4 disks each (instead of 8 to one). This should reduce the load on each card?

 

3) Buy a fan. Ideally I can find something that can be positioned in a way that I can even use the now empty slot for a second GPU (although my GPU passthrough anyways has issues when adding a second GPU (even when not passing it through), which doesn't make a lot of sense, but is a replicable issue).

 

Thanks again for the guidance on this board. As you can tell from this, I really must love Unraid that I still keep trying to use although I have been struggling with it so much. Let alone the cost of new mobo/CPU and the three GPUs. I hope it will one way be 100% stable and I believe the controller card (heat!) may be the issue preventing it from it today.

Link to comment
5 minutes ago, KC said:

Is the behavior the same if all plugins are removed? Just going stock for a day or week?

Looks like you've got a lot of plugins installed.

Highly doubt that this is the issue. I used to run Unraid without plugins (or maybe only with the UD plugin, which I need). I doubt the plugins have anything to do with my issues. The dockers maybe as they cause read/write activities on the array, which may put even more load on the controller during rebuild.

Link to comment
7 minutes ago, WizADSL said:

Have you considered upgrading you MB BIOS?  The diagnostics you posted show you are running version 0402 dated 06/13/17 and version 1902 dated 07/19/2019 is available.

I've been thinking to do so, but it is a bit painful as I run Unraid headless (and don't have a screen). A pity that bios upgrades cannot be done headless. I'd need to move it another room where I can then access it to a television. Have been thinking to do that to then also play around with some over-clocking of CPU speed. Have shied away from doing so as I am not sure whether my CPU fan is large enough for overclocking and also my CPU does not seem to benefit from it that much anyways.

 

Do you think that the bios upgrade could bring a change to my controller issues? Why? If yes, I could make the effort and give it a try (maybe together with an attempt to OC).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.