6.8.3 Parity drives disabling


Recommended Posts

For the 3rd time in 2 weeks since updating 6.8.2 to .3, both of my Parity drives disable at the same time. Both 6TB, 1 WD and 1 HGST to prevent batch issues.

Is this a known issue and is there a fix?
I saved my diagnostics, but have rolled back to 6.8.2, to see if the update is the issue.
Kind of a pain that I spent 2 1/2 days rebuilding my parity drives, only to have them disable again after updating my dockers and plugins.

tower-diagnostics-20200516-1837.zip

Link to comment

Definitely not a know issue, problem with the Asmedia controller, it dropped both parity disks:
 

May 16 18:05:08 Tower kernel: ata7: hard resetting link
May 16 18:05:18 Tower kernel: ata8: softreset failed (1st FIS failed)
May 16 18:05:18 Tower kernel: ata8: hard resetting link
May 16 18:05:18 Tower kernel: ata7: softreset failed (1st FIS failed)
May 16 18:05:18 Tower kernel: ata7: hard resetting link
May 16 18:05:28 Tower kernel: ata8: softreset failed (1st FIS failed)
May 16 18:05:28 Tower kernel: ata8: hard resetting link
May 16 18:05:28 Tower kernel: ata7: softreset failed (1st FIS failed)
May 16 18:05:28 Tower kernel: ata7: hard resetting link
May 16 18:06:03 Tower kernel: ata8: softreset failed (1st FIS failed)
May 16 18:06:03 Tower kernel: ata8: limiting SATA link speed to 3.0 Gbps
May 16 18:06:03 Tower kernel: ata8: hard resetting link
May 16 18:06:03 Tower kernel: ata7: softreset failed (1st FIS failed)
May 16 18:06:03 Tower kernel: ata7: limiting SATA link speed to 3.0 Gbps
May 16 18:06:03 Tower kernel: ata7: hard resetting link
May 16 18:06:08 Tower kernel: ata8: softreset failed (1st FIS failed)
May 16 18:06:08 Tower kernel: ata8: reset failed, giving up
May 16 18:06:08 Tower kernel: ata8.00: disabled
...
May 16 18:06:08 Tower kernel: ata7: softreset failed (1st FIS failed)
May 16 18:06:08 Tower kernel: ata7: reset failed, giving up
May 16 18:06:08 Tower kernel: ata7.00: disabled
May 16 18:06:08 Tower kernel: ata7: EH complete

AFAIK there are no issues with Asmedia controllers and v6.8.3, if there were I would expect many users with problems since it's a very commonly used controller, might be a power issue, do both disks share a power splitter or similar? Or, since the controller is onboard and an older revision it might also be a specific problem with that board/revision.

  • Like 1
Link to comment

Thanks for looking at the log. I did, but could make sense of it.
Well, I've been running the same system for 4 years and the controller may be going away....but I kind of doubt it, since it goes back to operating fine after restoring drives and parity rebuild. The failures ONLY come after I updated to .3 and  I update a docker or plugin.
I rolled back to .2 and will monitor and report back if it stays working or not. 
Thanks again.
*edit*
Sorry, missed your question.
All drives are on a common backplane on a Silverstone CS-380. Only 2 power legs feeding 8 drives. If it were a power issue, I would lose, at least, 4 drives at once, I would imagine. Since they were only the parity drives and different MFGs that dropped, it seemed to likely be software related.
I was one of those bitten by the database corruption bug and stayed at 6.6.7 for a long time. When I did upgrade to 6.8.2, it was smooth sailing, no issues. This happen after my first docker update after upping to .3. It could be coincidental, but I have my doubts and I guess running it back on .2 for a while will prove it, one way or another.

Edited by dojesus
Link to comment
4 hours ago, dojesus said:

Only 2 power legs feeding 8 drives. If it were a power issue, I would lose, at least, 4 drives at once, I would imagine.

Not necessarily. The drives need a surge of power to spin up, if power is marginal, it can effect the slowest to initialize or most power hungry drive. If the drive doesn't recover in time to accept a write without error, it will get disabled.

Link to comment

I'm guessing the parity drives rarely spin down, is that incorrect? Why both parity drives on the same backplane as all other drives, that are the only drives disabling? Are they doing something profoundly different that would cause both to stop, or the controller to fail, after a docker update?

 

Link to comment

Just a follow up...It's been a week since I've rolled back to 6.8.2 and have done numerous updates and the parity drives aren't dropping out anymore.
There is clearly something wrong in the 6.8.3 update that is negatively affecting the Asmedia sata controller....at least my revision of it.
I'm hoping a dev sees this, as it may be a minor problem now, but it could snowball into something worse down the road on later updates.

Edited by dojesus
Link to comment

Well, I've been running the same gear since Jan. 2016. The only issues I've had were the database corruption one that kept me on 6.6.7 for quite some time and this one from 6.8.3. Can you think of a non-software related thing that would make both drives drop out, when they work fine on 6.8.2?
I'm open for testing to help prevent future problems.

Link to comment
On 5/17/2020 at 6:01 PM, dojesus said:

I'm guessing the parity drives rarely spin down, is that incorrect?

Since nobody directly answered this. Parity drives should spin down unless something is writing to the array. So it depends on how frequently your array is written. I can certainly imagine use cases where parity seldom spins up.

  • Like 1
Link to comment

Makes sense. Thanks for the response, trurl.

Does 6.8.3 have any type of "sleep" commands that would shut down the SATA controller, that would fail to "wake up" on a write?
I'm trying to wrap my head around how docker or plugin upgrades would trigger this weird disabling.

 

Link to comment

Except it's not happening in 6.8.2....which is why I'm having a hard time wrapping my head around what the problem is.

I see that 6.9 is the same code as 6.8.3 with an upgraded kernel. I expect this problem to return when I try RC1, as it's unlikely the new kernel will "fix" the issue.

 

Link to comment
  • 2 weeks later...

Final update on this issue. Clearly the problem was/is hardware related, as the parity drives are disabling again in 6.8.2. Not sure why they work so long before the issue returned, but it's NOT the software.
Ordered a new SAS controller, to replace the failing onboard controller, which will tide me over til my AMD upgrade later this year.
Thank you everyone for your help and advise.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.