Possible Dead Parity Drive, 2nd Opinions Please.


Go to solution Solved by trurl,

Recommended Posts

Hi all,

 

This morning one of my 2 parity drives was disabled after several errors in the syslog as you can see below:


image.png.30ebf8359fc0e3b9d2e0b993f5870fe3.png

 

It appears this happened shortly after the mover was triggered:


image.thumb.png.eb91012046c347d49b4e1b50e7f41b7f.png

 

Ultimately after warning for a bit it goes into full on write errors (1024 or so total before it disables the disk):


image.thumb.png.ddb282bc150aaaf6c9708222ebe38ebe.png

 

Syslog is also being spammed with this, even after the drive was disabled:


image.png.2d02de10dd558a103813a94e541dd3dd.png

 

Server has been running for over 2 months since my upgrade to 6.11.0 without any issues bar the occasional plugin becoming depreciated and needing replacing.

 

The only hint I had of something going bad was lastnight at approx 22:30 - 00:00 when I was having issues playing LoL/Watching Twitch and doing some torrenting. This is possibly relevant as I'm running pfSense inside a VM as my primary router, which is running straight off my cache (2x NVME).

 

I've essentially concluded that the parity drive has gone bad and some of these symptoms are just a result/knock on effect of that...however the networking blip mentioned above (which again all runs off the cache, which is ONLY on my 2x 1TB NVMEs, and should have no presence on the array) has me slightly concerned I could be facing a more widespread issue.

That being said neither cache device has any SMART errors or any other indicators of failure I can find (same goes for my other disks bar that disabled parity), so possibly I'm worrying over nothing.

 

In any case before I go ordering overpriced replacement parity drives I figured getting some valued community opinions couldn't hurt :) 

Diagnostics attached.

ibstorage-diagnostics-20221206-1256.zip

Edited by DaveDoesStuff
Formatting
Link to comment
On 12/6/2022 at 1:37 PM, trurl said:

parity2 not connected. Check connections, SATA and power, both ends, including splitters. Then post new diagnostics

 

Appreciate the help! :)

 

Just done this. Clean shutdown, power off.

 

Everything is where it should be, case hasn't been opened since February and is in a heavy startech rack under our stairs. Literally no possible way anything can be disturbed. I unplugged everything connected to the problem drive and re-seated it all. No physical signs of issues with the drive, all ports look clean and straight/no damage. Also no signs of surge damage...etc...

 

Array started up no problem, but the drive is still "disabled".

 

Diags attached.

 

EDIT:

Noticed the disk reporting temperature so figured it was at last connected. Took the array offline and was able to run a short smart test and it passed with no errors.

 

EDIT 2: 

I shutdown and rebooted again as the GUI became unresponsive.

 

After bolting back up I followed the wiki instructions on re-adding a disabled disk and was able to start a parity sync with no issues.

 

However now one of my two intel nic interfaces can no longer be found by my pfSense VM. I've been trying to troubleshoot and noticed that the link lights for one of the ports (it's a dual nic card) is no longer lighting.

 

I was able to resolve this by pausing the rebuild and doing a full power cycle again. Then when I started the array and resumes the sync, the link lights were working again and the VM recognised the second nic.

 

Very strange behaviour though...

 

Parity sync is in progress, ETA 8 hours. Will only run pfSense until then which should be safe for the sync since it only runs off the nvme cache pool.

FINAL UPDATE:
The sync completed, no issues since...I'm not super comfortable not knowing the cause of the original issue but will still mark this as resolved.

 

ibstorage-diagnostics-20221206-1759.zip

Edited by DaveDoesStuff
Link to comment
  • DaveDoesStuff changed the title to Possible Dead Parity Drive, 2nd Opinions Please. (SOLVED)
  • 1 month later...

@trurl This has happened again unfortunately, diagnostics attached.

 

No warning of SMART failures or any other issues...then suddenly bam, disk is disabled. Unfortunately I don't have a spare 4tb and a new one is over €150 with shipping to Ireland at the moment. Not ideal :(

 

Not in a position to restart and rebuild at the moment to see if it resolves.

 

EDIT: Forgot to mention that since the resolution I've had 2 full system lockups/crashes. Was unable to login to web/gui/cmd as all were completely dead. I've since started logging to a syslog location. Haven't had that happen again since tho so still no RCA.

 

ibstorage-diagnostics-20230115-1727.zip

Edited by DaveDoesStuff
Link to comment

Parity2 has disconnected. Check connections, SATA and power, both ends, including splitters, to see if you can get Parity2 connected again. Can you see it in BIOS? 

On 12/6/2022 at 1:03 PM, DaveDoesStuff said:

no possible way anything can be disturbed

 

Try another SATA cable. Try to avoid more than 4 disks on a power supply cable. If you must use splitters don't split SATA power connections, split Molex, crimped Molex not molded. Don't bundle data cables. Make sure all cables have plenty of slack so plug can sit squarely on the connector with no tension.

 

After you get parity2 connected again post new diagnostics.

 

 

Link to comment
On 1/15/2023 at 6:57 PM, trurl said:

Parity2 has disconnected. Check connections, SATA and power, both ends, including splitters, to see if you can get Parity2 connected again. Can you see it in BIOS? 

 

Try another SATA cable. Try to avoid more than 4 disks on a power supply cable. If you must use splitters don't split SATA power connections, split Molex, crimped Molex not molded. Don't bundle data cables. Make sure all cables have plenty of slack so plug can sit squarely on the connector with no tension.

 

After you get parity2 connected again post new diagnostics.

 

 

 

After a power cycle without touching the hardware the drive was detected by the BIOS. But I went ahead and replaced the SATA cable anyway. Did the usual unassign drive, start in maintenance, array off, reassign and start rebuild. Which is currently still underway.

 

All SATA cables are "loose" in that they have plenty of slack either end and are not bound to each other. Same for SATA power.

 

Fingers crossed the cable replacement does it.

 

Diagnostics attached.

ibstorage-diagnostics-20230119-1916.zip

Link to comment
  • 3 weeks later...
  • DaveDoesStuff changed the title to Possible Dead Parity Drive, 2nd Opinions Please.

New diags attached.

 

The parity2 drive has been moved to a SATA power cable on which it is the only drive, no splitters, straight to the PSU.

 

Rebuild in progress.

If it goes again I'll try getting a replacement disk, if that also fails then I'm considering getting an LSI SAS 9207-8i and new motherboard (to allow me to use it)...I think at that point all other options would be exhausted and it would have to be a bum onboard SATA.

ibstorage-diagnostics-20230207-2308.zip

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.