Help appreciated: Parity error

nikiforos · January 28, 2023

Hello everyone,

thank you for your time, I hope you will be able to help me with my situation!

I have been running my Unraid server for roughly three years now without any issues. Until a couple of weeks ago...

So a couple of weeks ago I got a message, that one of my array disks has an error and cannot be read from. Since I had been contemplating expanding my storage anyway, I did not spend much time looking into the original error and just bought a bigger harddrive to replace the one with the error.

To to that, I

1) shut the server down (cleanly)

2) added the new drive to an empty slot in my case

3) pre-cleared the new disk (no issues/errors)

4) replaced the disks in the "Array Devices"

5) started the array and let the new disk rebuild

Everything seemed fine after that and I thought the problem was dealt with.

Sadly, after the next scheduled parity check, I got an error message that both of my parity drives have errors. Over 1000 each.

So I decided to rebuild the parity from the ground up. I'm hoping this wasn't a fatal mistake...

To rebuild the parity drives, I stopped the array and swapped the two parity drives with each other. After starting the array, Unraid started rebuilding the "new" parity drives. Btw, I also turned off Docker and the VM manager, as I thought it would be best to minimize data being written to the drives, while the parity is being rebuilt.

Once the parity was freshly rebuilt, I manually started a parity check, as I wanted to make sure that everything works fine. Which it did not! Again the parity drives reported over 1000 errors each.

I now got the option to start a "Read Check", which I did. It will take about 4 days though.

I attached a diagnostic.zip file, which I just now downloaded. I'm hoping someone will find useful information here. I certainly have no clue what to look for

Could you please help me with my next steps?

Should I run tests on the two parity drives, or should I wait for the "Read Check" to finish?

Did I mess up, or can the disks/data me salvaged?

Thank you very much for your support!!

Greetings from Vienna,

Nick

unraidserver-diagnostics-20230128-1727.zip

trurl · January 28, 2023

Which data disk did you replace? Disk12 has little if any data. Is that expected?

Both parity disks and both cache disks have disconnected. Looks like those are all on this controller:

02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
	Subsystem: ASRock Incorporation Device [1849:43c8]
	Kernel driver in use: ahci
	Kernel modules: ahci

Both parity disabled, and cache pool is unmountable, but maybe that will come back when the disks do.

You should always double check connections when inside the case.

Do any of your other disks show SMART warnings (thumbs down) on the Dashboard page?

No point in doing read check unless you just want to exercise and test those other disks.

Unrelated, but appdata has files on the array.

Shutdown, check connections, power and SATA, both ends, including splitters.

Reboot, start the array, and post new diagnostics.

nikiforos · January 28, 2023

Hello, thank you for your reply.

I replaced data disk 5 (Z2JMJMZT).

I will do as you suggested and report back. thank you!

trurl · January 28, 2023

13 minutes ago, trurl said:

Disk12 has little if any data. Is that expected?

13 minutes ago, trurl said:

Do any of your other disks show SMART warnings (thumbs down) on the Dashboard page?

nikiforos · January 28, 2023

- It is fine that Disk 12 has hardly any data. I had planned on using it only for a specific share

- All the disks have green thumbs up on the dashboard (see screenshot)

- I shut down the server, opened it, checked all the connections, moved the harddrives away from the motherboard SATA connection onto a PCi card (no room to move cache drives too), rebooted and started the array.

The cache drives have fixed themselves, the parity drives seem to still have the same issue.

I attached a new diagnostics file.

Thanks again!

unraidserver-diagnostics-20230128-1921.zip

trurl · January 28, 2023

11 minutes ago, nikiforos said:

parity drives seem to still have the same issue

No, they are not disconnected, just disabled, and will be until rebuilt.

nikiforos · January 28, 2023

So I should rebuild the parity again?

Do I have to use the same trick as before, by swapping the two drives and force a rebuild, or is there a more elegant solution?

nikiforos · January 28, 2023

Also, is my thinking correct, that I should keep Docker and the VMs inactive during the rebuild (to minimize writing to the disks), or is it fine to activate them?

trurl · January 28, 2023

55 minutes ago, nikiforos said:

use the same trick

What you did will work, but the standard way to rebuild a drive to itself, whether parity or data disk:

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

trurl · January 28, 2023

54 minutes ago, nikiforos said:

keep Docker and the VMs inactive during the rebuild (to minimize writing to the disks)

Read/writes of the array will slow rebuild, and rebuild will slow read/writes of the array.

nikiforos · January 28, 2023

Ok. Thank you!

I started to rebuild. I will update you in about a week when it is done!

Have a nice weekend.

trurl · January 28, 2023

11 minutes ago, nikiforos said:

about a week when it is done!

Typically 2-3 hours per TB of largest parity disk, so should only be 2 days unless you have port multipliers

nikiforos · January 28, 2023

8 minutes ago, trurl said:

Typically 2-3 hours per TB of largest parity disk, so should only be 2 days unless you have port multipliers

Hm... I don't know what "port multipliers" are, so I expect I don't have them.

The parity is being rebuilt at roughly 32MB/sec and is expected to last another 4 days and 20 hours.

I will then start a manual parity check, which also lasts 3-4 days.

trurl · January 28, 2023

1 minute ago, nikiforos said:

parity is being rebuilt at roughly 32MB/sec

That's about 1/4 the speed I get and I know many others get as much or better. Probably these controllers are to blame:

09:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
	Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
	Kernel driver in use: ahci
	Kernel modules: ahci
0b:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
	Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
	Kernel driver in use: ahci
	Kernel modules: ahci

trurl · January 28, 2023

Looks like both parity are on that first Marvell, so also possibly the reason those disks are getting dropped.

Help appreciated: Parity error

Recommended Posts

nikiforos

Link to comment

trurl

Link to comment

nikiforos

Link to comment

trurl

Link to comment

nikiforos

Link to comment

trurl

Link to comment

nikiforos

Link to comment

nikiforos

Link to comment

trurl

Link to comment

trurl

Link to comment

nikiforos

Link to comment

trurl

Link to comment

nikiforos

Link to comment

trurl

Link to comment

trurl

Link to comment

Join the conversation