Jump to content

Help appreciated: Parity error


Go to solution Solved by trurl,

Recommended Posts

Hello everyone,

 

thank you for your time, I hope you will be able to help me with my situation!

 

I have been running my Unraid server for roughly three years now without any issues. Until a couple of weeks ago...

 

So a couple of weeks ago I got a message, that one of my array disks has an error and cannot be read from. Since I had been contemplating expanding my storage anyway, I did not spend much time looking into the original error and just bought a bigger harddrive to replace the one with the error.

To to that, I

1) shut the server down (cleanly)

2) added the new drive to an empty slot in my case

3) pre-cleared the new disk (no issues/errors)

4) replaced the disks in the "Array Devices"

5) started the array and let the new disk rebuild

 

Everything seemed fine after that and I thought the problem was dealt with.

Sadly, after the next scheduled parity check, I got an error message that both of my parity drives have errors. Over 1000 each.

 

So I decided to rebuild the parity from the ground up. I'm hoping this wasn't a fatal mistake...

To rebuild the parity drives, I stopped the array and swapped the two parity drives with each other. After starting the array, Unraid started rebuilding the "new" parity drives. Btw, I also turned off Docker and the VM manager, as I thought it would be best to minimize data being written to the drives, while the parity is being rebuilt.

 

Once the parity was freshly rebuilt, I manually started a parity check, as I wanted to make sure that everything works fine. Which it did not! Again the parity drives reported over 1000 errors each.

I now got the option to start a "Read Check", which I did. It will take about 4 days though.

 

I attached a diagnostic.zip file, which I just now downloaded. I'm hoping someone will find useful information here. I certainly have no clue what to look for :/

 

Could you please help me with my next steps?

Should I run tests on the two parity drives, or should I wait for the "Read Check" to finish?

Did I mess up, or can the disks/data me salvaged?

 

Thank you very much for your support!!

Greetings from Vienna,

Nick

unraidserver-diagnostics-20230128-1727.zip

Link to comment
  • Solution

Which data disk did you replace? Disk12 has little if any data. Is that expected?

 

Both parity disks and both cache disks have disconnected. Looks like those are all on this controller:

02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
	Subsystem: ASRock Incorporation Device [1849:43c8]
	Kernel driver in use: ahci
	Kernel modules: ahci

 

Both parity disabled, and cache pool is unmountable, but maybe that will come back when the disks do.

 

You should always double check connections when inside the case.

 

Do any of your other disks show SMART warnings  (thumbs down) on the Dashboard page?

 

No point in doing read check unless you just want to exercise and test those other disks.

 

Unrelated, but appdata has files on the array.

 

Shutdown, check connections, power and SATA, both ends, including splitters.

 

Reboot, start the array, and post new diagnostics.

Link to comment

- It is fine that Disk 12 has hardly any data. I had planned on using it only for a specific share

 

- All the disks have green thumbs up on the dashboard (see screenshot)

 

- I shut down the server, opened it, checked all the connections, moved the harddrives away from the motherboard SATA connection onto a PCi card (no room to move cache drives too), rebooted and started the array.

The cache drives have fixed themselves, the parity drives seem to still have the same issue.

I attached a new diagnostics file.

 

Thanks again!

2023-01-28_19h20_58.png

unraidserver-diagnostics-20230128-1921.zip

Link to comment
8 minutes ago, trurl said:

Typically 2-3 hours per TB of largest parity disk, so should only be 2 days unless you have port multipliers

Hm... I don't know what "port multipliers" are, so I expect I don't have them.

The parity is being rebuilt at roughly 32MB/sec and is expected to last another 4 days and 20 hours.

I will then start a manual parity check, which also lasts 3-4 days.

2023-01-28_21h49_24.png

Link to comment
1 minute ago, nikiforos said:

parity is being rebuilt at roughly 32MB/sec

That's about 1/4 the speed I get and I know many others get as much or better. Probably these controllers are to blame:

 

09:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
	Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
	Kernel driver in use: ahci
	Kernel modules: ahci
0b:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
	Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
	Kernel driver in use: ahci
	Kernel modules: ahci

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...