[Resolved] 5 Errors After Every Parity Check

April 13, 20179 yr

Author

15 minutes ago, johnnie.black said:

That means it's pass the 4TB mark, only your paritys are larger, so they're the only ones still being checked.

slightly off-topic, but I suspect that if unRAID is writing data to the array, then only the drives that are changed need to be recalculated along with the old parity bit(s) and then re-written to parity as the new bit(s); so the rest of the array can remain spun-down. Is this correct?

Quote

April 13, 20179 yr

7 minutes ago, Joseph said:

I'm fairly certain if there is a problem reading a drive in the array (whether its a physical problem or controller problem), then unRAID will knock it out off the array and let you know it needs to be rebuilt.

Yes, but the issue is in how it's handled. When a read error occurs, the first thing unraid does is spin up all the drives, compute from parity what SHOULD be in that spot, and issue a write command to put the correct data back, a write failure at that point causes a red ball and the problem drive is no longer accessed. If everything is working properly, and the write succeeds, this has the beneficial end result of allowing a problematic logical sector to be rewritten seamlessly, and if the drive is working properly the smart subsystem will remap that physical sector if necessary.

In this case, who know what data will be returned when unraid spins up all the drives? Will it be correct? Will it be corrupt? It's a gamble.

Quote

April 13, 20179 yr

Community Expert

1 minute ago, Joseph said:

Is this correct?

Yes with normal writing mode, with turbo write enable all disks need to be spun up (except when there's more than a disk size and the user is writing to the end of a larger disk, in that case only disks of that size or larger need to be spun up)

Quote

April 13, 20179 yr

4 minutes ago, johnnie.black said:

except when there's more than a disk size and the user is writing to the end of a larger disk, in that case only disks of that size or larger need to be spun up

Is reconstruct write that intelligent? Or does it always keep everything spun up anyways?

Quote

April 13, 20179 yr

Community Expert

2 minutes ago, jonathanm said:

Is reconstruct write that intelligent? Or does it always keep everything spun up anyways?

No intelligence required , e.g., you have an array with 4 and 8tb disks, you're writing to the last half of one of the 8TB disks, all 4 TB disks will be idle, they will spun down when they reach the set idle time.

Quote

April 13, 20179 yr

8 minutes ago, johnnie.black said:

No intelligence required , e.g., you have an array with 4 and 8tb disks, you're writing to the last half of one of the 8TB disks, all 4 TB disks will be idle, they will spun down when they reach the set idle time.

Hmm. I guess I just assumed the table of contents was within the first few sectors, thus any writes would need all drives spun up anyway.

Quote

April 13, 20179 yr

Community Expert

4 minutes ago, jonathanm said:

Hmm. I guess I just assumed the table of contents was within the first few sectors, thus any writes would need all drives spun up anyway.

No they stay spun down, I've noticed this behavior on my mixed size arrays.

Quote

April 13, 20179 yr

Author

10 hours ago, johnnie.black said:

unRAID tunable settings also play a role.

ahhh.... I forgot about tunable! I ran it quite awhile back when I had the drive configuration all wrong (fast drives on slow controller and vice-versa.); that was around 6.0 or 6.1. So after I move everything off the Marvell controller and recheck parity, I'll run tunable again. Thanks for the tip.

Quote

April 13, 20179 yr

Community Expert

26 minutes ago, Joseph said:

ahhh.... I forgot about tunable! I ran it quite awhile back when I had the drive configuration all wrong (fast drives on slow controller and vice-versa.); that was around 6.0 or 6.1. So after I move everything off the Marvell controller and recheck parity, I'll run tunable again. Thanks for the tip.

The utility to find the best tunables is not working optimally for v6.2/6.3, but taking a look at your current settings they are no good:

md_num_stripes="6008"
md_sync_window="2704"
md_sync_thresh="192"

When using an LSI md_sync_thresh need to be closer to md_sync_window, change to these:

md_num_stripes="4096"
md_sync_window="2048"
md_sync_thresh="2000"

Edited April 13, 20179 yr by johnnie.black

Quote

April 13, 20179 yr

Author

1 minute ago, johnnie.black said:

When using an LSI md_sync_thresh need to be closer to md_sync_window, change to these:

md_num_stripes="4096"
md_sync_window="2048"
md_sync_thresh="2000"

Thank you so much...will do!

Quote

April 14, 20179 yr

21 hours ago, EdgarWallace said:

@Joseph, this was at the start of the parity check, the average was about 90MB/s, which is about the same speed I was getting with my AOC-SAS2LP-MV8.

@johnnie.black thanks a lot, see the outcome below....looks pretty good right?

Parity Check ended with 24 errors. I am going to run a Parity Check once again with "Write corrections to parity" option

SUCCESS

A second parity check after the xfs_repair /dev/md5 was showing 14 sync errors that were corrected. The final parity check reported 0 errors.

I rate my issues with the removal of the AOC-SAS2LP-MV8 and the insert of a DELL PERC H200 as fixed. Good to rely on my backup server (again).

Thanks to all who were helping, mainly @johnnie.black and @Fireball3

Quote

April 14, 20179 yr

So after my last parity check I got 4 errors. I got the Dell HV52W PERC H310 off eBay and so I was looking at flashing it, from what I am seeing on the links here I should be using the LSI SAS2008 Controllers(P10).zip for this card? Obviously I don't want to brick the card and so would really appreciate confirmation. I have made a bootable USB stick with Rufus and copied the files from the zip to the usb drive and also renamed the 1_sas2flash_x86_P10.exe to sas2flash_x86_P10.exe so that the bat file executes that flash. Is this all correct or am I missing anything?
Thanks for all the help

Quote

April 14, 20179 yr

See my sig, there is a toolset for the H310 with a more up-to-date IT firmware.

Quote

April 14, 20179 yr

41 minutes ago, Fireball3 said:

See my sig, there is a toolset for the H310 with a more up-to-date IT firmware.

So can I flash directly to the new firmware or do I have to flash to an interim firmware first? This card I believe should be running stock firmware.

Quote

April 14, 20179 yr

Author

1 hour ago, Fireball3 said:

See my sig, there is a toolset for the H310 with a more up-to-date IT firmware.

@Fireball, can you confirm if step 5.2 of P20 puts the interim firmware on the card, in this case P7, and then step 5.3 puts P20 on the card? Somehow I got all turned around and maybe the source of confusion on this thread.

Quote

April 14, 20179 yr

@Fireball, can you confirm if step 5.2 of P20 puts the interim firmware on the card, in this case P7, and then step 5.3 puts P20 on the card?

Yes, correct.
But the P7 step is mandatory on the way to P20!

Quote

April 16, 20179 yr

Community Expert

On 13/04/2017 at 6:29 PM, jonathanm said:

Hmm. I guess I just assumed the table of contents was within the first few sectors, thus any writes would need all drives spun up anyway.

This made sense and made me doubt my earlier post, I was pretty sure I've seen that behavior but just checked on one of my servers and you are right, after each file is written there's a small number of reads from each disk, to update the TOC, so no disks will spin down during writes with turbo write enable, my mistake

Quote

April 16, 20179 yr

Author

On 4/10/2017 at 1:36 PM, johnnie.black said:

Follow up, this is what I would recommend:

Parity - currently on onboard intel port1 (sata3) - move to free intel port 3 (sata2)

cache2 - currently on onboard intel port2 (sata3) - leave as is

cache - curtrently on maverll - move to intel port1 (sata3)

parity2 - currently on marvell - move to free intel port4 or free LSI port

The above steps are complete....the array is online, but the syslog shows about 1000 errors on ATA4 and btrfs errors. Any thoughts?

Parity check on hold until I can get this sorted. Thanks!!

Edited April 20, 20179 yr by Joseph

Quote

April 16, 20179 yr

Community Expert

Replace SATA cable on SSD serial ... 245F.

Quote

April 16, 20179 yr

Author

2 minutes ago, johnnie.black said:

Replace SATA cable on SSD serial ... 245F.

working on it.... you're gonna have to show me how you figured that out!

Quote

April 16, 20179 yr

Author

On 4/16/2017 at 5:31 PM, johnnie.black said:

Replace SATA cable on SSD serial ... 245F.

OK, so the cable has been replaced, however Cache & Cache 2 are online, but is no longer mountable.

Also, the error count is down to 34, doesn't appear to be ATA related, but do you think any of them need to be addressed?

FYI, I disabled the onboard Marvell controller in the bios.

Edited April 20, 20179 yr by Joseph
additional info

Quote

April 17, 20179 yr

So I have just finished the 3rd parity check after several tweaks and finally got zero errors. Is it normal that it should take multiple parity checks to finally get zero errors?

Quote

April 17, 20179 yr

Author

1 hour ago, crowdx42 said:

So I have just finished the 3rd parity check after several tweaks and finally got zero errors. Is it normal that it should take multiple parity checks to finally get zero errors?

My guess is there would be 2 parity checks. 1 to correct errors that were generated by the old card that's been replaced, and then the second one which should come up clean to 'prove' there are no more errors.

But then again, what do I know. I'm stuck with an unmountable cache and just recently learned about a table of contents on unraid....

hoping someone can provide answers for both crowdx42 and myself

Edited April 17, 20179 yr by Joseph

Quote

April 17, 20179 yr

Community Expert

8 hours ago, Joseph said:

Also, the error count is down to 34, doesn't appear to be ATA related, but do you think any of them need to be addressed?

ATA errors are gone, ACPI errors can usually be ignored, bios update may help some.

Problem is that the bad cable corrupted your cache pool, assuming pool was default raid1 try powering down, disconnect cable to SSD ...245F, power back back on and see if it mounts, if not use this FAQ to try and recover your cache data:

Quote

April 17, 20179 yr

11 hours ago, Joseph said:

My guess is there would be 2 parity checks. 1 to correct errors that were generated by the old card that's been replaced, and then the second one which should come up clean to 'prove' there are no more errors.

But then again, what do I know. I'm stuck with an unmountable cache and just recently learned about a table of contents on unraid....

hoping someone can provide answers for both crowdx42 and myself

This makes sense to me. So the change which seemed to make the difference for me was to turn off VT-D for virtual machine support. I still have the two Dell 310 cards which I have not installed yet and in that process I will be switching out the motherboard for a newer chipset with 3 PCIex16 slots instead of the current board which has 2, this allows me to add my 4 ports NIC.

Quote

[Resolved] 5 Errors After Every Parity Check

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)