Parity - 5 errors again?

opentoe · July 1, 2015

What are the odds on doing a parity back to back and seeing 5 errors again?

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069768

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069776

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069784

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069792

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069800

I did a parity a couple days ago after the build and had 5 errors also, but killed my syslog on a reboot. I saved it this time. I guess I'll run some tests on each drive, but the last time I checked they were all good.

EDIT: Finally got a parity check done. The same 5 errors again. Is this a bug I should report? I've individually checked every drive, ran long smart tests and all are good. No pending sectors, no raw errors.

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069768

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069776

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069784

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069792

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069800

garycase · July 1, 2015

I'd definitely run another check and see if you get the same 5 errors [i presume you ran a correcting check, so they should have been fixed -- otherwise of course you'll get the same errors ]

Someone had this same issue a couple months ago -- was that you?

opentoe · July 1, 2015

I'd definitely run another check and see if you get the same 5 errors [i presume you ran a correcting check, so they should have been fixed -- otherwise of course you'll get the same errors ]

Someone had this same issue a couple months ago -- was that you?

No, this was not me. I'll kick off another parity before I hit the sack tonight. My parity takes 15 hours. It is a 4TB drive and the average speed is 67MB/sec. Wonder why so slow. I get really good read speeds from the array.

EDIT: Can you let me know your md_write_limit is in your disk.cfg file? This may be why such slow parity checks.

garycase · July 1, 2015

My parameters are set at 3584, 1536, and 1536.

opentoe · July 1, 2015

My parameters are set at 3584, 1536, and 1536.

Ahh, you must have edited yours. Ok, I'll re-run a parity and see if the same errors come up.

JonathanM · July 1, 2015

Can you let me know your md_write_limit is in your disk.cfg file? This may be why such slow parity checks.

Stock, AFAIK, is this.

md_num_stripes="1280"

md_write_limit="768"

md_sync_window="384"

Wally · July 1, 2015

Opentoe,

I had the same exact problem at the same exact sectors as mentioned in the other thread here: http://lime-technology.com/forum/index.php?topic=38359.0. The problem is caused by the flakey Marvell controllers or Linux their drivers which causes problems mainly when using VT-D but also when not in my case. Try the latest unRAID 6.0.1 which seems to have fixed the problem in my system. My 5 errors were consistent and changed location when I upgraded to a larger parity drive.

Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069768

Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069776

Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069784

Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069792

Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069800

Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606472

Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606480

Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606488

Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606496

Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606504

Wally

Wally · July 1, 2015

Here's the thread that explains the problem: http://lime-technology.com/forum/index.php?topic=40683.0. If you google "VT-d Marvell" there's a lot of talk about it and the patches required. I believe even with VT-d disabled or not even available, these Marvell controllers can cause problems with certain system as seen here.

opentoe · July 1, 2015

I checked my syslog and don't have any errors about a Marvell controller. I had to stop my parity check since it wasn't even half over at 15 hours. So something is wrong somewhere. I'm doing a smart test on each of my drives, will reboot and try again. Hard to tell when there's nothing in the logs.

opentoe · July 1, 2015

Here is a paste of my syslog. I don't see and particular errors that stand out. Maybe someone can review it, who knows what to look for. I want to get everything right before I try another parity check. Since it takes a very long time. I started one a little while ago and was only getting 50MB/sec, but my array read speeds are really good. 113MB/sec. I must be missing something. I don't remember a parity ever taking so long. It is a 4TB drive.

http://pastebin.com/9ykP6H57

dgaschk · July 2, 2015

Test in Safe-Mode. If problem exists in SAFE mode then post diagnostics.

Wally · July 2, 2015

I never had any errors in my logs until I replaced my CPU with one that supported VT-d and then the DMA errors mentioned in the other posts showed up. I believe the Marvell controller still had problems as once I removed it the 5 parity check errors disappeared. Now with unRAID version 6.0.1 the problem seems patched as even with the Marvell controller installed, the parity errors are gone.

JustinChase · July 2, 2015

What are the odds on doing a parity back to back and seeing 5 errors again?

Better than you'd expect.

opentoe · July 2, 2015

What are the odds on doing a parity back to back and seeing 5 errors again?

Better than you'd expect.

Haven't read your thread yet, but I'm seeing the same exact issue you are. Here is a repost from the original above.

What are the odds on doing a parity back to back and seeing 5 errors again?

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069768

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069776

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069784

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069792

Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069800

I did a parity a couple days ago after the build and had 5 errors also, but killed my syslog on a reboot. I saved it this time. I guess I'll run some tests on each drive, but the last time I checked they were all good.

EDIT: Finally got a parity check done. The same 5 errors again. Is this a bug I should report? I've individually checked every drive, ran long smart tests and all are good. No pending sectors, no raw errors. I have it set to correct errors.

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069768

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069776

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069784

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069792

Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069800

I'll check our your thread.

opentoe · July 3, 2015

Well, I did run a short smart test on all my drives. All came back good.

No re-allocated sectors. No raw read errors. No CRC errors. All good.

I'm %100 sure the hardware is good. This is a system that has been running for little over a year on a Windows machine with no problems. All I did was move the mainboard to a different case. Never touched the memory, but pressed down on them just to make sure they were all in nice and tight. Never touched the processor. Kept the CPU fan on it. I'm pretty sure it is not the hardware.

In case unRaid support needs any hardware specs, here they are. I'm willing to try whatever. I've of course also checked all connections, cables, used the vaccumm and cleaned everything up. Made sure everything is in order. I have two brand new supermicro add-on cards - AOC-SAS2LP-MV8. Checked out their bios, both using same version, detecting drives fine. I like the fact you can view both cards internal menu on the same screen. Don't have to hit CTRL-M to access each one. Like the old school SCSI card days.

unRAID system: unRAID server Pro, version 6.0.1

Model: Custom

Motherboard: ASUSTeK COMPUTER INC. - SABERTOOTH X79

Processor: Intel® Core™ i7-3930K CPU @ 3.20GHz

HVM: Enabled

IOMMU: Enabled

Cache: L1-Cache = 32 kB (max. capacity 32 kB)

L2-Cache = 256 kB (max. capacity 256 kB)

L3-Cache = 12288 kB (max. capacity 12288 kB)

Memory: 32768 MB (max. installable capacity 96 GB)

ChannelA = 4096 MB, 1600 MHz

ChannelB = 4096 MB, 1600 MHz

ChannelC = 4096 MB, 1600 MHz

ChannelD = 4096 MB, 1600 MHz

Network: eth0: 1000Mb/s - Full Duplex

Kernel: Linux 4.0.4-unRAID x86_64

OpenSSL: 1.0.1o

Uptime: 0 days, 17 hours, 49 minutes, 18 seconds

dgaschk · July 3, 2015

http://lime-technology.com/forum/index.php?topic=21052.0

http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors

opentoe · July 3, 2015

http://lime-technology.com/forum/index.php?topic=21052.0

http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors

One can really start to go nuts trying to fix parity errors when there could be a possible bug in the software. I had the same problem as user JustinChase. Soon as I changed my parity spin down time to "never" my parity check ran fine with no errors. This would probably have to be solved by unRaid.

garycase · July 3, 2015

That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem.

My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count.

According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue.

My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue.

Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.].

opentoe · July 3, 2015

That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem.

My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count.

According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue.

My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue.

Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.].

We would definitely have to. The diagnostics option is a good place to start.

JustinChase · July 3, 2015

That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem.

My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count.

According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue.

My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue.

Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.].

We would definitely have to. The diagnostics option is a good place to start.

very strange. i assume that, like me, when switching back the parity spin down setting to use default, then running a new parity check, it will work now. i still don't know if it will survive a reboot though.

I'm going to post my diagnostics log in my other thread, since I know jonp is watching that one. perhaps your log will help figure out what we might have in common.

regardless, I'm happy to finally have this working again.

Parity - 5 errors again?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived