July 1, 201511 yr What are the odds on doing a parity back to back and seeing 5 errors again? Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069768 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069776 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069784 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069792 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069800 I did a parity a couple days ago after the build and had 5 errors also, but killed my syslog on a reboot. I saved it this time. I guess I'll run some tests on each drive, but the last time I checked they were all good. EDIT: Finally got a parity check done. The same 5 errors again. Is this a bug I should report? I've individually checked every drive, ran long smart tests and all are good. No pending sectors, no raw errors. Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069768 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069776 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069784 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069792 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069800
July 1, 201511 yr I'd definitely run another check and see if you get the same 5 errors [i presume you ran a correcting check, so they should have been fixed -- otherwise of course you'll get the same errors ] Someone had this same issue a couple months ago -- was that you?
July 1, 201511 yr Author I'd definitely run another check and see if you get the same 5 errors [i presume you ran a correcting check, so they should have been fixed -- otherwise of course you'll get the same errors ] Someone had this same issue a couple months ago -- was that you? No, this was not me. I'll kick off another parity before I hit the sack tonight. My parity takes 15 hours. It is a 4TB drive and the average speed is 67MB/sec. Wonder why so slow. I get really good read speeds from the array. EDIT: Can you let me know your md_write_limit is in your disk.cfg file? This may be why such slow parity checks.
July 1, 201511 yr Author My parameters are set at 3584, 1536, and 1536. Ahh, you must have edited yours. Ok, I'll re-run a parity and see if the same errors come up.
July 1, 201511 yr Can you let me know your md_write_limit is in your disk.cfg file? This may be why such slow parity checks. Stock, AFAIK, is this. md_num_stripes="1280" md_write_limit="768" md_sync_window="384"
July 1, 201511 yr Opentoe, I had the same exact problem at the same exact sectors as mentioned in the other thread here: http://lime-technology.com/forum/index.php?topic=38359.0. The problem is caused by the flakey Marvell controllers or Linux their drivers which causes problems mainly when using VT-D but also when not in my case. Try the latest unRAID 6.0.1 which seems to have fixed the problem in my system. My 5 errors were consistent and changed location when I upgraded to a larger parity drive. Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069768 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069776 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069784 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069792 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069800 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606472 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606480 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606488 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606496 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606504 Wally
July 1, 201511 yr Here's the thread that explains the problem: http://lime-technology.com/forum/index.php?topic=40683.0. If you google "VT-d Marvell" there's a lot of talk about it and the patches required. I believe even with VT-d disabled or not even available, these Marvell controllers can cause problems with certain system as seen here.
July 1, 201511 yr Author I checked my syslog and don't have any errors about a Marvell controller. I had to stop my parity check since it wasn't even half over at 15 hours. So something is wrong somewhere. I'm doing a smart test on each of my drives, will reboot and try again. Hard to tell when there's nothing in the logs.
July 1, 201511 yr Author Here is a paste of my syslog. I don't see and particular errors that stand out. Maybe someone can review it, who knows what to look for. I want to get everything right before I try another parity check. Since it takes a very long time. I started one a little while ago and was only getting 50MB/sec, but my array read speeds are really good. 113MB/sec. I must be missing something. I don't remember a parity ever taking so long. It is a 4TB drive. http://pastebin.com/9ykP6H57
July 2, 201511 yr I never had any errors in my logs until I replaced my CPU with one that supported VT-d and then the DMA errors mentioned in the other posts showed up. I believe the Marvell controller still had problems as once I removed it the 5 parity check errors disappeared. Now with unRAID version 6.0.1 the problem seems patched as even with the Marvell controller installed, the parity errors are gone.
July 2, 201511 yr What are the odds on doing a parity back to back and seeing 5 errors again? Better than you'd expect.
July 2, 201511 yr Author What are the odds on doing a parity back to back and seeing 5 errors again? Better than you'd expect. Haven't read your thread yet, but I'm seeing the same exact issue you are. Here is a repost from the original above. What are the odds on doing a parity back to back and seeing 5 errors again? Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069768 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069776 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069784 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069792 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069800 I did a parity a couple days ago after the build and had 5 errors also, but killed my syslog on a reboot. I saved it this time. I guess I'll run some tests on each drive, but the last time I checked they were all good. EDIT: Finally got a parity check done. The same 5 errors again. Is this a bug I should report? I've individually checked every drive, ran long smart tests and all are good. No pending sectors, no raw errors. I have it set to correct errors. Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069768 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069776 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069784 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069792 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069800 I'll check our your thread.
July 3, 201511 yr Author Well, I did run a short smart test on all my drives. All came back good. No re-allocated sectors. No raw read errors. No CRC errors. All good. I'm %100 sure the hardware is good. This is a system that has been running for little over a year on a Windows machine with no problems. All I did was move the mainboard to a different case. Never touched the memory, but pressed down on them just to make sure they were all in nice and tight. Never touched the processor. Kept the CPU fan on it. I'm pretty sure it is not the hardware. In case unRaid support needs any hardware specs, here they are. I'm willing to try whatever. I've of course also checked all connections, cables, used the vaccumm and cleaned everything up. Made sure everything is in order. I have two brand new supermicro add-on cards - AOC-SAS2LP-MV8. Checked out their bios, both using same version, detecting drives fine. I like the fact you can view both cards internal menu on the same screen. Don't have to hit CTRL-M to access each one. Like the old school SCSI card days. unRAID system: unRAID server Pro, version 6.0.1 Model: Custom Motherboard: ASUSTeK COMPUTER INC. - SABERTOOTH X79 Processor: Intel® Core™ i7-3930K CPU @ 3.20GHz HVM: Enabled IOMMU: Enabled Cache: L1-Cache = 32 kB (max. capacity 32 kB) L2-Cache = 256 kB (max. capacity 256 kB) L3-Cache = 12288 kB (max. capacity 12288 kB) Memory: 32768 MB (max. installable capacity 96 GB) ChannelA = 4096 MB, 1600 MHz ChannelA = 4096 MB, 1600 MHz ChannelB = 4096 MB, 1600 MHz ChannelB = 4096 MB, 1600 MHz ChannelC = 4096 MB, 1600 MHz ChannelC = 4096 MB, 1600 MHz ChannelD = 4096 MB, 1600 MHz ChannelD = 4096 MB, 1600 MHz Network: eth0: 1000Mb/s - Full Duplex Kernel: Linux 4.0.4-unRAID x86_64 OpenSSL: 1.0.1o Uptime: 0 days, 17 hours, 49 minutes, 18 seconds
July 3, 201511 yr http://lime-technology.com/forum/index.php?topic=21052.0 http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors
July 3, 201511 yr Author http://lime-technology.com/forum/index.php?topic=21052.0 http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors One can really start to go nuts trying to fix parity errors when there could be a possible bug in the software. I had the same problem as user JustinChase. Soon as I changed my parity spin down time to "never" my parity check ran fine with no errors. This would probably have to be solved by unRaid.
July 3, 201511 yr That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem. My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count. According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue. My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue. Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.].
July 3, 201511 yr Author That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem. My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count. According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue. My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue. Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.]. We would definitely have to. The diagnostics option is a good place to start.
July 3, 201511 yr That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem. My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count. According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue. My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue. Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.]. We would definitely have to. The diagnostics option is a good place to start. very strange. i assume that, like me, when switching back the parity spin down setting to use default, then running a new parity check, it will work now. i still don't know if it will survive a reboot though. I'm going to post my diagnostics log in my other thread, since I know jonp is watching that one. perhaps your log will help figure out what we might have in common. regardless, I'm happy to finally have this working again.
Archived
This topic is now archived and is closed to further replies.