opentoe Posted July 1, 2015 Share Posted July 1, 2015 What are the odds on doing a parity back to back and seeing 5 errors again? Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069768 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069776 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069784 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069792 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069800 I did a parity a couple days ago after the build and had 5 errors also, but killed my syslog on a reboot. I saved it this time. I guess I'll run some tests on each drive, but the last time I checked they were all good. EDIT: Finally got a parity check done. The same 5 errors again. Is this a bug I should report? I've individually checked every drive, ran long smart tests and all are good. No pending sectors, no raw errors. Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069768 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069776 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069784 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069792 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069800 Link to comment
garycase Posted July 1, 2015 Share Posted July 1, 2015 I'd definitely run another check and see if you get the same 5 errors [i presume you ran a correcting check, so they should have been fixed -- otherwise of course you'll get the same errors ] Someone had this same issue a couple months ago -- was that you? Link to comment
opentoe Posted July 1, 2015 Author Share Posted July 1, 2015 I'd definitely run another check and see if you get the same 5 errors [i presume you ran a correcting check, so they should have been fixed -- otherwise of course you'll get the same errors ] Someone had this same issue a couple months ago -- was that you? No, this was not me. I'll kick off another parity before I hit the sack tonight. My parity takes 15 hours. It is a 4TB drive and the average speed is 67MB/sec. Wonder why so slow. I get really good read speeds from the array. EDIT: Can you let me know your md_write_limit is in your disk.cfg file? This may be why such slow parity checks. Link to comment
garycase Posted July 1, 2015 Share Posted July 1, 2015 My parameters are set at 3584, 1536, and 1536. Link to comment
opentoe Posted July 1, 2015 Author Share Posted July 1, 2015 My parameters are set at 3584, 1536, and 1536. Ahh, you must have edited yours. Ok, I'll re-run a parity and see if the same errors come up. Link to comment
JonathanM Posted July 1, 2015 Share Posted July 1, 2015 Can you let me know your md_write_limit is in your disk.cfg file? This may be why such slow parity checks. Stock, AFAIK, is this. md_num_stripes="1280" md_write_limit="768" md_sync_window="384" Link to comment
Wally Posted July 1, 2015 Share Posted July 1, 2015 Opentoe, I had the same exact problem at the same exact sectors as mentioned in the other thread here: http://lime-technology.com/forum/index.php?topic=38359.0. The problem is caused by the flakey Marvell controllers or Linux their drivers which causes problems mainly when using VT-D but also when not in my case. Try the latest unRAID 6.0.1 which seems to have fixed the problem in my system. My 5 errors were consistent and changed location when I upgraded to a larger parity drive. Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069768 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069776 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069784 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069792 Apr 4 18:28:39 unRAID kernel: md: correcting parity, sector=3519069800 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606472 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606480 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606488 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606496 Apr 7 07:57:11 unRAID kernel: md: correcting parity, sector=1177606504 Wally Link to comment
Wally Posted July 1, 2015 Share Posted July 1, 2015 Here's the thread that explains the problem: http://lime-technology.com/forum/index.php?topic=40683.0. If you google "VT-d Marvell" there's a lot of talk about it and the patches required. I believe even with VT-d disabled or not even available, these Marvell controllers can cause problems with certain system as seen here. Link to comment
opentoe Posted July 1, 2015 Author Share Posted July 1, 2015 I checked my syslog and don't have any errors about a Marvell controller. I had to stop my parity check since it wasn't even half over at 15 hours. So something is wrong somewhere. I'm doing a smart test on each of my drives, will reboot and try again. Hard to tell when there's nothing in the logs. Link to comment
opentoe Posted July 1, 2015 Author Share Posted July 1, 2015 Here is a paste of my syslog. I don't see and particular errors that stand out. Maybe someone can review it, who knows what to look for. I want to get everything right before I try another parity check. Since it takes a very long time. I started one a little while ago and was only getting 50MB/sec, but my array read speeds are really good. 113MB/sec. I must be missing something. I don't remember a parity ever taking so long. It is a 4TB drive. http://pastebin.com/9ykP6H57 Link to comment
dgaschk Posted July 2, 2015 Share Posted July 2, 2015 Test in Safe-Mode. If problem exists in SAFE mode then post diagnostics. Link to comment
Wally Posted July 2, 2015 Share Posted July 2, 2015 I never had any errors in my logs until I replaced my CPU with one that supported VT-d and then the DMA errors mentioned in the other posts showed up. I believe the Marvell controller still had problems as once I removed it the 5 parity check errors disappeared. Now with unRAID version 6.0.1 the problem seems patched as even with the Marvell controller installed, the parity errors are gone. Link to comment
JustinChase Posted July 2, 2015 Share Posted July 2, 2015 What are the odds on doing a parity back to back and seeing 5 errors again? Better than you'd expect. Link to comment
opentoe Posted July 2, 2015 Author Share Posted July 2, 2015 What are the odds on doing a parity back to back and seeing 5 errors again? Better than you'd expect. Haven't read your thread yet, but I'm seeing the same exact issue you are. Here is a repost from the original above. What are the odds on doing a parity back to back and seeing 5 errors again? Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069768 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069776 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069784 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069792 Jun 30 13:54:08 SUN kernel: md: correcting parity, sector=3519069800 I did a parity a couple days ago after the build and had 5 errors also, but killed my syslog on a reboot. I saved it this time. I guess I'll run some tests on each drive, but the last time I checked they were all good. EDIT: Finally got a parity check done. The same 5 errors again. Is this a bug I should report? I've individually checked every drive, ran long smart tests and all are good. No pending sectors, no raw errors. I have it set to correct errors. Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069768 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069776 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069784 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069792 Jul 2 08:20:57 SUN kernel: md: correcting parity, sector=3519069800 I'll check our your thread. Link to comment
opentoe Posted July 3, 2015 Author Share Posted July 3, 2015 Well, I did run a short smart test on all my drives. All came back good. No re-allocated sectors. No raw read errors. No CRC errors. All good. I'm %100 sure the hardware is good. This is a system that has been running for little over a year on a Windows machine with no problems. All I did was move the mainboard to a different case. Never touched the memory, but pressed down on them just to make sure they were all in nice and tight. Never touched the processor. Kept the CPU fan on it. I'm pretty sure it is not the hardware. In case unRaid support needs any hardware specs, here they are. I'm willing to try whatever. I've of course also checked all connections, cables, used the vaccumm and cleaned everything up. Made sure everything is in order. I have two brand new supermicro add-on cards - AOC-SAS2LP-MV8. Checked out their bios, both using same version, detecting drives fine. I like the fact you can view both cards internal menu on the same screen. Don't have to hit CTRL-M to access each one. Like the old school SCSI card days. unRAID system: unRAID server Pro, version 6.0.1 Model: Custom Motherboard: ASUSTeK COMPUTER INC. - SABERTOOTH X79 Processor: Intel® Core™ i7-3930K CPU @ 3.20GHz HVM: Enabled IOMMU: Enabled Cache: L1-Cache = 32 kB (max. capacity 32 kB) L2-Cache = 256 kB (max. capacity 256 kB) L3-Cache = 12288 kB (max. capacity 12288 kB) Memory: 32768 MB (max. installable capacity 96 GB) ChannelA = 4096 MB, 1600 MHz ChannelA = 4096 MB, 1600 MHz ChannelB = 4096 MB, 1600 MHz ChannelB = 4096 MB, 1600 MHz ChannelC = 4096 MB, 1600 MHz ChannelC = 4096 MB, 1600 MHz ChannelD = 4096 MB, 1600 MHz ChannelD = 4096 MB, 1600 MHz Network: eth0: 1000Mb/s - Full Duplex Kernel: Linux 4.0.4-unRAID x86_64 OpenSSL: 1.0.1o Uptime: 0 days, 17 hours, 49 minutes, 18 seconds Link to comment
dgaschk Posted July 3, 2015 Share Posted July 3, 2015 http://lime-technology.com/forum/index.php?topic=21052.0 http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors Link to comment
opentoe Posted July 3, 2015 Author Share Posted July 3, 2015 http://lime-technology.com/forum/index.php?topic=21052.0 http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors One can really start to go nuts trying to fix parity errors when there could be a possible bug in the software. I had the same problem as user JustinChase. Soon as I changed my parity spin down time to "never" my parity check ran fine with no errors. This would probably have to be solved by unRaid. Link to comment
garycase Posted July 3, 2015 Share Posted July 3, 2015 That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem. My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count. According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue. My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue. Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.]. Link to comment
opentoe Posted July 3, 2015 Author Share Posted July 3, 2015 That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem. My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count. According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue. My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue. Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.]. We would definitely have to. The diagnostics option is a good place to start. Link to comment
JustinChase Posted July 3, 2015 Share Posted July 3, 2015 That's a VERY interesting result. Clearly it's a bug in the software => but it's a strange one, as I'm sure most folks have spindown settings and aren't seeing this problem. My only v6 system is my test setup; and it doesn't have this issue. Neither do my other 3 servers, but they're not on v6, so that likely doesn't count. According to your sigs, you & JustinChase do not have the same motherboard, so it's not likely a chipset issue. Nor do you have the same add-in controllers cards. So it's really strange that you're both having this issue. My v6 test setup only has 80GB drives, so it may require larger drives to encounter this -- but I'd think a LOT more folks would have reported this by now if it was a universal issue. Perhaps you & JustinChase should exchange very precise details on your configurations to see if there's SOMETHING in common [exact disk makes/models; exact list of plugins/Dockers in use; etc.]. We would definitely have to. The diagnostics option is a good place to start. very strange. i assume that, like me, when switching back the parity spin down setting to use default, then running a new parity check, it will work now. i still don't know if it will survive a reboot though. I'm going to post my diagnostics log in my other thread, since I know jonp is watching that one. perhaps your log will help figure out what we might have in common. regardless, I'm happy to finally have this working again. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.