BinaryPatrick Posted May 19, 2020 Posted May 19, 2020 I've posted about this before and thought I had it solved. Every time I run a parity check, I have parity errors. It seems like the longer I go between checks the more errors I have, but nonetheless I have at least 6K - 90K errors every time. At this point I have checked the cabling, run Memtest86 about 4 different times (both in parallel and serial across all CPU cores), and run SMART tests on the drives. No errors are detected in any of my testing. I feel like I must have a bad drive. When I set up this server 2 months ago, I was a n00b and didn't preclear at all. Should I break apart my config, start over, and preclear all the drives / run extended SMART tests? Are there any other tests I could run at this point? I have my data backed up on other devices so I purge my unraid instance at this point if I need to. I'm running 3x 2TB drives running xfs. Quote
JorgeB Posted May 19, 2020 Posted May 19, 2020 Post diags after at least two parity checks with errors (without rebooting). Quote
BinaryPatrick Posted May 20, 2020 Author Posted May 20, 2020 22 hours ago, johnnie.black said: Post diags after at least two parity checks with errors (without rebooting). Yea I don't think I can get two without errors. There are always errors, that's my point. homeserve-diagnostics-20200520-1308.zip Quote
JorgeB Posted May 20, 2020 Posted May 20, 2020 The syslog posted is filled with crashes, but I can't see the beginning of the problem or parity checks, reboot, start a parity, let it finish, do another one, then post diags, without rebooting. Quote
BinaryPatrick Posted May 20, 2020 Author Posted May 20, 2020 Do you know what those crashes are? My googling didn't give any hints other than some Kernel problem.. Quote
JorgeB Posted May 21, 2020 Posted May 21, 2020 13 hours ago, johnnie.black said: can't see the beginning of the problem Might be easier to see what's causing them if the syslog covers the start of the issue. Quote
BinaryPatrick Posted May 22, 2020 Author Posted May 22, 2020 I ran extended SMART tests on all HDDs and it returned no errors, but running another parity check resulted in 846,120 errors. Attached are the diagnostics. Syslog was 100% full. homeserve-diagnostics-20200522-0244.zip Quote
itimpi Posted May 22, 2020 Posted May 22, 2020 Your syslog is full of errors that start with May 22 00:14:03 homeserve kernel: DMAR: ERROR: DMA PTE for vPFN 0xf2827 already set (to 200000000000 not 758ed9002) May 22 00:14:03 homeserve kernel: ------------[ cut here ]------------ May 22 00:14:03 homeserve kernel: WARNING: CPU: 2 PID: 21150 at drivers/iommu/intel-iommu.c:2300 __domain_mapping+0x205/0x2dd Not sure what causes this, but they should not be happening on a well-behaved system. Quote
JorgeB Posted May 22, 2020 Posted May 22, 2020 Looks like a hardware problem, run memtest and/or try a different board/controller if available. Quote
Trucking Geek Posted April 22, 2022 Posted April 22, 2022 I am having this same issue with a new system. (i7-12700K, MSI Z690i AUROS Ultra, 32GB of Kingston Fury DDR5, 3x 1TB Samsung 980 Pro NVMe.) System seems to run fine for a few hours, then log fills up with the kernel: DMAR: ERROR: DMA PTE for vPFN 0xf2827 already set (to 200000000000 not 758ed9002) errors. Did you ever find a resolution? I am letting it run again now after lowering the memory clocks. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.