November 5, 2025Nov 5 Brand new unraid user, syno refugee, 1w left on my trial.I keep getting 2 parity errors during parity checks, even after fixing parity. (i think i did the write corrections option correctly)Granted 2 errors out of 26TB is de minimus, and I know parity isn't checksum, but i'm losing faith in the integrity of the array/data.So far:- parity multiple times, including with the fix option - initial parity check, before data transfer was 0 errors - it's always 2 parity errors now, after transferring data from old nas (synology)- memtest (ok), since other forum posts immediately jump to memory as the issue- smart extended tests, all ok, both before and after parity errors/fixes- always gracefully shutdown, and it's on a ups- burned 3 weeks of my trial, spent way too much time waiting for this test or that test to completeHardware (all brand new): - Terramaster F6-424 max, stock 8GB ram non-ECC DDR5 - 4* 26TB WD NAS Pro (3 of which are in array, 1 unassigned for now) - 2* 2TB Samsung 990 Pro nvme- Array is all XFS, with drive spin down after 15min- nvme are btrfs cache pool, added/enabled after data were transferred from synoQuestions:1 - Is there a log of where the parity checks failed? I suspect there's something somewhere, but i can't seem to find it. I feel that if it's the same two blocks each time, perhaps there may be a way to isolate those blocks.2 - Is there a way to isolate where the errors occur? I currently have 1 parity and 2 data drives, but could consolidate to 1 data drive, rerun parity to see if the errors recur. Repeat until no errors and the excluded drive is likely the cause. I'm hoping there may be an easier and quicker way though.3 - Would BTRFS help with confidence data aren't corrupted? I want to use unraid's flexibility to add/change drives at any time. ZFS seems too rigid and/or wasteful with space when flexibility is wanted. This experience with parity errors shook my confidence in XFS.4 - Is my hardware just toast? How can i tell definitively?- After repeat parity errors, i noticed recurring, non-periodic errors in syslog (other posts seem to lean toward "no worry, hardware corrected error"):```kernel: pcieport 0000:00:06.2: AER: Correctable error message received from 0000:00:06.2kernel: pcieport 0000:00:06.2: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)kernel: pcieport 0000:00:06.2: device [8086:463d] error status/mask=00000001/00002000``` - device is:```# lspci -nn | grep '8086:463d'00:06.2 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #2 [8086:463d] (rev 04)```Use case:- personal data - with cloud backup- local machine backup target - with cloud backup- media server (surprise!) - no cloud backup, since that would be cost prohibitive, although i am considering a local backup of a small, select portionHistory:- Hardware (all brand new): - Terramaster F6-424 max, stock 8GB ram non-ECC DDR5 - 4* 26TB WD NAS Pro (3 of which are in array, 1 unassigned for now) - 2* 2TB Samsung 990 Pro nvme- Array is all XFS, with drive spin down after 15min- nvme are btrfs cache pool, added/enabled after data were transferred from syno- Ran Parity check, all ok 0 errors, BEFORE putting any data in array- Created shares/users/set perms- rsync'd ~8TB from my syno, in batches based on share- Run parity check, 2 errors (relatively soon after starting)- Ran parity with corrections- moved more data- Ran parity check, 2 errors- Ran parity with correction, 2 errors 'fixed'- Noticed recurring, non-periodic errors (details noted eariler)- Ran parity check, 2 errors- Ran parity with correction, 2 errors 'fixed' again- smart extended test on all drives, 0 errors- upgraded from 7.1.4 to 7.2.0 (unsure if this is exactly in the correct sequence, but close)- memtest again, 0 errors- Running parity again, 2 errors so far- Never had a disorderly shutdown, attached to ups, only shut it down via gui.Thanks for reading this far, and especially for any assistance! depot-diagnostics-20251104-1859.zip
November 5, 2025Nov 5 Only parity check seen in syslogs in these diagnostics was noncorrecting. (syslog resets on reboot)
November 5, 2025Nov 5 Author Thanks for your responses, trurl.I had the syslog server set up (local, not flash), but i think that was after i ran a parity correcting run.When i saw your response 6 hours ago, i:cancelled the party (non-correcting) that was still runningrebooted out of superstitionensured syslog server was set up - perhaps i don't have it configured correctlyran a correcting parity checkabout 2.5hours in, at 10.2%, i noticed it corrected 2 parity errors.i cancelled the correcting parity runstarted a non-correcting parity runAt the time of this writing, the non-correcting run is at 3.3hrs and 13.2%, definitely farther along than the correcting run.Unless otherwise instructed, i'll let it complete and report back, eta 1 day.Thanks again for your help, much appreciated
November 6, 2025Nov 6 Author Well, don't i look like a fool....Non-correcting parity check finally completed, 0 errors.I'll see what happens on the next parity check.I'd appreciate any thoughts on my initial questions, if possible. Even a pointer to a section of the docs would help. This experience, specifically the uncertainty of what might be corrupt and what i might need to restore, has me reconsidering my storage plan and backup/restore strategy. On the bright side, at least it's early in my unraid journeyThanks!
November 7, 2025Nov 7 On 11/5/2025 at 12:12 AM, kmk-kmk said:Would BTRFS help with confidence data aren't corrupted?Yes, btrfs or zfs checksum all the data.
November 11, 2025Nov 11 Author I'm baaaack....Diags attachedLast night, non-correcting parity check started as scheduled. This morning, i checked and it was 30% complete found 2 errors. I'll let it continue, unless you provide guidance otherwise.Any insight and/or assistance is appreciated!Thanks, JorgeB - i moved my critical data to a btrfs mirror depot-diagnostics-20251111-0639.zip
November 11, 2025Nov 11 Cancel the non-correcting check, run a correcting one, and then another one right after, without rebooting. If the 2nd correcting check still finds errors, post the diagnostics.
November 11, 2025Nov 11 Author Thanks, i've stopped the non-correcting and started a correcting. I'll report back after the second correcting parity check. Each parity check takes around 2 days or more.I just realized my trial ends in 1 day 6 hours. I have not used any trial extensions.Should i request an extension during the parity check or request an extension now and run the two correcting parity checks after i'm extended? I'm not sure what happens when i extend the trial.
November 11, 2025Nov 11 If I remember correctly the licence is only checked when you start the array. Assuming I am correct (and I am sure someone will mention it if I am not) then you should be able to complete the requested checks before requesting the licence extension.
November 12, 2025Nov 12 Author Well, i now know that the array get stopped when the trial expires, lol.Regrettably, the first correcting parity check was almost complete. It did correct two errors, which i assume are the ones the previous non-correcting parity check found. I'll note that the 2 errors are detected/corrected reasonably early on (~3hours); this has been consistent across all runs that either detect or correct errors.I've extended my trial, saved diags, and started the second correcting parity check. I'll post when the second correcting parity check is complete, unless you have additional instructions in the meantime.
November 13, 2025Nov 13 9 hours ago, kmk-kmk said:I've extended my trial, saved diags, and started the second correcting parity check. I'll post when the second correcting parity check is complete,Yep, do that without rebooting.
November 14, 2025Nov 14 Author I see a trend here...second correcting parity completed, 0 errorsMy scheduled parity check (non-correcting) is set for monday night. If past weeks is any guide, i expect to see two parity errors again.I saved diags after each of the two most recent correcting parity checks (first found/corrected 2 errors, second found 0 errors).Please let me know if you'd like the diags (either or both). Otherwise, i'll continue to use the nas normally, and see what happens during/after Monday's parity check.
November 14, 2025Nov 14 If the second check didn't find any, reboot now and start another one, some hardware related sync errors only happen after a reboot.
November 14, 2025Nov 14 Author ah, ok, thanks, i wouldn't have thought of that.Rebooted and started a manual correcting parity check. I'll post back when it's complete....unless you'd like me to post when/if i see 2 errors, which is usually in the first 3-ish hours.
November 15, 2025Nov 15 12 hours ago, kmk-kmk said:unless you'd like me to post when/if i see 2 errors, which is usually in the first 3-ish hours.You can, and also cancel the check, if only that two are typically found.
November 15, 2025Nov 15 Author 2 errors were corrected ~3hrs in, per your msg and the fact that i've only ever seen 0 or 2 errors.I cancelled the check (14h ~46%) and started another correcting party check.
November 16, 2025Nov 16 That suggests an issue with a controller or device; and likely the sync errors are always in the same sectors.
November 22, 2025Nov 22 Author Since my last reply, i've ran several parity checks, with some reboots in between, keeping track of what happens when.I do believe you're on to the underlying issue, @JorgeB. Thank you!I only see parity errors after a reboot.I believe the parity errors are always the same two sectors, assuming i'm interpreting the below syslog entries correctly:kernel: md: recovery thread: P corrected, sector=3537895320kernel: md: recovery thread: P corrected, sector=3537895352Parity checks run after a correcting check, without an intervening reboot find no errors.Questions:How can i identify the underlying issue? Per your previous post, that suggests an issue with a controller or deviceCan i differentiate between a controller or disk? (i assume device implies disk) I suppose i could swap each array drive with another and see if the same two sectors report errorsWould using ZFS or BTRFS within the array help? From what i can tell, doing so may help identify a problem, but i'd be stuck restoring from backup, and then i'd need to restore everything since there's no way to tell which file or share is impacted. Perhaps i'm missing something though?Does it make sense for me to just run ZFS pool(s), ignoring the array? This approach seems to have the ability to both detect & correct errors, but i think i'd lose the flexibility the unraid array offers.What if i just ignore these two specific errors? the most i could lose is any files that include those 2*512 byte sectors, correct? I suppose the impact could be worse if a directory is stored in those two sectorsIn summary, it seems to be a hardware issue, and that unraid is working properly and has identified an intermittent problem, but i'm not sure what the best path forward is. I'm past the return windows and the hardware diagnostics return clean anyway.Thank you very much for all your help so far, as well as for any guidance you can provide! I'm continuing to learn about unraid and the various filesystems. I truly was shielded before by using synology.
November 23, 2025Nov 23 What model board are you using? There's no typical model info where it should be.I see that you are using an Asmedia controller. I assume it's an add-on? Are there no SATA ports on the board?In any case, that controller would be my main suspect, even if it's onboard.
November 24, 2025Nov 24 Author I'm using a Terramaster F6-424 Max, no idea what hardware they use other than what unraid identified. I assume the hardware is mostly custom and/or proprietary. My online searches were fruitless in identifying the hardware, other than the cpu.
November 24, 2025Nov 24 If you have any other PC available, my first test would be to move all the disks there and repeat the parity check; that would quickly confirm if the devices or the hardware is the problem. I suspect the latter; if you don't, you would basically need to replace one of the disks with a new one, and retest, if the same, replace another one and so on, but I doubt that is the problem.
November 25, 2025Nov 25 Author Regrettably, i don't have another machine to which i can move the disks to test.I agree with you in that it's likely the hardware. Since i'm outside the return window, i'm trying to find a way to work with the likely-defective hardware i have. If it is a consistent hardware failure, where the same two sectors fail, would it be possible, or even feasible, to mark those two sectors as 'bad' to avoid writing data to them? Though my gut tells me this is essentially asking for trouble in the future.Thank you for confirming my initial thought process of swapping one disk at a time. I think i can try to isolate these variables:specific diskslot of the hot swappable backplanecontrollercombination of aboveI'm working out a plan to test the above efficiently, since i have 4 hdds (3 in array, 1 unassigned) but all my data currently fits on one.I'm also struggling with the consistency of the errors -- for a hardware error, it just doesn't feel right that the same two sectors fail. I would expect more randomness. To be clear, i certainly trust your judgement and experience, and agree that it's likely a hardware issue, i'm just trying to better understand the type of hardware error that consistently fails in the same two sectors.Thank you again! I'll post back with my findings.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.