KeithAbbott Posted January 18, 2021 Share Posted January 18, 2021 I have unRAID 6.8.3 set up to do a parity check monthly. Every month, it finds 5 errors. It seems likely that the errors are from the parity drive itself. Are there any other options to correct this short of replacing the parity drive? I tried to search this forum for suggestions, but didn't spot anything that matched my situation. If I missed an applicable post, please reply with the link and I'll go from there. Thanks in advance! Quote Link to comment
trurl Posted January 19, 2021 Share Posted January 19, 2021 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
KeithAbbott Posted January 24, 2021 Author Share Posted January 24, 2021 Diagnostics ZIP file attached, as requested. sagetv-diagnostics-20210124-0813.zip Quote Link to comment
trurl Posted January 24, 2021 Share Posted January 24, 2021 Looks like you rebooted since last parity check so nothing to see in syslog. On 1/18/2021 at 6:42 PM, KeithAbbott said: parity check monthly. Every month, it finds 5 errors If your parity checks have been noncorrecting, then they are finding the same parity errors because they haven't been corrected. You need to run a correcting parity check so the parity errors get corrected. Then you need to run a noncorrecting parity check to verify that there are zero sync errors. Exactly zero is the only acceptable result, and if you don't get that you haven't finished diagnosing your problem. Quote Link to comment
KeithAbbott Posted January 24, 2021 Author Share Posted January 24, 2021 Thanks for the quick response. My monthly parity check has had "Write corrections to parity disk" set to "Yes" ever since I first set it up. I will rerun the parity check tonight and repost the diagnostics ZIP file, maybe that will give a better clue as to what is happening. My parity check takes 19 hours to run, so I probably will not have anything to post until later tomorrow. Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 You're using a SAS2LP, those are known to in some cases generate the same 5 sync errors corrupting data, IIRC it only happens in the first check after a reboot, you should replace it with an LSI. Quote Link to comment
KeithAbbott Posted January 26, 2021 Author Share Posted January 26, 2021 New diagnostics ZIP file attached, this one was created after a parity check was completed, but before any system reboot occurred. Thanks for the info about the SAS2LP, looks like I will be shopping for an LSI controller card. Although I would still like someone to take a look at my diagnostics file and confirm whether that was the root cause or not. sagetv-diagnostics-20210125-1946.zip Quote Link to comment
trurl Posted January 26, 2021 Share Posted January 26, 2021 Jan 24 22:28:25 SageTV kernel: mdcmd (62): check Jan 24 22:28:25 SageTV kernel: md: recovery thread: check P ... Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004440 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004448 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004456 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004464 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004472 Now to see if the next one has the same sync errors. Quote Link to comment
KeithAbbott Posted January 26, 2021 Author Share Posted January 26, 2021 Well, I ran another parity check overnight, and wouldn't you know this one came back with zero errors. First time in probably over a year. My next scheduled parity check is a week from today, I'll see how that one goes and post the results. In the meantime, I'm looking at replacing my SAS2LP with an LSI 9207-8i. Good choice? Anything I should be aware of? Anything I need to do besides unplugging the SAS2LP and plugging the 9207-8i in? Quote Link to comment
JorgeB Posted January 27, 2021 Share Posted January 27, 2021 8 hours ago, KeithAbbott said: I ran another parity check overnight, and wouldn't you know this one came back with zero errors. Did you reboot before? On 1/25/2021 at 9:11 AM, JorgeB said: IIRC it only happens in the first check after a reboot Quote Link to comment
KeithAbbott Posted January 27, 2021 Author Share Posted January 27, 2021 No, I had not. I will reboot and rerun the parity check, and post the results. Quote Link to comment
KeithAbbott Posted February 3, 2021 Author Share Posted February 3, 2021 I've rerun the parity check after rebooting, and attached the resulting diagnostics ZIP file. Here's the relevant snippet from the syslog: Feb 2 01:30:01 SageTV kernel: mdcmd (42): check Feb 2 01:30:01 SageTV kernel: md: recovery thread: check P ... Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004440 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004448 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004456 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004464 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004472 As you can see, the corrected sectors match the last parity check exactly. Hopefully, this gives a clue as to root cause of the problem and suggestions on where to go from here. If the drive(s) are good, and this is strictly an issue with the SAS2LP controller card, I can replace the card with an LSI 9207-8i and resolve the issue. However, if I have issues with the parity drive that requires replacing the drive, that will be a more costly remedy. Suggestions? sagetv-diagnostics-20210202-2040.zip Quote Link to comment
JorgeB Posted February 3, 2021 Share Posted February 3, 2021 7 hours ago, KeithAbbott said: As you can see, the corrected sectors match the last parity check exactly. That is the SAS2LP known issue, so it should be fixed once you replace it. Quote Link to comment
KeithAbbott Posted March 7, 2021 Author Share Posted March 7, 2021 I have replaced my SAS2LP with an LSI 9207-8i. I reran the parity check, this time without the setting to write corrections to the parity disk. It is still finding the same five errors as before. If I rerun parity check again (except this time writing corrections to the parity disk), will it make the corrections and then behave properly (no errors) thereafter? Mar 6 23:06:21 SageTV kernel: mdcmd (42): check nocorrect Mar 6 23:06:21 SageTV kernel: md: recovery thread: check P ... Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004440 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004448 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004456 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004464 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004472 Quote Link to comment
itimpi Posted March 7, 2021 Share Posted March 7, 2021 It should - but you need to do it to confirm. Quote Link to comment
KeithAbbott Posted March 9, 2021 Author Share Posted March 9, 2021 OK, thanks. I've reran the parity check once again, this time with the setting to write corrections to the parity disk. My next scheduled parity check is in early April, so I'll know then whether the new controller has solved the problem or not. With the new LSI 9207-8i controller, comes a new question. I think the firmware version is 18.xx.xx.xx, so I am thinking that I should be upgrading the firmware to 20.00.07.00. I am running it in IT mode, and from researching on this forum, it looks like I should probably remove/erase the BIOS, which would speed up the boot process a bit. In summary, upgrade the firmware and remove the BIOS. Any cautions against doing that? Any guidance either for or against? Quote Link to comment
JorgeB Posted March 9, 2021 Share Posted March 9, 2021 7 hours ago, KeithAbbott said: In summary, upgrade the firmware and remove the BIOS. Any cautions against doing that? Upgrading the firmware is very easy and can be done with Unraid, unfortunately the erase BIOS command doesn't work with the Linux tool, but it's still easy to do if you boot with DOS flash drive, or UEFI shell if the board has UEFI. Quote Link to comment
KeithAbbott Posted March 9, 2021 Author Share Posted March 9, 2021 (edited) Thanks much for the reply, looks like I found my weekend project. I do have a spare Windows 10 workstation that I will use, instead of my server. That way, I'm not messing with my "production" box, and I can then erase the BIOS also. Thanks for providing the link, that gave me exactly what I need. Oh, one other question. Is it best practice to run the monthly parity check with the setting to not write corrections to the parity disk? Edited March 9, 2021 by KeithAbbott Quote Link to comment
JorgeB Posted March 9, 2021 Share Posted March 9, 2021 16 minutes ago, KeithAbbott said: Is it best practice to run the monthly parity check with the setting to not write corrections to the parity disk? Correct. Quote Link to comment
deltaexray Posted March 9, 2021 Share Posted March 9, 2021 I have an issue, that I think fit's perfectly into this topig: My parity drive just got disabled within unraid becrause of errors. Only thing is, I do not know what kind of error's and I am in a little bit of a helpless situation here Quote Link to comment
JorgeB Posted March 9, 2021 Share Posted March 9, 2021 28 minutes ago, deltaexray said: Only thing is, I do not know what kind of error's and I am in a little bit of a helpless situation here Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
deltaexray Posted March 9, 2021 Share Posted March 9, 2021 (edited) As a quick info on top: I did the update to Unraid 6.9.0 this morning, everything worked as before, no issues there no nothing. I kept watching a bunch of series thru plex over the curse of the day, no issues there too. When I wanted to copy a picture to the server and then delete it, it told me that the "medium" was read/right-protected, which was the first thing that was unsual. So I went into the WebGUI and there was a big, red error message regarding the parity disk, which is now diabled. I, and that was a mistake, rebooted the system shortly after that cause I thought it would help - It did not. It said something like: Sector issue, cannot read something. Wouldn't surprise me if the drive is dead but I hope not Fun Fact: the Parity disk will also not spin down if the array spins down So now I'm stuck with an unprotected array and a disabled parity disk. Anyway, Diagnostics are attached server-diagnostics-20210309-1955.zip Edited March 9, 2021 by deltaexray Forgot something Quote Link to comment
JorgeB Posted March 9, 2021 Share Posted March 9, 2021 11 minutes ago, deltaexray said: Anyway, Diagnostics are attached There are some known issues with v6.9 and the ST8000VN004 when connected to an LSI, if it's an option you can connect them to the onboard SATA ports, otherwise best to go back to v6.8 for now. Quote Link to comment
deltaexray Posted March 9, 2021 Share Posted March 9, 2021 Why I am not even suprised. Nope, the onboards don't work, that is why there is an HBA How do I go back to 6.8? Quote Link to comment
JorgeB Posted March 9, 2021 Share Posted March 9, 2021 1 minute ago, deltaexray said: How do I go back to 6.8? If you used the GUI to upgrade there's an option to go back. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.