January 18, 20215 yr I have unRAID 6.8.3 set up to do a parity check monthly. Every month, it finds 5 errors. It seems likely that the errors are from the parity drive itself. Are there any other options to correct this short of replacing the parity drive? I tried to search this forum for suggestions, but didn't spot anything that matched my situation. If I missed an applicable post, please reply with the link and I'll go from there. Thanks in advance!
January 19, 20215 yr Community Expert Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.
January 24, 20215 yr Author Diagnostics ZIP file attached, as requested. sagetv-diagnostics-20210124-0813.zip
January 24, 20215 yr Community Expert Looks like you rebooted since last parity check so nothing to see in syslog. On 1/18/2021 at 6:42 PM, KeithAbbott said: parity check monthly. Every month, it finds 5 errors If your parity checks have been noncorrecting, then they are finding the same parity errors because they haven't been corrected. You need to run a correcting parity check so the parity errors get corrected. Then you need to run a noncorrecting parity check to verify that there are zero sync errors. Exactly zero is the only acceptable result, and if you don't get that you haven't finished diagnosing your problem.
January 24, 20215 yr Author Thanks for the quick response. My monthly parity check has had "Write corrections to parity disk" set to "Yes" ever since I first set it up. I will rerun the parity check tonight and repost the diagnostics ZIP file, maybe that will give a better clue as to what is happening. My parity check takes 19 hours to run, so I probably will not have anything to post until later tomorrow.
January 25, 20215 yr Community Expert You're using a SAS2LP, those are known to in some cases generate the same 5 sync errors corrupting data, IIRC it only happens in the first check after a reboot, you should replace it with an LSI.
January 26, 20215 yr Author New diagnostics ZIP file attached, this one was created after a parity check was completed, but before any system reboot occurred. Thanks for the info about the SAS2LP, looks like I will be shopping for an LSI controller card. Although I would still like someone to take a look at my diagnostics file and confirm whether that was the root cause or not. sagetv-diagnostics-20210125-1946.zip
January 26, 20215 yr Community Expert Jan 24 22:28:25 SageTV kernel: mdcmd (62): check Jan 24 22:28:25 SageTV kernel: md: recovery thread: check P ... Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004440 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004448 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004456 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004464 Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004472 Now to see if the next one has the same sync errors.
January 26, 20215 yr Author Well, I ran another parity check overnight, and wouldn't you know this one came back with zero errors. First time in probably over a year. My next scheduled parity check is a week from today, I'll see how that one goes and post the results. In the meantime, I'm looking at replacing my SAS2LP with an LSI 9207-8i. Good choice? Anything I should be aware of? Anything I need to do besides unplugging the SAS2LP and plugging the 9207-8i in?
January 27, 20215 yr Community Expert 8 hours ago, KeithAbbott said: I ran another parity check overnight, and wouldn't you know this one came back with zero errors. Did you reboot before? On 1/25/2021 at 9:11 AM, JorgeB said: IIRC it only happens in the first check after a reboot
January 27, 20215 yr Author No, I had not. I will reboot and rerun the parity check, and post the results.
February 3, 20215 yr Author I've rerun the parity check after rebooting, and attached the resulting diagnostics ZIP file. Here's the relevant snippet from the syslog: Feb 2 01:30:01 SageTV kernel: mdcmd (42): check Feb 2 01:30:01 SageTV kernel: md: recovery thread: check P ... Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004440 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004448 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004456 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004464 Feb 2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004472 As you can see, the corrected sectors match the last parity check exactly. Hopefully, this gives a clue as to root cause of the problem and suggestions on where to go from here. If the drive(s) are good, and this is strictly an issue with the SAS2LP controller card, I can replace the card with an LSI 9207-8i and resolve the issue. However, if I have issues with the parity drive that requires replacing the drive, that will be a more costly remedy. Suggestions? sagetv-diagnostics-20210202-2040.zip
February 3, 20215 yr Community Expert 7 hours ago, KeithAbbott said: As you can see, the corrected sectors match the last parity check exactly. That is the SAS2LP known issue, so it should be fixed once you replace it.
March 7, 20215 yr Author I have replaced my SAS2LP with an LSI 9207-8i. I reran the parity check, this time without the setting to write corrections to the parity disk. It is still finding the same five errors as before. If I rerun parity check again (except this time writing corrections to the parity disk), will it make the corrections and then behave properly (no errors) thereafter? Mar 6 23:06:21 SageTV kernel: mdcmd (42): check nocorrect Mar 6 23:06:21 SageTV kernel: md: recovery thread: check P ... Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004440 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004448 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004456 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004464 Mar 7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004472
March 9, 20215 yr Author OK, thanks. I've reran the parity check once again, this time with the setting to write corrections to the parity disk. My next scheduled parity check is in early April, so I'll know then whether the new controller has solved the problem or not. With the new LSI 9207-8i controller, comes a new question. I think the firmware version is 18.xx.xx.xx, so I am thinking that I should be upgrading the firmware to 20.00.07.00. I am running it in IT mode, and from researching on this forum, it looks like I should probably remove/erase the BIOS, which would speed up the boot process a bit. In summary, upgrade the firmware and remove the BIOS. Any cautions against doing that? Any guidance either for or against?
March 9, 20215 yr Community Expert 7 hours ago, KeithAbbott said: In summary, upgrade the firmware and remove the BIOS. Any cautions against doing that? Upgrading the firmware is very easy and can be done with Unraid, unfortunately the erase BIOS command doesn't work with the Linux tool, but it's still easy to do if you boot with DOS flash drive, or UEFI shell if the board has UEFI.
March 9, 20215 yr Author Thanks much for the reply, looks like I found my weekend project. I do have a spare Windows 10 workstation that I will use, instead of my server. That way, I'm not messing with my "production" box, and I can then erase the BIOS also. Thanks for providing the link, that gave me exactly what I need. Oh, one other question. Is it best practice to run the monthly parity check with the setting to not write corrections to the parity disk? Edited March 9, 20215 yr by KeithAbbott
March 9, 20215 yr Community Expert 16 minutes ago, KeithAbbott said: Is it best practice to run the monthly parity check with the setting to not write corrections to the parity disk? Correct.
March 9, 20215 yr I have an issue, that I think fit's perfectly into this topig: My parity drive just got disabled within unraid becrause of errors. Only thing is, I do not know what kind of error's and I am in a little bit of a helpless situation here
March 9, 20215 yr Community Expert 28 minutes ago, deltaexray said: Only thing is, I do not know what kind of error's and I am in a little bit of a helpless situation here Please post the diagnostics: Tools -> Diagnostics
March 9, 20215 yr As a quick info on top: I did the update to Unraid 6.9.0 this morning, everything worked as before, no issues there no nothing. I kept watching a bunch of series thru plex over the curse of the day, no issues there too. When I wanted to copy a picture to the server and then delete it, it told me that the "medium" was read/right-protected, which was the first thing that was unsual. So I went into the WebGUI and there was a big, red error message regarding the parity disk, which is now diabled. I, and that was a mistake, rebooted the system shortly after that cause I thought it would help - It did not. It said something like: Sector issue, cannot read something. Wouldn't surprise me if the drive is dead but I hope not Fun Fact: the Parity disk will also not spin down if the array spins down So now I'm stuck with an unprotected array and a disabled parity disk. Anyway, Diagnostics are attached server-diagnostics-20210309-1955.zip Edited March 9, 20215 yr by deltaexray Forgot something
March 9, 20215 yr Community Expert 11 minutes ago, deltaexray said: Anyway, Diagnostics are attached There are some known issues with v6.9 and the ST8000VN004 when connected to an LSI, if it's an option you can connect them to the onboard SATA ports, otherwise best to go back to v6.8 for now.
March 9, 20215 yr Why I am not even suprised. Nope, the onboards don't work, that is why there is an HBA How do I go back to 6.8?
March 9, 20215 yr Community Expert 1 minute ago, deltaexray said: How do I go back to 6.8? If you used the GUI to upgrade there's an option to go back.
Archived
This topic is now archived and is closed to further replies.