TheRefugee Posted August 2, 2017 Share Posted August 2, 2017 I often get exactly 5 parity errors during my parity checks and it is past time to diagnose the problem. I have had my share of unclean power shutdowns that I have mitigated but I want to be prepared for the next time it happens. Are there any steps I should take before executing my next parity check that will help me narrow down what could be causing the issue? Obviously, I will include the diagnostic zip file but are there any specific logging settings that should be enabled to catch my specific problem? Note: in my system log, there are recurring power failure warnings. I have a portable room a/c that triggers my UPS to kick on for 2 seconds before returning to normal operation. I am guessing there is a drop in voltage but the lights in my room don't dim and all other appliances operate normally. Hopefully, this isn't a problem and the UPS is just doing its job protecting my NAS. syslog.txt Quote Link to comment
JorgeB Posted August 2, 2017 Share Posted August 2, 2017 Know issue with a SAS2LP, more info on the threads below: Quote Link to comment
Squid Posted August 2, 2017 Share Posted August 2, 2017 4 hours ago, TheRefugee said: I often get exactly 5 parity errors during my parity checks and it is past time to diagnose the problem. I have had my share of unclean power shutdowns that I have mitigated but I want to be prepared for the next time it happens. Can you post your diagnostics. If I'm right about why the 5 errors occur, then diagnostics would let me know. (Or force me to rethink my theory) And, can you confirm or deny this: Between March 3 and April 10th, you reset the server (or any type of shutdown) Between April 10th and May 5th, you did not reset the server Between May 5th and May 22nd you reset the server You reset the Server sometime on May 22/23 after the parity check occurred on the 22nd, but before the parity check on the 23rd After May 23rd, you reset the server before June 25th. On the 25th you reset the server and ran another parity check on the 26th. And sometime between June 26th and Aug 2, the server was reset. Quote Link to comment
TheRefugee Posted August 2, 2017 Author Share Posted August 2, 2017 (edited) 24 minutes ago, Squid said: Can you post your diagnostics. If I'm right about why the 5 errors occur, then diagnostics would let me know. (Or force me to rethink my theory) And, can you confirm or deny this: Between March 3 and April 10th, you reset the server (or any type of shutdown) Between April 10th and May 5th, you did not reset the server Between May 5th and May 22nd you reset the server You reset the Server sometime on May 22/23 after the parity check occurred on the 22nd, but before the parity check on the 23rd After May 23rd, you reset the server before June 25th. On the 25th you reset the server and ran another parity check on the 26th. And sometime between June 26th and Aug 2, the server was reset. My server isn't up all the time so it almost certainly was rebooted between those dates. It's rare for my server to be up for more than a month at a time. Longest I have gone is 45 days, irrc. Edited August 2, 2017 by TheRefugee Quote Link to comment
Squid Posted August 2, 2017 Share Posted August 2, 2017 (edited) If at all possible, move the all the 8TB (and the samsung 850) drives off of the SAS2LP and on to the motherboard ports. If you only have enough ports for the 8TBs, then leave them off of the SAS2LP and put the samsung 850 onto the SAS2LP Then reboot, and try another parity check. If my theory holds true, you shouldn't have 5 parity check errors. If you do, post another diagnostics, as I need to compare the afflicted sectors (in the last check they were: Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151176 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151184 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151192 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151200 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151208 Edited August 2, 2017 by Squid Quote Link to comment
TheRefugee Posted August 2, 2017 Author Share Posted August 2, 2017 37 minutes ago, Squid said: If at all possible, move the all the 8TB (and the samsung 850) drives off of the SAS2LP and on to the motherboard ports. If you only have enough ports for the 8TBs, then leave them off of the SAS2LP and put the samsung 850 onto the SAS2LP Then reboot, and try another parity check. If my theory holds true, you shouldn't have 5 parity check errors. If you do, post another diagnostics, as I need to compare the afflicted sectors (in the last check they were: Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151176 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151184 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151192 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151200 Aug 1 10:47:02 Tower kernel: md: recovery thread: PQ corrected, sector=2743151208 Currently, 6 data drives (4 x 4TB and 2 x 8TB) and 2 parity drives (2 x 8TB) are on the SAS2LP with the Samsung SSD being the only drive connected to my motherboard. I have 6 total motherboard ports I can connect drives to so moving the 2 parity 8TB drives, 2 data 8TB drives and the SSD to the motherboard will work. Is this the preferred option? Do I need to write down which cables are connected to which drives beforehand? Does switching which drives are connected to which port interfere with my array being recognized or is it irrelevant? Quote Link to comment
Squid Posted August 2, 2017 Share Posted August 2, 2017 3 minutes ago, TheRefugee said: I have 6 total motherboard ports I can connect drives to so moving the 2 parity 8TB drives, 2 data 8TB drives and the SSD to the motherboard will work. Is this the preferred option? That would be my choice. 3 minutes ago, TheRefugee said: Does switching which drives are connected to which port interfere with my array being recognized or is it irrelevant? All irrelevant. unRaid keeps track of drive assignments by serial number, so you shouldn't notice any change at all. But, it is always prudent to make a note of the original cabling just in case. Quote Link to comment
TheRefugee Posted August 4, 2017 Author Share Posted August 4, 2017 (edited) On 8/2/2017 at 2:59 PM, Squid said: That would be my choice. All irrelevant. unRaid keeps track of drive assignments by serial number, so you shouldn't notice any change at all. But, it is always prudent to make a note of the original cabling just in case. I will be able to shut down and move my 8TB disks onto the motherboard tonight and I will start the parity check tonight as well. To clarify the order of operations: 1. Shutdown 2. Move 8TB disks and SSD onto motherboard, leaving 4TB disks on the SAS2LP 3. Power On, start array 4. Parity check Does that look correct? No other restart is necessary after getting the array to start up after moving the disks around? Just wanted to clarify because I wasn't sure if you were assuming I can hot swap and then do a restart. edit: My server has been up since the last parity check. Powering down to switch the disks will be the first time the server has been offline since the last 5 error parity check. Edited August 4, 2017 by TheRefugee Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.