mraneri Posted October 9, 2019 Share Posted October 9, 2019 (edited) Update: After the parity rebuild the old parity drive failed. After replacing and rebuilding the second drive there are 0 errors. I have a small server with 3x3TB data drives, 1x3TB parity, and 2x240GB SSD cache. It has run fine for the past couple years, but last week one of the data drives failed. It showed SMART failures, UnRaid kicked it out of the array and started emulating it. The only spare I had was a 4TB drive so I had to replace the parity drive. It cleared preclear so I set it as parity and moved the old parity to the array and let UnRaid start copying the parity and rebuilding the data drive. Everything went smooth, parity check after the rebuild returned 0 errors. This morning I get a notification that a parity check has started, it wasn't scheduled and I don't know what triggered it. I let it run its course and it finished with 183141001 errors. This is the first time in the two years the server has been running that I have had a parity check with any errors and I am at a loss for what happened - or what I should do next. As of right now everything on the server seems to be working normally. Thank you for any insight or advice. tower-diagnostics-20191009-2213.zip Edited October 19, 2019 by mraneri Quote Link to comment
Frank1940 Posted October 10, 2019 Share Posted October 10, 2019 your server restarted at this morning. here is the first line in the syslog. Oct 9 02:28:12 Tower kernel: microcode: microcode updated early to revision 0x27, date = 2019-02-26 You are also getting segfaults near the end of the syslog. I believe these are usually memory related. You might want to run memtst (from the boot menu) unless you have ECC memory. I would also double check that you didn't unlock any of the memory sticks when you were doing the drive changes. 1 Quote Link to comment
JorgeB Posted October 10, 2019 Share Posted October 10, 2019 7 hours ago, mraneri said: it wasn't scheduled and I don't know what triggered it. It was triggered by an unclean shutdown: Oct 9 02:29:05 Tower emhttpd: unclean shutdown detected I agree with Frank1940 that you should run memtest. 1 Quote Link to comment
mraneri Posted October 10, 2019 Author Share Posted October 10, 2019 Not ECC memory. I'm running Memtest now and will report back. Thank you both. Quote Link to comment
mraneri Posted October 10, 2019 Author Share Posted October 10, 2019 (edited) Quote Not ECC memory. I'm running Memtest now and will report back. Thank you both. Almost seven hours, four passes, zero errors. Edited October 10, 2019 by mraneri Quote Link to comment
Frank1940 Posted October 10, 2019 Share Posted October 10, 2019 Let it run for 24 hours... Quote Link to comment
JorgeB Posted October 11, 2019 Share Posted October 11, 2019 With so many sync errors most likely something happened during the disk replacement, but difficult to guess what without the logs covering that, still run memtest for 24H and if no errors then run another parity check to see if you get the same number of errors. Quote Link to comment
mraneri Posted October 12, 2019 Author Share Posted October 12, 2019 A little over 24h of memtest came back with zero errors. I re-ran the parity check and came back with the same results. Aside from the sync errors everything looks and functions normally. tower-diagnostics-20191012-1120.zip Quote Link to comment
JorgeB Posted October 12, 2019 Share Posted October 12, 2019 Likely the problem was during the replacement, but without the diags best option now is probably to run a correcting check, but there could be corruption on the rebuilt disk, if you have checksum of your files run a check. Quote Link to comment
mraneri Posted October 12, 2019 Author Share Posted October 12, 2019 28 minutes ago, johnnie.black said: Likely the problem was during the replacement, but without the diags best option now is probably to run a correcting check, but there could be corruption on the rebuilt disk, if you have checksum of your files run a check. That is with a write corrections checked. I just learned about making checksums for files doing research for this and installed the file integrity plugin. Is there anything else I can do from here? Quote Link to comment
JorgeB Posted October 12, 2019 Share Posted October 12, 2019 11 hours ago, mraneri said: Is there anything else I can do from here? Without previous diags or checksums can't think of anything else, unless you want to check the files on the rebuilt disk one by one. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.