October 9, 20196 yr Update: After the parity rebuild the old parity drive failed. After replacing and rebuilding the second drive there are 0 errors. I have a small server with 3x3TB data drives, 1x3TB parity, and 2x240GB SSD cache. It has run fine for the past couple years, but last week one of the data drives failed. It showed SMART failures, UnRaid kicked it out of the array and started emulating it. The only spare I had was a 4TB drive so I had to replace the parity drive. It cleared preclear so I set it as parity and moved the old parity to the array and let UnRaid start copying the parity and rebuilding the data drive. Everything went smooth, parity check after the rebuild returned 0 errors. This morning I get a notification that a parity check has started, it wasn't scheduled and I don't know what triggered it. I let it run its course and it finished with 183141001 errors. This is the first time in the two years the server has been running that I have had a parity check with any errors and I am at a loss for what happened - or what I should do next. As of right now everything on the server seems to be working normally. Thank you for any insight or advice. tower-diagnostics-20191009-2213.zip Edited October 19, 20196 yr by mraneri
October 10, 20196 yr Community Expert your server restarted at this morning. here is the first line in the syslog. Oct 9 02:28:12 Tower kernel: microcode: microcode updated early to revision 0x27, date = 2019-02-26 You are also getting segfaults near the end of the syslog. I believe these are usually memory related. You might want to run memtst (from the boot menu) unless you have ECC memory. I would also double check that you didn't unlock any of the memory sticks when you were doing the drive changes.
October 10, 20196 yr Community Expert 7 hours ago, mraneri said: it wasn't scheduled and I don't know what triggered it. It was triggered by an unclean shutdown: Oct 9 02:29:05 Tower emhttpd: unclean shutdown detected I agree with Frank1940 that you should run memtest.
October 10, 20196 yr Author Not ECC memory. I'm running Memtest now and will report back. Thank you both.
October 10, 20196 yr Author Quote Not ECC memory. I'm running Memtest now and will report back. Thank you both. Almost seven hours, four passes, zero errors. Edited October 10, 20196 yr by mraneri
October 11, 20196 yr Community Expert With so many sync errors most likely something happened during the disk replacement, but difficult to guess what without the logs covering that, still run memtest for 24H and if no errors then run another parity check to see if you get the same number of errors.
October 12, 20196 yr Author A little over 24h of memtest came back with zero errors. I re-ran the parity check and came back with the same results. Aside from the sync errors everything looks and functions normally. tower-diagnostics-20191012-1120.zip
October 12, 20196 yr Community Expert Likely the problem was during the replacement, but without the diags best option now is probably to run a correcting check, but there could be corruption on the rebuilt disk, if you have checksum of your files run a check.
October 12, 20196 yr Author 28 minutes ago, johnnie.black said: Likely the problem was during the replacement, but without the diags best option now is probably to run a correcting check, but there could be corruption on the rebuilt disk, if you have checksum of your files run a check. That is with a write corrections checked. I just learned about making checksums for files doing research for this and installed the file integrity plugin. Is there anything else I can do from here?
October 12, 20196 yr Community Expert 11 hours ago, mraneri said: Is there anything else I can do from here? Without previous diags or checksums can't think of anything else, unless you want to check the files on the rebuilt disk one by one.
Archived
This topic is now archived and is closed to further replies.