October 21, 20241 yr Community Expert I ran an error-correcting parity sync yesterday that resulted in ~750 errors. I am now running a non-error-correcting sync, and the error count is up to 16. I have not yet run a memtest, but I don't think it is that. I think it's probably bad SATA cables. Attached are diags I downloaded while running the test. pterodactyl-diagnostics-20241021-1447.zip Edited October 21, 20241 yr by marionza Confusing title
October 22, 20241 yr Community Expert You appear to be using ECC RAM, so theoretically it should not be bad RAM, but unlikely that it is a cable problem, could be a disk or controller issue, this assuming ECC is actually working.
October 22, 20241 yr Author Community Expert Hi @JorgeB! Do the logs show which specific disk is getting errors? If so, I could try replacing the cables to that drive. Also, I noticed one of the disks in my z-pool is reporting reallocated sectors. I am backing up the z-pool to the array using a cron job. Could this lead to parity sync errors? I am open to any suggestions you have and can do a memtest in the meantime. The attached diags were downloaded after completing a non-correcting sync with 759 errors. pterodactyl-diagnostics-20241022-1208.zip Edited October 22, 20241 yr by marionza
October 23, 20241 yr Community Expert 12 hours ago, marionza said: Do the logs show which specific disk is getting errors? Nope, and like mentioned, extremely unlike that it's a cable problem, it could be a disk problem. 12 hours ago, marionza said: Could this lead to parity sync errors? Should not.
October 23, 20241 yr Community Expert Solution 13 hours ago, marionza said: I am open to any suggestions you have and can do a memtest in the meantime. I would retest with half the RAM, you need to run two parity checks, since the first correcting check can still find errors, if the 2nd ones still finds some, try the other sticks, that will basically rule out a RAM issue.
October 23, 20241 yr Author Community Expert Got it. I currently have 4x16GB sticks of RAM. If I understand you correctly, I will do the following: Remove two RAM sticks. Run an error-correcting sync, which will result in errors. Run another check. If no errors, culprit is removed RAM. If errors... Swap two RAM sticks. Run an error-correcting sync, which will result in errors. Run another check. If no errors, culprit is other two RAM sticks. All of my array disks are presenting as healthy. How do I go about diagnosing a disk or controller problem? Or should we cross that bridge when we get there?
October 23, 20241 yr Community Expert Correct, if the problem is a disk, it's a pain to detected, this is rare but it happened before, basically you need to retest after removing one disk at a time.
October 25, 20241 yr Author Community Expert Good news! I am getting 0 errors after removing half of the RAM sticks. Should I consider these unusable?
October 26, 20241 yr Community Expert You can retest with that half, to confirm if it's really bad RAM or a bard issue with more memory load.
October 27, 20241 yr Author Community Expert So to test, put the RAM back in and run another parity sync?
October 27, 20241 yr Community Expert Put the 2 back and remove the 2 good ones u tested good. Then run it. If errors pop back up, then it's the memory or channel (MB)
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.