DieFalse Posted August 16, 2021 Share Posted August 16, 2021 (edited) In an effort to keep the thread clean and on point, I adjusted this thread to the single issue it helped resolve, Corruption on the cache pool. I recently resolved an MCE event by replacing hardware. Everything tests perfect and I receive no errors, while likely unrelated I wanted to mention. I have for the last three mornings, awoke to find my server completely unreachable and offline frozen. Even IDRAC console yields no response and I have to warm-cycle the server to bring it back online. Any insight available would be helpful. Attached is the diagnostics pulled when it loaded this morning. gsa-diagnostics-20210816-0916.zip Edited August 20, 2021 by fmp4m Adjust for problem addressed Quote Link to comment
JorgeB Posted August 16, 2021 Share Posted August 16, 2021 Enable syslog mirror to flash then post that log after a crash. Quote Link to comment
DieFalse Posted August 18, 2021 Author Share Posted August 18, 2021 Here is the syslog, I was able SSH in this time, using the shutdown script failed. It appears I have some sort of corruption occurring so I definitely need your advice. syslog Quote Link to comment
JorgeB Posted August 18, 2021 Share Posted August 18, 2021 See here for the checksum errors, you should run a scrub a monitor the pool for the future, but the main issue appears to be the constant call traces, I can't see what's causing them, looks more hardware related (or your hardware doesn't like that kernel), you can try upgrading to v6.10 to see if the newer kernel helps, if it's the same it's likely hardware. Quote Link to comment
DieFalse Posted August 18, 2021 Author Share Posted August 18, 2021 Ok. I setup the script in user scripts and scheduled it, I have also started a scrub now that parity check finished (the reboot caused that and it has zero errors). I will report back after the scrub and when it finishes, I will then try the beta to see if that sorts the call traces. Quote Link to comment
DieFalse Posted August 19, 2021 Author Share Posted August 19, 2021 Ok, two uncorrectables, any assist on what to do to resolve / find what they are to fix? Quote Link to comment
JorgeB Posted August 19, 2021 Share Posted August 19, 2021 5 hours ago, fmp4m said: Ok, two uncorrectables Check syslog for name of the file(s), delete them or restore from backup. Quote Link to comment
DieFalse Posted August 19, 2021 Author Share Posted August 19, 2021 7 hours ago, JorgeB said: Check syslog for name of the file(s), delete them or restore from backup. Excellent I will do that. Loaded the RC1 for 6.10 and am having an issue accessing the WebUI so I will have to sort that first. (never loads) will need another thread. Quote Link to comment
DieFalse Posted August 20, 2021 Author Share Posted August 20, 2021 On 8/19/2021 at 9:32 AM, fmp4m said: Excellent I will do that. Loaded the RC1 for 6.10 and am having an issue accessing the WebUI so I will have to sort that first. (never loads) will need another thread. Ok, I deleted the files, re-ran scrub and it comes back with: Scrub started: Thu Aug 19 12:34:41 2021 Status: finished Duration: 6:06:46 Total to scrub: 34.36TiB Rate: 1.60GiB/s Error summary: no errors found BUT the script you linked to returns this: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 1137 [/dev/sdb1].generation_errs 0 So I still have corruption that scrub nor the log is showing, I think... advice? Quote Link to comment
JorgeB Posted August 20, 2021 Share Posted August 20, 2021 15 minutes ago, fmp4m said: BUT the script you linked to returns this: You have to reset the errors, it explains how in the linked FAQ entry. Quote Link to comment
DieFalse Posted August 20, 2021 Author Share Posted August 20, 2021 2 minutes ago, JorgeB said: You have to reset the errors, it explains how in the linked FAQ entry. Makes sense - sorry, didn't want to reset until I knew it was the right thing to do, and missed the "lifetime" note. Thanks again. The call traces apparently relate to nvidia modules, so I opened a thread specific to that. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.