Jump to content

Corruption on Cache Pool [SOLVED]


Recommended Posts

In an effort to keep the thread clean and on point, I adjusted this thread to the single issue it helped resolve, Corruption on the cache pool.

 

I recently resolved an MCE event by replacing hardware.  Everything tests perfect and I receive no errors, while likely unrelated I wanted to mention.

 

I have for the last three mornings, awoke to find my server completely unreachable and offline frozen.  Even IDRAC console yields no response and I have to warm-cycle the server to bring it back online.

 

Any insight available would be helpful. Attached is the diagnostics pulled when it loaded this morning.

gsa-diagnostics-20210816-0916.zip

Edited by fmp4m
Adjust for problem addressed
Link to comment

See here for the checksum errors, you should run a scrub a monitor the pool for the future, but the main issue appears to be the constant call traces, I can't see what's causing them, looks more hardware related (or your hardware doesn't like that kernel), you can try upgrading to v6.10 to see if the newer kernel helps, if it's the same it's likely hardware.

Link to comment

Ok.  I setup the script in user scripts and scheduled it,  I have also started a scrub now that parity check finished (the reboot caused that and it has zero errors).  I will report back after the scrub and when it finishes, I will then try the beta to see if that sorts the call traces.   

Link to comment
On 8/19/2021 at 9:32 AM, fmp4m said:

 

Excellent I will do that.  Loaded the RC1 for 6.10 and am having an issue accessing the WebUI so I will have to sort that first.  (never loads) will need another thread.

 

Ok,  I deleted the files, re-ran scrub and it comes back with: 

 

Scrub started: Thu Aug 19 12:34:41 2021

Status: finished

Duration: 6:06:46

Total to scrub: 34.36TiB

Rate: 1.60GiB/s

Error summary: no errors found

 

BUT the script you linked to returns this:

[/dev/sdb1].write_io_errs 0
[/dev/sdb1].read_io_errs 0
[/dev/sdb1].flush_io_errs 0
[/dev/sdb1].corruption_errs 1137
[/dev/sdb1].generation_errs 0

 

So I still have corruption that scrub nor the log is showing, I think... advice?

Link to comment
2 minutes ago, JorgeB said:

You have to reset the errors, it explains how in the linked FAQ entry.

 

Makes sense - sorry, didn't want to reset until I knew it was the right thing to do,  and missed the "lifetime" note.  Thanks again.  The call traces apparently relate to nvidia modules, so I opened a thread specific to that.

  • Like 1
Link to comment
  • DieFalse changed the title to Corruption on Cache Pool [SOLVED]

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...