Identifying slot bad ram is in


Recommended Posts

Is there a command to identify which memory slot this error is referring or list all of them, I'm assuming its likely A1 of the 8 slots.

I'm not sure if it starts at channel #1 or channel #0 and might be A2 though (motherboard manual calls A1 channel A).
 

Jul 14 21:33:07 unRAID kernel: mce: [Hardware Error]: Machine check events logged
Jul 14 21:33:07 unRAID kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Jul 14 21:33:07 unRAID kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x109a826 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c0 socket:0 ha:1 channel_mask:2 rank:0)
Jul 15 04:21:17 unRAID kernel: mce: [Hardware Error]: Machine check events logged
Jul 15 04:21:17 unRAID kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Jul 15 04:21:17 unRAID kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x109a826 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c0 socket:0 ha:1 channel_mask:2 rank:0)
Jul 15 04:40:06 unRAID root: Fix Common Problems: Error: Machine Check Events detected on your server
Jul 15 11:31:07 unRAID kernel: mce: [Hardware Error]: Machine check events logged
Jul 15 11:31:07 unRAID kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Jul 15 11:31:07 unRAID kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x109a826 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c0 socket:0 ha:1 channel_mask:2 rank:0)
Jul 15 18:31:29 unRAID kernel: mce: [Hardware Error]: Machine check events logged
Jul 15 18:31:29 unRAID kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Jul 15 18:31:29 unRAID kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x109a826 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c0 socket:0 ha:1 channel_mask:2 rank:0)

 

Edited by Spartacus09
Link to comment

You are not alone in this problem.  What I would suggest you do is start by googling  CE memory scrubbing error on CPU_SrcID#0_Ha#1_Chan#1_DIMM#0

 

That will get you started.  Apparently, you already have your MB manual for reference.  You could try modifying the search parameters to include your MB spec and see if that gives you more specific help.  Probably, by apply logic and knowledge to your particular situation, some type of pattern will be become obvious.   (Nobody intentionally confused how the information is displayed in the syslog but each manufacturer seems to have their own slot numbering nomenclature ...) 

Link to comment
57 minutes ago, Frank1940 said:

You are not alone in this problem.  What I would suggest you do is start by googling  CE memory scrubbing error on CPU_SrcID#0_Ha#1_Chan#1_DIMM#0

 

That will get you started.  Apparently, you already have your MB manual for reference.  You could try modifying the search parameters to include your MB spec and see if that gives you more specific help.  Probably, by apply logic and knowledge to your particular situation, some type of pattern will be become obvious.   (Nobody intentionally confused how the information is displayed in the syslog but each manufacturer seems to have their own slot numbering nomenclature ...) 

Thanks so a guy here was receiving a channel 0 dimm 0 error also with a supermicro mobo sounds like SM labels start at 0: https://serverfault.com/questions/792225/how-to-find-which-memory-has-ce-error

 

Looks like its likely slot A2 then, ill give that shot.

What is the steps to clear that hardware error out of the logs so I can see if it comes back? (I updated unraid versions previously and it cleared it but didn't reoccur til a week or so later).

Edited by Spartacus09
Link to comment
On 7/16/2019 at 11:10 AM, Spartacus09 said:

What is the steps to clear that hardware error out of the logs so I can see if it comes back? (I updated unraid versions previously and it cleared it but didn't reoccur til a week or so later).

Restarting apparently clears the errors associated, or at least there are no errors now after replacing the ram in A1/A2.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.