[SOLVED] Encountered a hardware error 'mcelog' follows


Recommended Posts

I've Been setting up some docker applications when the server started to act funny. lost communication with all my dockers and the main unraid ui was broken. was able to log out and back in to fix that issue and restart the array and BAM everything is back. or so i thought, about 20 or so minutes pass and the same thing happened again. any help would be great.

jarvlos-diagnostics-20200901-0143.zip

Link to comment

Why do you have 500G allocated to docker.img?

 

20G should be more than enough. You must have one or more applications misconfigured and writing into docker.img instead of to mapped host storage.

 

I am running 17 dockers and they are using less than half of 20G docker.img. But I see you are already using much more than 20G even after restarting.

 

Making docker.img larger won't fix anything, it will just make it take longer to fill.

 

The usual reason for filling docker.img is specifying paths in your applications that don't exactly match the container path in your mappings. Common mistakes are specifying different upper/lower case (linux is case-sensitive) than in your container mappings or specifying a relative path (a path not beginning in /).

 

Also, your system and domains shares are on the array instead of all on cache where they belong, like your appdata.

 

appdata, domains, system should be all on cache and set to stay on cache. If you have these on the array, your dockers / VMs will have their performance impacted by slower parity writes, and they will keep array disks spinning since these files will always be open.

 

Do you actually have any VMs?

 

Which dockers do you use?

 

 

Link to comment

Okay I've made some changes and fixed filling my docker image.😁 reconfigured the offender no longer filling image 🥳

That was possibly a unrelated problem I was working on at the time of the initial reason for this post, the hardware error. I've looked in the main syslog and all I can find in there is this

 

LN# 2336 - Aug 12 02:26:22 Jarvlos kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)

 

And a couple or other entries after that but no mention of anything I know of to help that can read into, other than maybe one (or more) of my DIMM's might be on the way out.

Not super knowledgeable when it comes to ECC memory, so I would love to be wrong and this is just normal.

Link to comment
15 hours ago, JorgeB said:

That looks like a correctable memory error, check the event log on the board BIOS, it might have more info and identify the slot, e.g., this is from one of my Supermicro boards:

 

Capture3.PNG.3f27974949bd05343110146344b65c8d.PNG

 

 

 

 

Unfortunately I am using a old hp Z600 workstation I found online for a good price. I do however have a new server that is not in production that will be replacing the Z600. And it's a supermicro cse-847 w/ X9DRD-7LN4F-JBOD, Just need to source the active coolers that I need to keep it cool and quiet after i replace the 7k/rpm fans with Noctua IPPC 120/140mm Fans. Provided that I don't stretch my wallet to thin.😁

Link to comment
  • skippy4hammypc changed the title to [SOLVED] Encountered a hardware error 'mcelog' follows

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.