[SOLVED] Encountered a hardware error 'mcelog' follows

skippy4hammypc · September 1, 2020

I've Been setting up some docker applications when the server started to act funny. lost communication with all my dockers and the main unraid ui was broken. was able to log out and back in to fix that issue and restart the array and BAM everything is back. or so i thought, about 20 or so minutes pass and the same thing happened again. any help would be great.

jarvlos-diagnostics-20200901-0143.zip

trurl · September 1, 2020

Why do you have 500G allocated to docker.img?

20G should be more than enough. You must have one or more applications misconfigured and writing into docker.img instead of to mapped host storage.

I am running 17 dockers and they are using less than half of 20G docker.img. But I see you are already using much more than 20G even after restarting.

Making docker.img larger won't fix anything, it will just make it take longer to fill.

The usual reason for filling docker.img is specifying paths in your applications that don't exactly match the container path in your mappings. Common mistakes are specifying different upper/lower case (linux is case-sensitive) than in your container mappings or specifying a relative path (a path not beginning in /).

Also, your system and domains shares are on the array instead of all on cache where they belong, like your appdata.

appdata, domains, system should be all on cache and set to stay on cache. If you have these on the array, your dockers / VMs will have their performance impacted by slower parity writes, and they will keep array disks spinning since these files will always be open.

Do you actually have any VMs?

Which dockers do you use?

skippy4hammypc · September 2, 2020

Okay I've made some changes and fixed filling my docker image.😁 reconfigured the offender no longer filling image 🥳

That was possibly a unrelated problem I was working on at the time of the initial reason for this post, the hardware error. I've looked in the main syslog and all I can find in there is this

LN# 2336 - Aug 12 02:26:22 Jarvlos kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)

And a couple or other entries after that but no mention of anything I know of to help that can read into, other than maybe one (or more) of my DIMM's might be on the way out.

Not super knowledgeable when it comes to ECC memory, so I would love to be wrong and this is just normal.

JorgeB · September 2, 2020

That looks like a correctable memory error, check the event log on the board BIOS, it might have more info and identify the slot, e.g., this is from one of my Supermicro boards:

Capture3.PNG.3f27974949bd05343110146344b65c8d.PNG

trurl · September 2, 2020

8 hours ago, skippy4hammypc said:

Okay I've made some changes and fixed filling my docker image.😁 reconfigured the offender no longer filling image 🥳

If you want post diagnostics and I will see if there is more that needs to be done on this.

skippy4hammypc · September 2, 2020

Sure thing. let me know if there is any more.

jarvlos-diagnostics-20200902-1618.zip

skippy4hammypc · September 2, 2020

15 hours ago, JorgeB said:

That looks like a correctable memory error, check the event log on the board BIOS, it might have more info and identify the slot, e.g., this is from one of my Supermicro boards:

Unfortunately I am using a old hp Z600 workstation I found online for a good price. I do however have a new server that is not in production that will be replacing the Z600. And it's a supermicro cse-847 w/ X9DRD-7LN4F-JBOD, Just need to source the active coolers that I need to keep it cool and quiet after i replace the 7k/rpm fans with Noctua IPPC 120/140mm Fans. Provided that I don't stretch my wallet to thin.😁

[SOLVED] Encountered a hardware error 'mcelog' follows

Recommended Posts

skippy4hammypc

Link to comment

trurl

Link to comment

skippy4hammypc

Link to comment

JorgeB

Link to comment

trurl

Link to comment

skippy4hammypc

Link to comment

skippy4hammypc

Link to comment

Join the conversation