Jump to content

Docker containers stop working / cache disk issues


berben
Go to solution Solved by JorgeB,

Recommended Posts

Hi,

 

I encountered an issue with docker apps. They just randomly crash and are not responding. It happened a couple of times but reboot fixes it for a while (a couple of days). Today it happened again but this time I had a little time to do some log browsing. I've found a couple of weird errors showing up in logs. Firstly, system log shows something like this (new entry every couple of seconds):

 

Oct 12 18:03:45 NAS kernel: pcieport 0000:00:1c.3: AER: Corrected error received: 0000:02:00.0

 

I looked up in devices manager and it points to this device:

 

[8086:7abb] 00:1c.3 PCI bridge: Intel Corporation Device 7abb (rev 11)

 

Also, /var/log/docker.log is spamed with:

 

containerd: creating temp mount location: mkdir /var/lib/docker/containerd/daemon/tmpmounts: input/output error
time="2023-10-12T07:05:09+02:00" level=warning msg="containerd config version `1` has been deprecated and will be removed in conta
inerd v2.0, please switch to version `2`, see https://github.com/containerd/containerd/blob/main/docs/PLUGINS.md#version-header"

 

My cache disk is SATA SSD and I don't have any external devices connected directly to PCI slots. Everything is fine again after reboot, except the "AER" error still being logged.

 

This PC was running fine for about 6 months, it started to be problematic a couple of days ago.

 

Any help would be appreciated.

Edited by berben
Link to comment
  • berben changed the title to Docker containers stop working / cache disk issues
8 minutes ago, berben said:

Do you mean RAM test? The one I run from boot menu? Do you think this might be RAM related?

 

Corruption of BTRFS file systems seems to be a common symptom of potential RAM problems.   

 

Note that if you boot in UEFI mode then you need get a more recent version from memtest86.com that can boot in UEFI mode.   It would not do any harm to use that version anyway even if you do boot Unraid in legacy mode.

Link to comment

Thanks for the explanation. That would be weird because the machine is almost brand new but of course anything could happen. I'm creating USB with the newest memtest and I'll leave it running for a couple of hours.

 

SSD drive is not new though, it's a couple of years old drive.

Edited by berben
Link to comment

 

1 hour ago, berben said:

Looks like you were right. Memtest started to throw errorrs after a couple of seconds. Now I'll try to figure our if this is RAM indeed (hopefully not CPU) and RMA it. Thanks a lot for the tips!

 

Make sure you are not overclocking the RAM (a XMP profile IS an overclock).   You can also try with less RAM sticks installed as sometimes it is the memory controller that cannot handle the number of installed RAM sticks without issues.   Sometimes the CPU has a max RAM speed it can handle as well so check the manual for that.

Link to comment
8 minutes ago, itimpi said:

 

 

Make sure you are not overclocking the RAM (a XMP profile IS an overclock).   You can also try with less RAM sticks installed as sometimes it is the memory controller that cannot handle the number of installed RAM sticks without issues.   Sometimes the CPU has a max RAM speed it can handle as well so check the manual for that.

 

I took out one of the sticks and there seems to be no errors this time. I suspect the one stick that I took out to be faulty because the system was running fine for a couple of months and nothing has changed in the setup during this period. I'm not overclocking explicitely but I think XMP profile was selected. After this second test I'll put back the second stick and disable the XMP profile.

Link to comment

Ok, I think I can safely say that one of the two RAM sticks is at fault here. I tried swapping RAM slots and no matter what one of the sticks is causing errors. No XMP, no overclock. I'll send it to RMA and run the PC with only one stick for now. Thanks once again for the tips.

Link to comment
17 minutes ago, berben said:

Ok, I think I can safely say that one of the two RAM sticks is at fault here. I tried swapping RAM slots and no matter what one of the sticks is causing errors. No XMP, no overclock. I'll send it to RMA and run the PC with only one stick for now. Thanks once again for the tips.

At least it seems to be a reproducible error that follows a particular stick.    I wonder if it worth cleaning the contacts on the suspect stick just in case?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...