multiple problems with my server (mostly docker) 9.6.2

PrisonMike · January 23, 2023

Hello, unfortunately I am writing because I am having several problems with my unraid server. About a week ago, I was checking the GUI and noted that the cpu was pinned at %100. I waited for a day and checked again. It was still at %100.

I tried rebooting and noticed that the cpu wasnt pinned when the docker service wasnt running. So I concluded there was maybe a problem with a docker. I updated some dockers that needed updating but the problem still persisted. In addition I was now not able to access some of my dockers such as sonarr, radarr, prowlarr. But some of my other dockers like audiobookshelf and mealie worked fine.

Continuing on, I decided to update unraid to 6.11.5 from 6.9.2. Once I updated the OS nothing different happened. I normally use unraid with a static IP but I noticed that somehow unraid was reporting a mac address to my router (unifi) that the mac address was different but it was still using the same IP (192.168.1.5). So it seemed that unraid was somehow using two different mac addresses and they were both trying to use 192.168.1.5. So I switched to automatic DHCP un unraid and assigned the origional first mac address that I know is the mac for the adapter as 192.168.1.5. That seemed to work and I'm not having the address problem.

However, I still could not access the dockers that I needed. So I figured I would delete one and try to reinstall it with unraid backup/restore. That didn't work so i resorted to deleting the vdisk. After deleting the vdisk I restarted the server. After a restart i got an error that the docker service couldnt be started. I looked for other solutions and some said they get this error when the vdisk is too full, so increased the size to 100gb (was 50gb). That didnt work either. I deleted the vdisk (btrfs) several times and cycled the server to no avail. I also tried switching to xfs, which didnt work either.

In a final hail mary I rolled the server back to 6.9.2 via the GUI and I am still stuck. I also tried deleting the vdisk and switching to xfs and that didnt work.

So I guess the first issue is figuring out why the docker service wont start. Once I figure that out, I can see what was pinning my CPU if even it is docker.

Here is a link to the diagnostics: https://www.mediafire.com/file/n2gh16l3fb53xbu/poseidon-diagnostics-20230121-1955.zip/file

Edited January 24, 2023 by PrisonMike

PrisonMike · January 25, 2023

Anyone have any suggestions? Did I post the wrong diagnostic? I'm really not sure what to do at this point is my server even salvageable?

Edited January 25, 2023 by PrisonMike

trurl · January 25, 2023

attach diagnostics to your NEXT post in this thread

PrisonMike · January 27, 2023

Hello, here is a copy of my diagnostics attached to the post.

poseidon-diagnostics-20230127-1130.zip

trurl · January 27, 2023

Your system share has files on the array. Ideally, appdata, domains, system shares should be on fast pool (cache) and set to stay there so Docker/VM performance isn't impacted by slower parity, and so array disks can spin down since these files are always open.

You can worry about that later though.

You have completely filled logspace, so nothing has been logged in a few days.

But what logs you do have is showing problems communicating with cache.

Do you not see Errors in Main - Pool Devices?

Shutdown, check all connections, SATA and power, both ends, including splitters. Reboot will clear logs.

PrisonMike · January 31, 2023

Hello @trurl, thank you for your response. I do not see any errors in pool devices. It looks like my log file keeps filling up after a few days. Here is a diagnostic file, request about 8 hours after a reboot.

poseidon-diagnostics-20230130-2030.zip

trurl · January 31, 2023

Might be cache2 is going bad. Run extended SMART self-test on cache2

PrisonMike · February 3, 2023

Hello, after trying to run a SMART extended test on CACHE 2 I get the following error "Errors occurred - Check SMART report" please find the smart report attached. Thanks for your assistance.

poseidon-smart-20230202-1944.zip

trurl · February 3, 2023

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       00%      8922         830987072

replace

PrisonMike · February 6, 2023

On 2/2/2023 at 8:26 PM, trurl said:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       00%      8922         830987072

replace

Hello, thanks for the help! Since I have a pool of two cache drives, can I remove the bad one and run the server off of a single cache drive until I can get a new drive?

JorgeB · February 6, 2023

You can.

multiple problems with my server (mostly docker) 9.6.2

Recommended Posts

PrisonMike

Link to comment

PrisonMike

Link to comment

trurl

Link to comment

PrisonMike

Link to comment

trurl

Link to comment

PrisonMike

Link to comment

trurl

Link to comment

PrisonMike

Link to comment

trurl

Link to comment

PrisonMike

Link to comment

JorgeB

Link to comment

Join the conversation