Since upgrade 6.12.6 unraid server freezes and have to manually poweroff when docker is enabled


Recommended Posts

Hello folks,

 

Been pulling my hair out and finally was able to get at least systemlogs during a freeze so I can provide those. Once I upgraded to 6.12.6 and I redid my two cache pools as zfs I noticed my box freezing after a week or two, eventually it moved up to after a single day my unraid server becomes inaccessible and has to be manually shutdown.

 

After trial and error I found if I do not enable to docker service, my unraid box never has this problem. At that point I thought it may be one of the containers that perhaps moves or edits lots of data like unmanic so I disabled almost all containers and still the problem occurred.

 

I then noticed a trend where one of the CPU count becomes pegged at 100% before a freeze happens so I ran top during that time and saw the shfs command taking up 97% CPU. and saw some random entries about zfs that were tied to shfs in logs prior to the crash. So I wiped out the cache pools remade them as btrfs which has worked just fine for the past year and tried again. It didn't work so I deleted the docker img as well and went ahead and rebuilt all my containers again from scratch using the previous templates but new downloaded containers

 

I was stubborn to try and find what this was though so I had left an ssh session (not a web terminal this time) open and when I noticed my unraid web gui was inaccessible I tested my containers, I dont remember all the ports but there all now inaccessible except for one containers web page I had open which was my filezilla container. it is open and usable but it no longer can connect to any ftp servers anywhere. My SSH session to my surprise is still working as well. However, nothing from either my onboard aquantic 10gb card and its IP nor my mellanox 40gb cards ip(they are on different vlans) are accessible. I now get a 500 interenal server error when i try to view my gui so thats ....progress? Checking top out in my ssh session I see update_3 kcompactd0 and sadc taking 98-100%cpu. I cant run the diagnostics command it just says starting diagnostics collection and does nothing, even after a few hours I ctrl-c out of it and check /boot/logs and no new diagnostics are in there. Docker command does not work either. I even tried to kill the docker pids by hand and they kept respawning(I now should have thought to stop the service itself but its too late now) I copied off the syslog using scp in my session with mobaxterm so I have it available and I just tried to ping www.google.com because i realize i forgot to ping things and it is now frozen. I can't cancel the process or access anything anymore so that is done. I tried runnign a powerdown as well prior to this and the machine never shutdown i still had access to it 30 minutes after i sent that command and prior to trying to ping.

 

I hope someone can help me figure out whats going on all my gear for this box is under a year old and brand new except for a single drive, my hba card, and my mellanox 40gb card. They we're all functioning perfectly prior to the update.

 

Attached is the syslog i grabbed and the specs of my box. The system was up for 2 days and i turned docker on  at the 5th in the afternoon so as far as i can assume you can ignore things from 1/4 and only 1/5 and up matter.

 

 

Intel i5-11600K

gigabyte z590 Auros Master MB

Corsair Vengence 3200 DDR4 4x16gb

HP Mellanox 544QSFP 661685-001 MCX354A-FCBT (used from ebay)

HP H220 (=LSI 9207-8i) (used from ebay)

2x 1TB Hynix platinum NVME

4x14 tb Ironwolf Pro Sata

1x14 tb HGST WD Ultrastar DC HC530 14TB (refurbished/renewed from amazon) Sata

2x512 gb Teamgroup AX-2 SSD Sata

syslog.txt

Edited by PinkyD
clarify used parts, updated my ram spec, wrong size dimms originally
Link to comment
15 hours ago, JorgeB said:

First call trace is about the iGPU, try blacklisting the i915 driver.

I haven't had any problems with it in the previous version that is so odd.  now that you have given me a clue I saw the thread below, is that what it does? it seems similar to the other ways i saw it blacklisted. I started at 6.11 I believe and ive been upgrading as they have become available. Like I said it worked flawlessly the last entire year prior to this new update

https://forums.unraid.net/topic/143542-6123-using-intel-igpu-eventually-crashes-unraid-i915-related/#comment-1294347

 

Either way ill give both a shot.

Link to comment
9 minutes ago, PinkyD said:

Leave my AI at the door ?

Oh, I looked this up, you think I used chatgpt or something to troubleshoot the problem? That's a negative ghost rider. But, i'm also confused why you used the phrase, not sure if you used it correctly or the couple definitions I looked up are wrong. I'm not offering a solution just explaining my troubleshooting so far and that I have no solution...

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.