100% CPU, Dockers unresponsive, VMs affected


Recommended Posts

Hello. I ran into weird issues twice already. I am running 6.9.2 and attempted to upgrade to 6.10.2 last week, but I found out that certain things did not work properly, so I reverted back to 6.9.2. Since then, it happened twice already where all of sudden, my CPU went up, dockers are unresponsive. For instance, when I navigate to dashboard, dockers are not listed, as it could not load the list.

I realize this is happening when I start getting all sorts of notifications about dockers not doing their role, etc.

Yesterday, when that first happened, I attempted to do graceful reboot, but I did not wait long enough, it seemed that it got stuck at "stopping emhttpd" so I did power it off. When it came back up, all seemed normal.

Today, I did graceful reboot, upon restart, behavior continues to be the same.

tower-zohor-diagnostics-20220613-1424.zip is taken after reboot

tower-zohor-diagnostics-20220613-1409.zip and tower-zohor-diagnostics-20220612-0730.zip were taken at reboot attempts while issue was ongoing.

 

Any idea?

Thanks for advice.

Link to comment

The diagnostics when you say the problem was ongoing show repairing messages in the syslog of the form:

Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: cmd 61/08:00:d0:b9:40/00:00:0c:00:00/40 tag 0 ncq dma 4096 out
Jun 13 05:01:02 Tower-Zohor kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: status: { DRDY }
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: cmd 61/08:08:00:c0:30/00:00:0d:00:00/40 tag 1 ncq dma 4096 out
Jun 13 05:01:02 Tower-Zohor kernel:         res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: status: { DRDY }

which is the cache drive.    Not sure why they are occurring though.

Link to comment

Thanks. I did run "Fix Common Problems" and it indicated that cache write is disabled for all my drives (cache, disk1 and parity). I enabled that. 

I also had few shares that were set to "cache only" which I set to "prefer". CPU utilization went down and all seems normal now.

My cache drive does not indicate any issues, nor does any other drive. Maybe that is just something that got corrupted after upgrade and then rollback.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.