100% CPU, Dockers unresponsive, VMs affected

stefan.tomko · June 13, 2022

Hello. I ran into weird issues twice already. I am running 6.9.2 and attempted to upgrade to 6.10.2 last week, but I found out that certain things did not work properly, so I reverted back to 6.9.2. Since then, it happened twice already where all of sudden, my CPU went up, dockers are unresponsive. For instance, when I navigate to dashboard, dockers are not listed, as it could not load the list.

I realize this is happening when I start getting all sorts of notifications about dockers not doing their role, etc.

Yesterday, when that first happened, I attempted to do graceful reboot, but I did not wait long enough, it seemed that it got stuck at "stopping emhttpd" so I did power it off. When it came back up, all seemed normal.

Today, I did graceful reboot, upon restart, behavior continues to be the same.

tower-zohor-diagnostics-20220613-1424.zip is taken after reboot

tower-zohor-diagnostics-20220613-1409.zip and tower-zohor-diagnostics-20220612-0730.zip were taken at reboot attempts while issue was ongoing.

Any idea?

Thanks for advice.

itimpi · June 13, 2022

The diagnostics when you say the problem was ongoing show repairing messages in the syslog of the form:

Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: cmd 61/08:00:d0:b9:40/00:00:0c:00:00/40 tag 0 ncq dma 4096 out
Jun 13 05:01:02 Tower-Zohor kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: status: { DRDY }
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: cmd 61/08:08:00:c0:30/00:00:0d:00:00/40 tag 1 ncq dma 4096 out
Jun 13 05:01:02 Tower-Zohor kernel:         res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 13 05:01:02 Tower-Zohor kernel: ata1.00: status: { DRDY }

which is the cache drive. Not sure why they are occurring though.

stefan.tomko · June 13, 2022

Thanks. I did run "Fix Common Problems" and it indicated that cache write is disabled for all my drives (cache, disk1 and parity). I enabled that.

I also had few shares that were set to "cache only" which I set to "prefer". CPU utilization went down and all seems normal now.

My cache drive does not indicate any issues, nor does any other drive. Maybe that is just something that got corrupted after upgrade and then rollback.

100% CPU, Dockers unresponsive, VMs affected

Recommended Posts

stefan.tomko

Link to comment

itimpi

Link to comment

stefan.tomko

Link to comment

Join the conversation