Jump to content

Random Lockups


Recommended Posts

Hello All,

 

It seems I can't get away from having issues. Recently my server has started to randomly lockup. I can't unmount the drives, can't load the docker page and I can't exports my diagnostics when it happens. It gets to my Cache drive then just sits there and never moves past it. I'm running 6.9.2 and I've attached my diag from my last reboot.

 

Let me know if there's something else that's needed.

tower-diagnostics-20210812-0026.zip

Link to comment

There were issues with the NVMe device just before the crash, though unclear if the crash was related:

 

Aug 13 02:26:53 Tower kernel: nvme nvme0: frozen state error detected, reset controller
Aug 13 02:26:53 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1595697920 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
Aug 13 02:26:54 Tower kernel: pcieport 0000:00:01.0: AER: Root Port link has been reset
Aug 13 02:26:54 Tower kernel: pcieport 0000:00:01.0: AER: device recovery successful

 

Let the syslog server enable and see if the same happens again before next crash.

Link to comment
14 hours ago, JorgeB said:

There were issues with the NVMe device just before the crash, though unclear if the crash was related:

 

Aug 13 02:26:53 Tower kernel: nvme nvme0: frozen state error detected, reset controller
Aug 13 02:26:53 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1595697920 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
Aug 13 02:26:54 Tower kernel: pcieport 0000:00:01.0: AER: Root Port link has been reset
Aug 13 02:26:54 Tower kernel: pcieport 0000:00:01.0: AER: device recovery successful

 

Let the syslog server enable and see if the same happens again before next crash.

Here it is again.

syslog

Link to comment

Same thing, though this time the crash to a little longer after the errors:

 

Aug 13 17:54:39 Tower kernel: nvme nvme0: frozen state error detected, reset controller
Aug 13 17:54:39 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1528760504 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 0
Aug 13 17:54:39 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1528760760 op 0x0:(READ) flags 0x80700 phys_seg 18 prio class 0
Aug 13 17:54:40 Tower kernel: pcieport 0000:00:01.0: AER: Root Port link has been reset
Aug 13 17:54:40 Tower kernel: pcieport 0000:00:01.0: AER: device recovery successful
Aug 13 18:23:29 Tower kernel: microcode: microcode updated early to revision 0x71a, date = 2020-03-24

 

Still, this suggest it could be related, try removing the NVMe device or using a a different one if available (or it in a different slot).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...