Jump to content

Server hangs / crashes during Parity Check after installing new parity drive


Go to solution Solved by JorgeB,

Recommended Posts

Hi everyone.  Long time lurker...

My server keeps either hanging or crashing (I don't know which) during parity checks.  The system becomes completely locked until I do a power cycle, and of course nothing shows in the logs.  I enabled the syslog server to catch the failure, but i don't see the parity check starting at all.

Scheduled parity check started July 2 at 1 am, but i see nothing logged until i rebooted July 3 at 16:22. 

Parity check started automatically again when i rebooted, tried to cancel and everything hung.

Hard rebooted, paused then cancelled the check.  Manually initiated a check with correction disabled and everything hung up again within a few minutes.

 

I recently upgraded my parity drive to a 12TB.  The rebuild went fine, no problems.  I don't think this is a PSU issue (everything was perfect prior to the upgrade) but again, I cant see anything in the logs.

 

I've attached my diags, along with the syslog server logs that capture the restarts, if someone could look and give me some pointers?

I appreciate it!

Thank you so much.

 

 

pebbles-diagnostics-20230703-1718.zip

Edited by andyberry
(spell check)
Link to comment

Hi @JorgeB, That's correct.

I actually did the rebuild twice without problems. 1st when I replaced the parity drive, then a 2nd time after removing a smaller data drive and making a new config for the array. 

Parity checks hung or crashed the system each time.

 

Its been a while since I've bothered to inspect logs after a parity check, but I thought the beginning and end of the check actually got logged in syslog, so I am surprised to NOT see the start captured.  The crash isn't immediate; sometimes it will run for hours before the system becomes inaccessible.

 

I have also tried running the parity check in safe mode with all dockers and VMs stopped.  Same results.

 

 

Link to comment

Thanks @JorgeB.  I guess I'll open the case again and see if anything jumps out at me.

I did just update to 6.12 and initiated another parity check.  It hung again quickly:

 

Here are my last log entries before everything locked up:

Jul 4 07:31:09 Pebbles kernel: fbcon: i915drmfb (fb0) is primary device

Jul 4 07:31:09 Pebbles kernel: Console: switching to colour frame buffer device 210x65

Jul 4 07:31:09 Pebbles kernel: i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device

Jul 4 07:32:03 Pebbles kernel: mdcmd (40): check nocorrect

Jul 4 07:32:03 Pebbles kernel: md: recovery thread: check P ...

Jul 4 07:32:31 Pebbles flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update

Jul 4 07:34:12 Pebbles kernel: DMAR: DRHD: handling fault status reg 3

Jul 4 07:34:12 Pebbles kernel: DMAR: [DMA Read NO_PASID] Request device [06:00.0] fault addr 0x0 [fault reason 0x06] PTE Read access is not set

Link to comment

 

I had 1 of the array drives on the Asmedia card, so moved it to the onboard SATA controller.  Cache and unassigned are now on the Asmedia.

I don't want to jinx it but I'm at 1:40 / 7% of a parity check without a crash...

I will let you know whether it succeeds.

Thank you so much for your help.  This looks promising.

-A

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...