Continuous System Crash while rebuilding after HDD swap


Go to solution Solved by Rayce185,

Recommended Posts

Hello all

 

My System has been running well for almost half a year now with 6x3TB and 6x4TB, with two of the 4TB running in parity.

 

As I want to gradually upgrade the 3TB to 4TB too, I recently purchased 2x4TB. I powered down the system, swapped the first drive, powered back up, rebuilt with Doocker and VM on with no issues.230305_UNRAID_Array.thumb.PNG.9505af71317a5750e1bfd92e6bfd903a.PNG

 

After the first  rebuild I powered back down, swapped the second drive, powered back up and began the second rebuild.

 

This time the system crashed after being about 3 hours / 40% in. Trying to rebuild after the crash gave an immediate crash again.

 

So I unmounted the second disk, deleted the partition, precleared the disk and began another rebuild, this time without docker or VM. Before hitting 2% the system crashed again. I am not able to do any rebuild, emulation or parity check without the system crashing not even 5 minutes after starting the array.

 

After watching carefully I did notice a CPU error popping up once or twice before crashing, but this doesn't always occur:

 

image.thumb.png.55fe3e6a0e4526a016af0bac6e73ceee.png

 

Shortly before I started "Fix Common Problems", but the log entries are unrelated.

 

Attached are the diagnostics and the latest syslog before the crash.

 

unraid-syslog-20230305-2104.zipunraid-diagnostics-20230305-2159.zip

 

Can someone please help me how to get the system running again?

 

Thanks.

 

 

EDIT: Oh yeah, Syslog is also logged to flash. Here's a copy of it:

syslog

Edited by Rayce185
Added content
Link to comment
  • Solution
On 3/6/2023 at 10:43 AM, JorgeB said:

Try again after the reboot, if you get the same errors there's possibly a hardware problem.

As said since the System crashes it was rebooting anyway.

 

But I have found the reason for the crashing:

 

I had a third GPU connected via PCIe x1 riser. It seems that either that setup or the fact that so many PCIe lanes are occupied was giving the entire system instability. After removing the x1 riser the system is running stable again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.