[SOLVED] Boot looping after crash


Recommended Posts

I'm pretty new to Unraid, having only set up my instance last week. All was going well, and I was well in to the process of copying data from my old NAS on to my new build when today I noticed robocopy was throwing a load of errors. The share had unmounted and the web UI was inaccessible. At this point I assumed something catastrophic had happened to took the usual step of rebooting the box. I couldn't even ping the box when it was in this state, it appeared to be completely dead (fans were still spinning, so no power loss).

 

To cut a very long story short, I couldn't get it to boot in to a reliable, stable state after that. It's only booted with the web UI accessible once and crashed after a few minutes. For the most part it's boot looping and not throwing any errors. I can see it going through the boot steps (bz image ok, bzroot ok), then it prints a load of white text that scrolls too fast for me to read and reboots. In the half a second the text is visible I can't see any obvious errors.

 

The build is a Ryzen 3600 on an Asus B450M-A motherboard with 16GB of DDR4 (2x 8TB). It's in a Node 804. I was running it headless but have put in a GPU for trying to get to the bottom of this issue (which means I've had to unplug my HBA card as there's only one PCIe slot on the motherboard). I've got a 550w PSU which should be plenty for the hardware (I'm not normally running a GPU, but I've tested with all disks and GPU disconnected, so I don't think it's PSU related as it fails with extremely low draw).

 

I've tried booting in safe mode with and without the GUI. I've rebuilt the USB drive with Unraid on it, both copying the config file and not. I've tried removing both sticks of memory in case it was faulty RAM. I've tested the memory and CPU with UBCD, with everything passing (interestingly, if I try and run memtest from the Unraid boot menu it immediately reboots where I can run it fine from UBCD). I've also applied all the AMD fixes I could find (power management, turning off overclocking, updating the motherboard to the latest version, setting idle power etc.). I've tried a different USB drives, in different ports.

 

The hardware (mobo, CPU, memory) was taken out of another machine where it worked fine. This in combination with the hardware tests all passing really makes we wonder what's going as it appears to not be a hardware issue.

 

Am I missing anything obvious? This was working fine a day ago but now will not boot no matter what I try. I'm at the point where I'm going to have to either start replacing the motherboard and CPU (I have a CPU I could use, but no motherboard that would fit in the case and swapping the CPU will be a massive PITA as it's in another machine) or just giving up.

Link to comment

I've done a bit more testing on this, I created an ubuntu live disk and tried to boot to that and, boot loop. It (very briefly before boot looping) displays the error in the attachment (sorry for the rubbish quality).

 

This appears to be a CPU error, can someone confirm? It could still be a memory issue I guess, but I've tried both sticks separately in multiple slots with no change (i.e. still boot loops) and run memtest for several hours with no errors. I did also run CPU testing and didn't return any errors so I'm really uncertain, with multiple data points contradicting each other. Is there a chance it's the motherboard? I'm just looking for some next steps.

 

I'm currently looking at potentially trying another CPU (I've got a 5600x but it's in another rig, so is going to be a huge hassle) so really open to suggestions at this point!

IMG_0278.PNG

Link to comment

Is the motherboard BIOS latest version? Is all overclocking or XMP turned off? Even if the memory itself is rated for a specific speed, the motherboard and CPU are capable of significantly less aggressive timing. Try turning the memory speed to the lowest setting and see if the symptoms change.

 

Also, make sure CPU to heatsink interface is proper, as in correct amount of heatsink compound and spring tension holding the heatsink evenly against the CPU.

Link to comment

I had overclocking turned off in the BIOS and all the Ryzen specific suggestions that people make around power management etc. In the end I swapped in another CPU and all the problems went away, so it appears to be a faulty CPU. CPU was only a year old too, so it's a shame. I put a 5600X in there instead and all the issues went away, so I think I'm going to leave it in there and pick up a new CPU for the other machine. The 3600 failed in the other machine, so it's dead.

Link to comment
  • JorgeB changed the title to [SOLVED] Boot looping after crash

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.