Jump to content

No Boot after Trying Everything...


BKTEK
Go to solution Solved by BKTEK,

Recommended Posts

Hello!

 

I decided to rebuild my server's OS which was bogging down like crazy. So I installed 6.11 to a USB device and fired it up. Things seemed fine - I installed a few essential plug-ins, notably Nvidia Drivers. Then later in the day I rebotted the server and it wouldn't properly boot. Hasn't been able to boot since. The motherboard is a Supermicro X11SSM-F.

 

I read up a lot on possibilities about why it might not boot. I thought maybe it was hanging trying to install the Nvidia driver. But since I can't get into the system anymore and can't SSH in, I'm not able to fix it.

 

Safe mode doesn't help, etc.

 

I have since tried booting the system with: 


different RAM (ECC and non-ECC)
different DIMM slots
different USB devices (identical devices which used to work fine)
different USB ports
USB3
USB2
using a backup of my OS before the crash
installing various other Unraid builds
ethernet plugged in
ethernet unplugged
removing the video card
Various UEFI options, including EFI and Legacy boot
hard drives
no hard drives

 

EDIT (additional info): It's worth noting about the RAM...the server originally had two 16GB ECC DIMMs in it (some years ago). At some point in the distant past one of the sticks just stopped showing up so I could only see 16GB total...even in the UEFI. So when I was troubleshooting all of this all day I was able to get the UEFI to see 32GB in a lot of different slot combinations. But when troubleshooting I tried each stick in each slot separately...no avail.

 

In every case the system hangs, usually on random stuff. I worked on this for about 10 hours today and made no progress.

Any help would be appreciated. Thank you.

Edited by BKTEK
Link to comment

It stopped in at least 30 different places, literally. It always gets to the blue Unraid screen where I have tried normal boot, safe mode, GUI, and GUI safe mode. It always continues past that for about 20 seconds or so and then is totally unresponsive. It appears to draw its last line of text but necessarily the entire "idea", like there's a BMC error it sometimes mentions that I've seen so many times. It mentions the word "compensating" but that word is split between two lines and sometimes it stops drawing text in the middle so ends with "compe" and never even finishes the sentence. 

 

Sometimes it stops around the Nvidia driver install and complains about kernel taint. This only happens when I try to use my old build. It often freezes right after.

 

It seems like it's usually looking for hardware when it freezes (BMC, video card, sda, sdb, etc.).

 

After unplugging all of the drives it still errors.

 

Seems like it almost certainly has to be flash drives or RAM...and I've tried every permutation of those I can think of.

 

Thank you for your response.

Link to comment

On another note, which might help me diagnose things more...during the first time a new UNRAID install gets used, what files are written to the USB drive? Because every time I put the drive back in my computer to reformat it and reinstall UNRAID to it, it has errors...so I believe the the broken install/boot is corrupting the USB drive.

 

That might prove informational.

Link to comment
3 minutes ago, JorgeB said:

I believe nothing is written until you start changing settings.

I just tested this while you must have been writing your reply (serendipity) and can see that it is definitely writing to the USB. I put a new install of 6.10.3 in the server and booted it up. It wrote lots of things to the config:

 

folders:

modprobe.d

plugins-error

pools

shares

ssh

ssl

wireguard

 

files:

network-rules.cfg

(various others)

 

This is with no drives attached...so I can't see how it's seeing this stuff, including "wireguard."

 

Weird. Thanks for responding.

Link to comment
7 hours ago, JorgeB said:

Where does it stop in the boot process?

Some errors (just this morning):

 

Dirty bit is set - Fs not properly unmounted and data may be corrupt. Automatically removing the dirty bit.
There are differences between the boot sector and backup (mostly harmless)

 

 

Hangs:

 

ata4: SATA max UDMA/133 abar m2048@0xdd11d000 port 0xdd11d280 irq 31
[drm] Initialized ast 0.1.0 20120228 for 0000:06:00.0 on minor 0
mpt2sas_cm0:  0 1 1
(the three above hangs were simple reboots with no hardware changes...demonstrating the randomness of the hangs)

 

mount: /dev/loop1 mounted on /lib/modules.
mpt2sas_cm0: CurrentHostPageSize is 0: Setting default page size to 4k
(removed SAS): usb 1-5: new high-speed USB device number 2 using xhci_hcd
(no SAS, 2 SSDs attached): ast 0000:06:00.0: [drm] Analog VGA only

 

This is really crazy. Any help/insight would be great. I'm at a loss.

Link to comment
  • 3 weeks later...
  • Solution

After trying so many things for so long, I decided simply wait for a very long time - and in fact I was able to boot in. One of the many problems (and the primary one ultimately) was that my IPMI on my SuperMicro board seems to conflict with the video card I have in the box...so it seems that the video output simply stops at random places during boot.  That's why I had so many random "hangs"...it would randomly stop updating the video on my monitor during the boot process. Strange. 

 

Either way, I ultimately got it to work.

Link to comment
1 hour ago, BKTEK said:

After trying so many things for so long, I decided simply wait for a very long time - and in fact I was able to boot in. One of the many problems (and the primary one ultimately) was that my IPMI on my SuperMicro board seems to conflict with the video card I have in the box...so it seems that the video output simply stops at random places during boot.  That's why I had so many random "hangs"...it would randomly stop updating the video on my monitor during the boot process. Strange. 

 

Either way, I ultimately got it to work.

 

I have a IBM tower server and see something similar.  But I don't even get the boot menu.  I have it set to boot to GUI by default, so I just wait till I see the GUI from the server output.  Since I got this server I have not once seen the boot process from the output of the integrated video.  But through IPMI I can see the entire boot process.  Its quite strange but it ultimately works.  

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...