BKTEK Posted October 6, 2022 Share Posted October 6, 2022 (edited) Hello! I decided to rebuild my server's OS which was bogging down like crazy. So I installed 6.11 to a USB device and fired it up. Things seemed fine - I installed a few essential plug-ins, notably Nvidia Drivers. Then later in the day I rebotted the server and it wouldn't properly boot. Hasn't been able to boot since. The motherboard is a Supermicro X11SSM-F. I read up a lot on possibilities about why it might not boot. I thought maybe it was hanging trying to install the Nvidia driver. But since I can't get into the system anymore and can't SSH in, I'm not able to fix it. Safe mode doesn't help, etc. I have since tried booting the system with: different RAM (ECC and non-ECC) different DIMM slots different USB devices (identical devices which used to work fine) different USB ports USB3 USB2 using a backup of my OS before the crash installing various other Unraid builds ethernet plugged in ethernet unplugged removing the video card Various UEFI options, including EFI and Legacy boot hard drives no hard drives EDIT (additional info): It's worth noting about the RAM...the server originally had two 16GB ECC DIMMs in it (some years ago). At some point in the distant past one of the sticks just stopped showing up so I could only see 16GB total...even in the UEFI. So when I was troubleshooting all of this all day I was able to get the UEFI to see 32GB in a lot of different slot combinations. But when troubleshooting I tried each stick in each slot separately...no avail. In every case the system hangs, usually on random stuff. I worked on this for about 10 hours today and made no progress. Any help would be appreciated. Thank you. Edited October 6, 2022 by BKTEK Quote Link to comment
JorgeB Posted October 6, 2022 Share Posted October 6, 2022 Where does it stop in the boot process? Quote Link to comment
BKTEK Posted October 6, 2022 Author Share Posted October 6, 2022 It stopped in at least 30 different places, literally. It always gets to the blue Unraid screen where I have tried normal boot, safe mode, GUI, and GUI safe mode. It always continues past that for about 20 seconds or so and then is totally unresponsive. It appears to draw its last line of text but necessarily the entire "idea", like there's a BMC error it sometimes mentions that I've seen so many times. It mentions the word "compensating" but that word is split between two lines and sometimes it stops drawing text in the middle so ends with "compe" and never even finishes the sentence. Sometimes it stops around the Nvidia driver install and complains about kernel taint. This only happens when I try to use my old build. It often freezes right after. It seems like it's usually looking for hardware when it freezes (BMC, video card, sda, sdb, etc.). After unplugging all of the drives it still errors. Seems like it almost certainly has to be flash drives or RAM...and I've tried every permutation of those I can think of. Thank you for your response. Quote Link to comment
BKTEK Posted October 6, 2022 Author Share Posted October 6, 2022 On another note, which might help me diagnose things more...during the first time a new UNRAID install gets used, what files are written to the USB drive? Because every time I put the drive back in my computer to reformat it and reinstall UNRAID to it, it has errors...so I believe the the broken install/boot is corrupting the USB drive. That might prove informational. Quote Link to comment
JorgeB Posted October 6, 2022 Share Posted October 6, 2022 36 minutes ago, BKTEK said: .during the first time a new UNRAID install gets used, what files are written to the USB drive? I believe nothing is written until you start changing settings. Quote Link to comment
BKTEK Posted October 6, 2022 Author Share Posted October 6, 2022 3 minutes ago, JorgeB said: I believe nothing is written until you start changing settings. I just tested this while you must have been writing your reply (serendipity) and can see that it is definitely writing to the USB. I put a new install of 6.10.3 in the server and booted it up. It wrote lots of things to the config: folders: modprobe.d plugins-error pools shares ssh ssl wireguard files: network-rules.cfg (various others) This is with no drives attached...so I can't see how it's seeing this stuff, including "wireguard." Weird. Thanks for responding. Quote Link to comment
BKTEK Posted October 6, 2022 Author Share Posted October 6, 2022 6 hours ago, JorgeB said: Where does it stop in the boot process? Last error (with no drives attached: mount: /dev/loop1 mounted on /lib/modules. Quote Link to comment
BKTEK Posted October 6, 2022 Author Share Posted October 6, 2022 7 hours ago, JorgeB said: Where does it stop in the boot process? Some errors (just this morning): Dirty bit is set - Fs not properly unmounted and data may be corrupt. Automatically removing the dirty bit. There are differences between the boot sector and backup (mostly harmless) Hangs: ata4: SATA max UDMA/133 abar m2048@0xdd11d000 port 0xdd11d280 irq 31 [drm] Initialized ast 0.1.0 20120228 for 0000:06:00.0 on minor 0 mpt2sas_cm0: 0 1 1 (the three above hangs were simple reboots with no hardware changes...demonstrating the randomness of the hangs) mount: /dev/loop1 mounted on /lib/modules. mpt2sas_cm0: CurrentHostPageSize is 0: Setting default page size to 4k (removed SAS): usb 1-5: new high-speed USB device number 2 using xhci_hcd (no SAS, 2 SSDs attached): ast 0000:06:00.0: [drm] Analog VGA only This is really crazy. Any help/insight would be great. I'm at a loss. Quote Link to comment
JorgeB Posted October 6, 2022 Share Posted October 6, 2022 Try booting a different OS to rule out hardware issues, like Ubuntu or even Windows. Quote Link to comment
Solution BKTEK Posted October 26, 2022 Author Solution Share Posted October 26, 2022 After trying so many things for so long, I decided simply wait for a very long time - and in fact I was able to boot in. One of the many problems (and the primary one ultimately) was that my IPMI on my SuperMicro board seems to conflict with the video card I have in the box...so it seems that the video output simply stops at random places during boot. That's why I had so many random "hangs"...it would randomly stop updating the video on my monitor during the boot process. Strange. Either way, I ultimately got it to work. Quote Link to comment
jmztaylor Posted October 26, 2022 Share Posted October 26, 2022 1 hour ago, BKTEK said: After trying so many things for so long, I decided simply wait for a very long time - and in fact I was able to boot in. One of the many problems (and the primary one ultimately) was that my IPMI on my SuperMicro board seems to conflict with the video card I have in the box...so it seems that the video output simply stops at random places during boot. That's why I had so many random "hangs"...it would randomly stop updating the video on my monitor during the boot process. Strange. Either way, I ultimately got it to work. I have a IBM tower server and see something similar. But I don't even get the boot menu. I have it set to boot to GUI by default, so I just wait till I see the GUI from the server output. Since I got this server I have not once seen the boot process from the output of the integrated video. But through IPMI I can see the entire boot process. Its quite strange but it ultimately works. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.