Jump to content

1 SATA and 2M2.SATA missing and no network access?


Rhuarc
Go to solution Solved by Rhuarc,

Recommended Posts

Ok,  not exactly sure where to start.  I shutdown my server today to install a UPS on it.  When I booted it back up it was like something got screwy.  For the array I have an LSI with 8 drives, plus 4 more hooked directly to SATA on the motherboard.  For Cache I have 1 SATA SSD, and 2 M.2 SATA SSD that are in PCI x1 risers plugged into the motherboard.  This had all been working great!  I noticed the my RAM was showing as 32GB installed but only 16GB Useable (in unRAID).  I started to doing some digging to see why that might be.  I had read some stuff about the SMBus being weird with some LSI cards and that taping the 5th and 6th pins fixed memory not being useable so I tried this.  It did not work.

 

Now the 2 M.2 drives (cache) and 1 of the array drives (a 16TB that should be parity1) that is hooked directly to the motherboard are no longer showing in unRAID.  They still show in the BIOS though!  Also, the onboard network interface is no longer working for me to remote into it.  I am hooked directly to it with KVM in order to diagnose.  I have tried basically reverting everything to exactly the config before, but those drives and the network interface will not come back!  I'm at a complete loss as to what has gone wrong and where to begin troubleshooting.

 

I've included a Diagnostics File.

 

Help me Obi-UnRAID, your my only hope!

galaxy-diagnostics-20221217-0215.zip

Link to comment
21 minutes ago, JorgeB said:

The 2 port SATA controller is failing to initialize, connect parity to the other SATA controller, the free SATA port next to where your SATA SSD is connected.

 

For the NVMe devices it's more of a mystery, the driver is not loading and there's nothing in the syslog about why, very strange.

Thank you so much for your help!

 

I switched the parity drive to the only free SATA port left.  It did an additional weird thing now.  When I boot it runs several times could not find eth0.  Literally all I did was change that one SATA cable!

 

Also an additional strange thing by doing that, the useable memory went from 32GB back down to 16GB!  I don't know what I had done to get it to go back up to 32GB useable, but seemingly switching that cable made the difference again.  I did also try a separate NIC to see if I could get network access. No dice. I've included diagnostics again.

 

The thing I'm so confused about is that this EXACT configuration worked earlier today. I'm not sure why suddenly it isn't finding drivers for the NVME drives, or not initializing the SATA ports. And also not finding either of the NICs!!!  Earlier today I had the onboard and a separate NIC bonded in active-backup mode and everything worked great!

galaxy-diagnostics-20221217-0257.zip

Edited by Rhuarc
Link to comment

Ok, so a strange update to this topic.  I needed to see if the problem was hardware (mainboard, NIC, HBA, etc) or software (unRAID config) so I did a new trial install of unRAID on a different USB key.  Everything is magically showing up now!  Both the M.2 in the PCIe brackets, the NIC is working normally now and has a network connection.  So I think somewhere along the lines I truly FUBARed my config!

 

Is there a way to do a clean install of unRAID but then get back all of my docker settings, network configs, and system customizations?  Or at least compare the two installs to see what changes might have been made that are screwing it all up?

 

Thank you in advance!

Link to comment

I will post up more findings tomorrow when I have time, but quick question.  What possible reason would cause an onboard NIC to stop working only when an LSI HBA is plugged in, and for 2 of my onboard SATA ports to stop working?  As soon as I take out the HBA suddenly the network works again and the 2 drives on the sata ports show back up!

Link to comment
5 hours ago, Rhuarc said:

I will post up more findings tomorrow when I have time, but quick question.  What possible reason would cause an onboard NIC to stop working only when an LSI HBA is plugged in, and for 2 of my onboard SATA ports to stop working?  As soon as I take out the HBA suddenly the network works again and the 2 drives on the sata ports show back up!

I would suggest carefully checking the motherboard manual as sometimes there are limitations on what can be plugged into what expansion slots.

Link to comment
4 hours ago, itimpi said:

I would suggest carefully checking the motherboard manual as sometimes there are limitations on what can be plugged into what expansion slots.

This is on my list to do tomorrow. The wierd thing is that this exact configuration worked perfectly for a couple weeks without any config problems. Complete with 2 M.2 drives in pcie x1 adapters, hba with 8 drives, 5 more drives plugged into Sata on the mb, onboard nic and add on nic working in active backup network mode. I'm just not sure what changed that now it is so all over the place! 

Link to comment

Ok, here are the results of a a lot of my research in a spreadsheet. I have included the motherboard manual as well.

 

It has the following:

*32GB RAM (2 of these kits) -  https://www.amazon.com/gp/product/B077DWDTZR/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 

*LSI HBA - https://www.ebay.com/itm/162958581156 I have verified that both of these cards I have tried have the 20.0.07 firmware

*EVGA GTX 1060

*PCI to M.2 SSD Adapter x2 - https://www.amazon.com/gp/product/B09TGW3P82/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 (using these and the SSD for cache)

*Ryzen 1600X

*2.5GB NIC - https://www.amazon.com/gp/product/B07Y2GWVB8/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 (I had it working with both this and the onbard NIC in active-backup mode in unRAID, but have taken it out while trying to get all this troubleshooting done)

*12 HDD (2 parity, 10 for the array)

*1 SSD (Using these for the cache along with the 2 NVME drives)

 

I had ALL of this working until I rebooted 2 days ago to hook up a UPS and now I am having all of these problems.  

I have tried to get this all working again using both my existing unRAID installation and I also created a new trial installation on a different USB key to make sure I didn't do something weird in my install. S Same result in both.

 

I am truly at a loss why it was working and now the exact same configuration is causing so many problems...

Screenshot 2022-12-18 125029.png

 

 

According to the manual the M2_1 slot and the PCIE4 slots (in blue) are connected and the M2_2 and SATA 3_3 (in yellow) are connected.  The 4 red SATA ports are one controller that is capable of RAID etc, and the 2 green SATA are a different ASMedia contorller.

 

Weirdly though at no point am I using either of the actual M2 slots so I don't think that should be disabling the PCIe slots or the onboard SATA 3_3. 

Motherboard.thumb.png.9587d722e5e4f4e40b1f7bfeb236b249.png

 

Fatal1ty AB350 Gaming K4.pdf

Edited by Rhuarc
Link to comment

I seem to have stumped the experts! Lol... Assuming I am able to get the HBA working if nothing is plugged into the SATA ports on the MB (which I will be testing tomorrow) Would a 16 port HBA work in the PCI4 slot shown above (which according to the manual is an x8 slot)?   

 

I want to make sure I wouldn't be limiting the speed of the 16 port HBA by using it on an x8 slot.  Then I would have basically all drives plugged into a single HBA except the 2 PCI M.2 NVME risers that go into x1 slots.

 

Thanks again everyone for all the help!  I'm pulling out my hair! lol

 

Also,  Is there a good cheaper 16 slot HBA that is recommended?  I'm of course still hoping to get this fixed for right now using my existing hardware so keep those suggestions coming!!!

Edited by Rhuarc
Link to comment
  • Solution

Well,

 

I have no idea what fixed it, but after unplugging everything and gradually adding things back in all of my drives are working again.  I still occasionally have a Useable Memory only showing as half what is installed, but that is a separate issues that only seems to happen sometimes so I will post a new thread about that problem.

 

Thank you all!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...