Sudden problems with unRaid and Hardware


corpsell

Recommended Posts

Hi everyone, first time poster.

System specs:
Chassis: Dell r710
1 SSD for cache via PCIe adapter
3 WD RED drives via the backplane
UnRaid version: 6.5.3

I, Saturday, preformed a clean shutdown and all seemed well. From there, I reconfigured my network with a new router and patch panel.

Come today, Monday morning, all is ready to go back online. I run my new CAT 6 cables to my server, and boot. Through the booting process I see all of my HDDs and single SSD show up through verbose (I think this is the right term) while the system was booting, also showing me that the first three physical DIMM's on my mobo are now all "dead". Shortly after my system returns that there is no boot media. Odd. From here I go to my boot settings and search to make my flash drive the boot drive, it doesn’t show up. Again, very odd, but I shutdown and go internally to the built in USB hub and reseat it. Ok, boot again and voila, I am now booted into unRaid. I then login only to find that all 4 of my drives are no longer assigned nor are they available in each disks drop down. Now I’m spooked. I see in the verbose again that the drives are all there and all have data, but unRaid can’t see them.

Where do I go from here?

Also note: the server is no longer showing up in my routers devices connected list. I’ve also bypassed the patch panel direct into my switch. Other devices on the switch function perfectly.

So to sum up everything quickly, shutdown just fine, rebooted and now unRaid can’t see nics or the HDDs and the first three DIMMs are now "bad". Flash drive works, as when I boot into unRaid it says “missing x drive” ect. It knows what it’s looking for but can’t find it. :(

As for my syslog, how can i pull this since I cant any longer access the server over the network (noob question I know)?

Thanks.

Link to comment

There are 9 DIMMs per CPU, and I've reconfigured the good DIMMs into the bad DIMM slots. Same error persists. With that in mind, would this even cause the problems with the HDDs and NICs not being seen by unRaid?

 

Also note, unRaid will boot fine, so it seems like the RAM problems wouldn't impact the other issues, am I wrong?

Edited by corpsell
Link to comment
31 minutes ago, trurl said:

So is there no bad RAM in the system now? Everything ultimately goes through RAM.

I think now we are good with RAM, how can I confirm this via the console GUI (i.e. viewing system specs)? Also, how can I pull the syslog onto a removable media from the console since I cant remote in due to the networking problem (notated during boot at "Cant find bond0"). Bond0 is what I was using for my network via eth0 and eth1. Now in "network settings" I only see eth0 and not the other 3 eth's. Again I'd love to show you the Syslog, but don't know how to pull it.

Edited by corpsell
Link to comment
23 minutes ago, corpsell said:

I think now we are good with RAM, how can I confirm this via the console GUI (i.e. viewing system specs)? Also, how can I pull the syslog onto a removable media from the console since I cant remote in due to the networking problem (notated during boot at "Cant find bond0"). Bond0 is what I was using for my network via eth0 and eth1. Now in "network settings" I only see eth0 and not the other 3 eth's. Again I'd love to show you the Syslog, but don't know how to pull it.

Just pulled the syslog and also confirmed all 8 DIMMs (8x4GB) are working. No more ram issues. See attached file for syslog.

syslog.txt

Link to comment

I had simalar issues with my tyan s7012 board. It turned out to be a pin in the socket. It somrtimed worked, the BAM stalled and could not see 3 dimms. You might want to check your sockets. For me 3 reeseating made the pin pop back in place. Regarding your drive issues. Iss your cobtroller showing up in unraid? (Tools->System Log)

Sent from my SM-G955F using Tapatalk

Link to comment
2 minutes ago, Alphahelix said:

I had simalar issues with my tyan s7012 board. It turned out to be a pin in the socket. It somrtimed worked, the BAM stalled and could not see 3 dimms. You might want to check your sockets. For me 3 reeseating made the pin pop back in place. Regarding your drive issues. Iss your cobtroller showing up in unraid? (Tools->System Log)

Sent from my SM-G955F using Tapatalk
 

You mean the system log I just have attached or another log? If the former, IDK what to look for in that log file. If I search "Controller" in the syslog, I see only mentions of USB controllers (unless I'm reading it wrong).

 

Link to comment
14 minutes ago, Alphahelix said:

Sorry my bad not system log... system devices. Look for LSI. My 9211 register as: RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)

Sent from my SM-G955F using Tapatalk
 

So I do have this:

[1001:0072] 05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Pusion-MPT [Falcon] (rev 03)

 

I also see and ATA controller which is what shows on boot in relation to my HDDs

Link to comment

Here are some various photos in the GUI of my problem. Hope it helps.

1_nic.jpg

I have 4 nics connected, the first two of which had been bond0 before. Now I only see eth0 and during bootup after bios, I see "The device bond0 cannot be found".

 

controller.jpg

I believe this to be my controller, even though none of my drives show up.

 

 

 

Edited by corpsell
Link to comment
1 hour ago, trurl said:

You can get us more complete diagnostics by going to Tools - Diagnostics.

 

You can also test memory by booting into memtest. It is one of the options on the unRAID boot menu.

 

The memory seems fine now, 0 errors in the memtest so far, will confirm when finished and also post the diagnostic file. What should I look for within my syslog in the mean time?

Link to comment

Great news, after nuking the bios and reconfiging the SAS card, HDDs are back online. Everything is great now except the NICs.  Now there’s not even any lights on the eth ports. After my parity is done I’ll go back to bios and double check the mic settings. No matter what I do, 1 or all 4 eth ports, it always returns a self assigned address. 

 

Will report back. 

Link to comment
On 9/11/2018 at 6:51 AM, corpsell said:

Great news, after nuking the bios and reconfiging the SAS card, HDDs are back online. Everything is great now except the NICs.  Now there’s not even any lights on the eth ports. After my parity is done I’ll go back to bios and double check the mic settings. No matter what I do, 1 or all 4 eth ports, it always returns a self assigned address. 

 

Will report back. 

Sorry guys, work got crazy. Final update, everything is now working great. I just had to toggle the NICs back off/ reboot, then back on and reboot in the BIOS to get the NIC's back up. Thanks guys.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.