[solved] Server unresponsive, had to hard reboot


BLuFeNiX

Recommended Posts

Would anyone care to take a look at my syslog? It contains multiple boots worth of logs, with the last section being the boot after the hard reset. Best I can figure, it started to hang immediately following an SSD trim. I have left several newlines separating the hang-time logs and the following reboot (around line 26800).

 

This server build is only a few days old. I have done 5 passes with memtest86 (0 errors) prior to this happening, and the SSDs are brand new. I did have a UDMA error previously on one of the SSDs, but I swapped the cable and SATA port, and it's been fine since then.

 

Any advice is appreciated.

syslog.watt.txt

Edited by BLuFeNiX
Link to comment
38 minutes ago, Frank1940 said:

Yous should provide the Diagnostics file.      Tools   >>>   Diagnostics     

 

You might also give us a rundown on the actual hardware since it is brand new build. 

 

Is the PS new or recycled?  

Sure thing, it's attached. The build was partially cannibalized from another server I built in 2016.

Old parts from 2016:
* ASRock AM1H-ITX mobo
* 2x 2TB Seagate ST2000DM001 HDDs
* 2x Crucial 8GB DDR3 1600 MT/s PC3-12800 CL11 Unbuffered UDIMM 240-Pin Desktop Memory CT102464BA160B
* AMD Athlon 5350 AD5350JAHMBOX 2.05 GHz Quad-core

New parts:
* 2x 2TB WD AV-GP SATA III Intellipower 64 MB Cache Bulk/OEM AV Hard Drive WD20EURX
* Syba 4 Port SATA III 6.0Gb/s PCIe Controller Card (identifies as Marvell 88SE9230 Chipset via lspci)
* Chenbro SR30169 Tower Case SR30169T2-250 (came with new 250 watt PSU preinstalled)
* 1x Crucial BX500 240GB 3D NAND SATA 2.5-Inch Internal SSD - CT240BX500SSD1Z
* 1x PNY CS900 240GB 2.5” SATA III Internal Solid State Drive (SSD) - (ssd7cs900-240-rb)

Misc details:
* The HDDs are connected to the PCIe expansion SATA card
* The SSDs are plugged directly into the Mobo. One of them is in a "normal" SATA port, and the other is in an ASmedia SATA port (but in regular SATA mode). This is because of the UDMA error I got, so I moved it from the normal SATA port to the ASmedia one.
* The flash drive is from a few years ago. Not certain of age, but basically never used. I have a new one that I will be swapping in.

 

Thanks for the help. Let me know if any more information would be useful.
 

watt-diagnostics-20190917-0305.zip

Link to comment

The Marvell chip sets have been an issue for quite some time.  (For more information, Google  Unraid.net Marvell 88SE9230 Chipset )  You can try turning off any virtualization options in the BIOS.  This thread gives more details:   

 

 

You should also check the spec's on that power supply.  (You may have to physically look for the ratings on the units as I could not find any in the specs.)  There are very few folks using 250W supplies with Unraid.  The +12V rails are usually the problem.  The recommendation for Unraid use is that the PS be the single rail type.  (Commonly, the less expensive PS's will be dual rail with the second buss being dedicated to the GPU card.)   You also have to be concerned about total power consumption rating.  (I have seen a PS rating where, say, a 300 watt supply will have a 24A rating on the +12V busses which only leave 12W for everything else.)  You need between 8-to-12 ampere rating on the +12V buss to allow for the peak starting current when all of the drives spin up simultaneously.  

 

 

  • Like 1
Link to comment
31 minutes ago, Frank1940 said:

The Marvell chip sets have been an issue for quite some time.  (For more information, Google  Unraid.net Marvell 88SE9230 Chipset )  You can try turning off any virtualization options in the BIOS.

...

You should also check the spec's on that power supply.

...

Thanks! Unraid's information panel told me IOMMU was disabled, but I've just turned off SVM in the BIOS to be safe. I'll see if I can get it to run like this, and if it happens again I'll try swapping the PSU. I will update here in several weeks, in case it's able to help anyone else down the road.

Link to comment
4 minutes ago, jonathanm said:

Switching to an LSI chipset controller would likely solve most of your issues.

I thought about suggesting that but I also saw in the spec sheets for his MB this :

 

Quote

1 x PCI Express 2.0 x16 Slot (PCIE1 @ x4 mode)

I was not sure that an LSI card would be happy with this.  Perhaps, you could provide more insight.  

Link to comment
1 minute ago, jonathanm said:

Since it's new, can you return the marvell based controller? Switching to an LSI chipset controller would likely solve most of your issues.

I'm considering it. Gonna see how the stability is for a week or so, and if it doesn't freeze up I might return the card and try turning virtualization back on with a different card.

Is LSI the gold standard for Unraid?
 

1 minute ago, johnnie.black said:

x4 it's fine for an LSI, less bandwidth but enough for most HDDs.


Is there a particular card you'd all recommend?

Link to comment

You can get LSI cards from many different vendors on E-bay and the cost is quite reasonable-- <$90US.  Read the offers carefully and vet the vendors with care as there are some 'bad' actors selling counterfeit cards.  I believe that many of the LSI cards that Johnnie listed are discontinued as that should be a clue when someone is selling 'new' ones.  If you are uncomfortable with flashing (or crossflashing), some vendors will provide cards with it already done.  Be care on E-bay with the LSI Model numbers.  Some vendors will list the card model number based on the hardware but the LSI part number also refers to the software/firmware installed on the card and LSI RAID software/firmware versions may not work with Unraid.  

Edited by Frank1940
  • Like 1
Link to comment
3 minutes ago, BLuFeNiX said:

I want to make sure I'm reading this correctly. You're saying that specific versions of LSI firmware might not work with Unraid? Is this wiki page adequate for checking? https://wiki.unraid.net/Hardware_Compatibility

Yes.  The part numbers that @johnnie.black listed were the ones that have the LSI IT-mode software/firmware installed on them at the factory.  (Note that LSI has been bought by a couple of different entities since 2014.)   I (personally) would be very careful in using that compatibility list as it is not really maintained to see that obsolete information is removed.  See this information which is at the beginning of this WIKI:

Quote

 

Please use this page with caution! It was updated much more in the days of v4 and v5, has seen little updating since the advent of v6. That means many hardware recommendations may be obsolete.

The following list is compiled by the unRAID user community. While it is mostly accurate, it is not definitively so, as it cannot be guaranteed that users have the time, expertise or diligence to test and report back all aspects. It is recommended that if you are using this list, you do so in conjunction with heavy use of the forum.

 

 

Link to comment
7 minutes ago, Squid said:

More that some cards will ship with ancient and/or buggy versions of the firmware.  

This version seems to work fine for many people:   FWVersion(20.00.07.00)  The better vendors will spell out exactly what you are getting.  If you are comfortable with flashing the firmware, it does not matter much what software/firmware is installed.  However, you must be aware enough to check and take any appropriate action required. 

Link to comment
6 hours ago, Frank1940 said:

This version seems to work fine for many people:   FWVersion(20.00.07.00)  The better vendors will spell out exactly what you are getting.  If you are comfortable with flashing the firmware, it does not matter much what software/firmware is installed.  However, you must be aware enough to check and take any appropriate action required. 

Gotcha, thanks. I'm fine with flashing firmware. If the server is stable with virtualization turned off I'll probably leave it be though :)

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.