New setup help


gumby327

Recommended Posts

I had a AMD Ryzen 7 5700X and then a *G.  The problems mounted to a point where the whole thing is no longer worth experimenting on.  I went from bit-rot, to CRC failures on over 10 drives and I had drives error out with little explanation.  I changed from Marvel to HBA SAS, I changed cables, nothing would last more than a week.  It was a total waste of time and thousands of dollars.  I have an Intel across the workshop that is stable never ha d a single problem.  It even works fine on the disks failed in either of my two AMD Ryzen chips.  The processor sensors have never detected, I cannot load drivers... At the end of the horrible experience I have embraced change.

 

So, it is time for clean slate.  I am thinking of building a Intel server.  My parts list is:

https://www.amazon.com/dp/B09D1HDPQT/?coliid=IPY2LP8W0O5IH&colid=2VOOCJZIV4PU5&psc=1&ref_=lv_ov_lig_dp_it

https://www.amazon.com/dp/B086MN2XYL/?coliid=I1NXKFT148T5KX&colid=2VOOCJZIV4PU5&psc=1&ref_=lv_ov_lig_dp_it oops, I need the one with the IGPU

 

Is there any problems with this combination?

Edited by gumby327
Link to comment
4 hours ago, gumby327 said:

oops, I need the one with the IGPU

Yes, get the non F model, as for the board it should work fine, just not sure about the NIC, should be fine with v6.10-rc, might not be with v6.9.x, and it's still a Realtek NIC, Intel would be better, also consider getting the non WIFI model since you won't be using that, unless you want to use it with a VM.

  • Like 1
Link to comment

I run a 5700G with zero issues, by chance have you tested or swapped out your RAM?  Clearly the drives, CPU or disk controller were not the issue, that narrows it down quite a bit.

 

I have run an A10-6790, 2200g, 1600AF, 2600, 2700x and the 5700G on a variety of motherboards with never a single issue.  I do understand the frustration and wanting to try something different.

Link to comment

That is haunting me, but the pattern I am seeing is disk 1 and or 3 start throwing CRC errors.  I take them out put in a brand new drive and here today it lasted about 7 hours and all of a sudden it starts going nuts with CRC's.  I have moved the two cables around, but nothing works what ever drive I stick in that spot fails.

 

NODE 804:

 

           == parity1      == disk4

           == parity2     == disk3

air --->  == disk6        == disk2  air --->

           == disk5        == disk1

                             |                    |

          wires           |       PSU       |

                            _______________

 

Disk 1 is closest to the case cover.

 

I don't know disk1 is literally 10 hours old now and it went out right away.  disk3 is still fine.  7 and 8 are sata and on the floor in front of the case.  So... let me take the ram out of server "Pokey" and place it in "Gumby" and take all of the ram out of "Gumby" and set it aside.  I don't want to kill Pokey since it is my backup to the media library.  The problem I may have is thermal pad on my CPU, the cooler has to come off to get at my ram. 

 

Did you ever get your CPU temp to report out?

Link to comment

CPU temps report with the MSI and Asrock board but did not with the Gigabyte board, motherboard temps are spotty with the MSI, sometimes they work after a reboot and sometimes they do not.  Memory can be funny but it should have reported an error, though that is not guaranteed.  Is the power supply new (ish) and does swapping the power from drive to drive cause the error to follow power cables?  If it were me, even if I went with a different board and cpu, I would still be troubleshooting this one, too much money invested to just leave it alone.  

 

FWIW, the 5700G is the best cpu I have used to date, more than enough power and power consumption is really low.  The 2700X was a power hog in comparison, I dropped 40 watts of total power with the cpu and cooler change and temps run from 25-30C whereas with the 2700X I would idle at 50C, even with an Arctic Liquid Freezer II 240 on it, which I hated having a water cooler on it.

Link to comment

I wonder what would happen if I placed this array on the other board (Intel one) and placed this dongle (flash drive) over there.  It is another chipset and much older, and less cores.  But that is suppose to be the power of unRAID

Edited by gumby327
Link to comment
8 minutes ago, gumby327 said:

I wonder what would happen if I placed this array on the other board (Intel one) and placed this dongle (flash drive) over there.  It is another chipset and much older, and less cores.  But that is suppose to be the power of unRAID

Basic array functions should work fine, the only issues would be if you have passed through hardware for VM's.

Link to comment
14 minutes ago, gumby327 said:

I wonder what would happen if I placed this array on the other board (Intel one) and placed this dongle (flash drive) over there.  It is another chipset and much older, and less cores.  But that is suppose to be the power of unRAID

 

That is basically all I have ever one when changing cpu/motherboard/case.   Now you know the ram is not the issue.

Link to comment

new update, you are probably not going to believe this.  NODE 804, when fully disked up with 12 HDD hard drives, may have too much case twist.  Unfortunately I broke the IT rules.  I tried two things.  #1, I put half the questionable ram in Pokey, and I put the other half in Gumby.  Then I swapped the SAS connection spots and pressed down on the controller card into the slot ... it was not seated correctly.  Now I look at a screenshot of my UDMA CRC Errors in Unraid System Dashboard V2 and I see we are now rock steady.  For five minutes.  I am even watching 4 movies same time, ran mover as well as fire up my surveillance FTP server ...  Been going for 10 minutes now and it has not had a single CRC detected.

 

So, to me right now it looks like the NODE 804 case is to weak to handle carrying it up my ladder and placing it on the shelf when I am done working on it without checking all the cards for seating every time.

Link to comment

new drive is in.  what I did was took array drive8 and  assigned it to slot for drive1.  So, it is on the motherboard SATA now.  It has been running clean for a couple of hours.  What I am not sure about is the LSI Broadcom SAS 9300-8i is x8 and it is in a x4 slot.  I wonder if it is doing a multiplier for the 4 lanes, and if that is the case that has been noted as a problem with unRAID.

Link to comment
16 hours ago, gumby327 said:

What I am not sure about is the LSI Broadcom SAS 9300-8i is x8 and it is in a x4 slot.

Except for limiting the total bandwidth that won't be a problem, i.e., that by itself is not a reason for having stability issues, you can only use the available motherboard lanes, can't do anything to try and use more than the available number.

Link to comment

I have had Gumby running steady no errors for over two days and locked in the actual problem.  It was a HBA SAS PCIe 8x resting in a PCIe 4x slot.  I always knew it would have restricted bandwidth in that slot, but what I had never guessed is the strategy it was using was port multiplier.  It says half lanes so multiply for the other SAS rail.  It was fairly OK for 7 drives, but when you placed a 8th drive in you started getting CRC errors.  Today I am going to lock in that as the problem by placing it in a 16x GPU slot and hook up the 8th drive and do a parity check.  That will tell me the machine would make a good back up server.  

 

Second thing I will do is build out Gumby with more premium specs.

 

I ordered a new parts list for Gumby:

   https://www.amazon.com/gp/product/B09GP7V2W5/ref=ppx_yo_dt_b_asin_title_o03_s00?ie=UTF8&psc=1

   https://www.amazon.com/gp/product/B00Q2Z11QE/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1

   https://www.amazon.com/gp/product/B07V6132NX/ref=ppx_yo_dt_b_asin_image_o01_s00?ie=UTF8&psc=1

   https://www.amazon.com/gp/product/B00HS23QZO/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

 

The existing Node 804 is being demoted to backup "Pokey" server and getting one of my two HBA SAS controler cards.  Since it has only one second x16 size lane that is x4 it can never be a VM machine as well as a large array server.  So, I have a spare AMD Ryzen 5 5600x 6 core 12 thread processor.  That will make it stable and powerful beyond it's needs as a backup and reverse proxy.

Link to comment
12 hours ago, gumby327 said:

I really wish someone would have helped me catch my mistake.  That is RAID, not HBA.  I just thank my lucky stars I did not install it and learn afterwords as it initialized a RAID array and all my data was gone.

 

12 hours ago, gumby327 said:

 

 

Edited by gumby327
Link to comment

So, the experiment to place it in a x16 slot, load it up with 8 drives and run it was a fail.  Shortly after the array rebuild it's failed drive, another one died.  So, I know when this goes to be my backup server, there will be no parity, and it will have only 5 drives.  In fact, for that case it does not even need the SAS HBA, so I probably will just set that card aside.  Either the drive allocation in the motherboard or chip is bad or the card has a flaw.  The new motherboard has a lot more strength and is will be a full mini SAS cable not these tiny ones.  But that board has 8 SATA ports out the gate, so all I will need off the add on card is 3 or 4 more lanes.

Link to comment
19 hours ago, gumby327 said:

It was fairly OK for 7 drives, but when you placed a 8th drive in you started getting CRC errors. 

Like mentioned using an x4 slot won't cause that kind of problems.

 

6 hours ago, gumby327 said:

So, the experiment to place it in a x16 slot, load it up with 8 drives and run it was a fail. 

Not surprising, the problem could be the HBA itself or other issue, like power, etc.

Link to comment
On 3/6/2022 at 3:31 AM, JorgeB said:

Like mentioned using an x4 slot won't cause that kind of problems.

 

Not surprising, the problem could be the HBA itself or other issue, like power, etc.

well, brand new motherboard, and same old problem, so the only thing remaining is the HBA controller card is bad.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.