gumby327 Posted March 2, 2022 Share Posted March 2, 2022 (edited) I had a AMD Ryzen 7 5700X and then a *G. The problems mounted to a point where the whole thing is no longer worth experimenting on. I went from bit-rot, to CRC failures on over 10 drives and I had drives error out with little explanation. I changed from Marvel to HBA SAS, I changed cables, nothing would last more than a week. It was a total waste of time and thousands of dollars. I have an Intel across the workshop that is stable never ha d a single problem. It even works fine on the disks failed in either of my two AMD Ryzen chips. The processor sensors have never detected, I cannot load drivers... At the end of the horrible experience I have embraced change. So, it is time for clean slate. I am thinking of building a Intel server. My parts list is: https://www.amazon.com/dp/B09D1HDPQT/?coliid=IPY2LP8W0O5IH&colid=2VOOCJZIV4PU5&psc=1&ref_=lv_ov_lig_dp_it https://www.amazon.com/dp/B086MN2XYL/?coliid=I1NXKFT148T5KX&colid=2VOOCJZIV4PU5&psc=1&ref_=lv_ov_lig_dp_it oops, I need the one with the IGPU Is there any problems with this combination? Edited March 2, 2022 by gumby327 Quote Link to comment
JorgeB Posted March 2, 2022 Share Posted March 2, 2022 4 hours ago, gumby327 said: oops, I need the one with the IGPU Yes, get the non F model, as for the board it should work fine, just not sure about the NIC, should be fine with v6.10-rc, might not be with v6.9.x, and it's still a Realtek NIC, Intel would be better, also consider getting the non WIFI model since you won't be using that, unless you want to use it with a VM. 1 Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 Let me get my order in then. I really appreciate you folks. Quote Link to comment
SirReal63 Posted March 2, 2022 Share Posted March 2, 2022 I run a 5700G with zero issues, by chance have you tested or swapped out your RAM? Clearly the drives, CPU or disk controller were not the issue, that narrows it down quite a bit. I have run an A10-6790, 2200g, 1600AF, 2600, 2700x and the 5700G on a variety of motherboards with never a single issue. I do understand the frustration and wanting to try something different. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 That is haunting me, but the pattern I am seeing is disk 1 and or 3 start throwing CRC errors. I take them out put in a brand new drive and here today it lasted about 7 hours and all of a sudden it starts going nuts with CRC's. I have moved the two cables around, but nothing works what ever drive I stick in that spot fails. NODE 804: == parity1 == disk4 == parity2 == disk3 air ---> == disk6 == disk2 air ---> == disk5 == disk1 | | wires | PSU | _______________ Disk 1 is closest to the case cover. I don't know disk1 is literally 10 hours old now and it went out right away. disk3 is still fine. 7 and 8 are sata and on the floor in front of the case. So... let me take the ram out of server "Pokey" and place it in "Gumby" and take all of the ram out of "Gumby" and set it aside. I don't want to kill Pokey since it is my backup to the media library. The problem I may have is thermal pad on my CPU, the cooler has to come off to get at my ram. Did you ever get your CPU temp to report out? Quote Link to comment
SirReal63 Posted March 2, 2022 Share Posted March 2, 2022 CPU temps report with the MSI and Asrock board but did not with the Gigabyte board, motherboard temps are spotty with the MSI, sometimes they work after a reboot and sometimes they do not. Memory can be funny but it should have reported an error, though that is not guaranteed. Is the power supply new (ish) and does swapping the power from drive to drive cause the error to follow power cables? If it were me, even if I went with a different board and cpu, I would still be troubleshooting this one, too much money invested to just leave it alone. FWIW, the 5700G is the best cpu I have used to date, more than enough power and power consumption is really low. The 2700X was a power hog in comparison, I dropped 40 watts of total power with the cpu and cooler change and temps run from 25-30C whereas with the 2700X I would idle at 50C, even with an Arctic Liquid Freezer II 240 on it, which I hated having a water cooler on it. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 ok, I ran it with the other server's ram for about 10 minutes or so, the CRC errors are ticking up about on every second or so still even with different ram. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 (edited) I wonder what would happen if I placed this array on the other board (Intel one) and placed this dongle (flash drive) over there. It is another chipset and much older, and less cores. But that is suppose to be the power of unRAID Edited March 2, 2022 by gumby327 Quote Link to comment
JonathanM Posted March 2, 2022 Share Posted March 2, 2022 8 minutes ago, gumby327 said: I wonder what would happen if I placed this array on the other board (Intel one) and placed this dongle (flash drive) over there. It is another chipset and much older, and less cores. But that is suppose to be the power of unRAID Basic array functions should work fine, the only issues would be if you have passed through hardware for VM's. Quote Link to comment
SirReal63 Posted March 2, 2022 Share Posted March 2, 2022 14 minutes ago, gumby327 said: I wonder what would happen if I placed this array on the other board (Intel one) and placed this dongle (flash drive) over there. It is another chipset and much older, and less cores. But that is suppose to be the power of unRAID That is basically all I have ever one when changing cpu/motherboard/case. Now you know the ram is not the issue. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 new update, you are probably not going to believe this. NODE 804, when fully disked up with 12 HDD hard drives, may have too much case twist. Unfortunately I broke the IT rules. I tried two things. #1, I put half the questionable ram in Pokey, and I put the other half in Gumby. Then I swapped the SAS connection spots and pressed down on the controller card into the slot ... it was not seated correctly. Now I look at a screenshot of my UDMA CRC Errors in Unraid System Dashboard V2 and I see we are now rock steady. For five minutes. I am even watching 4 movies same time, ran mover as well as fire up my surveillance FTP server ... Been going for 10 minutes now and it has not had a single CRC detected. So, to me right now it looks like the NODE 804 case is to weak to handle carrying it up my ladder and placing it on the shelf when I am done working on it without checking all the cards for seating every time. Quote Link to comment
SirReal63 Posted March 2, 2022 Share Posted March 2, 2022 An improperly seated card can really mess with things. If that is all it then you got lucky, hopefully you can restore drives or at least make sure all of them are working correctly. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 yes, but at a half hour it started counting them (CRC errors) up again. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 I have more brand new disks here, but there comes a point where you toss in the towel. Quote Link to comment
SirReal63 Posted March 2, 2022 Share Posted March 2, 2022 There is no shame in replacing. Quote Link to comment
gumby327 Posted March 2, 2022 Author Share Posted March 2, 2022 ok, one more time. Quote Link to comment
gumby327 Posted March 3, 2022 Author Share Posted March 3, 2022 new drive is in. what I did was took array drive8 and assigned it to slot for drive1. So, it is on the motherboard SATA now. It has been running clean for a couple of hours. What I am not sure about is the LSI Broadcom SAS 9300-8i is x8 and it is in a x4 slot. I wonder if it is doing a multiplier for the 4 lanes, and if that is the case that has been noted as a problem with unRAID. Quote Link to comment
gumby327 Posted March 4, 2022 Author Share Posted March 4, 2022 so, that is the final deal. The PCIe architecture of that Micro ATX B550M has a limit and I flat out exceeded it. The machine cannot do Windows VM and also handle HBA loads same time and it seems to be about one express lane short in overall bandwidth. Quote Link to comment
JorgeB Posted March 4, 2022 Share Posted March 4, 2022 16 hours ago, gumby327 said: What I am not sure about is the LSI Broadcom SAS 9300-8i is x8 and it is in a x4 slot. Except for limiting the total bandwidth that won't be a problem, i.e., that by itself is not a reason for having stability issues, you can only use the available motherboard lanes, can't do anything to try and use more than the available number. Quote Link to comment
gumby327 Posted March 5, 2022 Author Share Posted March 5, 2022 I have had Gumby running steady no errors for over two days and locked in the actual problem. It was a HBA SAS PCIe 8x resting in a PCIe 4x slot. I always knew it would have restricted bandwidth in that slot, but what I had never guessed is the strategy it was using was port multiplier. It says half lanes so multiply for the other SAS rail. It was fairly OK for 7 drives, but when you placed a 8th drive in you started getting CRC errors. Today I am going to lock in that as the problem by placing it in a 16x GPU slot and hook up the 8th drive and do a parity check. That will tell me the machine would make a good back up server. Second thing I will do is build out Gumby with more premium specs. I ordered a new parts list for Gumby: https://www.amazon.com/gp/product/B09GP7V2W5/ref=ppx_yo_dt_b_asin_title_o03_s00?ie=UTF8&psc=1 https://www.amazon.com/gp/product/B00Q2Z11QE/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1 https://www.amazon.com/gp/product/B07V6132NX/ref=ppx_yo_dt_b_asin_image_o01_s00?ie=UTF8&psc=1 https://www.amazon.com/gp/product/B00HS23QZO/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 The existing Node 804 is being demoted to backup "Pokey" server and getting one of my two HBA SAS controler cards. Since it has only one second x16 size lane that is x4 it can never be a VM machine as well as a large array server. So, I have a spare AMD Ryzen 5 5600x 6 core 12 thread processor. That will make it stable and powerful beyond it's needs as a backup and reverse proxy. Quote Link to comment
gumby327 Posted March 6, 2022 Author Share Posted March 6, 2022 (edited) 12 hours ago, gumby327 said: Second thing I will do is build out Gumby with more premium specs. I ordered a new parts list for Gumby: https://www.amazon.com/gp/product/B09GP7V2W5/ref=ppx_yo_dt_b_asin_title_o03_s00?ie=UTF8&psc=1 https://www.amazon.com/gp/product/B00Q2Z11QE/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1 https://www.amazon.com/gp/product/B07V6132NX/ref=ppx_yo_dt_b_asin_image_o01_s00?ie=UTF8&psc=1 https://www.amazon.com/gp/product/B00HS23QZO/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 I really wish someone would have helped me catch my mistake. That is RAID, not HBA. I just thank my lucky stars I did not install it and learn afterwords as it initialized a RAID array and all my data was gone. 12 hours ago, gumby327 said: Edited March 6, 2022 by gumby327 Quote Link to comment
gumby327 Posted March 6, 2022 Author Share Posted March 6, 2022 So, the experiment to place it in a x16 slot, load it up with 8 drives and run it was a fail. Shortly after the array rebuild it's failed drive, another one died. So, I know when this goes to be my backup server, there will be no parity, and it will have only 5 drives. In fact, for that case it does not even need the SAS HBA, so I probably will just set that card aside. Either the drive allocation in the motherboard or chip is bad or the card has a flaw. The new motherboard has a lot more strength and is will be a full mini SAS cable not these tiny ones. But that board has 8 SATA ports out the gate, so all I will need off the add on card is 3 or 4 more lanes. Quote Link to comment
JorgeB Posted March 6, 2022 Share Posted March 6, 2022 19 hours ago, gumby327 said: It was fairly OK for 7 drives, but when you placed a 8th drive in you started getting CRC errors. Like mentioned using an x4 slot won't cause that kind of problems. 6 hours ago, gumby327 said: So, the experiment to place it in a x16 slot, load it up with 8 drives and run it was a fail. Not surprising, the problem could be the HBA itself or other issue, like power, etc. Quote Link to comment
gumby327 Posted March 8, 2022 Author Share Posted March 8, 2022 On 3/6/2022 at 3:31 AM, JorgeB said: Like mentioned using an x4 slot won't cause that kind of problems. Not surprising, the problem could be the HBA itself or other issue, like power, etc. well, brand new motherboard, and same old problem, so the only thing remaining is the HBA controller card is bad. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.