kevschu Posted March 11, 2020 Share Posted March 11, 2020 (edited) I have been using unraid for a few years now, and was on 6.8.0 without any issues before, using an older intel board and an i7-3990. Recently I replaced that board and CPUwith an Asus B450 board and Ryzen 3 3200G. Since then, once a week or so, sometimes longer, the system becomes unavailable on the network. It cannot be pinged, and my router shows it as offline (system is turned on and there weren't any power outages or brownouts). If I turn the system off by pressing the power button and letting Unraid do its thing to turn off, once the systems comes back up, I always end up having the same data drive (same drive each time) in a disabled status. schumachertower-diagnostics-20200310-2115.zip Edited March 11, 2020 by kevschu clarify the disabled drive is a data drive. Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 See if this helps: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 P.S. Ryzen 3 3200G is a second gen Ryzen, not third as the model implies. Quote Link to comment
kevschu Posted March 11, 2020 Author Share Posted March 11, 2020 Thank you for the reply. Didn't know the 3200G was considered a 2nd gen chip. Going through the link that was shared will work on the following to see if it brings stability and resolves my issue. 1. Set syslog server up, just to make sure I have the diagnostic data saved somewhere safe on the reboots 2. check the "Power Supply Idle Control" in the BIOS, I don't believe it is set to the suggested "typical current idle". 3. Swap the memory out as I am using DDR4 3200, and the CPU only supports up to 2933. Maybe swap it out for 2400 or 2666, 2933 seems niche based on what I see as available options on, say, Newegg. Am I missing anything else from that article? There weren't ever any errors listed on the drive that keeps showing up as Disabled, and I have copied the data off of that drive onto other drives so I am not worried about cloning it. Quote Link to comment
JonathanM Posted March 11, 2020 Share Posted March 11, 2020 12 minutes ago, kevschu said: Swap the memory out as I am using DDR4 3200, and the CPU only supports up to 2933. Typically speed ratings are a maximum, so your current RAM if it's on the approved part list for that motherboard should be fine running at the speed called for by the CPU combined with the specific layout of your RAM. There should be no need for different sticks, just run them at the approved speed. It's like getting supercar rated tires and putting them on a sedan. The only issue is that you probably overpaid a little because you will never run them at top speed. 1 Quote Link to comment
kevschu Posted March 11, 2020 Author Share Posted March 11, 2020 I would suspect my single stick of memory is fine then, it is registering at 2400MHz in the bios. Quote Link to comment
trurl Posted March 11, 2020 Share Posted March 11, 2020 11 hours ago, kevschu said: If I turn the system off by pressing the power button and letting Unraid do its thing to turn off, once the systems comes back up, I always end up having the same data drive (same drive each time) in a disabled status. Did you rebuild the disabled drive? It won't get enabled unless you rebuild it (recommended) or set a New Config and rebuild parity instead. Quote Link to comment
kevschu Posted March 11, 2020 Author Share Posted March 11, 2020 I have rebuilt the drive in the past. I will end up rebuilding it again. I would like to try and figure out why this scenario of the system becoming unavailable, and then that drive in a disabled state when it recovers keeps happening. Quote Link to comment
trurl Posted March 11, 2020 Share Posted March 11, 2020 1 hour ago, trurl said: Did you rebuild the disabled drive? The reason I asked is because I wondered if some problem updating your flash drive made it forget that disk had been rebuilt. If you rebuild it and reboot on purpose, does it still think it needs to be rebuilt? Quote Link to comment
kevschu Posted March 11, 2020 Author Share Posted March 11, 2020 hmmm. would need clarification on "reboot on purpose". if i. 1. rebuild the drive 2. start using it again 3. gracefully reboot using the unraid UI 4. system comes back up and the drive is still fine. If 1. Unraid becomes unavailable 2. power device off using physical power button on system 3. power device on 4. system comes back up and the drive is disabled. Quote Link to comment
trurl Posted March 11, 2020 Share Posted March 11, 2020 4 hours ago, kevschu said: gracefully reboot using the unraid UI ^this Quote Link to comment
kevschu Posted March 12, 2020 Author Share Posted March 12, 2020 Yeah, doing that it comes up fine. it is only after an ungraceful reboot when unraid becomes unresponsive that the drive becomes disabled. but it is always that one drive. Quote Link to comment
trurl Posted March 12, 2020 Share Posted March 12, 2020 Rebooting starts syslog over. The point of setting up syslog server is to get the syslog saved so that after a hang or crash forces a reboot, you can get the syslog from before the reboot. Maybe that will give a clue. Quote Link to comment
kevschu Posted March 13, 2020 Author Share Posted March 13, 2020 Alright, so, that didn't take long for it to become unresponsive this time. I had to hard power it down again, but I think adjusting the "shutdown timeout" in disk settings helped because the disk that usually shows up as disabled wasn't disabled when it came back online, this time. I do have the syslog on the syslog server though. Is there a way for me to anonymize it, or will running the diagnostic collector grab it from that location and do that for me? Quote Link to comment
Dissones4U Posted March 13, 2020 Share Posted March 13, 2020 (edited) 25 minutes ago, kevschu said: I do have the syslog on the syslog server though. Is there a way for me to anonymize it, or will running the diagnostic collector grab it from that location and do that for me? Diagnostics does not grab the syslog from the syslog server file folder it would only be since last reboot. You'd need to upload the file from the syslog folder. If you need to anonymize (I don't think you do) open the text file hit CTRL+F and search for the term(s) you want to delete or replace and do so. Might as well include diagnostics too.... Finally, looking at JB's link above did you read all of this about Global C-state Control (it's a link within the first link) Edited March 13, 2020 by Dissones4U typo and added link Quote Link to comment
kevschu Posted March 13, 2020 Author Share Posted March 13, 2020 (edited) I looked through the thread. I've ensured that "Global C-State" is disabled, and that the PSU Idle setting is set to typical. I've also added "rcu_nocbs=0-3" to the syslinux configuration. The only change this time will be the rcu-nocbs setting, as the two power settings were already configured during this last lock. attached is the syslog and new diags. This portion of the syslog seems important.. schumachertower-diagnostics-20200312-2016.zip syslog-192.168.0.3.log Edited March 13, 2020 by kevschu possibly relevant information from log Quote Link to comment
JorgeB Posted March 13, 2020 Share Posted March 13, 2020 Still SATA controller problems: Mar 12 16:34:37 SchumacherTower kernel: ahci 0000:02:00.1: AHCI controller unavailable! Mar 12 16:34:38 SchumacherTower kernel: ata10: failed to resume link (SControl FFFFFFFF) Mar 12 16:34:38 SchumacherTower kernel: ata10: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Mar 12 16:34:43 SchumacherTower kernel: ata10: hard resetting link Mar 12 16:34:43 SchumacherTower kernel: ahci 0000:02:00.1: AHCI controller unavailable! Mar 12 16:34:44 SchumacherTower kernel: ata10: failed to resume link (SControl FFFFFFFF) Mar 12 16:34:44 SchumacherTower kernel: ata10: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Mar 12 16:34:44 SchumacherTower kernel: ata10: limiting SATA link speed to <unknown> Mar 12 16:34:49 SchumacherTower kernel: ata10: hard resetting link Mar 12 16:34:49 SchumacherTower kernel: ahci 0000:02:00.1: AHCI controller unavailable! Mar 12 16:34:50 SchumacherTower kernel: ata10: failed to resume link (SControl FFFFFFFF) Mar 12 16:34:50 SchumacherTower kernel: ata10: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Mar 12 16:34:50 SchumacherTower kernel: ata10.00: disabled ... Mar 12 16:35:05 SchumacherTower kernel: ata9.00: disabled ... Mar 12 16:35:59 SchumacherTower kernel: ata6.00: disabled ... Mar 12 16:35:59 SchumacherTower kernel: ata5.00: disabled Multiple disks dropping offline, there are also several NIC related errors, if you can I would try with a different board, that one appears to have issues, either actual problems or compatibility issues with Linux. 1 Quote Link to comment
kevschu Posted March 13, 2020 Author Share Posted March 13, 2020 I will need to order a new board and test it out. thank you! Quote Link to comment
kevschu Posted March 17, 2020 Author Share Posted March 17, 2020 Well, I replaced the Asus Prime B450M-A/CSM motherboard with an ASRock X570M Pro and things have been running solid for the last 3 days. Quote Link to comment
rilles Posted May 13, 2020 Share Posted May 13, 2020 On 3/17/2020 at 1:20 AM, kevschu said: Well, I replaced the Asus Prime B450M-A/CSM motherboard with an ASRock X570M Pro and things have been running solid for the last 3 days. I have the same Asus board with a 2200g, and have the same SATA/network dropping but only under heavy sustained cpu load. Interesting that a mobo changed this - I wonder if its the chipset or the manufacturer that improved things. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.