June 9, 20251 yr I know this must be hardware but polling the crowd. I have been getting hard lockups since 7.0, where the monitor presents the login/terminal, but it's locked so can't type anything and I lose access to the GUI. I have to reset the server to get it to come back.BIOS is fully up to date,Ive tested with C-States disabledI wiped my USB and did a clean install with no plugins, same behaviorI did a full round of MEMTESTS with no errors foundCPU is not overheating, its hovering around 60 degrees on the high end which is fine for RyzenNo logs, it locks up without any logging, nothing out of the ordinaryAny ideas? I feel like Ive exhausted all my options besides replacing RAM just in case, or potentially a CPU issue, but Id expect logs if that were the case. Also, I don't think the USB would go bad without errors, would it be possible for this to happen if the USB was going bad?I have a Ryzen 7950X with 2x32GB DDR5 RAM.Edit: This was resolved by going into BIOS and disabling PBO and all Core Boost options, indicating a CPU hardware issue. I have emailed AMD to see if they will RMA it. unraid-diagnostics-20250609-2139.zip Edited June 11, 20251 yr by Jclendineng Added diag
June 10, 20251 yr Author Another thing, I was thinking my HBA may be overheating but I have great airflow so that’s unlikely. I can go a while without a lock up or I can go 5 minutes. If I’m working in the UI for a bit and copying a file between disks for example it will lock up in about a minute, but if I’m just leaving the server alone and am not using it, it seems to be fine.
June 10, 20251 yr Community Expert memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.
June 10, 20251 yr Author I’ll try that next time it hangs. I moved 500gb between drives to try and trigger it, and started a drive replacement to see what the process was and no lockups in about a day. It’s so strange it happened maybe 10 times yesterday. I replaced my PSU with a higher wattage one in case that was the culprit, no dice. I just can’t get it to be consistent.Checked slack logs as well and everything is hunky dory, just will randomly lock up and require a hard reset.Edit. After the drive finishes importing and silvering I’ll hop into the gui and fiddle with it to try and trigger. Edited June 10, 20251 yr by Jclendineng
June 11, 20251 yr Author Solution Ok, crashed again. I’m starting by removing my intel arc GPU and will try and crash it again, then I’ll remove a ram stick.Edit. Crashed without the Arc card, I then removed a ram stick and crash, replaced the other one and still crashed. I’m disabling PBO and all core boost in BIOS to see if the CPU is going bad.Edit. After disabling all CPU boosts in BIOS I transferred about 1TB of data from multiple sources to the disks, along side running a couple VMs. No crash as of this morning, I’ll hammer it today and try and get it to lock up, if that “fixes” it, it means I have a bad 7950x which isn’t unheard of. Edited June 11, 20251 yr by Jclendineng
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.