Merrrp Posted July 7, 2020 Posted July 7, 2020 (edited) Background I moved my Unraid box in to a new case the other day and moved it on to a new motherboard in the process, as well as adding a new PCIE SATA card for more ports. Initially Unraid started up and found all my drives perfectly fine and all seemed well. Shortly after there were a phenomenal amount of errors (turning a monitor on attached to the Unraid box was a hyperspeed mess of constant errors), the web GUI became barely responsive, and after an hour or two of being unable to get it to power off cleanly I had to force it off. I checked cables/connectors etc, all fine, then realised the issue was my motherboard BIOS version (MSI X470 with 3rd Gen Ryzen BIOS version but running a 2nd Gen Ryzen) so I downgraded back as far as MSI will allow me to the last 2nd gen Ryzen version. Starting Unraid back up again everything seemed to be running ok for about a day, the array started on boot and began a parity check, despite me setting it not to auto start the array as I was going to run disk checks (some errors had logged about xfs needing repairing). I let it continue and about 60% of the way through I was at 3 errors. Current Issue(s) When I looked at Unraid again today my parity drive has been disabled following whatever happened with the parity check, that disk has over 20 billion read errors, the parity check finished with lots of errors, and 2 more drives in the array have millions of read errors, and my log is at 100% (syslog is the one taking up the space I think). Unraid was performing a 'read check' but had stopped, initially I resumed it but then changed my mind and instructed it to stop the array. I am currently unable to cleanly shut down or reboot, the array shows as stopped in the web GUI but the status at the bottom says 'Array Stopping Sync filesystems...'. Can anyone offer me some help in trying to fix this situation please? I'm a bit out of my depth at this point to be honest. Diagnostics attached. Edit: I'm using Unraid v6.8.3 (nvidia driver version) risa-diagnostics-20200707-0949.zip Edited July 7, 2020 by Merrrp Quote
JorgeB Posted July 7, 2020 Posted July 7, 2020 Problem with the onboard SATA controller, this is quite common with Ryzen boards, there are several reports that using the newer kernel on the v6.9 betas fixes the problem. Quote
Merrrp Posted July 7, 2020 Author Posted July 7, 2020 Oh I see, I was previously on a B450 and never encountered any issues on that so I just assumed X470 would also be fine being from the same generation. I've checked and indeed the 3 drives with read errors are the only array drives on the onboard controller. The cache pool was also running from the onboard controller but I hadn't noticed any issues with that, but I might have just missed them if there were. I've forced the system off for now, I'll look at swapping back to my old B450 motherboard later on since that worked fine despite sadly having less PCIE slots. I wonder if trying some different BIOS versions might help since there are so many issues with the BIOS on this MSI board but I don't think it's worth the hassle at this point. Thanks for your input, I'll post an update when I've swapped everything back over again this evening. Quote
JorgeB Posted July 7, 2020 Posted July 7, 2020 IIRC it also happens with some B450 based boards, it doesn't happens with all Ryzen boards, regardless of chipset, best bet it to use v6.9. Quote
Merrrp Posted July 7, 2020 Author Posted July 7, 2020 Thanks, I guess I must've got lucky with the last board (Gigabyte B450M DS3H). Rather than swap motherboards I've got version 6.9.0-beta22 on now and I've set off a read check, fingers crossed. As an aside I realised the reason my array was auto starting when I thought I'd configured it not to, was because when it's in a somewhat broken state (for example when I stop the array) none of the settings pages that I've tried work, the HTTP requests just hang so I hadn't actually changed that setting technically. Quote
Merrrp Posted July 8, 2020 Author Posted July 8, 2020 So the read check seemed to be going fine with no read errors, I grew impatient and cancelled it about 2 thirds through to set off a parity rebuild which seemed to be going fine so I went to bed with it at about 60/70% with no errors. This morning it was unresponsive, there was a kernel panic error on the monitor attached (I may be misremembering the error), so I forced it off and booted up again. Everything appeared fine so I tried setting another rebuild off this morning that's been going all day, and at about 90-95% the web UI suddenly dropped and there's a kernel panic error on the monitor. I've forced it off again, and it's booted up again appearing as though everything's fine. I think I'm going to have to switch back to the previous motherboard, as much as I wanted the extra slots after the trouble I've had with it in other builds in hindsight I don't know what I was thinking using it in this system. I've attached the diagnostics from after booting up following each crash in case that helps shed light on the situation. Thanks again for your help, hopefully swapping back will get me up and running again. risa-diagnostics-20200708-1119.zip risa-diagnostics-20200708-1933.zip Quote
JorgeB Posted July 8, 2020 Posted July 8, 2020 First thing to do would be to respect the max supported RAM speeds depending on the config, you're currently overclocking the RAM, also check the power supply idle control setting. 1 Quote
Merrrp Posted July 8, 2020 Author Posted July 8, 2020 Oh wow I didn't realise it need to be as low as 1866Mhz for 4 sticks, I'll try reconfiguring and giving it one more go. Quote
Merrrp Posted July 9, 2020 Author Posted July 9, 2020 I set my RAM clock down to 1866Mhz, disabled C states, and set power supply idle control to typical. I've managed to complete a parity rebuild overnight now and got everything started up etc, it looks as though all is well so far, so that may well have done the trick. Thanks once more for your help, much appreciated! 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.