DataHearth Posted October 27, 2021 Share Posted October 27, 2021 (edited) Hello there! I've encountered many issues with disks disabled this year. Every weeks or even shorter (approximately), I have a disk disabled on my server and the whole server's applications (docker, VMs, etc...) are going down (simply nothing work). I simply remove the disk from the array via the main tab, start the array, stop it again and then add the failing drive. IF stop the array failed (the drive couldn't be unmounted), I simply restart the server and continue the previous explained steps. What is weird with this issue, it is not specific to a drive. All drives are subject to this issue. I've already replaced my SATA cables, but it doesn't solved the issue. So I was wondering if someone can help me diagnose this thing :). I think it is a hardware issue (almost sure) but, as said, I've already replaced my SATA cables (which was the least expensive action to do). After, if I need to repair more expensive stuff life MB, drives power supply, I need to be sure this will solve the issue ^^. My though is that I might have a problem with the motherboard SATA sockets. Last option, a power supply issue (but I don't think that would occur a such error). I've attach the latest diagnostic after a drive was disabled (~5mins before). Thank you in advance ! cronos-diagnostics-20211027-1437.zip Edited November 6, 2021 by DataHearth solved Quote Link to comment
trurl Posted October 27, 2021 Share Posted October 27, 2021 Since most of your disks are currently disconnected, I suspect a power problem. What is your PSU and how is it connected to drives? Are there splitters involved? Quote Link to comment
DataHearth Posted October 27, 2021 Author Share Posted October 27, 2021 (edited) Thanks for the quick response :). Here's the power supply : "Corsair RM850x (v2), 850W" I'me using a simple Nvidia MSI GeForce GT 710 + an AMD Ryzen 9 3900X. I'm gonna try tonight to remove the GPU and wait for a another week to check if something is happening. The drives are simply connected with the base cables given with the power supply (like this). Edited October 27, 2021 by DataHearth Quote Link to comment
JorgeB Posted October 27, 2021 Share Posted October 27, 2021 Oct 27 14:34:08 cronos kernel: ahci 0000:02:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0001 address=0x9f400000 flags=0x0000] Problem with the onboard SATA controller, this is quite common with some Ryzen boards, look for a BIOS update or use an add-on controller. Quote Link to comment
ChatNoir Posted October 27, 2021 Share Posted October 27, 2021 33 minutes ago, DataHearth said: The drives are simply connected with the base cables given with the power supply (like this). How many drives on a single lead from the PSU ? Quote Link to comment
DataHearth Posted October 27, 2021 Author Share Posted October 27, 2021 @ChatNoir One with 4 3.5" server grade drives and 2 one another one (one SSD and a "normal" 2.5" HDD) Quote Link to comment
DataHearth Posted October 27, 2021 Author Share Posted October 27, 2021 @JorgeB arf, I was really hopping that wouldn't be motherboard related... Well, gonna check tonight for BIOS updates. I don't any (yet) other SATA controllers. Quote Link to comment
DataHearth Posted October 27, 2021 Author Share Posted October 27, 2021 I've updated my BIOS. It was outdated from 1 major version. I'll see in one week if this is fixing the problem. Else, I'll just troubleshoot the PSU splitter cable and then if not successful, change my motherboard (as it needs to be upgraded either way). I'll keep this post up to date ^^. Quote Link to comment
DataHearth Posted November 6, 2021 Author Share Posted November 6, 2021 Update: it's been almost 10 days without disk failure after a BIOS update. I guess the BIOS update did solve the problem for now. But I'm still gonna checkout other possible issues. Thanks for the help guys ! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.