ZVeguillaCotto Posted August 31 Share Posted August 31 (edited) I've been struggling with my system for a couple of months now. The issue is the array reports drive errors that bogs down the system until a reboot. The errors disappear and come back after a non-consistent number of hours or tasks. I've looked at the disks reports and don't see anything wrong (permanent). I have re-seated all my components to include RAM, PSU, HBA, cables, drives. I've changed HBAs, RAM, and cables. I have 2 diagnostics I will attach, they were done within second of the errors occurring and with the system up for a short time. The system has gone weeks without issues in the past but I tough diags for those times would be harder to parse. Some things that come to mind to mention: The behavior has occurred even when running in safe mode, GUI, no GUI and any other combination. I methodically deleted and reinstalled each docker container to avoid runaway issues. The system ran flawlessly for around 4 months and I had 5 drives total then, I THINK the issue appeared when the array grew to 6+ drives. I managed to come back down to 6. Finally managed to update BIOS a couple of days ago (seemed complicated on this MB). I'm starting to feel a little insane about this and have been glued at my computer for weeks but evidently this is beyond my knowledge and ability to google. TIA to anybody that can help. *HBA current 9600-24i *drive cables current 2x (SFF-8654 8i to 2x (4x SFF-8643)) hl15-diagnostics-20240828-2034.zip hl15-diagnostics-20240830-2050.zip Edited August 31 by ZVeguillaCotto Quote Link to comment
Solution JorgeB Posted August 31 Solution Share Posted August 31 All disks dropped offline, most probable reasons would be a power/connection issue or the controller. Quote Link to comment
ZVeguillaCotto Posted August 31 Author Share Posted August 31 3 hours ago, JorgeB said: All disks dropped offline, most probable reasons would be a power/connection issue or the controller. The disks that produce the errors are different each time. I've changed disk bays also. I have a couple of additional theories: Files got corrupted once upon a time, each time that file is read the errors starts (no idea if possible). Backplane is damaged and thermal expansion causes errors. It hadn't occurred to me to connect drives directly to HBA to bypass backplane. I will attach disks directly to HBA today and report back. Quote Link to comment
JorgeB Posted August 31 Share Posted August 31 9 hours ago, ZVeguillaCotto said: Files got corrupted once upon a time The errors are because the disks are dropping, it has nothing to do with files or data, it's a hardware issue. Quote Link to comment
ZVeguillaCotto Posted September 2 Author Share Posted September 2 On 8/31/2024 at 6:30 PM, JorgeB said: The errors are because the disks are dropping, it has nothing to do with files or data, it's a hardware issue. I have been running the drives directly to the HBA and PSU, bypassing the backplane for about a day and a half. I haven't seen any errors yet. Will keep updating as the days go on. Thanks. 1 Quote Link to comment
ZVeguillaCotto Posted September 7 Author Share Posted September 7 1 week update. The issue has not returned... yet. Will mark as solved. Will keep updating if relevant. I read about connector issues being the leading cause for this issue and thought I tried everything to rule it out. I had missed removing the backplane from the equation. Thanks to @JorgeB for the help. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.