Recently, my server crashed and the trouble seems to be with multiple hard drives. The problem began with a "recertified Exos" drive that I recently added to the setup. This drive randomly disabled itself after reporting 1024 read errors. Attempting to re-enable and rebuild the drive, I noticed the process persistently halted at exactly 1024 read errors, following which the drive proceeded into a Read Check.
Intriguingly, during my second attempt at rebuilding the drive, another older drive in my setup reported over 1 million read errors (although all were corrected). Yet, the rebuilding process stopped once again when the Exos drive reached 1024 read errors, similar to my previous experience.
Upon a third rebuild attempt, the older drive's read errors escalated dramatically, now numbering over 64 million. Additionally, a third drive began to report issues, showing 5 million read errors. Despite these alarming numbers, all drives have reported normal statuses via SMART diagnostics.
Given this complexity, I am unsure whether a faulty drive might be causing read errors on the other drives or why the Exos drive consistently halts at 1024 read errors. I also wonder if these issues could be linked to a fault in one of my two LSI 9207-8i HBA cards, or its just one drive causing a chain of events.
I'm not sure how to proceed.
server3.0-diagnostics-20230802-1036.zip