Constant Read Errors taking down array

DarkAvernus · July 5

Hi all,

I am having real trouble trying to narrow down my current problem and hoping the community can help.

I have a rather large array and I am finding that im constantly dropping disks to read errors.
I have already replaced all the SAS cables for this server, rotated the molex power cables, reseated the HBA, updated the HBA and rotated the HBA ports.
I am expecting that the problem is either the backplane or power to the backplane and i think SMART looks okay with the exception of really high DWORD AND disparity error counts especially on Disk 8.

I did notice though that while Disk 5 and 8 are now disabled, Disk 6 and 7 also showed errors on the unraid UI, but its not reflect in the smart files. For the record these drives are all on the same backplane.

Ill be honest though i now have no idea, i feel like ive tried everything short of replacing the drives.

This is my first time dealing with constant errors like this but from what i can read, the SAS drives that are disabled and experiencing high error counts are all on the same backplane as per the logs attached.

While i wait for a response and some feedback, I plan to swap in a spare drive into the Disk 8 slot because of the numbers im seeing on that drive, but im not holding my breath as mentioned, i think its that whole row which has an issue.

Also thinking i might shuffle the physical locations of the drives out of that backplane as if the issues remain on that backplane OR if the issues follow the disks, that gives me more information.

Would love some help on this one though

plexserver-diagnostics-20240705-1615.zip

JorgeB · July 5

Jul  5 12:14:11 PlexServer kernel: sd 1:0:7:0: device_unblock and setting to running, handle(0x0020)
Jul  5 12:14:12 PlexServer kernel: sd 1:0:7:0: Power-on or device reset occurred
Jul  5 12:14:13 PlexServer kernel: sd 1:0:6:0: device_block, handle(0x001f)
Jul  5 12:14:13 PlexServer kernel: mpt3sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Jul  5 12:14:14 PlexServer kernel: sd 1:0:6:0: device_unblock and setting to running, handle(0x001f)
Jul  5 12:14:15 PlexServer kernel: sd 1:0:6:0: Power-on or device reset occurred

These "Power-on or device reset occurred" errors happening for multiple disks usually mean a power/connection problem, so could still be PSU, cables, etc, or bad backplane for example.

DarkAvernus · July 5

Just so i can learn for the future, where in the diagnostic was this hidden?

I actually have redundant molex connections so I'm using this now to try and debug, but it might be a good time to buy a new PSU anyway.

I have already replaced the SAS cables so I doubt this is causing the problem and because its a power issue of sorts, I can eliminate a HBA problem.

Might purchase that PSU and just hope its not the backplane. Thanks for the help

itimpi · July 5

48 minutes ago, DarkAvernus said:

Just so i can learn for the future, where in the diagnostic was this hidden?

That would be the syslog file in the ‘logs’ subfolder. It is the same as the one you can view via the icon at the top right of the GUI.

Constant Read Errors taking down array

Recommended Posts

DarkAvernus

Link to comment

JorgeB

Link to comment

DarkAvernus

Link to comment

itimpi

Link to comment

Join the conversation