Jump to content

Constant Read Errors taking down array


Go to solution Solved by JorgeB,

Recommended Posts

Hi all, 


I am having real trouble trying to narrow down my current problem and hoping the community can help. 

I have a rather large array and I am finding that im constantly dropping disks to read errors. 
I have already replaced all the SAS cables for this server, rotated the molex power cables, reseated the HBA, updated the HBA and rotated the HBA ports. 
I am expecting that the problem is either the backplane or power to the backplane and i think SMART looks okay with the exception of really high DWORD AND disparity error counts especially on Disk 8.

I did notice though that while Disk 5 and 8 are now disabled, Disk 6 and 7 also showed errors on the unraid UI, but its not reflect in the smart files. For the record these drives are all on the same backplane. 

Ill be honest though i now have no idea, i feel like ive tried everything short of replacing the drives. 

This is my first time dealing with constant errors like this but from what i can read, the SAS drives that are disabled and experiencing high error counts are all on the same backplane as per the logs attached. 

While i wait for a response and some feedback, I plan to swap in a spare drive into the Disk 8 slot because of the numbers im seeing on that drive, but im not holding my breath as mentioned, i think its that whole row which has an issue. 

 

Also thinking i might shuffle the physical locations of the drives out of that backplane as if the issues remain on that backplane OR if the issues follow the disks, that gives me more information.

Would love some help on this one though

plexserver-diagnostics-20240705-1615.zip

Link to comment
  • Solution
Jul  5 12:14:11 PlexServer kernel: sd 1:0:7:0: device_unblock and setting to running, handle(0x0020)
Jul  5 12:14:12 PlexServer kernel: sd 1:0:7:0: Power-on or device reset occurred
Jul  5 12:14:13 PlexServer kernel: sd 1:0:6:0: device_block, handle(0x001f)
Jul  5 12:14:13 PlexServer kernel: mpt3sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Jul  5 12:14:14 PlexServer kernel: sd 1:0:6:0: device_unblock and setting to running, handle(0x001f)
Jul  5 12:14:15 PlexServer kernel: sd 1:0:6:0: Power-on or device reset occurred

 

These "Power-on or device reset occurred" errors happening for multiple disks usually mean a power/connection problem, so could still be PSU, cables, etc, or bad backplane for example.

Link to comment

Just so i can learn for the future, where in the diagnostic was this hidden? 

I actually have redundant molex connections so I'm using this now to try and debug, but it might be a good time to buy a new PSU anyway. 

I have already replaced the SAS cables so I doubt this is causing the problem and because its a power issue of sorts, I can eliminate a HBA problem. 

Might purchase that PSU and just hope its not the backplane. Thanks for the help 

Link to comment
48 minutes ago, DarkAvernus said:

Just so i can learn for the future, where in the diagnostic was this hidden? 

That would be the syslog file in the ‘logs’ subfolder.   It is the same as the one you can view via the icon at the top right of the GUI.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...