[Solved] Multiple Read Errors, can't figure out what is causing it.


Recommended Posts

Hi guys,

 

I have been trying to figure out the past few days what may be causing these read errors that I am getting on whats seemingly to be random drives. I've attached the diagnostic file for reference.

 

At first, I thought it was just bad disks, but the smart test did not fail. I did a parity-sync to rebuild the array and went on my way. The very next day, I would get multiple disk errors, I figured this can't be a bad disk. I remade the array via the "New Config" method, but now I am getting 5 disk errors, and it seems endless. I was thinking its possibly cabling, so I have ordered replacement cables, but still not sure.

 

I am hoping if I can seek some assistance in confirming the diagnosis. Thanks in advance.

s-ephesus-diagnostics-20200630-2232.zip

Edited by lolsamsam
Link to comment
7 hours ago, johnnie.black said:

Problems appear to start after the spindown command, try temporarily disabling spindown to see if it makes any difference.

Thank you for the response, per your recommendation, I disabled spin-down and the errors have not occurred since.  Though I am not sure if I like the idea of keeping the drives spun up 24/7, is this good or bad.

For recent changes in hardware, I recently added a SAS Expander, RES2SV240. 4 ports to the HDD backplanes, 2 ports to the LSI 9200 HBA card. Would the SAS Expander be the culprit? I have no SAS drives, all are SATA drives.

Link to comment
1 minute ago, lolsamsam said:

Would the SAS Expander be the culprit?

My main suspect would be that specific Seagate model (or that disk together with an LSI, it might spindown on a different controller):

 

Model Family:     Seagate IronWolf
Device Model:     ST12000VN0007-2GS116

 

It appears only disks from that model had issues, at least this time, also I have multiple LSIs from same/similar models and also use that same Intel expander and there are no spindown issues, you can look for a firmware update for those disks, IIRC Seagate release a new firmware for the 10TB IronWolf model because it had issues with LSI controllers, not sure they apply to the 12TB model.

Link to comment
  • 3 weeks later...
On 7/1/2020 at 9:25 AM, johnnie.black said:

My main suspect would be that specific Seagate model (or that disk together with an LSI, it might spindown on a different controller):

 

Model Family:     Seagate IronWolf
Device Model:     ST12000VN0007-2GS116

 

It appears only disks from that model had issues, at least this time, also I have multiple LSIs from same/similar models and also use that same Intel expander and there are no spindown issues, you can look for a firmware update for those disks, IIRC Seagate release a new firmware for the 10TB IronWolf model because it had issues with LSI controllers, not sure they apply to the 12TB model.

 
 
 

It had been a few weeks to test this out again, I have kept my server up by making sure all the disks continue to spin. 

So I did some research on my end, and I was unable to find new firmware upgrades for this 12TB model, and I continue to have this spin down issue. I will be trying to use a 2nd LSI HBA (the one I had previously slotted) to plug it in and see if the issue continues. I am going to be pretty sad to find out if it really was the SAS expander (I am in denial that it is, but my mind keeps going back to this). I saw a few pics of folks with this particular SAS expander, so I was inspired to use this.

So bizarre though its JUST when it spins down it goes haywire and have read errors everywhere.

Link to comment

Quick update: I took out the SAS expander and the problem persists. I am now thinking its possible its the HDD backplane? I recently switched from a NORCO 4220 case to an iStarusa E4M20 case. I am also now having trouble stopping the array without forcing terminal commands to stop disk activity.

 

Thank you for the help so far, I am at a loss. Would it be possible that this is XFS related? As I need to repair it?

Link to comment

Consider this solved.

 

I made a knucklehead mistake. I had only plugged in 5 of the TEN molex 4 pin ports on the case backplane. (I feel like it was a miracle the thing ran at all.) After rectifying my mistake, turned spin-down on and crossed my fingers...made it through the night without errors!

 

Thanks for providing support.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.