Quick, need help. Array showing absurd # of reads and writes on several drives. Shares not accessible


Recommended Posts

Reads are 22,617,230,667,232, Writes are 22,617,230,667,232.  And of course, thousands of errors.

 

Aug 10 21:12:27 Tower kernel: md: disk1 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk2 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk3 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk4 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk7 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk8 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk9 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk0 read error, sector=39195248
Aug 10 21:12:27 Tower kernel: md: disk1 read error, sector=39195256
Aug 10 21:12:27 Tower kernel: md: disk2 read error, sector=39195256
Aug 10 21:12:27 Tower kernel: md: disk3 read error, sector=39195256
Aug 10 21:12:27 Tower kernel: md: disk4 read error, sector=39195256
Aug 10 21:12:27 Tower kernel: md: disk7 read error, sector=39195256
Aug 10 21:12:27 Tower kernel: md: disk8 read error, sector=39195256

 

Not sure if it's my HBA controller that I removed, and installed back again.  Before that, it was ok.  No spare HBA.  As for the SFF-8087 cable, using 2 cables for 7 drives.  Can't be possible that all drives are failing if 1 of the cables are defective.

tower-diagnostics-20190810-1324.zip

Edited by jang430
Link to comment

BTW, I forgot to mention that it starts out ok.  No errors.  I can even access shares, then after a few minutes, those errors showed.

 

I've removed it, and placed it well once again.  But ok, will move slots this time.  Any other suggestions from other members?  Hope this is not the end of my HBA :D  Thanks itimpi

Edited by jang430
Link to comment

@itimpi, reseating the hba controller didn't help.  but unrolling my SFF8087 breakout cable did.  Now, able to access my array without problems.  It also performed a parity check, and found 0 errors.  Could it be possible that both my SFF8087 cables failed at the same time?  To be honest, I highly doubt it.  But I don't have additional cables to troubleshoot.  My cables since long, were a bit rolled.  But it has been like that before.  I've only removed, and reseated my HBA controller.  Don't know why this happened.

Link to comment

No idea why unrolling the cables worked unless they were putting some strain on the connectors so that vibration could make them momentarily break contact.     I personally got relatively short cables to reduce clutter.    I do know that it is recommended you do NOT try and tidy SATA cables by taping them together as this tends to increase cross-talk but not sure how true this really is.

Link to comment
16 hours ago, itimpi said:

 I do know that it is recommended you do NOT try and tidy SATA cables by taping them together as this tends to increase cross-talk but not sure how true this really is.

Personally I think the issue the extremely poor retention at the connector, and any attempt to bundle the cables is more likely to pull one of the ends out of alignment. The connector must be completely square in all dimensions to make a proper link that will stay connected during normal system vibration.

Link to comment

I've done some further testing.  I've switched power between drives, still the same problem.  I've connected the parity drive directly to M/B instead of connected to HBA controller, (using regular sata cable) still having the same problems.  I'm attaching a screenshot here.  

 

My Drive 3 is also disabled since I think it died first before the parity.

 

Won't array start without parity?  Any suggestion what to do next?

Annotation 2019-08-14 234828.png

Link to comment
On 8/18/2019 at 5:38 AM, jang430 said:

Could it finally be my Dell Perc H310 that is causing problems?

Looks like it:

Aug 18 09:55:38 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!!

Try it in a different slot, also make sure it's sufficiently cooled.

 

You're also having problems with the parity disk, and that one is on the onboard SATA ports, it dropped offline so there's no SMART report, but looks more like a connection problem.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.