Jump to content

Array has 2 disk with read error, disk 6 in error


Recommended Posts

Hello,

2 days ago, I woke up to a disc emulated. Stopped the array, mount in maintenance mode, run disk test, everything is fine. Ok, rebuild the entire array.

 

I also swapped the data cable (connected to a hba card) and shutdown my server to do some inspection (and dusting).

 

This morning, samething but I also have a warning saying I have 2 disk with read error on the array. I'm unsure what is going on.

Disk 5 - ST16000NM001G-2KK103_ZL2NQ6RK (sdi) (errors 313227)
Disk 6 - ST16000NM001G-2KK103_ZL2NR6GL (sdg) (errors 1882)

 

I'm unsure what to do now. Do I have failing disks?

 

Thank you

 

servraid-diagnostics-20240627-0829.zip

Link to comment

It's not reported as a disk problem for both, and SMART looks OK, that and the fact that the issue started in both disks at the same time, suggests a power/connection issue, do the disks share something other than the miniSAS cable, like a power splitter?

Link to comment
Posted (edited)

They are into a Startech 4 disk enclosure that have 2 power for 4 disk and a fan. I don't have enough space in the server for my drives so I was looking for something external that could safely keep them and came up with that. The disk that disconnected again have a new cable that is coming from the other 4 wire channel of the HBA card.

 

https://www.amazon.ca/gp/product/B00OUSU8MI/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

 

My card is lsi 9201

https://www.amazon.ca/gp/product/B0BVTJPZSG/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

 

I did ran a smart short test success on both and now running short long to check

 

edit: I'm wondering if I'm running out of power from the psu. I have a Dell Precision 5820. This thing has limited power and I think I might be running out? It does go through some splitter cause it only have like 4 headers for power on the psu, which is so stupid considering that thing has 12 sata ports

 

Xeon Gold W-2275 with 128GB ECC Ram

NVidia RTX 4000

 

2x NVME 1.0 TB SSD

6x 16TB Sata HDD

1x 8TB Sata HDD

1x 4TB Sata HDD

 

I added some fan (and one was required for the RTX 4000 installation from Dell).

 

Maybe I should try to find another HDD enclosure but with external power that I can connect like this one, unsure if that exist

Edited by Nodiaque
Link to comment

I'm waiting for rebuilt but already, samed disk 5 made same number of error again during rebuild. Waiting to see if it get worst, might also be a failing controller. I'll try putting one of the drive outside on aother power cable (hopping it doesn't split to the same upward) after the rebult

Link to comment
Posted (edited)

Ok so here's the "new" situation. 2 days ago, I took out disk 5 and disk 6 of the startech 4-bay unit and plug them directly into a power and sas->sata cable and rebuild. This morning, it's Disk 5 that is now disconnected and download disk has invalid path (like last time). download disk is still into the startech bay but I'm more concern about the disconnected disk.

 

It usually happen during night at the backup time (well, I think, all I know is that all backup fail during that time). I think this time it was during a parity check because I'm att 33% of parity check and it stopped.

 

I'm wondering if it's the HBA card that's failing, the power that's not enough or something else.

 

 

servraid-diagnostics-20240701-0836.zip

Edited by Nodiaque
Link to comment
34 minutes ago, Nodiaque said:

I'm wondering if it's the HBA card that's failing, the power that's not enough or something else.

 

Could be, it still not logged as a disk problem, also see if it's happening right after a spin up, some Seagate disks have issues with spin up when used on an LSI, especially SAS2 models, so if it's after a spin up try disabling spin down to test.

Link to comment
Posted (edited)

that would be weird that it took all this time. Is there another card I could use to not have this problem since all my disk are seagate?

 

Right now, I'm connecting a 2nd psu that will power the external drive to see if it's a power issue. I'll also put the never spin down on the disk 5 which this time disconnected.

 

I though maybe the mainboard is having issue (since the HDD controller on the mainboard is fried) but all the disk on the hba card would go down if it was shorting it.

 

edit: there's also the invalid path that I don't get. Now it's the download drive, last time it was download and another drive. The download drive is still in the startech with another drive though. Very weird.

Edited by Nodiaque
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...