Problems after moving drives from onboard SATA to PCIe Card


Go to solution Solved by JorgeB,

Recommended Posts

I bought an LSI SAS 9210-8i card and moved my drives off the motherboard onto it.

 

After putting the card in and cabling it up I have 2 drives that are showing as unmountable. (Disk1 & Disk4).

 

image.png.cefa47e75cee8cb9b59b259683707e98.png

 

When I start the array I see that:

 - Disk4 is also seen as an Unassigned Device (sdc)

 - Disk1 is seen as an historical device (sdc)

 

image.png.0995680f673f4f49e0af2c73704fba05.png

 

When I stop the array I see that:

 - Both Disk1 & Disk4 show up as Historical Devices.

 

image.png.0dc1a9e851cab9e2fcd10bd94d4a08ae.png

 

Not sure how to resolve it.

 

D

iwursrv-diagnostics-20220320-1116.zip

Edited by Darronn
typo
Link to comment
On 3/20/2022 at 3:18 PM, Darronn said:

I have 2 drives that are showing as unmountable. (Disk1 & Disk4).

Do you mean disks 2 and 4?

 

Disk1 is mounting in the diags posted, disk2 is not because it's emulated and there are read errors on disk4, same errors that also make disk4 unmountable, disk showing as unassigned means it's dropping offline and reconnecting again, check/replace cables or try swapping ports on disk4 and disk1.

Link to comment

Disk 2 is missing. So yes it's unmountable because of that.  That's what started this for me. I had trouble replacing the drive because of slowness. It was stating that it was going to take 57 days to repair & the web interface was extremely slow. Pretty sure the SATA controller on the board is failing.

 

Disk1 I was mistaken, it's mounted, but not sure why it's showing up as an Historical device.

 

Disk4 is flipping around and showing up as sdc, sdf, sdg, sdh. I've run short & extended smart tests and it's not showing any errors but the array seems to think there are 2 read errors. The Mini SAS cables are brand new and seated well. Last night I tried to put them in the same order as they were before so they've been moved around and still same state.

 

Just trying to get to a point that I can try and add the new replacement Disk2 and rebuild. Then I'll be replacing the motherboard.

 

 

Link to comment

This is the only odd thing I can find.

 

image.png.d50b75d30bb309fc1337907257754685.png

 

When the array is started. Disk4 is unmountable. It's showing (sdg) which doesn't exist.

 

# hdparm  -I  /dev/sdg
/dev/sdg: No such file or directory

 

image.png.5d4ec62014bd0171b377cb3159755570.png

 

The duplicate unassigned device (sde) does.

 

# hdparm  -I  /dev/sde

/dev/sde:

ATA device, with non-removable media
        Model Number:       WDC WD40EDAZ-11SLVB0                    
        Serial Number:      WD-WX22D410YDUL

 

Any ideas what is causing that and how to fix it?

 

Link to comment
18 hours ago, JorgeB said:

disk showing as unassigned means it's dropping offline and reconnecting again, check/replace cables or try swapping ports on disk4 and disk1.

 

I've already tried that. It doesn't make a difference which port that drive is on on it always shows up as an unmountable drive in the array and an Unassigned device. I can mount that drive outside the array and read the data on it.

 

Wouldn't there be a log of this dropping offline and reconnecting? I really don't think that is happening.

 

Moving the drive from the onboard SATA to the SAS card caused something to identify this drive differently. Now there is a ghost twin that the array tries to mount, which doesn't exist and obviously fails.

 

Right now I've added new drives outside the array and am copying the data off the array. Fortunately DIsk4 was recently purchased and added (less than two weeks ago) and only a small portion of data is on it.

 

Once it's all copied off. I'll New Config the array. Start over and copy it back.

 

 

Edited by Darronn
typo
Link to comment
  • Solution
4 hours ago, Darronn said:

Wouldn't there be a log of this dropping offline and reconnecting?

Yes, and there is:

 

Mar 19 20:20:49 IWURSRV kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221101000000)
Mar 19 20:20:49 IWURSRV kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221101000000)
Mar 19 20:20:49 IWURSRV kernel: mpt2sas_cm0: enclosure logical id(0x500605b003cc0de0), slot(2)

 

It is a strange issue, since it keeps happening only with those two disks, if it's not power/cable could be a controller or disk problem, can you connect them, even if temporarily, to the onboard SATA ports?

 

Link to comment
8 hours ago, JorgeB said:

Yes, and there is:

 

Mar 19 20:20:49 IWURSRV kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221101000000)
Mar 19 20:20:49 IWURSRV kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221101000000)
Mar 19 20:20:49 IWURSRV kernel: mpt2sas_cm0: enclosure logical id(0x500605b003cc0de0), slot(2)

 

It is a strange issue, since it keeps happening only with those two disks, if it's not power/cable could be a controller or disk problem, can you connect them, even if temporarily, to the onboard SATA ports?

 

 

Ok, I see. Is that log from when it first boots up?

 

I have tried putting the drive back on the onboard controller, but it's worse there. When the drive is connected to that it spins up/down and chirps. Doesn't matter what port.

 

It's possible that the LSI card i bought is faulty, but since the motherboard seems to be having issues I've ordered a new board to start there.

 

If I have issues after that with the LSI card, I'll try the new onboard ports which should be good.

 

Thanks for your help looking at it.

 

Link to comment
  • 2 weeks later...

Just to update on my issue. @JorgeB Thanks for all your help and time responding. I'll mark your response of "dropping offline and reconnecting" as the cause/solution.

 

After testing different drives connected in different positions I realized that the cause was two of the 5.25" drive caddys I recently purchased.

 

My tower case only held a small number of 3.5" drive spaces so I was using the 5.25 bays with caddys. Two I purchased a long time ago and have had no issues, but the recent two I purchased both are causing the intermittent issues.

 

I bought a Rosewill server chassis & higher wattage PS. Moved everything into it and then ran new pre-clears on each drive before adding to the array.  All have been running perfect with no errors now.

 

I've also added a second parity drive which is currently syncing / data rebuild. Three of the drive are well used between 20k and 28K power on hours. So I thought it was a good idea to add the extra data protection.

 

I'm going to go though previous drives that were pulled because of errors because I think they may actually be fine.

 

image.png.205f1efd65dab5b7cd494ec7d73281fe.png

 

 

 

 

 

 

 

 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.