2 data drives (connected by PCI-E SATA card) marked "missing" all of a sudden - no system config change - Unraid 6.11.5


Go to solution Solved by JorgeB,

Recommended Posts

Hi guys,

Needing some advice from the "brain's trust" for an issue that has emerged with no warning.

I have an Unraid array running 6.11.5 which has a total of 10 drives, 1 parity, 9 data.
This configuration has been working smoothly for ~10 years now without a hitch. 

A few days ago with no rhyme or reason, upon booting the array I noticed two disks missing - naturally the system doesn't mount the disks because the config has "changed":

1341347411_Unraidarraydisksmissing.thumb.png.f87e1b51bb5a7418cdfc96c9ddd48abf.png

 

Tried a shut down, restart didn't resolve the issue, tried to mount the array, didn't work either, so then went to examining the cables - both power cables seem fine, then I noticed that the two specific drives are the only ones that are connected using a PCI Express to SATA port replicator (the rest are connected to the motherboard directly), so started turning my attention to this. The Port Replicator I'm using is a Highpoint Rocket 640L:

HighPoint Rocket 640L Lite Version 4-Port PCI-Express 2.0 x4 SATA 6Gb/ – DirectNine - Australia

I've had it for 10 years and it's worked perfectly for all this time. So I tried to connect the "offending" two disks to the motherboard directly by uncoupling 2 other drives just to see if there are any issues with the power cable, the SATA cables or the disks. Sure enough, they are recognised by the system immediately. 

 

Deducing that for some reason the PCI-Express port replicator is failing, I've then gone and bought a new one (not the same model):
Buy Startech PEXESAT322I 2 Port PCI Express Controller Card | KVM Hubs & Controllers | Scorptec Computers 

I've then gone and connected the new port replicator to the two hard disks, restarted the array and unfortunately, no luck.

 

I cannot understand why the system is all of a sudden not recognising these two disks given I'm convinced there's nothing wrong with them as when they are connected to the motherboard directly, they work and prior to Friday just gone (25th March 2023!) this configuration has worked flawlessly with no issues for 10 years.

 

I was wondering if anyone can shed some light on what might be going on here and whether there's some change that has potentially occurred in the operating system that is causing this that I'm not aware of. I regularly update the Unraid software and I believe it is uptodate or at least very close to. To assist matters I have attached the diagnostics file that I ran last night with this configuration. I notice in the log that it indicates that "slot 8 and 9 are missing" but I cannot understand why this is the case. 

 

To confirm I have tried the following in the interests of narrowing down the fault:

- Tried different SATA cables - no success

- Tried different PCI Express slots for the SATA card - no success

- Tried a different (brand new) SATA card - no success (in 3 different slots)

- Tried connecting the hard disk to the motherboard directly (works - implying the data and power cables are fine)

- Checked the BIOS - ACHI mode is activated and I can't see anything preventing ports from working - but again, this has worked for years and I've made no changed to the BIOS.

 

Very keen if someone can shed light on this.

 

The fallback plan for which I'd also like guidance is that if this simply fails to work, I perhaps need to connect those two "offending" drives up, disconnect two others, restart with new configuration (obviously invalidating my parity) copy their contents over to the array with free space available, remove the "offending" hard disks, reconnect all the ones to the motherboard only (I will have 2 fewer disks) and then rebuild the parity drive - I'm wondering if that's the workaround if the two disks connected to the SATA replicator simply fail to work. I'm not worried about losing data - I know the disks are fine, so if I have to rebuild parity if all else fails, I'm not concerned.

 

Welcome thoughts folks - I'm scratching my head on this one. 
Cheers,

Mike.

bargetower-diagnostics-20230327-2120.zip

Link to comment

Thanks JorgeB - appreciate your quick reply.

The issue I have is that this original controller has been in exactly that PCI-E slot for 10 years with no issue and all of a sudden it stopped being detected.
I bought a new one as I indicated, put it into not only the same PCI-E slot but chose 3 other PCI-E slots (1, 4 and 8 channel) and it is not detected either. 

Do you have any thoughts what might be causing this? There are no BIOS changes, updates, no firmware changes, no nothing. 
It just powered it down on Thursday, powered it up on Friday as I do frequently, and bang - two drives missing!

Thanks for all your help.
Cheers, Mike.

Link to comment

Thanks for your reply. I tried a different one as per my original post but it’s essentially the same  type of product. PCI-E to SATA except this only has 2 ports which is all I need. 
 

I just cannot understand why it is not being read by the system or recognised anymore. I have tried several PCI ports too. I can’t imagine the motherboard is damaged - it hasn’t moved nor been exposed to power surges or anything like that. 
 

What should I do? A last resort would be to I guess clear some space on the remaining drives (losing my parity in the process) - reconnect the two drives - copy their content over to the others and have 2 fewer drives then rebuild parity? Can that be done or is there a better way of getting the data off those drives if for some crazy reason I can’t access the total number of drives anymore. 
 

Thoughts?

thanks again. 

Link to comment

So strange the MB would suddenly stop reading this controller when everything else seems to be fine (and so suddenly with no system change) but I do have my media center PC running Windows 11. I guess I’d plug in the card assuming I have a spare PCI port, connect the HDD via SATA and I’d check to see if disk manger can see the disk even if it can’t read the file system - is that what you’re thinking?

Link to comment

I will try that - I also have another motherboard not being used and may consider that (but it's a big job). The other thing I alluded to above is - is it possible to mount the system under new config - clear some space, disconnect two drives, attach the two that currently can't be connected simultaneously, copy their data over, retain only 8 disks, then rebuild parity - or is there a better way to accomplish that task? I don't know if there is a way to connect just those two disks to another PC, copy the data over, then rebuild parity on the revised array minus two disks going forward.

Link to comment

Hi JorgeB - I really appreciate your help with this - in trying to work this situation out, (trying other PCI-E slots for the card for example) - I've now just lost another disk - Disk5 is now "missing" - that is connected directly to the Motherboard via a SATA connector - I haven't touched any of those connectors - just the two troublesome ones. 

 

What is possibly happening here - is this motherboard dying? I've never seen anything like it. Diagnostics attached - any advice appreciated!!

Thanks in advance.
Mike.

 

bargetower-diagnostics-20230330-2006.zip

Link to comment

I have managed to use a secondary PC just to ensure my data is safe on those two disks that I couldn't access, and of course, successfully have added them to this temporary array (perhaps I shouldn't have done that, but ultimately I just want to get the data off them now) - I didn't have enough RAM available immediately so couldn't install the Community Application software (ugh, yet another problem I didn't anticipate) - I can now see those two disks in this temporary array with no problem and all the data is safe, but I am struggling to be able to see that content in my Windows Explorer so I can start sifting through what I'd like to keep what what I'm happy to get rid of:

image.thumb.png.e185144b757d7d7d0516a2c66f491461.png

 

Is this what "Unassigned Devices" will help me do? I will get some additional RAM and go that pathway but is there a very simple way to ensure a PC connected to the same network can then slowly copy off content via Windows Explorer? When I go to "TOWER" on my network in Windows Explorer on another PC connected to the same network (It sees the Device), it doesn't show anything within that folder - so I clearly need to point it to those two disks just like on my proper array.

 

Thanks in advance.

Link to comment

Thanks JorgeB - that was the solution - I would never have guessed if you hadn't told me, so thank you!
I managed to fire up a temporary array - pulled all the data off those two disks (as suspected, nothing was wrong with the data on the drives).

What I have done since is fire up the main array with two fewer drives, indicated that it is a new configuration - all disks were recognised and naturally it needs to rebuild the parity disk which it started doing. One of the disks is "erroring" yet is in green status and I have never had an issue with that disk and I don't know why. When it finishes parity I will remove the disk to the temp array and check its contents as I don't believe there is anything wrong with it.

My question to you though is that my old Shares are visible via my Windows PC but there is no content within those shares. Do I need to reset the Share in order to see the files or is there some other reason why the Share is visible on the network, but has no content in it. (When I do a view of each disk I can see all the files so there is definitely content readable on all the disks). - It could be related to the erroring on the disk - what do you think?

 

1406190705_unraiddisksituationapr22023.thumb.png.da99389bab780b5cd01e93d9f38b6869.png


Thanks for all your help, I feel I'm making progress!

Link to comment

Thanks JorgeB - since then, the drive became "Red balled" with the red X indicating the drive was in an error state which is extremely strange. This is the diagnostics prior to me removing the disk from my main array, and putting it into my "trial array" where it loaded perfectly with no issue - very strange. Diagnostics are attached.

 

Restarting my main array minus that disk and it has started perfectly, I created a new configuration, it has started to rebuild parity minus the disk6 in error state, and I can access all my shares and the content as normal - it must have been the "errored disk" that caused me to be unable to read my content in those shares.

Keen to hear what you think about why that disk is in error state when putting it into the temporary array, it works and reads perfectly. 

Thanks for your guidance here, hopefully post this parity rebuild my new "slimmed down" array will be stable going forward.
Cheers,
Mike.

bargetower-diagnostics-20230403-1729.zip

Link to comment

Hi JorgeB - the array minus the 2 inaccessible disks (due to the SATA connectivity card) plus the other disk that for some reason went into error state is now fully operational back in Green status with a rebuilt parity which is great - I will need to keep an eye on it over the next few weeks just to see why some of the disk connectivity was a little dodgy for no real rhyme or reason. What I wanted to ask you is that now my Windows network mapped drives are a little out of whack at the moment.

I'd like to remap them whereby Z:\ is mapped to the entire share, Y: was mapped to disk1; X: was mapped to disk2 and so on but when I go into the \\TOWER it doesn't show those disks anymore, just the Share - is this because I need to go into each disk where it is disk"x" where each x represents a number of the disk and "Export" that as well? i.e. this screen:

image.thumb.png.e99db8d4f41b319eaaaba69b439d8059.png

I imagine it defaults back to "no" if I have changed my configuration. Will that restore that visibility on my Windows network?

Thank you again for your support - you've been most helpful and patient!
Cheers,
Mike.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.