Having a problem with unmountable disks


Recommended Posts

Hello Everyone.

I ran into a problem today when i accidentally took out a drive from my array while it was still running.  I quickly put it back *hotswap* and then shut down the server and did a reboot.

When rebooting it gave me disk errors on a few disks and i could not start the array.

I refreshed the config to be able to start the array and get some info.

 

It seems that 3 disks are having write errors. but only when the array starts.

I have tried reseating my mini sas cables and believe it to be a problem with the HBA.

But just to make sure i have attached my diagnostics.

 

The drives giving me trouble are SDQ SDR & SDS

 

They seem to show up and be able to work, but then they throw errors and show up as unassigned disks.

 

If anyone has the time to look at this and can confirm my suspicions.

Or tell me if im totally eff'd and should just buy new drives. or to be ready to kiss my media goodbye.

 

Whatever the case i would really appreciate any knowledge on the subject and any guidance

 

 

tower-diagnostics-20210622-1914.zip

Edited by Gsusking2
clarity
Link to comment
6 hours ago, Gsusking2 said:

it to be a problem with the HBA.

It's possible, but first I would swap those disk with others from different slots and see if the problem follows the disks or stays with the slots.

Link to comment
5 hours ago, JorgeB said:

It's possible, but first I would swap those disk with others from different slots and see if the problem follows the disks or stays with the slots.

Yes I have tried that. Seems to be same problem. I have even changed my sas expander. Changed some wires too. 
is it possible it’s a power problem. I have 16 drives on one molex line. 

Link to comment
33 minutes ago, Gsusking2 said:

s it possible it’s a power problem. I have 16 drives on one molex line. 

That could definitely be a problem!   Normally no more than about 4 would be recommended and you want to avoid using too many splitters as they are also often a contributor to power issues.

Link to comment
  • 2 weeks later...
Posted (edited)
On 6/23/2021 at 2:23 AM, JorgeB said:

It's possible, but first I would swap those disk with others from different slots and see if the problem follows the disks or stays with the slots.

Ok so i did that, but it seemed to follow the slots in unraid,

then one would get better, and others would fail.

It seems i lost a few disks.  I replaced them, but still having the same problems. 

@itimpi i have attached the latest logs.

I lost 3-4 disks, 

my shares are unavail.  Some of the drives were formated.

I think i totally Effed myself

 

please give any advice. or recomend i restart from the start?

I would be sad to lose everything but nothing was irreplaceable. Just a big media collection.

No backups except for music

 

tower-syslog-20210704-0327.zip

Edited by Gsusking2
Link to comment
3 hours ago, Gsusking2 said:

Some of the drives were formated.

Any data on formatted drives would be lost, if you want more advice for now please post the diagnostics: Tools -> Diagnostics (after array start).

Link to comment
Posted (edited)
3 hours ago, JorgeB said:

Any data on formatted drives would be lost, if you want more advice for now please post the diagnostics: Tools -> Diagnostics (after array start).

Thank you for the guidance,

Attached are the Diagnostics 

When the parity rebuild starts, disks 1, 2, 7,8,9, 13 start to throw millions of read errors.

and Disk 8 has an unmountable file system

 

tower-diagnostics-20210704-0620.zip

Edited by Gsusking2
clarity
Link to comment
7 hours ago, Gsusking2 said:

but it seemed to follow the slots in unraid,

Do you mean the problem happens is certain slots? If yes if suggests a backplane issue, could also be a power issue depending on how the backplane is powered.

 

Don't replace or format any more disks, just try do to the parity sync after replacing the backplane/checking power, if there are still read errors in multiple disks there's still a problem.

Link to comment
Posted (edited)
46 minutes ago, JorgeB said:

Do you mean the problem happens is certain slots? If yes if suggests a backplane issue, could also be a power issue depending on how the backplane is powered.

 

Don't replace or format any more disks, just try do to the parity sync after replacing the backplane/checking power, if there are still read errors in multiple disks there's still a problem.

I mean I would change the disk. Change the slot on my chassis. 
Now that I look closer it’s just indiscriminately throwing errors it seems. 
 

I have 2 molex lines coming from my 1300w power supply. 
I am thinking about getting some sata to molex to spread the power across another cord
they are both used to power the fan wall and the 6 molex connections on the backplane. I will buy new splitters and test with those. I also bought new sas cables to test from hba to backplane. 

Edited by Gsusking2
Link to comment
  • 3 weeks later...
  • 1 month later...
On 7/24/2021 at 5:15 AM, JorgeB said:

First see if this helps.

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... ...Sorry, could not find valid secondary superblock Exiting now.

 

Im still unable to xfs repair the 2 disks that have the red X on them.

 

What steps should or can i take from here?

Link to comment
1 hour ago, Gsusking2 said:

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... ...Sorry, could not find valid secondary superblock Exiting now.

 

Im still unable to xfs repair the 2 disks that have the red X on them.

 

What steps should or can i take from here?

Are you trying the xfs_repair from the GUI or the command line?  If the command line exactly what command are you trying?

Link to comment
1 hour ago, itimpi said:

Are you trying the xfs_repair from the GUI or the command line?  If the command line exactly what command are you trying?

I tried to do it from the GUI, tho before i had tried it from the command line.

Let me know a command to try and i will give it a shot.

 

Link to comment

You're still having multiple disk errors:

 

Sep  2 07:38:55 tower kernel: md: disk10 read error, sector=10128
Sep  2 07:38:55 tower kernel: md: disk8 read error, sector=12800

 

The disabled disks can't be correctly emulated with errors on additional disks.

Link to comment
1 hour ago, JorgeB said:

You're still having multiple disk errors:

 

Sep  2 07:38:55 tower kernel: md: disk10 read error, sector=10128
Sep  2 07:38:55 tower kernel: md: disk8 read error, sector=12800

 

The disabled disks can't be correctly emulated with errors on additional disks.

ok, should i try a repair of disk 8 and 10?

i also have replacement drives on standby

 

Edited by Gsusking2
Link to comment
On 9/2/2021 at 12:12 PM, JorgeB said:

Those type of errors are usually bad power/connection, what have you replaced so far?

Ok 
as of today I have replaced all sas cables and have tried to use different power supplies. 
 

My problem has progressed. Since changing out all the wiring my chassis (norco 4224) isn’t recognizing a lot of disk. 
I have swapped hba to test that. 
have swapped sas expanders to check that 

have hooked the hba direct to each back plane to test.

I even have a spare chassis (norco 4220) 

And still some drives don’t show up.  
 

let me know if I should post diagnostics. Tho I can’t start my array. 

 

 

Link to comment
1 minute ago, JorgeB said:

If the drives still don't show up after replacing all that make sure they are good by seeing if they are detected in a different computer.

can i safely plug them into my windows gaming pc?

would i just check in the bios?

sorry im fairly new, but do understand most concepts and instrucitons.

Link to comment
7 minutes ago, Gsusking2 said:

can i safely plug them into my windows gaming pc?

would i just check in the bios?

sorry im fairly new, but do understand most concepts and instrucitons.

Yes, Windows won't do anything with them as long as you don't format them. BIOS should be enough to detect

Link to comment
On 9/7/2021 at 10:44 AM, trurl said:

Yes, Windows won't do anything with them as long as you don't format them. BIOS should be enough to detect

Ok as it seems the drives are no longer good.  I think my backplane might have fried them.  In this case with loosing so many disks at once, Inc a parity.  Is it just a start over scenario? (no i dont have a back up, This was mainly a media server with all replaceable content.)

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.