Jump to content

Too many wrong and/or missing disks! - Have I lost data ?


mikamap
Go to solution Solved by JorgeB,

Recommended Posts

Hi, 

 

On my unraid server, I currently have 2 parity drive (6TB and 4TB) and 7 disk (3TB or 4TB).

yesterday I was getting SMART errors from a disk in my array.

 

So I ordered 2 new 4tb seagate Ironwolf drive.

 

I replaced the drive with the new ones.  Everything seems to work fine.  And then the parity / sync was halted and 2 other drives were getting some errors (1 parity and 1 another data drive).

 

The 2 drives with errors were disabled and the one I tried to replace was "Emulated".

 

My error, i think, was to unassign the "disabled drive" and start the array as now, I cannot start it again since it is missing 3 disk.

 

 

What I'm thinking : 
1- Put back the drive with the error on the array (since the data should already be there)

2- Do a "New config"  Since the 3 drives should have the data on them.

 

edit: I also plan to change the PSU and all the SATA cable as I have read that some of these errors (CRC) are caused by bad SATA cable or fluctuating power.

 

Is there a good chance it will work ?

 

Should I first try to backup the data I have at the moment on external drive before doing this ?

 

 

thank you 

Edited by mikamap
Link to comment

Problem was cause by issues with the onboard SATA controller, this is quite common with some Ryzen boards, especially under load, BIOS update might help, or you can also use an add-on controller.

 

As for the current situation, first you need to reboot to clear the controller problem, then you can re-enable the disable disks and try to rebuild again, though same thing might happen if the controller errors out again, we assume the disable disks are OK but since there's no SMART for any of them it's best to reboot now and post new diags, also keep the old disk intact for now in case it's needed.

Link to comment

Here is the new diagnostics that I have now.

 

As I told in the first post, I tried to reassign the disabled drive, so I started the array without the disc.  Now, 3 of them are "new drive".

 

 

And I forgot to tell in the first post that I have currently 6 disk plugged into the motherboard SATA and the 3 other and the cache already plugged into a sata controller.

 

If it's a better solution to plug all of them on sata controller, I'll purchase more.

 

 

So the plan : 

- Update motherboard driver

- Add sata controller for the drives

 

 

And for the fact that 3 are new, for what I have read, I need to do a "New config" right ?

tower-diagnostics-20220222-0743.zip

Link to comment
  • Solution

Old disk4 does show some recent issues, other ones look fine, I would suggest doing this, keep old disk4 intact for now and reconnect the new one, then:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed, including the new disk4, replacement disk should be same size or larger than the old one
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk4
-Start array (in normal mode now), ideally the emulated disk will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk
-If the emulated disk mounts and contents look correct stop the array
-Re-assign disk4 and start array to begin.

 

Link to comment

Everything worked great.

The parity-Sync / Data-rebuild started.  Just need to wait a few hours now.

 

I just saw that my VM didn't start and I get the message "Libvirt Service failed to start" on this tab, but I'll check back after the process is finished.

 

Thank you.

Edited by mikamap
Link to comment
Just now, mikamap said:

Not sure if everything will be ok after.

It won't, would need to see the diags to confirm but like mentioned above it's most likely the same controller issue, it tends to be worst under load, like during a parity check or rebuild.

 

You can cancel the current rebuild and try again later, ideally without using the onboard controller.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...