Extremely slow Parity-Sync


aurevo

Recommended Posts

Hello everybody,

 

This morning my UnRAID told me that a hard drive is missing, after restarting the machine, I was able to reassign the hard drive, but the parity sync is extremely slow:

 

Total size: 8 TB    
Elapsed time: 7 minutes    
Current position: 237 MB (0.0 %)    
Estimated speed: 504.4 KB/sec    
Estimated finish: 183 days, 3 hours, 13 minutes

 

Is the hard disk possibly defective?

 

What else can I look up?


I don't want the rebuild to run at this speed.

Link to comment
5 minutes ago, aurevo said:

I just waited 50 minutes, is that normal?

Due to a design consideration, most things from the webUI if they don't complete in exactly 120 seconds will never *appear* to complete.  Diagnostics is one of them, and slowly processes which have the ability to run longer than 120 seconds are being reconfigured to operate slightly differently.

 

In the meantime, just go to a terminal and type in

diagnostics

The file will get saved to the flash drive (logs folder)

Link to comment
58 minutes ago, Squid said:

Due to a design consideration, most things from the webUI if they don't complete in exactly 120 seconds will never *appear* to complete.  Diagnostics is one of them, and slowly processes which have the ability to run longer than 120 seconds are being reconfigured to operate slightly differently.

 

In the meantime, just go to a terminal and type in


diagnostics

The file will get saved to the flash drive (logs folder)

 

Okay, that's pretty weird.

 

Attached the last two logs.

 

Thanks in advance for the help.

tower-diagnostics-20190913-1411.zip tower-diagnostics-20190915-1244.zip

Link to comment
On 9/15/2019 at 11:24 PM, trurl said:

Looks like most of your problems are disk connection issues. Check power and SATA connections, both ends, all disks, then try again and post a new diagnostic.

 

So on, I just ordered new cables for all drives and assembled them.

 

Trying to replace the possibly defective 8TB drive with an 6TB drive, I hit new config.

 

Can I undo this step, because when I now start the array I will loose parity and the data from that 8TB disk, or?

 

Partiy was valid before hitting New Config.

 

I think it is better to ask before making the next step, the server is off until your answer.

Link to comment
1 hour ago, aurevo said:

Trying to replace the possibly defective 8TB drive with an 6TB drive,

Can't replace an existing array disk with a smaller one.

 

SMART looks OK for disk3, though what kind of disk is:

 

Device Model:     OOS8000G
Serial Number:    00000000

If you really want to replace it you can use the invalid slot command but need another 8TB (or larger) disk.

 

Link to comment
24 minutes ago, itimpi said:

I could not see a 8TB data disk in the diagnostics you posted!   Which is the 8TB drive that has failed?

 

The first 8TB hard disk is the Parity Disk 1, the second 8TB hard disk has disappeared from the configuration during operation.
I reinstalled it and restarted the server, then it was there again, but the Partity Check / Rebuild process was as slow as mentioned in the original post.

Then I exchanged all the data and power cables and rebooted the server with the 6TB for the 8TB hard drive because I thought I could replace it with a smaller one.

 

After that I created a new config.
This is the current state.

What is the best way to keep the parity (I can reinstall the old 8TB HDD) but how do I get the config back?

 

20 minutes ago, johnnie.black said:

Can't replace an existing array disk with a smaller one.

 

SMART looks OK for disk3, though what kind of disk is:

 


Device Model:     OOS8000G
Serial Number:    00000000

If you really want to replace it you can use the invalid slot command but need another 8TB (or larger) disk.

 

 

When it looks okay, I will try to build the parity with this disk enabled. But as mentioned above, how to revert the new config change?

Link to comment
1 hour ago, johnnie.black said:

I would recommend using the invalid slot command if you want to use a new disk, rebuilding on top of the older one might be a mistake, instead run an extended test on it, and if all OK re-sync parity instead.

 

I'm just not sure I understand you right now.

My approach would be like this: I start with the new config, reassign the disks to the slots as before the new config. Then I set the disk I want to remove to unassigned, is that right?

 

But if I start the array at this point, it will automatically rebuild parity and I loose the parity data which is avaiable at the parity drive and loose the data on the disk and I lose the contents of the hard drive I removed earlier. So I lose the data and the data to restore the removed hard disk via parity, or do I see it wrong?I or?

Link to comment

Assuming all disks are available and disk3 is indeed OK, and passes the extended SMART test, after the new config you just need to assign all disks as before and start the array to begin parity sync, you can even trust parity and then run a correcting check.

 

If I misunderstood and there is one missing disk then this isn't the way to go, in that case the invalid slot command would be the way to go, but you need a new disk of the same size to replace the missing one.

Link to comment
15 minutes ago, johnnie.black said:

Assuming all disks are available and disk3 is indeed OK, and passes the extended SMART test, after the new config you just need to assign all disks as before and start the array to begin parity sync, you can even trust parity and then run a correcting check.

 

If I misunderstood and there is one missing disk then this isn't the way to go, in that case the invalid slot command would be the way to go, but you need a new disk of the same size to replace the missing one.

 

Okay, so first I do the extended SMART test, and if it's okay, I'll assign the disks as before.

If the parity data is then used to restore the data to disk 3, or the parity data is overwritten, as I see from the message (starting the array will override the data or something like that).

 

The following hard disks are in the system:

8TB (Parity 1)
3TB
3TB
8TB (This disk is disconnected during operation)
4TB
2TB
6TB
6TB
6TB

 

After reinstalling the disk I wanted to restart the parity process and it was so slow.

At the moment the disk is removed, but I could reinstall it. What do I do now to recover the data that was on the disk, because I have a new config, but before the new config the parity was supposed to still exist.

Link to comment
38 minutes ago, aurevo said:

Okay, so first I do the extended SMART test, and if it's okay, I'll assign the disks as before.

Correct, if SMART test is successful assign all disks as before, check "parity is already valid" and start the array, if all disks mount correctly run a correcting parity check, if not post new diags.

Link to comment
2 hours ago, johnnie.black said:

Correct, if SMART test is successful assign all disks as before, check "parity is already valid" and start the array, if all disks mount correctly run a correcting parity check, if not post new diags.

 

Just to be sure: The hard disk allocation must be exactly the same as before and must not be extended by a hard disk until parity is restored, right?

Does it make a difference if the ports have changed (for example from sdc to sdf) because I changed ports and cables yesterday?

Link to comment
3 hours ago, johnnie.black said:

Yes, use only the same disks, though data disk order is not important with single parity.

 

Which SATA port doesn't matter.

 

I just did what you suggested, and the process started normally.

 

After a few minutes the page refreshed (I don't know why) and the array was stopped again and the message about the missing encryption key came up.

I re-entered the password and restarted the array with the option "parity correction". Now it's back, but it started all over again.

 

Speed shortly after start:

 

Total size: 8 TB    
Elapsed time: 1 minute    
Current position: 9.40 GB (0.1 %)    
Estimated speed: 103.9 MB/sec    
Estimated finish: 21 hours, 22 minutes    
Sync errors corrected: 0

Link to comment
  • 4 weeks later...

Hello,

 

the problems were no longer present for a long time.

 

Today I couldn't use one of my VMs and wanted to restart the server.

 

I shut it down cleanly via Power down and didn't disconnect the power.

 

After the reboot Disk 3 (OOS8000G_00000000 - 8 TB (sdk)) was only emulated.

After your last tips I had exchanged all cables and also checked the power plugs. 

 

If I see it correctly, the hard disk is still recognized by the system, isn't it?

 

In the appendix the logs of directly after the restart. 

tower-diagnostics-20191011-2118.zip

Link to comment
57 minutes ago, aurevo said:

If I see it correctly, the hard disk is still recognized by the system, isn't it?

 

It was recognised but it has dropped off line. Is it connected to a SATA port multiplier?

Oct 11 21:18:23 Tower kernel: ata9: softreset failed (1st FIS failed)
Oct 11 21:18:23 Tower kernel: ata9: limiting SATA link speed to 3.0 Gbps
Oct 11 21:18:28 Tower kernel: ata9: softreset failed (device not ready)
Oct 11 21:18:28 Tower kernel: ata9: reset failed, giving up
Oct 11 21:18:28 Tower kernel: ata9.00: disabled
Oct 11 21:18:28 Tower kernel: sdk: detected capacity change from 8001563222016 to 0

 

Link to comment
13 hours ago, John_M said:

 

It was recognised but it has dropped off line. Is it connected to a SATA port multiplier?


Oct 11 21:18:23 Tower kernel: ata9: softreset failed (1st FIS failed)
Oct 11 21:18:23 Tower kernel: ata9: limiting SATA link speed to 3.0 Gbps
Oct 11 21:18:28 Tower kernel: ata9: softreset failed (device not ready)
Oct 11 21:18:28 Tower kernel: ata9: reset failed, giving up
Oct 11 21:18:28 Tower kernel: ata9.00: disabled
Oct 11 21:18:28 Tower kernel: sdk: detected capacity change from 8001563222016 to 0

 

 

Yes, it is connected to an multiplier. Card with Marvell 88SE9215 controller and ASM1062 from ASMedia.

 

2 hours ago, johnnie.black said:

You should avoid Marvell controllers, Marvell + SATA port multiplier double no no.

 

Can I find this information on this forum at any place? 

And which controller should I better use for, than I would replace it.

 

Hard to find another controller without Marvell in Germany.

Link to comment
43 minutes ago, aurevo said:

And which controller should I better use for, than I would replace it.

Any LSI with a SAS2008/2308/3008/3408 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, 9400-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.

Link to comment
1 hour ago, johnnie.black said:

Any LSI with a SAS2008/2308/3008/3408 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, 9400-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.

Then I will try to buy one of the controllers and send the other one back.

I looked once and the controllers are available for a good price probably only used.

But how is it that only one hard disk was affected at a time? Three times only exactly the one hard disk and for one month it worked.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.