[SOLVED] Data Re-build drop from 80MB/sec to 500KB/sec


Recommended Posts

Hi everyone, I recently did a parity upgrade and using the old parity drive to replace a dead drive that has been emulating for a few weeks. I ran the parity check before the parity sync happen and it passed. So I move forward with replacing the dead drive with the old parity drive and the data re-build was going well (around 80MB/sec) and when it hit 40% it dropped to 500KB/sec and I let it ran for a few hours and it was still moving at that rate. So I did a reboot of server and it restarted the data rebuild, and this time it hit 50% and it dropped to 500KB/sec again. I have uploaded a diagnostic as I would like to know what is going on and fix this issue so I can have my server back up and running...

 

The order of which I did
1. upgrading 1 of the parity disk from 10TB to 12TB

2. replace an emulated 8TB drive with the old 10TB parity drive

3. started parity sync (no docker or vm is running) and successfully completed

4. started data rebuild (1st time) and it was hitting 80MB/sec until 40% and drop to 500KB/sec
5. rebooted the server, started data rebuild again (2nd time) and it was hitting 80MB/sec until 50% and drop to 500KB/sec

 

Currently still running the data rebuild as I'm posting this, please provide help!

 

UPDATE:
Currently its on 50.3% and rebuilding at 100KB/sec, still hoping it will speed up or I'm force to copy off the data from the emulated drive and start new configuration

 

UPDATE 2:

With the help of @Vr2Io, I manage to get my unraid server up and running, by reseting the configuration for parity and disk to for a parity sync from scratch and everything worked. Its likely my parity had some corruption either during parity copy (from 10TB to 12TB) or prior to the parity copy (unlikely as I ran parity check right before I took server offline and performing upgrade of drives.

zoo-diagnostics-20201210-1552.zip

Edited by jamesy829
Link to comment
14 hours ago, JorgeB said:

According to the diags it's completely stuck at the moment, but can't the see the reason for it, reboot in safe mode, start array in maintenance mode and start over, also there is some log spam from CA backup and an unassigned disk, see if you can fix that.

Hi @JorgeB, thank you for helping me out! Was busy at work so didn't get a chance to respond.  I started in safe mode and maintenance mode, currently running the data rebuild (20%) so will report back when it either passes or stuck (I hope not...)!

Link to comment
On 12/11/2020 at 2:55 AM, JorgeB said:

According to the diags it's completely stuck at the moment, but can't the see the reason for it, reboot in safe mode, start array in maintenance mode and start over, also there is some log spam from CA backup and an unassigned disk, see if you can fix that.

Hi @JorgeB, bad news... so it ran in safe mode data rebuild for a day and was going at 80-100MB/s, and all of sudden it dropped to 750KB/s... attached is the diagnostic logs if you can help, thanks!

zoo-diagnostics-20201212-1132.zip

Link to comment
1 minute ago, Vr2Io said:

Note, sure it is abnormal, like JorgeB say something ( process ) in stall state.

Yea... it is very frustrating me ha... its been 1 week downtime, as I got 2 12TB to replace parity and use parity as replacement drives for emulated and small drive... fortunately I was able to backup everything on that emulated drive. Can you suggest the steps to "start fresh" where I just replace that emulated drive without data rebuild and copy the stuff back when my system is back up?

Link to comment
14 minutes ago, jamesy829 said:

Can you suggest the steps to "start fresh" where I just replace that emulated drive without data rebuild and copy the stuff back when my system is back up?

Correct way should be fix fundamental problem first. You no need concern emulated disk if you already have backup.

 

Due to can't identify hardware or software cause that problem ( more likely hardware issue ), pls perform memory test.

 

If no hardware issue found, I will forget that emulated disk ( due to have backup ) then use Unraid build-in backup feature to backup the USB stick, then start all in fresh

 

- assign data disk

- start array and check share was normal

- assign parity disk, let it sync and check does same problem reproduce

 

Any USB disk in array ??

Edited by Vr2Io
Link to comment
18 minutes ago, Vr2Io said:

Correct way should be fix fundamental problem first. You no need concern emulated disk if you already have backup.

 

Due to can't identify hardware or software cause that problem ( more likely hardware issue ), pls perform memory test.

 

If no hardware issue found, I will forget that emulated disk ( due to have backup ) then use Unraid build-in backup feature to backup the USB stick, then start all in fresh

 

- assign data disk

- start array and check share was normal

- assign parity disk, let it sync and check does same problem reproduce

 

Any USB disk in array ??

Wow thanks for this instruction, I'll list what I have done.

 

Tried to identify hardware and software issue

* I switch to a new motherboard/cpu combo 2 months ago and it was running fine for a month and then 1 of the drive died), left it emulated for a few weeks as I was waiting for my new drives to arrive

* did a full memory test prior to moving to the new system and everything is passed for 48 hours

 

I did not know about the Unraid backup feature, I will need to look into it but wouldn't the plug-in for backup automatically do this?

 

By start fresh, do you mean reset the config for parity and array to reassign the drives? 

 

No USB disk is attached other then the Unraid disk itself.

 

Link to comment
15 minutes ago, jamesy829 said:

I did not know about the Unraid backup feature, I will need to look into it but wouldn't the plug-in for backup automatically do this?

The build-in USB backup feature very easy to use and straight forward, it can restore the image to USB stick just simple click.

 

15 minutes ago, jamesy829 said:

By start fresh, do you mean reset the config for parity and array to reassign the drives? 

Yes, once identify the root cause, you can restore the USB stick backup image to resume as current state.

 

*** But once parity disk have modify, then you can't rebuild the data disk anymore, even restore back the USB  ***

Edited by Vr2Io
Link to comment
Just now, Vr2Io said:

The build-in USB backup feature very easy to use and straight forward, it can restore the image to USB stick just simple click.

 

Yes, once identify the root cause, you can restore the USB stick backup image to resume as current state.

Thanks @Vr2Io I will reset the config for parity and array. I have a feeling its also the plugins that caused some irreversible changes to my disk as I had quite a bit of plugin which I'm not sure if they did anything, 1 of them was the file integrity plugin and the other one I can't remember as I saw 2 plugins that were red in the logs from the last data rebuild.

Link to comment

Or you could don't assign parity disk ( not touch parity disk ) to fresh setup, just have data disk then perform all disk read check to see same problem happen or not this moment.

 

9 minutes ago, jamesy829 said:

1 of them was the file integrity plugin

Not likely plugin issue, because safe mode also have same problem.

Edited by Vr2Io
Link to comment
2 minutes ago, Vr2Io said:

Or you could don't assign parity disk ( not touch parity disk ) to fresh setup, just have data disk then perform all disk check to see same problem happen or not.

 

Not likely plugin issue, because safe mode also have same problem.

Good idea, i won't assign the parity disk (1x10TB and 1x12TB) for fresh setup.

 

This is the kernel logs when I clicked pause and trying to reboot the server.

20201212_132712.jpg

Edited by jamesy829
Link to comment
47 minutes ago, Vr2Io said:

Call trace always indicate something wrong, you need dig out the cause.

 

Good news, I setup same mainboard EVGA X299 FTWK ( i7-9800X ) just a month. But no call trace or major problem. BIOS was 1.24.

Oh didn't notice you modified the msg, yea, the board is sick and I'm running the 10980xe, but maybe I should update the BIOS (which I need to figure out how). I just did a new config with no parity and it start up fine. But I'm seeing weird kernel errors (could be I am using old unraid nvidia kernel that caused system to errror?) And 1 or the error i saw was the backup plug-in thats in the screenshot. 

20201212_141603.jpg

Edited by jamesy829
Link to comment

Those error seems not critical.

 

In fact I haven't idea what cause the problem, as you mention this setup haven't problem in first two month, so you need try-and-error by different method. BTW, make config in simple and basic should be first step. I think update BIOS could be later because it could made things worse.

 

I start in 6.9 beta 30, 35 then RC1 currently.

Edited by Vr2Io
Link to comment
1 hour ago, Vr2Io said:

Those error seems not critical.

 

In fact I haven't idea what cause the problem, as you mention this setup haven't problem in first two month, so you need try-and-error by different method. BTW, make config in simple and basic should be first step. I think update BIOS could be later because it could made things worse.

 

I start in 6.9 beta 30, 35 then RC1 currently.

Ah ok, will trial and error. What does it mean by simple and basic, and maybe I should upgrade to rc1 as I'm still on 6.8.3

Link to comment
15 hours ago, Vr2Io said:

Backup USB stick then recreate start in fresh and last step troubleshoot in hardware direction.

Hey @Vr2Io just providing some updates, after doing the no parity, start array and using for a few hours, there were no issues so I decided to reboot and do the process again. Unfortunately it still got stuck, this time at 66%, so I decided to reset both parity and disk array, letting the parity rebuild instead. Knock on wood, its going well at 66% as we speak and parity will be done in another 13 hours... Will update as things progress, thanks!

Link to comment
22 hours ago, Vr2Io said:

recreate start in fresh

Pls note fresh means not restore the image, only basic Unraid OS. If this not help, you need try isolate some hardware, i.e. remove one set data disk and its HBA or even both, then try some disk with onboard SATA, you need reach a point which everything resume normal.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.