parity rebuild crash data disc upgrade w/log


Recommended Posts

Hi, I was in the process of replacing/upgrading data disc (from 3TB to 12TB, my 1 parity drive is also 12TB). I've done this 3/4 times before, never had a problem, but now parity rebuild crashes around 2% mark, have already tried 3 times, crashed 3 times (lost connection, and forced to reboot, even took the case apart and blew the interior for dust).

 

my PSU is Seasonic 80+gold 550W https://www.newegg.com/seasonic-focus-plus-550-gold-ssr-550fx-550w/p/N82E16817151189?Item=N82E16817151189

UPS monitor shows my PSU load during parity rebuild is 111W.

 

My system did a scheduled parity check a week ago, I remember the process was "interrupted" (1st time for me since using Unraid for about 6 months now), but was able to resume, so took me 2 sessions to complete that scheduled parity check, but I didn't think too much about it and didn't think to syslog it (not that I know what to do with it anyways).

 

attached is the latest syslog, one note-worthy line is: tsc: Fast TSC calibration failed

 

any help would be much appreciated.

SanDisk_Ultra_Fit_04014e0f09611353daac5735697dfc78bf564f546e637cf62b923c8e5a2b5f0d9cf700000000000000000000c07313f7ff1d1b18835581076c26c0d6-0-0-20200324-1104 flash (sda).txt

Edited by trurl
lots of individual attachments instead of the diagnostics zip
Link to comment
36 minutes ago, superfans124 said:

My system did a scheduled parity check a week ago, I remember the process was "interrupted" (1st time for me since using Unraid for about 6 months now), but was able to resume, so took me 2 sessions to complete that scheduled parity check

Did that parity check complete without any errors? You don't want to rebuild a data disk unless your parity was completely valid.

 

If you don't know for sure there were zero parity errors on that last check, you can see by going to Main - Array Operation and click on History.

 

Do you have any SMART warnings on the Dashboard for any of your disks?

 

Syslog is after reboot, of course, so we can't see what happened. Setup Syslog Server so you can get syslog saved somewhere so you can post it if it happens again.

 

 

Link to comment
21 minutes ago, trurl said:

Did that parity check complete without any errors? You don't want to rebuild a data disk unless your parity was completely valid.

 

If you don't know for sure there were zero parity errors on that last check, you can see by going to Main - Array Operations and click on History.

 

Do you have any SMART warnings on the Dashboard for any of your disks?

 

Syslog is after reboot, of course, so we can't see what happened. Setup Syslog Server so you can get syslog saved somewhere so you can post it if it happens again.

 

 

thanks, I'm setting up kiwi syslog server with your instructions, let's see...

 

there were 300-ish errors during last parity check, though I have always enabled auto-correction to parity errors, not sure if any of these matters.

Link to comment
2 hours ago, trurl said:

Did that parity check complete without any errors? You don't want to rebuild a data disk unless your parity was completely valid.

 

If you don't know for sure there were zero parity errors on that last check, you can see by going to Main - Array Operation and click on History.

 

Do you have any SMART warnings on the Dashboard for any of your disks?

 

Syslog is after reboot, of course, so we can't see what happened. Setup Syslog Server so you can get syslog saved somewhere so you can post it if it happens again.

 

 

does this look like anything? I'm not sure.

 

tried unsuccessfully setting up a remote syslog server, resorted to store syslog locally to the flash drive, the attachment is the only file (after the latest crash) from the flash that looks remotely close to a syslog file, though not even in .txt format, I added the extension to read it.

 

not sure if I'm even logging the syslog right, suggestions would be appreciated.

syslog.txt

Link to comment
2 hours ago, superfans124 said:

there were 300-ish errors during last parity check, though I have always enabled auto-correction to parity errors, not sure if any of these matters.

There should always be exactly zero parity errors. If that is not the case you need to consider why. And if you do correct parity errors, you should run another parity check to make sure you have exactly zero parity errors as a result.

 

How can you expect to rebuild a data disk if parity isn't valid?

 

After you get your other problems fixed, be sure to save that original disk in case the rebuild has problems.

 

Nothing obvious in that syslog.

 

Have you done memtest?

 

Have you done these Ryzen tweaks?

 

 

 

Link to comment
23 hours ago, trurl said:

There should always be exactly zero parity errors. If that is not the case you need to consider why. And if you do correct parity errors, you should run another parity check to make sure you have exactly zero parity errors as a result.

 

How can you expect to rebuild a data disk if parity isn't valid?

 

After you get your other problems fixed, be sure to save that original disk in case the rebuild has problems.

 

Nothing obvious in that syslog.

 

Have you done memtest?

 

Have you done these Ryzen tweaks?

 

 

 

I just completed 24 hours of memtest, 24G (though the awkward 3x8GB config) 7 passes, no error. 1st parity rebuild afterwards still stops at 2% mark.

 

how do you mean by getting problems fixed (prior errors?) and/or the original disk? I tried with the original 3TB drive (12T out), Unraid won't allow me, my build is now locked to the 12TB drive even though a parity rebuild was never successfully completed.

 

I guess as a last resort I'll just delete my build profile and start anew, I'm fine with that. but any more pointers to avoid that resolution would be great.

Link to comment
51 minutes ago, superfans124 said:

how do you mean by getting problems fixed (prior errors?) and/or the original disk? I tried with the original 3TB drive (12T out), Unraid won't allow me, my build is now locked to the 12TB drive even though a parity rebuild was never successfully completed.

That original disk can be read all by itself, just like any data disk in Unraid it is an independent filesystem. It can be mounted with Unassigned Devices, for example and files can be read from it.

 

What about this question?

On 3/24/2020 at 2:59 PM, trurl said:

Have you done these Ryzen tweaks?

 

Link to comment
9 minutes ago, trurl said:

That original disk can be read all by itself, just like any data disk in Unraid it is an independent filesystem. It can be mounted with Unassigned Devices, for example and files can be read from it.

 

What about this question?

 

non-OC ram with 2nd-gen Ryzen 3, though the 3x8GB config are not of the same speed. I can't find email receipts for them, but the 1st 8GB ddr4 stick is 2133mh? and the later 2 more 8GB ddr4 sticks are 2666mh (maybe?) bought as a pair when cheap. I only can speak certain at this point they are on different clock, but the system has been running stable at the lower clock speed for couple of months now. prior parity checks had multiple zero error sessions before the latest one had 300 errors, all with the current 3x8GB RAM config.

 

I'm not sure what to do with the mounted unassigned device, w/o the rest of my parity in a limbo right now, more instructions please, thanks.

Link to comment

The reason I was talking about hanging on to the original is if for some reason the rebuild to the new disk didn't work out.

 

You could try New Config with that original disk and check the box to tell it parity is already valid, then see if you can complete a parity check.

 

Link to comment
On 3/25/2020 at 8:52 PM, trurl said:

The reason I was talking about hanging on to the original is if for some reason the rebuild to the new disk didn't work out.

 

You could try New Config with that original disk and check the box to tell it parity is already valid, then see if you can complete a parity check.

 

thanks for your help. I think I've nailed it down eventually to xfs corruption in the new 12TB drive.

 

I don't know the whats and whys, but prior to upgrading the new drive, I manually relocated every file from the old 3TB, so it was empty, and the new 12TB certain was empty. Somehow it became corrupted to cause the parity rebuild failure? or the parity rebuild failure caused it to become corupted?

 

after new config, a new sympton arose that I couldn't relocate any file to the new 12TB, krusader stalled. copy/move files from disc A, stalled, c/m files from disc B stalled, but c/m files between A and B work, let me to believe something's wrong with the new 12TB, and checked the filesystem, and xfs_repair solved it.

 

I still don't know the whats and whys, but everything is working out fine now from what I can see. And thanks for your help again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.