Jump to content
nickro8303

Multiple Errors

49 posts in this topic Last Reply

Recommended Posts

My unRAID server has so many issues right now I'm not sure where to start.

 

1. I rebooted the other day to clear up space on my docker container. After the reboot, I started receiving increasing crc errors from my cache drive.

 - Replaced the sata cable and moved to a different sata port. Errors still continue.

2. After changing the cables I booted up and one of the array drives shows "Unmountable: Unsupported partition layout".

 - Stopped the array, unassigned the drive, started the array and checked the emulated disk. 

 - Stopped the array and reassigned the drive and started a Parity-Sync / Data-Rebuild.

3. Now I'm getting another error message saying a different disk has Unable to write to disk and read errors.

4. Also, /var/log is getting full (currently 100 % used)

 

I'm really not sure what's causing all these issues as prior to this everything was running properly except my docker container was full.

 

Any help with these issues would be greatly appreciated.

 

Thanks in advance.

 

tower-diagnostics-20190207-0941.zip

Edited by nickro8303

Share this post


Link to post

Start here:

Your SSD needs a new SATA cable.

Onboard SATA ports 5 and 6 are set to IDE, this can cause serious issues on theses AMD chipsets, change to SATA/AHCI in the bios.

 

After doing this reboot and post new diags.

Share this post


Link to post
1 hour ago, johnnie.black said:

Start here:

Your SSD needs a new SATA cable.

Onboard SATA ports 5 and 6 are set to IDE, this can cause serious issues on theses AMD chipsets, change to SATA/AHCI in the bios.

 

After doing this reboot and post new diags.

Thanks for the info, I had no idea this was the case. I will change the settings when I get home from work. Already changed out the sata cable for a new one.

Share this post


Link to post

There are also problems with disk3, it dropped offline, but with all the other errors difficult to analyze, it could be the Marvell controller where it's connected, when you reboot check connections on it, I forgot to say in case you don't know, don't let the rebuild finish as it's rebuilding garbage.

Share this post


Link to post
40 minutes ago, johnnie.black said:

I forgot to say in case you don't know, don't let the rebuild finish as it's rebuilding garbage.

Wow ok, stopped the rebuild.

Share this post


Link to post

Ok so I did the steps you outlined and disk 3 is back. Disk 4 is being emulated and my parity disk just went to red X. I'm not sure what to do now. I'm pretty sure I heard it clicking before it went to the red X. Posting diagnostics.

tower-diagnostics-20190208-1853.zip

Share this post


Link to post

Your parity isn't showing up in the SMART reports for those diagnostics. Check connections. It looked OK in previous diagnostics. I don't see how you can have a disabled parity when it already has another disk rebuilding.

Share this post


Link to post

 

5 hours ago, trurl said:

I don't see how you can have a disabled parity when it already has another disk rebuilding.

Yeah, I wonder if this could be improved, Unraid won't disable two disks with single parity, but it will still disable one even if there's an invalid disk (disk being rebuilt), resulting in two invalid disks.

 

OP:

 

ATA errors on parity and disk1, possibly cable related, replace SATA and power cables on both disks, when done post new diags so we try again after re-enabling parity.

 

Share this post


Link to post

Could this be due to my motherboard going bad? I just replaced all the sata cables a few months ago and the parity drive is also new. I can't see how all these things could be going wrong at the same time.

Share this post


Link to post
1 minute ago, nickro8303 said:

Could this be due to my motherboard going bad?

It's possible, replacing the cables one more time would be easier to rule them out, if issues persist it could be the board.

Share this post


Link to post

I found one new sata cable and replaced it with the one on the parity drive, and it's still showing a red X. I'm pretty sure it's dead as I know I heard a clicking sound the last time I booted up. 

 

Disk 4 is now showing up as "Unmountable: No file system". 

 

I ordered a set of new sata cables but they won't be here for few days. I'm almost certain this is not due to the cables though. 

 

How should I handle replacing the parity drive with Disk 4 in the state it's in? Am I just looking at losing the data on disk 4?

tower-diagnostics-20190210-0944.zip

image.thumb.png.e7a3feeac2fb7e337ec1b3ea51b212d2.png

Edited by nickro8303

Share this post


Link to post

Parity disk looks OK in those. I am guessing you will have to do the "invalidslot" command to rebuild disk4 again but wait till @johnnie.black replies.

Share this post


Link to post
3 hours ago, nickro8303 said:

I'm pretty sure it's dead as I know I heard a clicking sound the last time I booted up. 

If parity is really faileing you'll have a problem rebuilding disk4, but no harm in trying:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-All disks should remain assigned but re-assign any missing disk(s) if needed
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type:

mdcmd set invalidslot 4 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check.

 

If there are any issues or errors during the rebuild grab and post new diags.

 

 

Share this post


Link to post
20 hours ago, johnnie.black said:

If parity is really faileing you'll have a problem rebuilding disk4, but no harm in trying:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-All disks should remain assigned but re-assign any missing disk(s) if needed
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type:


mdcmd set invalidslot 4 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check.

 

If there are any issues or errors during the rebuild grab and post new diags.

 

 

Ok I followed the directions and started the array after applying the new config and running the command. It's now showing that disk 2 and 4 are unmountable. Parity-Sync/Data rebuild is in progress. Should I let it complete? 

image.png.bbafce02de6824457be3ba7c0f4d91a1.png

tower-diagnostics-20190211-1003.zip

Share this post


Link to post
6 minutes ago, nickro8303 said:

Should I let it complete? 

Cancel now, you didn't use the right command, and are also rebuilding, i.e. overwriting disk 2:

 

Feb 11 10:00:09 TOWER kernel: md: recovery thread: recon D2 D4 ...

 

Share this post


Link to post

Looks like you typed:

 

mdcmd set invalidslot 4 2

 

instead of:

 

mdcmd set invalidslot 4 29

 

Not that, looks like a copy/paste issue, I see on the log:

 

Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 29

 

but if I copy/paste from the syslog I get:
 

Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 2 9

 

this resulted in disks 4 and 2 being set invalid, instead of disk4 and parity2 as it should be.

 

Edited by johnnie.black

Share this post


Link to post
1 hour ago, johnnie.black said:

Looks like you typed:

 

mdcmd set invalidslot 4 2

 

instead of:

 

mdcmd set invalidslot 4 29

 

Not that, looks like a copy/paste issue, I see on the log:

 


Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 29

 

but if I copy/paste from the syslog I get:
 


Feb 11 09:59:33 TOWER kernel: mdcmd (32): set invalidslot 4 2 9

 

this resulted in disks 4 and 2 being set invalid, instead of disk4 and parity2 as it should be.

 

I see that. That's what I get for copy and pasting instead of typing I guess. Can I go ahead and start the process over again seeing as the parity drive seems to be working again?

Share this post


Link to post

Could be johnnie has gone to test what happens when you force it to rebuild 2 data disks with only one parity. I don't recall seeing this before and the fact that both disks are unmountable doesn't seem like a good sign.

 

Do you have backups?

Share this post


Link to post
4 minutes ago, trurl said:

Could be johnnie has gone to test what happens when you force it to rebuild 2 data disks with only one parity. I don't recall seeing this before and the fact that both disks are unmountable doesn't seem like a good sign.

 

Do you have backups?

I do have backups of the important stuff but the majority of my data movies and tv shows can be recreated from physical media. Not really worried about losing that data. Just want to get the server back to stable.

Share this post


Link to post
8 minutes ago, nickro8303 said:

Can I go ahead and start the process over again seeing as the parity drive seems to be working again?

You can try again, but data on both disks will have some (or a lot) of damage, most likely unfixable by xfs_repair, you might still be able to recover some data with a file recovery utility, like UFS explorer.

Share this post


Link to post
4 minutes ago, johnnie.black said:

You can try again, but data on both disks will have some (or a lot) of damage, most likely unfixable by xfs_repair, you might still be able to recover some data with a file recovery utility, like UFS explorer.

Ok question is which disk do I rebuild then? 2 or 4. I guess it doesn't really matter at this point.

Share this post


Link to post
17 minutes ago, trurl said:

Could be johnnie has gone to test what happens when you force it to rebuild 2 data disks with only one parity.

I tried to replicate this and Unraid crashed, it didn't started the rebuild, but looking at the screenshot it did start for the OP, so depending for how log it ran I fear it overwrote part of disk2.

 

14 minutes ago, nickro8303 said:

Ok question is which disk do I rebuild then? 2 or 4. I guess it doesn't really matter at this point.

I would try disk4, because after the overwritten part disk2 will be OK, disk4 we don't now how it is.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now