Jump to content

Server disabled Disk12 on power up after power outage - During data rebuild of disabled drive, Disk1 reported Errors


Recommended Posts

Good day everyone!

 

I would greatly appreciate help figuring out what’s up with my UNRAID server. 

 

It’s running 6.12.6 and hadn’t had any issues until today after the server shutdown due to a power outage. The server appeared to have safely/gracefully shutdown successfully, however when I powered it back up, Drive 12 of the array was “Disabled”

 

I followed the standard procedure to get it back into the array. About 2% into the data-rebuild for Disk12, Disk1 threw 168 errors and the rebuild stopped for a few minutes.  - Update, the error count is up to 248 on Disk1 and the Extended SMART test reports as Interrupted:Stopped by Host

 

The rebuild was continuing along and no other errors had occurred (at 3.1%) but I am wondering if I should consider replacing one or both the drives. I will attach the full diagnostic file.

So far no SMART errors on either of the drives. I am still waiting for the extended tests to complete.

 

The original drive that was disabled is a USB external drive and the other is an internal HDD (May be connected through a cheap PCIe 8 port SATA card.)

 

Thanks for reading!

tower-diagnostics-20240110-1417.zip

Edited by unraidersofthelostark
Link to comment
  • unraidersofthelostark changed the title to Server disabled Disk12 on power up after power outage - During data rebuild of disabled drive, Disk1 reported Errors
6 minutes ago, trurl said:

USB not recommended for array or pools for many reasons, including the disconnects on multiple disks you are currently experiencing.

 

 

Thank you for the heads up! I had not known that when I started my UNRAID journey a few years ago, and plan on phasing them out as they die/need to be replaced. Please correct me if I’m wrong but a faulty USB device wouldn’t cause UNRAID to report errors with an internal drive, would it? 

 

I’ve also noticed the rebuild process seems to be stopping frequently, although no additional errors or issues are reported. The average speed is a few MB/s due to the frequently disruptions.

Link to comment

The rebuild is having problems because the disk interfaces are continuously resetting. Take a look at your syslog.

 

5 minutes ago, unraidersofthelostark said:

a faulty USB device wouldn’t cause UNRAID to report errors with an internal drive

No. Which internal drive are you referring to?

Link to comment
1 hour ago, unraidersofthelostark said:

@trurl Disk1 of the array, the ST8000DM004 (Which is just found out is SMR, same model as what I have as parity) 

You may have to disable spindown on that disk to get it to complete extended self-test, but otherwise it looks OK.

 

I have SMR in my array, but not as parity. Probably would get better performance with CMR drive as parity, but that is the least of your trouble.

 

1 hour ago, trurl said:

Take a look at your syslog.

 

Link to comment

@trurl Oof, I'm assuming you're referring to how it's full of ATA bus errors along with the link dropping to 1.5gpbs? I also see "md:recover thread:multipledisk errors, sector=296933472" and then it seems to increment by 8 sectors on the next message of the same error. 

 

What would I do in a situation such as this? Disk12 dropped out, didn't think much of it since it's USB and like you said isn't the reliable. Then during the rebuild Drive1 apparently died (Those errors seem plentiful and serious). Considering I was 5.8% into the rebuild of Disk12, what should my next step be? I picked up 2x used WD 8TB RED NAS drives from 2017 for $85/each so I have drives to swap if that's recommended. Going to try to preclear them in another system so they're ready to go + checked for SMART errors.

 

Thank for the tip on disabling spindown, I have the delay set to 8 hours up from 4 but will disable it entirely. 

Link to comment
44 minutes ago, unraidersofthelostark said:

Disk12 dropped out, didn't think much of it since it's USB and like you said isn't the reliable.

You should think much of it. In order to reliably rebuild every bit of a disk, every bit of ALL other disks must be reliably read. Parity by itself can rebuild nothing.

 

Probably nothing really wrong with the disabled disk, and maybe its contents are mostly OK without rebuilding if you haven't already messed it up by trying to rebuild on top with all these other problems.

 

I often recommend not even bothering with parity when you have USB in the array because of these problems. If you have no parity, there is nothing to get out of sync, so nothing to rebuild. On the other hand, of course, nothing can be rebuilt.

 

49 minutes ago, unraidersofthelostark said:

I picked up 2x used WD 8TB RED NAS drives

Really, the best thing would be to start a new array with only the SATA disks, and copy data from the USB disks as Unassigned Devices. You could use one of those WD Reds as parity and so get CMR parity while you're at it.

Link to comment

@trurl I really appreciate the advice!

 

I regret leaving any USB devices in my array and have certainly learned from this experience, that 8TB is the last one left.

 

So it sounds like stopping the rebuild is my best bet of saving the data on Disk12. Not sure how to handle the array effectively having two "bad" disks. As you stated there was probably nothing wrong with the USB disk but I have removed/readded it and started the rebuild. 

 

If there is a way to get a list of the contents on the two drives in questions I should be able to restore the files from a backup. The really unfortunate part of this setup is I do not have connectivity for another disk to replace the USB disk as an internal. 

 

I would strongly prefer to not have to rebuild that array (50TB) from scratch but understand if it's at that point. Would I still be able to start the array and pull the rest of the data off other drives if I disconnect the USB disk and fire up the array?

Link to comment

You can New Config with only the SATA disks, rebuild parity onto one of those WD Reds. The contents of the SATA data disks will not be affected and their data will be included in the new array.

 

Then you can rebuild one of the smaller disks with that other WD Red and so get more capacity. Many of your disks are really too full. You should always keep some free space in case the filesystem needs to be repaired.

 

Then you can work with the USB disks as Unassigned Devices. And think about upsizing some of the other small SATA disks in the array.

Link to comment

Thankfully I did! Though it was a painful restore to pull that much data from the cloud backup (Didn't have a local backup server yet).

 

One of the replacement drives threw CRC errors, which is probably a cable or other issue unrelated to the drive, think I should run additional tests after swapping the cable and running an extended SMART test?

 

Thanks! Hope your weekend is going well

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...