OK, this looks bad. What do I do next?


Karyudo

Recommended Posts

I am having a terrible time with unRAID. In the six months I've been running my server, I've had a new motherboard fail (just over a month ago), a new cache SSD fail (last couple of weeks), and now a new drive—which I conscientiously pre-cleared—look like it's failing.

 

I just got the SSD straightened out yesterday, and then in the evening, there seemed to be a power anomaly (which I have never noticed happening before in the five years we've lived here). My desktop stayed on, but the server reset. Seemed to boot fine, all the drives came up, but of course it hadn't been shut down properly, so a parity check was recommended. So I started that, and then went to bed.

 

This morning, one drive is marked with an 'x' and "device is disabled, contents emulated." But the parity check kept running until just now (14 hours or so later). The Main screen looks... not good. (Screenshot attached.)

 

I'm getting very, very, very, very tired of the stress associated with wondering if I'm going to lose data, and wondering what to do in what order to make sure I don't screw anything up. Could somebody please talk me through this?

 

(Oh, and before you think it: the UPS has been ordered.)

unRAID_failing.png.7c7ed56a0a941fed0828353ee3f74075.png

Link to comment
  • Replies 76
  • Created
  • Last Reply

Top Posters In This Topic

Huh. My order of operations was CA backup (December 18, when I had finished resetting everything on the new SSD), then parity check (late December 18 through almost all day December 19). Does the parity check really not write anything interesting to what turns up in the diagnostics? That's surprising; I'd have thought it would.

 

What can I try next that will definitely not screw anything up? I'd like to move forward, but I don't have a good idea of what will get done in the background to fix or ruin my array.

 

I could, for example, sort out which drives are connected to which controller cards. But I don't want to waste time gathering information on red herrings, if that's not useful.

 

Link to comment

These are the syslog's last entries:

 

Server/Media/localhost/0/d595daea9917d359763fe010abf7d7cf3bafa4c.bundle/
Dec 18 23:30:00 Shinagawa rsyncd[11388]: cd+++++++++ Plex/config/Library/Application Support/Plex Media Server/Media/localhost/0/d595daea9917d359763fe010abf7d7cf3bafa4c.bundle/Contents/
Dec 18 23:30:00 Shinagawa rsyncd[11388]: >f+++++++++ Plex/config/Library/Application Support/Plex Media Server/Media/localhost/0/d

Link to comment

Besides a couple CRC errors in the beginning log is full with the community applications backup, can't see what happened.

If the docker.img has errors or is completely full rsync on its own decides to log into the syslog instead of my designated log file.

 

Sent from my LG-D852 using Tapatalk

 

Link to comment

My main focus is to get back up and running with minimal (i.e. zero, if possible) data loss. Looks like the usual diagnostics are not available, but I hope you (Squid and johnnie.black) have some suggestions based on experience, nonetheless.

 

Can I take the array offline, reboot, and try again (as long as I'm no worse off than I am now by doing so)?

 

Looks like my controller topology is:

- all four 8TB data drives on a Vantec controller

- both 8TB parity drives on an Iocrest controller

- both 4TB WD drives on the mobo controller

- unassigned cache drive also on mobo controller

 

If a reboot doesn't change anything, then should I perhaps move drives off the Vantec controller, and onto something else (I have another Iocrest controller, plus space on the mobo and existing Iocrest controllers)?

 

What else can I provide to help make good decisions?

Link to comment

OK: progress. Powered up, and all drives (except for Disk 5) come up green on the new controller ports. Array started. Diagnostic file attached.

 

As predicted, Disk 5 is "unmountable." How do I go about rebuilding it?

 

Does Disk 5 need to be replaced? (At CA$250 per, I sure hope not. But I do have another new-but-not-precleared 8TB drive, if necessary.)

unRAID-diagnostics-20161220-1304.zip

Link to comment

Before I forget, this is unrelated but you need to replace disk3 SATA cable, there still are CRC errors.

 

Disk5 needing rebuild was expect, being unmountable was not, and it's formatted BTRFS, so it can be harder to fix.

 

So I would recommend first unassigning it and trying to mount it using the UD plugin so you can check if the actual disk is also unmountable.

Link to comment

You also have another problem, parity1 needs replacing, and this is an actual disk problem:

 

Model Family:     Seagate Archive HDD
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z840EA9H

197 Current_Pending_Sector  0x0012   093   093   000    Old_age   Always       -       2600
198 Offline_Uncorrectable   0x0010   093   093   000    Old_age   Offline      -       2600

Link to comment

Disk5 is also not good, without the syslog it's just guessing, but maybe its errors crashed the controller, still good to use different ports if available.

 

Model Family:     Seagate Archive HDD
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z840P64M
197 Current_Pending_Sector  0x0012   089   089   000    Old_age   Always       -       3888
198 Offline_Uncorrectable   0x0010   089   089   000    Old_age   Offline      -       3888

Link to comment

OK: have replaced Disk 3 SATA cable. Diagnostic attached.

 

Next, I will:

- install new 8TB drive to replace Disk 5 (and keep Disk 5 intact, just in case)

- disable Parity 1, by unplugging and disconnecting it

- see if I can stumble my way to rebuilding Disk 5 from Parity 2

 

Do I need to pre-clear the replacement Disk 5? If not, I imagine it would still be a good idea...?

 

(Mostly unrelated note: it would be nice if parity drives were explicitly labelled "Parity 1" and "Parity 2," instead of just "Parity" and "Parity 2.")

unRAID-diagnostics-20161220-1533.zip

Link to comment

You can just unassign Parity 1 in the GUI - no need to remove it physically. You don't have to pre-clear the replacement for Disk 5, but you might want to test it first. With two bad disks your array is unprotected at the moment and pre-clearing the new disk would take a long time. Personally, I'd want to get another disk in there as soon as possible but at the very least I'd run a short SMART self-test on it. The rebuild process is going to write every single bit of 8 TB to it.

Link to comment

Took out both Disk 5 and Parity drives. Put two new 8TB drives in.

 

Assigned one drive (serial ending LCHC) as Disk 5. Ran Short SMART test; completed without errors; SMART report attached.

 

I've left the other drive (ending MZHK) unassigned. I see it in the choices for Parity, and also under the Unassigned Devices plugin section.

 

Now, how exactly do I go about rebuilding Disk 5? I don't see anything obvious. unRAID says I can't start the array: "Invalid configuration: Too many wrong and/or missing disks!"

 

It would be nice to get the rebuild started before about 11 PM (PST), so it can be working while I'm asleep tonight....

unRAID-smart-20161220-1658.zip

Link to comment

That SMART report looks OK. The trouble with the short self-test is that it doesn't have time to test much of the drive but at least it confirms that the electronics are working and it hasn't been catastrophically damaged in transit.

 

Strange that the array won't start as you're allowed to have two of the original disks missing. Just checking that Parity 2, Disks 1 to 4 and 6 are all present and showing a green status. Disk 5 is a new disk with blue status and Parity 1 is unassigned? And that you powered the system down to replace the drives (ie. no hot-swapping)?

 

Perhaps the aborted parity check following the unsafe shutdown has upset it and it considers Parity 2 to be invalid too, which is not unreasonable as there are very likely to be some errors.

 

What I would do is seek johnnie.black's advice, but failing that I'd do Tools -> New Config and, using the screen grab in your first post as a guide, assign the correct drives to Parity 2, Disks 1 to 4 and 6. Then I'd assign the new drive to Disk 5 and leave Parity 1 unassigned (in other words, just as it currently is). Important: check the box labelled "Parity is already valid". Double check the assignments and then start the array. This time it should start and Disk 5 should rebuild.

 

This won't affect any of the data on your other disks and you still have the original Disk 5 as insurance.

 

If all goes to plan you'll still need to do a file system check/repair on Disk 5 once it has finished rebuilding.

 

If the array still won't start post your diagnostics.

 

Link to comment

Just checking that Parity 2, Disks 1 to 4 and 6 are all present and showing a green status. Disk 5 is a new disk with blue status and Parity 1 is unassigned? And that you powered the system down to replace the drives (ie. no hot-swapping)?

Yup. Just like that: no hot-swapping, blue status on Disk 5, green on everything else, Parity 1 unassigned. Screenshot attached. (Man, that 320kb maximum is unnecessarily stringent: I can't even get a full screenshot at original resolution into that size!)

 

Perhaps the aborted parity check following the unsafe shutdown has upset it and it considers Parity 2 to be invalid too, which is not unreasonable as there are very likely to be some errors.

The parity check after the unsafe shutdown wasn't aborted, but it did complete with what seems to have been a SATA controller issue. The Main tab didn't seem to have any issues with the Parity drive (although I guess the logs did). To remove the Parity drive, I didn't do anything ahead of time: I just shut down, and re-started without the drive in place. (And yeah, I'm quite sure I got the right drive, by confirming the serial number on the drive with the snippet johnnie.black included in his post.) Should I have instead done something in the GUI while the drive was spun up, to make sure it was more gracefully removed from the array?

 

What I would do is seek johnnie.black's advice...

Yeah, I'm hoping he pokes his head in again soon!

 

...but failing that I'd do Tools -> New Config [...] This won't affect any of the data on your other disks and you still have the original Disk 5 as insurance.

Thanks for the outline of what to try next, and a big THANK YOU for adding some confirmation that such a move won't affect the other disks!

 

screencapture-unRAID-Main-20161220-1926.jpg.7c9c03a6375b5b2dbe32164b95c784c0.jpg

Link to comment

I would just have unassigned Parity 1 in the GUI and left it physically connected with the aim of investigating it further once my data was safe.

 

Pending sectors are bad news in unRAID because they can't be read, but a cycle of pre-clear might convert them to remapped sectors, which makes the disk usable again, with the proviso that you keep a close eye on it and reject it if it gets worse.

Link to comment

Just checking that Parity 2, Disks 1 to 4 and 6 are all present and showing a green status. Disk 5 is a new disk with blue status and Parity 1 is unassigned? And that you powered the system down to replace the drives (ie. no hot-swapping)?

Yup. Just like that: no hot-swapping, blue status on Disk 5, green on everything else, Parity 1 unassigned. Screenshot attached. (Man, that 320kb maximum is unnecessarily stringent: I can't even get a full screenshot at original resolution into that size!)

 

That's probably a bug, try this:

 

-set disk5 to unassigned

-leave parity1 unassigned

-you should now be able to start array

-stop array

-reassign new disk5

-start to begin rebuild

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.