Disk Errors


Recommended Posts

Hey, so I know this may seem odd, but I had a power outage this morning and it caused read errors on a bunch of my drives. See, I do have a battery backup unit but it's not a particularly nice one. Anyways, I rebooted the server and all errors magically washed away only for some reason one of my drives has an X by it and it won't let me do anything with it, there was even notification about how drives no longer have errors. How do I ignore this red X and let the drive come back? I know 100% without any reservations that the hardware is fine and the data is prolly perfect still too as the server wasn't doing anything really when the power went out. The drive that is being emulated was one of the drives that had almost literally nothing on it save from a few random files here and there. I just want unraid to stop emulating the drive, that's it.

 

image.thumb.png.a5867366b34ed583f33fde1f4c70b177.png

Link to comment

Currently copying data from the dead/dying drive now via the command line onto other array drives. assuming I don't have any other failures, I should be able to save the lion share of the data. I just need to get a few more drives so that I can have a staging area to copy stuff off of the current drives and run a preclear to confirm that they are physically intact. It may also be time to drop in a second parity drive too. 😒

Link to comment

Just went out and bought 3 new 10TB drives. The one that's totally dead is an 8TB so a small upgrade. The repaired one may or may not be good and it is also an 8TB drive. If the xfs_repair drive is actually physically good and passes preclear, I will then look at replacing one or two of my older 4TB drives. I also really want to stick with having two parity drives now to keep this from happening again. Looks like my next week will be taken up by and full of preclearing drives and copying data over to a new array since this one is basically gone and I want to confirm which if any of my drives died. Though I am always open to recommendations if anyone has any.

Link to comment

And now I realize I have unRAID Plus and not Pro and you can't start the array with more drives installed, even if they aren't going into the array. Yippy! 😩 Man, my day keeps getting better by the second. I know two of these drives aren't even going back into the system, making me be under the limit of 12, except I would have to unplug two drives and I won't even be able to pre-clear all three new drives with the array started in emulation mode. I will say it would have been nice if the fact that even OUTSIDE the array you can't have more than 12 drives installed at all was stated more clearly otherwise I prolly would have just gone and bought Pro, to begin with. After spending $1000 on the new drives and a new battery backup, with 'pure sine wave', since mine is obviously not working well enough to keep surges from destroying my server, what's another $50 right? 😕

 

You know what? I am going to shut my server completely off. I think I have had it for tonight and I don't think I can even stand the sight of it right now. I'll get some sleep, get back at it in the morning all fresh and bright-eyed. 🤷‍♂️

Link to comment

Upgraded my license, preclearing the new drives, finishing command-line recovery off the drive I thought was clicking and so far, nearly all the data is good and safe. No idea how it could click before yet now seems to be fine but I am not complaining right now. If I can save the data, that's all I care about. I have a partial backup to a paid for business Gsuite account. The only reason why the backup is outdated is the fact that my internet, even though I have 1Gb down, I only get 35Mb up. I have been permanently uploading for over 8 months now and it just hasn't finished. At least my nextcloud data is all 100% safe and fully backed up as well as the hardest to replace data I have. The only thing that is even partially losable is some of my more recent media. I did it in this order so that if something ever went wrong I could restore what is most important and irreplaceable. All in all, I am just happy I went with unRAID over any of the other methods, I mean, with a hardware raid or ZFS raid using something like FreeNAS I would have lost everything in one fell swoop. Sure it's gonna take me over a week, maybe longer, to get back up and running normally but I like the idea of the small win first.

 

I know this has become more of a blog about what's happening to me but I started it and it might still be interesting to some out there.

Link to comment

Got all the new drives installed ina brandy new array on their own, I am now copying files from the unassigned devices plugin into the new drives until the old drives are clear and then running preclear on those freshly empty drives. I have precleared BOTH of the drives that failed before, including the one that I thought was clicking and they both not only allowed me to save every bit unharmed, but a preclear operation on them both finished without a single error. Could I have just heard a cable get into a fan somewhere? I mean, there are nine fans in my system and I have done the best job I could with cable management considering the current 14 drives crammed in the front of the server chassis. I did pull and reseat literally every SATA data and power cable in the entire machine and I can tell you at least one of those cables wasn't clicked in all the way, though it had never been a problem before. All I can do now is trust that the drives were just very confused and tagged bad by unRAID because of the power surge and that everything is probably fine. I mean, they both are less than a year old and weren't even that heavily used really, which is why saving the data off of them wasn't actually that bad. Those two drives are just waiting till I transfer everything from some more of my other 10TBs onto the new 10TBs fully so I can stop the array and add them and two of my older 4TB drives at the same time as the second parity drive.

 

When you have almost 40TB out of a 62TB array to copy over MAYBE two drives at a time, it seems it can take quite a LONG LONG LONG time. LOL 😁 Thank the powers that be, I haven't lost a single file so far and will run file integrity soon as everything is copied over.

 

I canceled my order on Amazon for the new battery I was going to get. It was a Line interactive unit, way better than the POS standby unit I had, but honestly, that wasn't enough for me to feel perfectly safe that power issues wouldn't end up being a problem in the future. So I hopped on NewEgg, they had a better price, and I got a big ol' Online/Double Conversion style unit. Yes, yes it was more than twice the price but the full electrical isolation from my older homes wiring was worth the investment in my opinion. The new battery is rated for 1350 Watts, max and even though it is 55 pounds and absolutely massive, it is whisper quiet and performs VERY well, better than any battery unit I have ever owned in my life. It's a real pro level unit too with expandable runtime if I ever felt that was necessary. It's the isolation and zero switch time that really sold me on it. I mean, it is to protect what has become a very large investment for me and the peace of mind alone is probably worth the price of admission anyway.

 

Well, this won't be my last post about this of course but I won't be making too many for a while at least because it is mostly just lather, rinse, repeat, with copying files, preclearing drives, adding them back to the array. Probably will be at least another week and likely a bit more till I can spark everything back up and hopefully return to normal operation. Wish me luck!!! 👍

Link to comment

Just a quickie update. Last two drives have finally started to copy their data onto the 'new' array. so far every drive without fail has passed the preclear with flying colors and no read/write errors getting tot data back onto the array. Soon as this finishes copying, it will be time to switch back over to my pfSense VM and stop using Panicrouter. Yes, Panicrouter is actually the name I game to my old Linksys whatever the heck it is that I am using to keep from having the internet go down every time I take the array offline to do anything. I am losing a good 50% or more of my speed and my latency has been toilet level since I switched over, so that will be nice. At least I can build the second parity disk while the server is online and properly functional, I just won't be running any of those automated services till the second parity is built.

 

To anyone who is interested, the new battery backup has been working great in double conversion/online mode. The thing is so much quieter than I expected. It is a Tripp Lite SU1500XLCD in case anyone was interested in which model I grabbed. I don't have a proper rack set up right now and even when/if I do, this can just sit next to it as getting the floor box version saved a few bucks.

 

Anyways, hopefully, I won't be posting, implying I haven't had any problems, till at least after the last copy from the last two orphaned drives is done. An adventure this has been but a learning experience as well. 😜

Link to comment

The epic of my server seems to finally be over... The last drive of all of them finally finished preclearing this afternoon. I added to the array and added one to the second parity slot just a few minutes ago the last drive. So far, every bit of data is there and accounted for. The array is now 82TB but now also has that second parity disk, which is building right now in the background. When I sparked plex back up after switching back to my pfSense VM for internet and everything just hummed back to life like nothing was wrong, it made me so happy. It's like my server was just turned off and let sit there for a few weeks for no reason and nothing running on there is the wiser that it was a bumpy ride.

 

Turns out that the new battery somehow bugs out if I have the USB plugged in while the server is booting, the fix is to just run it without the battery connected to the server. Of course, this is a little annoying because I won't be able to have the server automatically shut off this way in case of a power outage but I'm home almost all the time so I think the better than a half hour runtime is probably good enough for me to do it manually if something goes wrong. At least until I decide to bite the bullet and get the network card for the battery unit and get a script put together that can do it for me. This is the least of my worries anyway.

 

This will probably be my last post about this adventure that I have been on here and I leave with a smile on my face. Sure it was expensive, sure it was very sketchy for a while there, sure it took WAY longer than I expected... BUT! Not one bit of data was lost, Not one drive failed the preclear at all or showed any errors of any kind. I'm happy.

 

Words to the wise, always keep your backups up to date. If you have expensive hardware or data you can't afford to lose, back it up and get a good battery and save yourself the headache I had, it's worth every penny.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.