Jump to content

Drive died and Parity wants to rewrite


pinion

Recommended Posts

See screenshot. I had a drive that died and won't even let the server boot with it in. I've never even heard of that but it boots without the drive. Now it says Parity will be overwritten if I start the array. Should I replace the drive first?590bc1b25f392_ScreenShot2017-05-04at8_01_56PM.thumb.png.c2cfca0589e448c62ebd648b147b0269.png

Link to comment

Thanks for the help. I've attached it. I put that drive in an external enclosure and now unraid sees it but the Parity is still with a red x and says it will be rebuilt. Considering the system won't start with that drive I moved to an enclosure I don't trust it at all. Maybe I should hook the external up to another computer and copy everything I can off of it.

unraid-diagnostics-20170505-0939.zip

Link to comment

Do you have backups of any irreplaceable files?

 

First some comments that are unrelated to your issue.

 

You have a lot of FSCK*.REC files on your flash. These are the result of filesystem repair on your flash. It may be repaired now, but it makes me wonder if something got lost in the repair.

 

You have a lot of things installing from /boot/extra, including some things that are also being installed by nerdpack.

 

Now to your problem.

 

Disk4 is missing from the beginning of logs/syslog.1, but it isn't missing later in syslog.1, and it isn't missing in syslog. It appears in SMART with 2 pending sectors, but it doesn't look like unRAID ever disabled it. It isn't shown as disabled in system/vars.txt

 

Parity disk looks OK, but it is disabled, which is why unRAID wants to rebuild it. I don't see the write error that would have caused unRAID to disable the disk, but syslog.1 is truncated. It is only missing a few hours. See if there is a syslog.1 in /var/log and post it.

 

Possibly you just disturbed parity cable when you were trying to get disk4 working.

 

Obviously rebuilding parity with a missing data disk isn't a good idea. According to the screenshot disk4 is missing, but it doesn't look like it is missing now according to your diagnostics.

 

So is disk4 showing up as an array drive now because you put it in an external enclosure? Why do you say you don't trust it? Hooking it up to another system doesn't sound like a good idea. It is ReiserFS and you would need Linux to read it anyway. If you can read it where it is that is probably good enough.

 

Is that screenshot really showing us the current state of things? Is disk4 missing now?

 

Link to comment

The only thing irreplaceable are my pictures and I have backups somewhere. Need to test my backups I guess. 

 

When Disk4 is installed the system won't boot. I visited the system then turned on the external enclosure I put Disk4 in. That's why it wasn't there but now is. The reason I don't trust it is that the system won't boot with it installed. So the screenshot is no longer accurate and it shows that it is there. Is there a test I can do on that disk before doing anything?

 

And yes, between nerdpack and the trolley I was using before I need to do some cleanup. :|

Link to comment
5 minutes ago, pinion said:

The reason I don't trust it is that the system won't boot with it installed. So the screenshot is no longer accurate and it shows that it is there.

But if it is there and no longer missing, then it is installed as far as unRAID is concerned, and only parity needs to be rebuilt.

 

Is this a USB external enclosure, eSATA, or what?

 

When you saw that it wouldn't boot with the disk internal, did you try a different port?

Link to comment

Yes. I actually thought it was one of the controller cards since the system booted with all 3 removed. So I removed the card I thought was bad but when I put all the drives back in it still wouldn't boot. So I basically started trying each drive 1 by 1. That's when I discovered that drive e caused it not to boot. So I'm sure most of the drives have all been moved from where they originally started. But, I can try to insert that drive in a spot I know it hasn't been in. Or maybe a known good spot. 

 

The enclosure is USB. 

Link to comment

You could set parity to not installed and start the array without parity. That would let you work with it if you really needed to, but of course you wouldn't have protection.

 

Since it works fine externally, I really have to wonder if there is really anything about the disk that is causing the booting problem.

 

Could you try it in a motherboard port?

 

Are you sure you don't have a power problem?

Link to comment

If you boot and do not start the array, unRAID does NOT kick disks from the array. It just reports them missing. This is an important distinction, because if you fix the connection problem, they appear on the next boot, and all will be well.

 

Is it possible you gently nudged a cable while doing work inside the server. Nudging cables can cause them to loose connection, consistently or intermittently. Often the boot is fine and then they drop on a parity check or while being written to. It is actually better if you knock the cable hard enough that it just looses connection and is offline at boot.

 

There is a setting in unRAID that will cause the array NOT to start automatically. This gives you an opportunity to investigate drive connection issues before the array starts. I have this set myself, because I always want to make sure everything looks good before the array starts. And since I only reboot once in a great while, it is not an inconvenience. Last night I installed a new controller and the first few boots I had some issues, and unRAID told me drives were missing but there was no harm. I would very much suggest enabling this setting if it is not already.

 

I would suggest shutting down and doing an inspection of the cabling, especially to these two drives. Having a drive literally die - like a light bulb - blink and it is dead, is beyond rare. I don't think I have ever seen it happen in 10 years on the forum, and i don't want to say how many years I've been tinkering inside computers. They just don't fail like that. When the computer reports them missing, it almost always means an issue in the connection chain - broken connector on the drive, loose cable connection (by far the most common), a bad sata cable (2nd place), broken controller, etc. Occasionally it is not the sata cable, but the power connection that is impacted. This is most common with power splitters, but can also happen with a nudged power cable. The PSU can also be having issues - not powerful enough for example. It happens, but not common.

 

If you have one drive that drops, and start the array anyway, unRAID officially accepts that the drive is missing, and simply fixing the connection issue won't bring it back. Often users will fix a cabling problem, but when the drive doesn't magically appear back in the array, they think they didn't fix anything, when in fact the drive is now alive and well and waiting for them to do something to get it back into the array. There is a way to rebuild a single dropped disk, and there is even a way of slipping one or more disks back into the array and telling unRAID they are valid (even if they may not really be since they were outside the array for a time), but you have know how. It is not automatic.


In your current state, with no parity and a failed disk4, your data that was on disk4 is at risk. With a single parity unRAID can handle one but not two failed disks. It is good that parity is one of the ones affected, because if both disks are really failed, you only loose one drive's data. If both failed disks were data disks, parity would be useless and you'd loose 2 data disks worth of data. Don't want to scare you, but that's the situation. However, I think it very likely that the data on disk4 is fine. It just got kicked from the array and you;ll be able to get this resolved with no or minimal loss. But be very attentive and follow instructions provided. A mistake at this point could seal the deal and cause you to lose data.

 

I have not reviewed your logs in detail. You are in excellent hands with trurl. But wanted to provide this background info to assist you in understanding.

Link to comment

Sorry, I see I've made it confusing. Disk4 was missing because I took it out. I took it out because the server won't boot with it in. I then connected it via an external usb drive after I got the server to boot. That's why it was suddenly showing it was there.

 

 Now, I've put it back in using a known working bay. See the picture I took. The bootup stops there with the drive installed. The only way to boot is to remove that one drive 

IMG_20170505_135809.jpg

Link to comment
14 minutes ago, bjp999 said:

There is a setting in unRAID that will cause the array NOT to start automatically. This gives you an opportunity to investigate drive connection issues before the array starts. I have this set myself, because I always want to make sure everything looks good before the array starts.

 

FWIW array won't start with any missing array disk even if auto-start is enable, I do like to keep it disable in any server with a case pool because it will start with missing cache pool members and that can create problems with your pool, for users with no or single device cache no problem leaving auto-start enable.

Link to comment
2 minutes ago, johnnie.black said:

 

FWIW array won't start with any missing array disk even if auto-start is enable, I do like to keep it disable in any server with a case pool because it will start with missing cache pool members and that can create problems with your pool, for users with no or single device cache no problem leaving auto-start enable.

 

Thanks for keeping me honest. I thought that if it was one drive down, unRAID would start it. Maybe, back in the day, that was the way it worked. Still like being the one deciding if I want to start the array. After a dirty shutdown (e.g., power failure), I believe if the server reboots and kicks off a parity check (which is now non-correcting). I guess no harm, but I'd rather be in control.

Link to comment

Another note I'd like to make is I also tried to install the newest version of unpaid from scratch. I got the same thing and reformatted the usb stick and copied my old config back over to it. I know that the extra crap on my flash drive was brought up so I thought it might be important to say I also tried with a clean install and I still get the above issue and can't boot at all with that hdd installed. Also, thanks everyone for your help!!! Sounds like after this I need to buy 2 new drives for parity.

Link to comment
2 minutes ago, pinion said:

I also tried with a clean install and I still get the above issue and can't boot at all with that hdd installed.

I was going to bring this up again since flash problems are probably the most common reason for boot failure. Glad you took care of this.

 

3 hours ago, trurl said:

Are you sure you don't have a power problem?

But I'm still wondering about this. When it is in an external enclosure it isn't using the server PSU.

 

As you have been moving it around, have you also been using different power cables?

 

What is the exact model of your power supply?

Link to comment

There are 2 redundant power supplies from Ablecom Model PWS-902-1R. They feed into the backplate for the 26 bays. As you see above I'm not using all 26 but just 9. I've moved that disk around a few times with the same results. I noticed the screen did progress. I'll post that picture too in case something jumps out. But this was about half an hour after turning on and I still wasn't booted up. 

 

I'm going to format and put the latest unraid on again or maybe use a fresh USB stick. 

IMG_20170505_142303.jpg

Link to comment

Any idea what I'd have a ton of FSCK.REC files that are 8kb each? Also, I tried hooking that hdd directly to the MoBo and all the system does is flash a blinking cursor at me. I moved that drive to the bay closest to the powersuplies and I got the same thing as the last screen I posted. I'm backing up the thumb drive now and will try it with a clean version of unraid.

Link to comment
31 minutes ago, pinion said:

I tried hooking that hdd directly to the MoBo and all the system does is flash a blinking cursor at me.

Often the BIOS will try to "help" you by trying to boot from a new disk. Did you make sure it was booting from flash?

Link to comment
18 minutes ago, trurl said:

Often the BIOS will try to "help" you by trying to boot from a new disk. Did you make sure it was booting from flash?

Well, I feel like an idiot. That's what had happened. And it actually booted! Attached is the diagnostic although since I can see the bad drive it the drop down I'm not sure you'll see much in it. But who knows. Now, I'm going to put my files back on the flash and try to boot and see what happens. Maybe run to Frys...

tower-diagnostics-20170505-1255.zip

Link to comment
44 minutes ago, trurl said:

Often the BIOS will try to "help" you by trying to boot from a new disk. Did you make sure it was booting from flash?

Ok, I sort of merged a clean 6.3.3 along with my old one and got rid of the /extra and /custom directory so it wouldn't install all the extra crap. I'm booted back up and the array automatically started without parity. That Disk that had been giving me problems said it had 1 read error. I attached the diagnostics, the smart test for that drive (I checked it with a fast SMART and it said it had errors), and a screenshot of what I see now.

 

I've shut it down... I don't know where to go from here. But, I probably need to go buy a hard drive or two.. right?

tower-diagnostics-20170505-1322.zip

tower-smart-20170505-1319.zip

Screen Shot 2017-05-05 at 4.21.34 PM.png

Link to comment
4 minutes ago, trurl said:

Parity looks OK, it just needs to be rebuilt. Disk4 has 2 pending but I would wait till after parity rebuild to consider what to do about that.

Ok, when I get back home I'll turn the server back on and start a parity rebuild.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...