Jump to content
We're Hiring! Full Stack Developer ×

Possible Fatal Issue-1 Drive Failed, 1 Drive Disabled


big.burlinski

Recommended Posts

Hello:

 

This is my first actual post, even though I have been an unRAID user since around 2006. I now actually have two unRAID servers, and I am having problems with my older original one (Tower), I'm not too creative lol. Both servers are unRAID 6.3.5.

Tower has a total of 7 2TB data drives and a 2 TB Parity drive of various manufacture, and I am familiar with replacing drives and maintaining the server, but I am sometimes a little lazy and frugal. 

Disk usage is 11.4 TB out of 14 TB used  for the array

My current situation is  and also apologize for not having captured syslogs, before I rebooted a few times:

 

Disk 5 (2TB Seagate) has dropped out and been disabled a maybe 2 or 3 times over the last 2 years,  and I thought up until now it was one of those "reseat the cables" and it comes back, then I rebuild it

 

2 or 3 days ago, Disk 7 (2TB WDC EAR) got disabled, and it is also a cabling type issue (I have disks 5,6, and 7 in a somewhat half assed drive bay in the top slots usually used for optical drives, it's an Antec 300) with a rigged up cooling fan. When the debacle started, Disk 5 was green and not having any issues

 

In the process of powering down the server to troubleshoot, it appears Disk 5 has now failed completely, so what I have is Disk 5 Missing, and Disk 7 Disabled.

 

I have removed both drives and tested with USB connected to my desktop, Disk 5 is  a hard failure, basically won't initialize, Disk 7 spins up and the data can be read on my desktop (Linux thankfully).  Disk 5 is reiserfs and Disk 7 is XFS if that matters.

 

Now to what I would like to do. I am now retired and I have plenty of time on my hands, but I would also like to downsize, I have a bunch of crap on this server that I could probably do without, especially with all the streaming services available, I really don't need to hoard this many movies and TV shows, however I would like to preserve some "favorites"  and my personal data, some of which I do have backups, but not all. I am thinking of slowly and methodically retiring Tower and migrate the "good stuff" over to Tower2 over the next month or two, so Tower really only needs to "limp" along for that amount of time.

 

So on Disk 7, the data is retrievable, but Disk 5 is "gone". I need help in figuring out if is would be possible to "trust" the data on Disk 7 since it really is "there", and emulate the data on Disk 5. I should mention I have not started the array since this began with the 2 failed disks, so no parity checks have started up to this point. I tried reading the trust my array procedure, but not sure what will happen with 2 failed disks, and I do know the golden rule that unRAID really can only rebuild one drive. There were  some minimal writes to the array obviously, but both the amount and and nature of the data I can live without 100% accuracy given the situation. The very first thing I would do is get whatever I can off of Disk 5 and then remove it from the array. Then proceed to do the same thing with Disk 7.

 

Any help or suggestions would be appreciated.

 

Thanks... Harry

 

Link to comment

Do you know if disk7 had anything written to it since it dropped offline, or has this disk been only read for a while?

 

To resolve this what you'd need to do is this.

 

1 - Get or find a replacement for disk5. Should be same size as old disk5.

2 - Do a new config and where disk5 had been, insert the new disk

3 - Start the array, trusting parity

4 - Immediately stop the array. Remove replacement disk5 from slot5, leaving the slot empty. 

5 - Start the array. (UnRAID will now start and be emulating disk5)

6 - Look at disk5. See if it looks generally good. Hopefully it will. But if disk7 had gotten updates while it was being emulated, it might appear unformatted. But there is still hope - especially since you are likely using RFS file system, which is best for recovering data from situations like this.

7 - Either way, stop the array, reassign a new disk to slot5 (doesn't have to be the same one you used in steps 1 and 2), and restart the array. 

8 - unRAID will rebuild disk5

 

Good luck!

 

Won't lecture you as you probably know. But it is a good idea to rotate in new drives. If you're going to push the lifespan, I'd strongly recommend dual parity which would be saving your butt right now if you had it. :)

Link to comment

Looks like you understand unRAID well enough to not make the mistake of trying to wing it without advice. SSD has given you the steps, but I thought I would add some other advice.

 

Since you are a long time user, but not very active on the forum, it might be that you have missed some things about V6.

 

Syslogs - we don't want them usually. They are included in the diagnostics zip file you can get by going to Tools - Diagnostics. We would much rather have the diagnostics zip, preferably before reboot if possible. Might even be good to post it now, since it also includes SMART for all disks if you want us to take a look.

 

Notifications - if you are going to set it and forget it, it is critical that you setup notifications so unRAID can notify when you begin to have problems. FIxing a problem before you have another can be the difference between losing data or saving it.

Link to comment

Thanks Trurl, I appreciate it. I did really mean the diagnostics zip file. 

 

I am going to throw one more thing out there, feel free to shoot it down.:D . I do know about trying to do too many things at once, believe me.

 

Any possibility of manually getting what I want off of disk 7 external to the unRAID server, then preclearing that and using it to replace disk 5?

Basically following your procedure but not putting disk 7 into the new config? 

Link to comment
4 minutes ago, big.burlinski said:

Thanks Trurl, I appreciate it. I did really mean the diagnostics zip file. 

 

I am going to throw one more thing out there, feel free to shoot it down.:D . I do know about trying to do too many things at once, believe me.

 

Any possibility of manually getting what I want off of disk 7 external to the unRAID server, then preclearing that and using it to replace disk 5?

Basically following your procedure but not putting disk 7 into the new config? 

Parity plus all other disks are required to rebuild a missing disk. Without disk7 you can't rebuild disk5.

Link to comment
33 minutes ago, big.burlinski said:

Hey, thanks for the reply! I didn't want to necessarily purchase any more drives (I don't have any more 2TB drives around I know I know lol), but that is sounding like the best option unless I just want to completely lose disk 5. They're not really that expensive. Thanks !

 

If you don't want to replace disk5, there is another option. 

 

After step5 in my instructions, disk5 is being emulated. At this point, anything you wanted to salvage from disk5 can be copied to another disk (in the array or over your network to a workstation).

 

You DO need a 2T drive for use in steps 1 and 2. Often a failing disk is alive enough to be recognized by the BIOS and serve that purpose.

 

13 minutes ago, big.burlinski said:

Any possibility of manually getting what I want off of disk 7 external to the unRAID server, then preclearing that and using it to replace disk 5?

Basically following your procedure but not putting disk 7 into the new config? 

 

You can do this in another way. You can delete / move the unneeded data from disk7 (or a combination of disks), and then copy the data you want to keep from emulated disk5 to disk7 (or combination of disks). Once everything on disk5 you want to salvage is complete, you can do a new config, redefine the disks as you like (disk7 can become disk5), and rebuild parity.

Link to comment

Yeah I know it's a risk running in emulated mode especially with another questionable disk, but given the circumstances, I think I will try that and if I do buy drives in the future it will be 3 or 4 TB for Tower2 . (Presently only 3TB parity so that's the max without replacing the parity drive)

 

One thing I want to clarify, regardless of what I do, in steps 1 and 2, I need a physical drive there, correct? It can't be "missing" correct?

Link to comment

I recommend copying to another system if you want to follow SSDs idea of copying the emulated disk5 and possibly disk7 that way you won't be writing to an unprotected array. It may not be any slower copying over the network than writing to the array anyway. Either way it will have to spin all disks to read the emulated disk5, but only disk7 will need to spin to read it. But any writes to the array will still update parity and the parity update will also require all disks to spin.

 

The New Config must include a disk5. Hopefully it will work enough for it to see that it exists.

 

The less risky procedure is going to be to rebuild disk5 to a new disk and proceed with the downsizing and other changes after everything is stable again.

 

 

Link to comment
16 minutes ago, big.burlinski said:

Yeah I know it's a risk running in emulated mode especially with another questionable disk, but given the circumstances, I think I will try that and if I do buy drives in the future it will be 3 or 4 TB for Tower2 . (Presently only 3TB parity so that's the max without replacing the parity drive)

 

One thing I want to clarify, regardless of what I do, in steps 1 and 2, I need a physical drive there, correct? It can't be "missing" correct?

 

Yes - unRAID won't allow you to assign a slot to something that is not a drive. And it has to be the right size. Like I said - the failed drive should work. If you were using 6.4, you could put any disk that is the right size and unRAID would not touch it. But with 6.3.5, unRAID will partition the disk and render its contents unusable if it hadn't been already partitioned by unRAID (you can recover if that happens, but I'd find it hard to do if it were me!)

Link to comment

I'm going to buy one 2TB drive 9_9. The current disk5 will only show "missing" in unRAID and it shows the s/n from the config. So I do need to put something in there to emulate it and I guess I could actually rebuild it now so for the time being the current plan is to follow your entire procedure. thanks again! Eventually I will have a few old  2TB drives for somewhat questionable backup lol

Link to comment
19 minutes ago, big.burlinski said:

I'm going to buy one 2TB drive 9_9. The current disk5 will only show "missing" in unRAID and it shows the s/n from the config. So I do need to put something in there to emulate it and I guess I could actually rebuild it now so for the time being the current plan is to follow your entire procedure. thanks again! Eventually I will have a few old  2TB drives for somewhat questionable backup lol

 

Before you do what I suggested, make a backup of your config directory. That will allow you to get back to this point in time. That reminds me, if you have a USB backup, that would also be usable at this time to recover your configuration.

 

You would only see the missing if you had not done a new config.

 

To see if the dead disk is recognized, boot the server, go to a new slot, and click the little dropdown arrow. If you see the disk5 there, my approach should work for you.

Link to comment
2 minutes ago, SSD said:

 

Before you do what I suggested, make a backup of your config directory. That will allow you to get back to this point in time. That reminds me, if you have a USB backup, that would also be usable at this time to recover your configuration.

 

You would only see the missing if you had not done a new config.

 

To see if the dead disk is recognized, boot the server, go to a new slot, and click the little dropdown arrow. If you see the disk5 there, my approach should work for you.

 Oh I see, I need to try that I didn't realize it would show missing until I do a new config I'll try that now Thanks !

Link to comment

There is another way to accomplish this.  Of course first make a backup of your usb flash device, at least the contents of 'config' directory.

Go ahead and yank disk5 out.

Next, use New Config tool to initialize a new config, but preserve all the assignments.  If you don't preserve, that's ok, just assign all the slots the way they were and leave disk5 unassigned.

Now go back to Main and leave the browser on that page.

Open a terminal window (webTerminal, telnet, ssh, or console) and type this command:

mdcmd set invalidslot 5 29

Next, DO NOT check the 'trust parity' box, just immediately click the Start button on the page in the browser.  It's important that you don't 'refresh' the Main page after typing that command, and don't install or remove a physical device or else it will undo the effect of the command.  What should happen, in your case, is array will Start and disk5 will be emulated.

 

What this does is tell the md/unraid driver which 2 device slots start out "invalid".  In a normal new array case, the two numbers are 0 and 29 (0 means parity slot, 29 is parity2 slot).  The above command is telling driver that disk5 and disk29 (parity 2 slot) are invalid but all the others are valid.

Link to comment
9 minutes ago, limetech said:

There is another way to accomplish this.  Of course first make a backup of your usb flash device, at least the contents of 'config' directory.

Go ahead and yank disk5 out.

Next, use New Config tool to initialize a new config, but preserve all the assignments.  If you don't preserve, that's ok, just assign all the slots the way they were and leave disk5 unassigned.

Now go back to Main and leave the browser on that page.

Open a terminal window (webTerminal, telnet, ssh, or console) and type this command:


mdcmd set invalidslot 5 29

Next, DO NOT check the 'trust parity' box, just immediately click the Start button on the page in the browser.  It's important that you don't 'refresh' the Main page after typing that command, and don't install or remove a physical device or else it will undo the effect of the command.  What should happen, in your case, is array will Start and disk5 will be emulated.

 

What this does is tell the md/unraid driver which 2 device slots start out "invalid".  In a normal new array case, the two numbers are 0 and 29 (0 means parity slot, 29 is parity2 slot).  The above command is telling driver that disk5 and disk29 (parity 2 slot) are invalid but all the others are valid.

 

Hey Thanks, I am going to try this when I am sufficiently caffeinated! I would love not to buy another 2TB . Thanks again!

Link to comment

I tried this, and the array is started. I do not think it worked, Disk 5 now has Red X and says Not Installed, also Unmountable. 

 

I do not see that message that says something like drive disabled, content emulated

 

In the lower section it says Unmountable Disk Present  disk 5 with the format option. 

 

However, I do not have a system share for Disk5, that's the part that I am not sure if I should or not. Believe it or not I do not really know any specific files that were definitely on disk5 at this point.

 

Thanks

Link to comment
21 minutes ago, big.burlinski said:

I tried this, and the array is started. I do not think it worked, Disk 5 now has Red X and says Not Installed, also Unmountable. 

 

I do not see that message that says something like drive disabled, content emulated

 

In the lower section it says Unmountable Disk Present  disk 5 with the format option. 

 

However, I do not have a system share for Disk5, that's the part that I am not sure if I should or not. Believe it or not I do not really know any specific files that were definitely on disk5 at this point.

 

Thanks

 

What version unRAID OS are you running?

Link to comment

What this means is the disk is being emulated, but the part of the emulated disk that contains the formatting details is corrupted. 

 

Potentially explainable by writes to disk7 while it was being emulated. Each of those writes would now be corrupting parts of the emulation of disk5. (If you understand why, you understand unRaid pretty well)

 

File system repair on emulated disk is possible, and may or may not provide usable results. Often it is partially successful. Will be interesting to see if Tom has any other ideas on how to proceed.

 

Can you confirm the file system is RFS? RFS has issues, but one of it's strengths is the ability to do these types of repairs better than other file systems.

 

Stay tuned...

Link to comment
1 minute ago, SSD said:

What this means is the disk is being emulated, but the part of the emulated disk that contains the formatting details is corrupted. 

 

Potentially explainable by writes to disk7 while it was being emulated. Each of those writes would now be corrupting parts of the emulation of disk5. (If you understand why, you understand unRaid pretty well)

 

File system repair on emulated disk is possible, and may or may not provide usable results. Often it is partially successful. Will be interesting to see if Tom has any other ideas on how to proceed.

 

Can you confirm the file system is RFS? RFS has issues, but one of it's strengths is the ability to do these types of repairs better than other file systems.

 

Stay tuned...

Yes, Disk5 is RFS (reiserfs). Don't forget that it is not even installed anymore, the last set if instructions said I could just yank it, and it's not being seen by the system anyway. 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...