Jump to content
LAST CALL on the Unraid Summer Sale! 😎 ⌛ ×

What should I expect if I pull the power on a disk - My system is busted


vw-kombi

Recommended Posts

I qam testing unraid as a replacement for my readynas.

Nothing live as yet so I can play without risk.

 

I have it all built and working as I wish, so thought I would try to 'break' it is possible failure scenarios.

 

I have 5 drives, 3 x 3, 1 x 4 and 1 x 4 parity.

I pulled the power out of the 4TB data drive

Waited, no message

Tried to writ a text file to that drive and then got a message saying it is offline/error etc.

Re-connected the power - it did not recorver

Shutdown after a while and restarted - drive still in faulty state.

The status says 'unmountable: No file system'.

So I ticked the box and said format.  It said formatting, then said unmountable again, I seemed to be stuck in this loop.

So I stopped the array, which does not seem to give me any more options.

I started the array again - same issue   

So I powered off the server, powered it back on, and was asked to format again. I did this

This time the disk said device is disabled, contents emulated (from the parity I know).

The dashboard still has a cross on the disk with faulty.

In the main tab, I dont have a format option anymore.

 

So what do I do now ?  Firstly, I did not expect the disk top fail like that and need a format, and now that has been done, I expect it to rebuild from the parity.

 

I am used to my readnas raid 5 - just add a disk, and off it goes and rebuilds the array.

 

Edit - did some googling, seems I have to manually remove, reboot and re-add the drive again (with the array stopped) - rebuilding now.

I also found an option to do if you just had a cable issue - which I guess I should have found BEFORE I formatted the disk as it was asking me to do?.

 

 

 

 

 

 

Link to comment

Well, The rebuild said it had 60+ days to complete!

Turns out another disk - disk 2 - has a shedload of CRC read errors not, which I guess are causing the issues with the parity rebuild - and meaning it will never effectively finish.

I guess this is why some people have two parity disks!!!!!!

I pulled disk2 also - and the array will not start - too many missing disks.

 

So much for this test disk failure test - off to buy a new drive tomorrow and will have to rebuild the entire unraid system again..... only 3 days left on my unraid eval too...

 

I guess dont break what is not broken comes to mind.

 

Link to comment

Better now than later.

 

1. Unraid uses ALL the disks for protection, not just the parity disks. If you have an "unused" disk sitting in the parity protected array and it fails, you are unprotected until either that disk is rebuilt or you rebuild parity without it. Dual parity extends that to 2 disks, but the principle remains. Bottom line, ALL disks in the parity protected array should be tested fully and have no errors before they are trusted in the array.

 

2. Unraid, or ANY raid or NAS system is NOT a backup just because it can tolerate a drive loss. There are so many more ways to lose data, if you value your data, it needs to be backed up somewhere else, not just rely on a disk rebuild. File system corruption is a separate issue from disk failure. Usually when a drive fails, unraid will keep the file system intact and allow a full rebuild. But, as you found out, if you have a dodgy drive that hasn't fully failed yet, unraid may not be able to rebuild another failed drive.

 

Notifications when smart statistics change, and regular parity checks can help you pro-actively weed out drives that are close to failure but haven't actually dropped out yet. You should strive to keep all drives in the array in perfect health, and remove any that show signs of failure. If you don't, the odd drive that fails without warning will catch you out.

Link to comment

When you perform your next test, if/when you get the ‘Unmountable.  No file system" message and then prompted with the Format button, look and see if there is a warning icon next to the Format button and click on it.

 

You should be prompted with a message that Formatting a disk will lose all data on the disk and is not something you should do when attempting to recover a disk .... this is all from memory as it is some time since I last formatted a disk in my unRaid server.   If things have changed since I last formatted a disk, hopefully someone more knowledgeable about the current state will correct anything I have stated wrong.

 

Link to comment
5 hours ago, vw-kombi said:

disk 2 - has a shedload of CRC read errors not

CRC errors are usually indicative of poor cabling / loose connections.  Not usually a problem with the drive.  Quite possible that if you yanked the cable on the drive to remove it you disturbed other drive's cabling at the same time.

 

9 hours ago, vw-kombi said:

need a format, and now that has been done, I expect it to rebuild from the parity.

Because you formatted the emulated disk, you lost the files on it.  Formatting on unRaid is the same as formatting on any OS.  It wipes the drive.  Rebuilding from parity will result in an empty drive.  Unformatted means that because you pulled the power in the middle of a write, you got it at the exact wrong time when the system was updating the file structures / metadata.  You needed to run the file system checks on the drive to recover and then rebuild.

 

BTW, a UPS is always recommended on any server 

Link to comment
6 minutes ago, Squid said:

Physically, Hot plug is supported, but you still have to stop the array to switch out disks and start the rebuild process, so its not a true hotplug

Right, I mean not perform hot-plug during array in start condition.

Link to comment

Cheers,  Thanks for all the replies.  A few answers to these posts here - not the array is back finished :

 

As someone stated, as I formatted the drive, after parity rebuild, I have no data on the drive.  That was not obvious in the interface - I guess that is what the tick box is for - a warning to read all doco before you do that.

 

I have an onsite and offsite backup of the data stored on my NAS, ans this unraid will have the same once rolled out live.

 

I have a 1400VA UPS for it - 6% utilised when normal operation (42w), 9% while it was doing all that parity stuff.

 

I re-conected the cable to drive 2 after a power off, and no more CRC errors last night which allowed the parity rebuild to complete.

 

I will doco the exact process when a drive fails next - NEVER FORMAT - even though it asks for one - I will look oout for this warning when i do this next test - which I will have to do now as this was all very scary - 'You should be prompted with a message that Formatting a disk will lose all data on the disk and is not something you should do when attempting to recover a disk .... this is all from memory as it is some time since I last formatted a disk in my unRaid server.   If things have changed since I last formatted a disk, hopefully someone more knowledgeable about the current state will correct anything I have stated wrong.'

 

No hot plug on unraid...... well, that is a throw back to the 80's! 

 

 

 

Link to comment

OK - I re-did my tests - used below process.  Just asking if there was an scenario in the below test that would not have required a parity rebuild.  It was just a cable pull after all.

Thanks.,

 

Pulled cable

Write file to disk – disk showing as faulty

Shutdown array

Re-connect disk

Reboot

Drive has X – device disabled, contents emulated is still shown

Stoped array

Unassign the disk

Start array

Stop array

Reassign the disk

Start array -> a rebuild will commence

 

 

Link to comment

No. The drive failed because a write to it failed. Therefore the contents of the physical drive are not correct with relation to the emulated drive. A rebuild is the only way to get them in sync, without suffering any data loss

Sent from my SM-T560NU using Tapatalk

Link to comment

Thanks squid - so if I did not do the write test, and instead just re-connected the cable, then all well and good ?

Or should I have pulled tha cable, shutdown the artray, connected the cable, restarted the array and all good ?

 

Simulating a cable issue on an unused disk - as it one that was not being written too...

 

One last question - following on from this logic, does unraid ONLY report a failed drive when something is written to it ? 

 

 

Link to comment
7 hours ago, vw-kombi said:

Thanks squid - so if I did not do the write test, and instead just re-connected the cable, then all well and good ?

Or should I have pulled tha cable, shutdown the artray, connected the cable, restarted the array and all good ?

 

Simulating a cable issue on an unused disk - as it one that was not being written too...

 

One last question - following on from this logic, does unraid ONLY report a failed drive when something is written to it ?

Here is the progression that unraid follows to deal with a drive that is not responding.

1. A read is requested from the drive, and an error is returned from the controller

2. Unraid spins up the rest of the drives, computes the value of the data that should occupy that position from all the rest of the drives

3. That data is sent to the drive. If the write fails, the drive is red balled, and all further drive activity is emulated by the rest of the drives, if the write succeeds, the error counter for that drive is incremented and unraid continues normally.

 

So, to answer your question, pulling the cable and reading from the drive is enough to fail the drive. Once the drive is failed, any activity destined for that drive is instead done on the emulated drive, including any file system maintenance when unmounting and shutting down. There is no scenario where unraid will "unfail" a drive without a full rebuild, because the data on the emulated disk is different then what is on the failed disk. Remember, not all disk activity is file writes, there are housekeeping tasks that can occur when mounting and unmounting the drive, emulated or physical.

 

Unraid fails a drive when a write fails. BUT since unraid will try to reconstruct a read and write it back to the drive, a read from a non-responsive drive will fail it as well.

 

Unraid also monitors the smart attributes of the drives, and will report health based on some of those attributes, however it never uses the health report as a basis to fail the drive, only writes. 

Link to comment
On 7/5/2018 at 4:52 PM, Benson said:

Seems unRAID haven't said support hot-plug disk. When I first try unRAID, I also do many test but not hot-plug, so the result completely different as your founding.

 

Hot-plug can mean lots of things. It's the drives + controller cards that needs to handle electrical hot-plug.

 

Next thing is if you really want a system to perform automatic recovery or not. Automatic recovery is great after a trivial error. But if the error wasn't trivial, then an automatic recovery may destroy the work-around options an intelligent user might have based on the very specific situation that user is in. So it's normally good with systems that doesn't try to instantly recover from a failure but waits and gives the user a chance to decide what route to take.

 

Rebuilding from parity requires that the working disks really are working and doesn't give read errors.

So having a drive giving CRC errors indicates that the user should look at the cabling - or possibly a different controller card port - before trying a rebuild.

This wouldn't be an option if the system decided to auto-repair.

 

It's a bit interesting with the word "formatting". Lots of users thinks of it as a great thing to do, because it sounds like a great method to create order. And it does. But while an empty shelf is nice and tidy, it is often better to have a shelf filled with lots of nice items :)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...