disk became unmountable and I lost 2 TB worth of data?


Recommended Posts

Is there anything I can do?

 

I was browsing my NAS and all of a sudden disk 2 disappeared. I rebooted my NAS and it came up with disk 2 unmountable, format to continue. I formatted (probably a mistake, I assumed it would rebuilt my parity) and nothing, no more data...

 

Prior to me reformatting my data was gone, I checked, 2500 movies down to 1500....

 

Is it hosed and how the heck did this happen? I am on 6.2 beta 21.

 

My questions:

 

1. I am sure me selecting format was a bad choice but, why did it reboot and come back "healthy" and online yet definitely not since it was missing 2 TB of data?

2. Is there anything that can be done? Can I manually access the parity? can a parity be rebuilt or did I screw that up the minute I formatted the drive? I don't think so though since like I said, data was GONE prior to me hitting format.

 

I wish I knew what was even on that drive... this is frustrating beyond belief and making me extremely worried to even turn on the server and lose more drives...

Link to comment

in my system log:

 

May  5 23:13:29 Hades kernel: XFS (md2): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_free_ag_extent+0x419/0x558
May  5 23:13:29 Hades kernel: CPU: 2 PID: 6733 Comm: mount Not tainted 4.4.6-unRAID #1
May  5 23:13:29 Hades kernel: Call Trace:
May  5 23:13:29 Hades kernel: XFS (md2): Internal error xfs_trans_cancel at line 990 of file fs/xfs/xfs_trans.c.  Caller xlog_recover_process_efi+0x148/0x155
May  5 23:13:29 Hades kernel: CPU: 2 PID: 6733 Comm: mount Not tainted 4.4.6-unRAID #1
May  5 23:13:29 Hades kernel: Call Trace:
May  5 23:13:29 Hades kernel: XFS (md2): xfs_log_force: error -5 returned.
May  5 23:13:29 Hades emhttp: mount error: No file system (32)

hades-syslog-20160505-2336.zip

Link to comment

You definitely were wrong to select Format as that is an instruction for unRAID to create an empty file system on the disk. At that point parity is updated to reflect the disk having an empty file system.

 

A disk suddenly coming up unmountable normally means some sort of file system corruption has occurred on the disk and the correct way forward is to put the array into Maintenance mode and run the repair tool appropriate to the format in use.  Often the disk becoming unmountable will be accompanied by the disk being disabled (marked with a red cross) due to a write having failed (which is why corruption has occurred).

 

In terms of recovering data after incorrectly issuing format, then if the disk happens to be in reiserfs format there is an excellent chance the recovery tool can retrieve most of the files.  I do not believe the tools for other formats are as good, so you will have to wait until someone else chimes in with suggestions for those.  I think I have seen posts suggesting that for XFS there may be a tool that can be used under Windows to recover files in such a scenario.  No idea about Btrfs.

Link to comment

You definitely were wrong to select Format as that is an instruction for unRAID to create an empty file system on the disk. At that point parity is updated to reflect the disk having an empty file system.

 

A disk suddenly coming up unmountable normally means some sort of file system corruption has occurred on the disk and the correct way forward is to put the array into Maintenance mode and run the repair tool appropriate to the format in use.  Often the disk becoming unmountable will be accompanied by the disk being disabled (marked with a red cross) due to a write having failed (which is why corruption has occurred).

 

In terms of recovering data after incorrectly issuing format, then if the disk happens to be in reiserfs format there is an excellent chance the recovery tool can retrieve most of the files.  I do not believe the tools for other formats are as good, so you will have to wait until someone else chimes in with suggestions for those.  I think I have seen posts suggesting that for XFS there may be a tool that can be used under Windows to recover files in such a scenario.  No idea about Btrfs.

 

Ya, the format was dumb, but if the disc was dead why didn't my parity kick in? I've lost disks before, this was kind of frustrating beyond a doubt

Link to comment

The file system on the drive likely became corrupt. This can happen due to faulty RAM, faulty hardware, and in some cases just happens. ECC ram can usually prevent the latter, but most people don't use it.

 

Parity can not repair a file system, and that is why it didn't attempt a rebuild. unRAID couldn't see a valid partition on that drive. Your data was likely 100% recoverable by repairing the file system, however the second you hit format your data went poof and parity was immediately updated to reflect this. If this happens again, you should do a file system repair on the drive. NEVER format a drive with data on it. In the future I suggest using a program that lists out every single file on the server, on a per-drive basis, in a single txt file. That way you can know exactly what you lost if you run it regularly. Sorry this happened, sometimes we learn things the hard way.

 

On a side note, 1000 movies on a 2TB drive? That's only 2GB a movie? Perhaps use this as an excuse to upgrade to better quality... 2TB drive only stores about 50 movies for me (35GB per movie). I guess that's the pro of using full quality BD rips, you lose a heck of a lot less content. :P

Link to comment

The file system on the drive likely became corrupt. This can happen due to faulty RAM, faulty hardware, and in some cases just happens. ECC ram can usually prevent the latter, but most people don't use it.

 

Parity can not repair a file system, and that is why it didn't attempt a rebuild. unRAID couldn't see a valid partition on that drive. Your data was likely 100% recoverable by repairing the file system, however the second you hit format your data went poof and parity was immediately updated to reflect this. If this happens again, you should do a file system repair on the drive. NEVER format a drive with data on it. In the future I suggest using a program that lists out every single file on the server, on a per-drive basis, in a single txt file. That way you can know exactly what you lost if you run it regularly. Sorry this happened, sometimes we learn things the hard way.

 

On a side note, 1000 movies on a 2TB drive? That's only 2GB a movie? Perhaps use this as an excuse to upgrade to better quality... 2TB drive only stores about 50 movies for me (35GB per movie). I guess that's the pro of using full quality BD rips, you lose a heck of a lot less content. :P

 

Am I to assume I should not use this drive anymore? Or should it be safe and just that partition for some reason become corrupt?

 

1000 was a bit high, I lost about 1000 movies and tv episodes, about 440 movies, but yes I am in the process of getting better quality stuff now, I have had my collection since about 2003 so I have a lot of Dvd rips still.

 

I don't care about losing it, I care more about having no idea what was lost minus doing some fancy shit with Plex DBs to try to figure it out, oh and recovering tv shows sucks with Usenet.... :)

 

As a side note to unRAID devs, might I recommend if something like this happens to at least have warning messages for dumb people like myself that formatting the drive will nuke everything in your parity from that drive as well. I knew that would happen somewhere in the back of my brain but typically when I have had drive failures the system acted accordingly and I knew what to do. This threw me for a loop and I acted quickly not smartly.

Link to comment

It might also be a good idea to consider having a second set of your data somewhere, even if it is on external USB hard drives, just because you have a NAS with redundant drives doesn't mean your data is always safe. Also, did you ever consider setting up a second parity drive which is possible in the 6.2b? I don't know if it would of made a difference in your situation or not but it might be worth considering if you have the space.

Link to comment

It might also be a good idea to consider having a second set of your data somewhere, even if it is on external USB hard drives, just because you have a NAS with redundant drives doesn't mean your data is always safe. Also, did you ever consider setting up a second parity drive which is possible in the 6.2b? I don't know if it would of made a difference in your situation or not but it might be worth considering if you have the space.

 

I do... or well i have everything uber important to me backed up... IE Photos, Home Videos, Music. I also have all the important stuff backed up to CrashPlan as well, unfortunately CrashPlan is a memory hog and I was selective with backing up stuff since backing up 15TB to the cloud required about 16GB of RAM to do it... So again, Photos, Music and home videos only.

 

I am dumb, just not completely stupid hahaha. Movies and TV Shows are a pain to recover but not an end of the world scenario. I just think there should be better wording on the format button especially in a catastrophic situation like that. Thankfully I do backup stuff that matters to me beyond unRAID, but still, not a good scenario.

 

also, second parity drive is a plan (like you said probably not helpful in this scenario though), I was just planning to wait until 6.2 is official before getting it setup.

Link to comment

can this be coorelated back to a specific drive or is this completely unrelated. Nothing is having an issue on my system but I just saw that pop up and I am wondering if my disk 2 is just cooked.

 

617 May  6 23:15:49 Hades kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

May  6 23:15:49 Hades kernel: ata1.00: irq_stat 0x08000000, interface fatal error

May  6 23:15:49 Hades kernel: ata1: SError: { UnrecovData HostInt 10B8B BadCRC }

May  6 23:15:49 Hades kernel: ata1.00: failed command: READ DMA EXT

May  6 23:15:49 Hades kernel: ata1.00: cmd 25/00:40:e0:be:5c/00:05:8c:01:00/e0 tag 10 dma 688128 in

May  6 23:15:49 Hades kernel:        res 50/00:00:df:be:5c/00:00:8c:01:00/ec Emask 0x50 (ATA bus error)

May  6 23:15:49 Hades kernel: ata1.00: status: { DRDY }

May  6 23:15:49 Hades kernel: ata1: hard resetting link

May  6 23:15:49 Hades kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

May  6 23:15:49 Hades kernel: ata1.00: configured for UDMA/133

May  6 23:15:49 Hades kernel: ata1: EH complete

Link to comment

can this be coorelated back to a specific drive or is this completely unrelated. Nothing is having an issue on my system but I just saw that pop up and I am wondering if my disk 2 is just cooked.

 

617 May  6 23:15:49 Hades kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

May  6 23:15:49 Hades kernel: ata1.00: irq_stat 0x08000000, interface fatal error

May  6 23:15:49 Hades kernel: ata1: SError: { UnrecovData HostInt 10B8B BadCRC }

May  6 23:15:49 Hades kernel: ata1.00: failed command: READ DMA EXT

May  6 23:15:49 Hades kernel: ata1.00: cmd 25/00:40:e0:be:5c/00:05:8c:01:00/e0 tag 10 dma 688128 in

May  6 23:15:49 Hades kernel:        res 50/00:00:df:be:5c/00:00:8c:01:00/ec Emask 0x50 (ATA bus error)

May  6 23:15:49 Hades kernel: ata1.00: status: { DRDY }

May  6 23:15:49 Hades kernel: ata1: hard resetting link

May  6 23:15:49 Hades kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

May  6 23:15:49 Hades kernel: ata1.00: configured for UDMA/133

May  6 23:15:49 Hades kernel: ata1: EH complete

On unRaid Main screen, there's a very tiny little picture of a hard drive immediately to the left of the drive model # / serial #.  Clicking it will bring up all the ata messages for that particular drive.  Keep clicking til you find the appropriate drive
Link to comment

can this be coorelated back to a specific drive or is this completely unrelated. Nothing is having an issue on my system but I just saw that pop up and I am wondering if my disk 2 is just cooked.

 

617 May  6 23:15:49 Hades kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen

May  6 23:15:49 Hades kernel: ata1.00: irq_stat 0x08000000, interface fatal error

May  6 23:15:49 Hades kernel: ata1: SError: { UnrecovData HostInt 10B8B BadCRC }

May  6 23:15:49 Hades kernel: ata1.00: failed command: READ DMA EXT

May  6 23:15:49 Hades kernel: ata1.00: cmd 25/00:40:e0:be:5c/00:05:8c:01:00/e0 tag 10 dma 688128 in

May  6 23:15:49 Hades kernel:        res 50/00:00:df:be:5c/00:00:8c:01:00/ec Emask 0x50 (ATA bus error)

May  6 23:15:49 Hades kernel: ata1.00: status: { DRDY }

May  6 23:15:49 Hades kernel: ata1: hard resetting link

May  6 23:15:49 Hades kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

May  6 23:15:49 Hades kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

May  6 23:15:49 Hades kernel: ata1.00: configured for UDMA/133

May  6 23:15:49 Hades kernel: ata1: EH complete

On unRaid Main screen, there's a very tiny little picture of a hard drive immediately to the left of the drive model # / serial #.  Clicking it will bring up all the ata messages for that particular drive.  Keep clicking til you find the appropriate drive

 

thanks! Oddly enough its not the failed drive I had but the drive I am doing the Dolphin copy from right now. Its also my only reiserfs drive left (if that matters). I didn't have the storage until recently to convert that drive over to xfs like everything else. Could this be a serious issue or just something that went funky when I tried to do a copy? Originally in dolphin I tried browsing through samba share and copying that way but it failed for me (I didn't see the root icon at first). Maybe that was the corresponding error?

Link to comment

Nah nothing to do with the fs.  Could simply be a slightly loose power cable or the power dropped a hair too low when the drive spun up, or there was a bunch of noise on the sata cable, or the cable is not making a great contact (since it is a CRC error) If it continues throughout the transfer, then its definitely something to check out

 

Basically, the drive didn't respond correctly when it spun up, so the controller told the drive to reset itself

Link to comment

Since you have only posted a syslog and not the full diagnostics zip (as you should always do in V6 instead of syslog) we don't have much information about the health of any of your drives.

 

And no, dual parity would not have made any difference since both parities would have been written to stay in sync with the empty filesystem you told it to write.

Link to comment

In the future I suggest using a program that lists out every single file on the server, on a per-drive basis, in a single txt file.

Any recommendations of program to do this on unraid?

 

ls -altR /mnt/disk*  > /mnt/cache/filelists

Is there no program? Would be easier.

 

You could turn this into a script yourself and schedule it via cron.

 

Link to comment

In the future I suggest using a program that lists out every single file on the server, on a per-drive basis, in a single txt file.

Any recommendations of program to do this on unraid?

 

ls -altR /mnt/disk*  > /mnt/cache/filelists

Is there no program? Would be easier.

 

You could turn this into a script yourself and schedule it via cron.

BRiT, I am not THAT pro  :o hehe

Link to comment

sorry wickedathletes , I don't want to steal your post but I think I have a similar issue like you get...

 

I've just post about it https://lime-technology.com/forum/index.php?topic=49067.0 and I don't want to do the same "mistake" you've make. I would like to fix the issue before formating my two unmountable drive.

 

for some reason I probably got an error from what Squid explain below

 

Nah nothing to do with the fs.  Could simply be a slightly loose power cable or the power dropped a hair too low when the drive spun up, or there was a bunch of noise on the sata cable, or the cable is not making a great contact (since it is a CRC error) If it continues throughout the transfer, then its definitely something to check out

 

Basically, the drive didn't respond correctly when it spun up, so the controller told the drive to reset itself

 

And I don't know how to fix this. All my drive are in reiserfs and I can't use any reiserfs -- command..

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.