Can I save the data on this drive?


Oddwunn

Recommended Posts

I have a 23 data drive, 1 parity drive server that had been running great for the last 4 or 5 years.  A few days ago I upgraded from version 6.7.2 to version 6.8.2. When I rebooted the server to effect the update, I looked at the dashboard and noticed that I had a single SMART error on 21 of the 23 data drives, "UDMA CRC error count 1" and then acknoledged the error in order to clear it (21 times, of course). When using the WEB interface, I checked my user shares and discovered that one of the shares,  "Games" (spanning disks 1 - 5) was empty, though all of the data appeared intact on all of the individual disks. I tried rebooting the server a couple of times, and after the second time I found disk 3 reading as "not installed" and was disabled in the array. Being the total dummie-moron that I am, I immediately replaced the drive and then allowed the software to rebuild the replaced drive. All went well and the new drive looked fine, so I thought that the only problem I had left was to try to fix the user share "Games" that was still currently empty, thinking that maybe the bad disk had somehow affected my user share.

 

I finally gave up trying to fix the share and figured that I would just access the "Games" drives one disk at a time until I could research a solution.

 

I started to write some data to disk2 today and found that I could not rename the directory I had just written....hmmm. I copied a couple more directories  to disk2 with no issues and then, for no apparent reason (to me, anyway) the entire contents of disk2 disappeared. I rebooted the server and the next time it came up I found disk2 with a red "X" and it was listed as "Unmountable: no file system". I then ran xfs_repair (since all of my disks are formatted XFS) from the console using the -nv switches. It took only 4 seconds, but the log it generated was absolutely HUGE. I am guessing that every single file on the disk has been trashed (not by the repair utility, since it was in read only mode, but by something else unknown to me), so I chose to shut down the server and remove the trashed disk2. I tried to mount disk2 on a Windows 10 machine using a Linux utility from Paragon Software, but Windows won't mount the drive either.

 

Is there any way to recover the data on disk2 or it trashed completely? Would reverting back to 6.7.2 while the disk is missing give me the opportunity to replace the disk or otherwise recover the data? Are all of these problems that popped up since upgrading to 6.8.2 just a coincidence or could the new version be causing problems? (I have another 23 disk server that upgraded with no issues whatsoever, but it does have a slightly newer MB, CPU, and SATA cards.

 

I have the problem server online and am copying data as fast as I can to another server before more problems pop up. I should be done in about a month.

 

Anyone have any advice as to the easiest way to get out of this dilemma, even if it isn't the cheapest way to get things back to normal?

Link to comment

Ok, as suggested by tee-tee jorge, I performed a check filesystem on the EMULATED disk2, and sure enough, it asked for -L (to clear a log?) and then the check was performed. The result was that the emulated disk no longer comes up as "Unmountable: no file system", though the red "x" is still there because the disk is not physically installed. Looking at the contents of the disk (it was originally a 4TB disk filled with about 3TB of data), I found that I have roughly 2TB of data which seems to be recovered and about 1TB of data in the "lost and found" directory, with all of that data put into separate numerical directories, and I have no way of knowing which data goes where.

 

Unless anyone has any idea of something else I can do, I think I should just accept that I have lost disk2 and I will research the procedure in the wiki as to how to install a new disk2, with no data on it, simply formatted and ready to use with NEW data.

 

My remaining concerns, though, are along these lines:

 

1. It seems to me to be suspiciously coincidental that all of these problems started only when I updated to 6.8.2. (Read my original post to understand what I mean by "all of these problems".)

 

2. Maybe I have another hardware issue that is wreaking havoc on my server?

 

3. Is there any possibility that reverting back to 6.7.2 could clear up any of this mess, or would I be simply adding fuel to the fire?

 

4. I guess parity protection is useless when the problem disk develops a format problem. It would have been nice to be given a head's up by UnRAID that the disk should be replaced or reformatted, while still protecting the data on it. As it stands now, once the format goes bad, then UnRAID thinks that the bad format needs to be protected in addition to the data, or at least that is what it looks like to me.

 

Right now I am scared to use this server at all. Here is the latest diagnostics, in case anyone wants to see them: 

tower2-diagnostics-20200222-1140.zip

Edited by Oddwunn
Link to comment
15 hours ago, Oddwunn said:

3. Is there any possibility that reverting back to 6.7.2 could clear up any of this mess, or would I be simply adding fuel to the fire?

No, and extremely unlikely this has anything to do with upgrading Unraid.

 

15 hours ago, Oddwunn said:

4. I guess parity protection is useless when the problem disk develops a format problem

Parity can't help with filesystem corruption, it's also not a backup, it's for redundancy, you still need backups of any irreplaceable data.

 

If you still have the old disk intact you can try to mount it with the UD plugin, it will need a new UUID if mounted with the array started, filesystem there might have less damage, or even no damage at all.

 

 

 

Link to comment

Ok, thanks again, Johnnie.Black! I will try that. The data I lost simply represented a lot of work, not a lot of irreplaceable data. I am happy  and thankful that parity has saved several disks in the past...I just have to remember NEVER to buy another Seagate drive again, as I have about a 90% failure rate with them, while maintaing a 0% failure rate with Hitachi/HGST (out of about 500 drives that I have bought over the years...wierd considering that Seagate now owns HGST).

 

Anyway, many thanks for all of your help!

Link to comment

HGST (Hitachi Global Storage Technologies) was a manufacturer of hard disk drives, solid-state drives, and external storage products and services.

It was initially a subsidiary of Hitachi, formed through its acquisition of IBM's disk drive business. It was acquired by Western Digital in 2012. However, until October 2015, it was required to operate autonomously from the remainder of the company due to conditions imposed by Chinese regulators. Chinese regulators later permitted Western Digital to begin wider integration of HGST into its main business. By 2018, the HGST brand had been phased out, with its remaining products now marketed under the Western Digital name. (Wikipedia)

Link to comment

Hmmm.....I thought it was Seagate who bought them. Well, I told ya I am a moron.

 

Ah, crap! The finest consumer level drives ever made, and now we won't be able to buy them soon.

 

Edit: I should qualify my statement, as "finest" is too broad a term. I should have said "most dependable" instead, as there have always been better performers in other parameters.

 

No matter...I will still stay as far away as possible from Seagate. Virtually every one of the ~75 to 80 drives that I have bought in the past have failed (most of the time catastrophically) while in use. Seagate is a failure looking for a place to happen. I stll have 4 more of them in use, so now that I know that I won't be seeing HGST being sold for much longer, I will swoop up as many as I can before they disappear completely.

Edited by Oddwunn
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.