Unformatted disk after data rebuild


Recommended Posts

  • Replies 114
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

OK, maybe we will talk about parity a "bit".

 

Parity works at the bit level. You probably know about bits, the ones and zeros that everything in computers and data is made of. Your disks are just a bunch of bits. Every bit on the disk is either a one or a zero. How those ones and zeros are interpreted as folders, files, etc. is done by the software.

 

The concept of parity is used in many places in computers and communication. It is always about the bits. Parity is simply an extra bit that allows a missing bit to be calculated. It isn't magic, and it isn't even very complicated. Here is an explanation of how Unraid uses a specific bit from the parity disk plus that same bit from each of the other disks to determine the value of that bit for a missing disk:

 

https://wiki.unraid.net/UnRAID_6/Overview#Parity-Protected_Array

 

 

 

 

 

 

Link to comment

The way all those bits gets interpreted as folders and files is what the filesystem is all about. The filesystem is the metadata, the data about the data. The filesystem tells which bits on the disk make up a file and how those files are organized into folders and subfolders.

 

Your Unraid V5 uses a filesystem called ReiserFS. One of the reasons I suggested copying your files to a new V6 server is because that would be an easy way to convert all your files to one of the newer V6 filesystems. We won't worry about that for now obviously.

 

Unraid parity is realtime. Any time any data disk in the array is written, parity is updated immediately. It is very important to recognize that, except for a simple read operation, everything else involves writing the disk, and so updating parity.

 

Everything but a read is a write operation that updates parity. When you write a file obviously that is a write operation. Copying a file is also obviously a write operation.

 

When you delete a file, the metadata of the filesystem is written to mark the file as no longer in any folder and the space for the file is available to be reused. So deleting a file is also a write operation that updates parity.

 

When you move a file to another disk, both disks are written. The file is written to the destination disk, and then the file is deleted from the source disk. And as already mentioned, that delete is a write operation. So both disks are written, and both writes update parity.

 

Now, here is the part that trips up a lot of people, because they have only a vague idea about what "format" means. As I mentioned, "format" has a very specific meaning.

 

"Format" means "write an empty filesystem to this disk". That is what it has always meant in every operating system you have ever used. That empty filesystem is simply the metadata that represents an empty top level folder ready to have folders and files added to it.

 

But note that I said "write". Format is a write operation, which updates parity.

 

If you format a disk in the parity array, parity is updated and so agrees that the disk has an empty filesystem. If you rebuild a disk after formatting it, you will get an empty filesystem. This is why I stressed that format is NEVER part of the rebuild process.

 

 

Link to comment

Repairing a filesystem is also a write operation. It writes corrections to the filesystem metadata.

 

But if you are working at the command line, there are 2 different ways to refer to the filesystem.

 

You can repair the partition on the sd device, which is what you did way back in the first post. Doing it this way leaves parity out of it and so parity becomes invalid when you take this approach.

 

Or you can repair the md device, which includes parity when writing those corrections, and so parity is maintained.

 

Link to comment

OK, I believe I followed that excellent explanation. Now I still don't know how to get myself out of the mess I created, but hopefully you have an answer for that as well. I was chasing a random red ball disk error and thought unraid functioned differently, so if the repair command hadn't made any changes, (which I also logically should have ran read only) then re-assigned disk10 would have been OK? It probably doesn't matter ultimately, but I'm learning tons about how this works. Do you think the original disk10 file bits are still recoverable? Thanks for all your help with my chaos! I felt like Walter White chasing the fly on this one... so I was trying crazy things, in retrospect. I don't have to try hard to sound like an idiot.

 

I'm super curious to see what disk13 contains, and how my 2 spare disks fit into the complete data set. I probably don't deserve to get lucky and lose nothing.

Edited by dogfluffy
Link to comment

It isn't clear which disk you tried to repair. And you yourself said you weren't sure if you had tried to repair disk10. One thing I think we can say for sure is you weren't repairing the emulated disk13, since you weren't working with the md device. Could be you were repairing the actual disk13, or disk10, or something else. It is always a good idea to capture the results of the filesystem repair for further review.

 

I think we will just have to find out what happens when we get to that next step.

 

I think there were some disks that you currently have not installed in the array, including the original disk13, and those might also be readable and provide some recovered files. The reason we wanted to rebuild disk13 to a new disk instead of to the same disk is because we wanted to keep the original in a safe place to give us another chance at getting its files if something about the rebuild and repair of disk13 didn't work out.

 

How is the rebuild progressing?

Link to comment

Ok rebuild is complete. It looks like I have disk13 recovered. It looks like there may be corruption as I can't view disk13 it just hangs up and times out eventually. Also I have a lost+found of 4.26 TB on disk 9. Disk10 unformatted and disk11 is 3TB blank. It also appears my shares are broken. Quite a mess all in all. Hmm yes now the console is fast scrolling inode locked errors over and over. Should I stop the array and shut it down?

Screenshot 2019-01-31 at 5.45.18 AM.png

Edited by dogfluffy
Link to comment

^^^As johnnie said.

 

I still like this idea:

On 1/29/2019 at 5:41 PM, trurl said:

Since you let this go for another year and a half before getting back on it, maybe you aren't in any hurry to get this done.

 

The simplest and best long-term solution in my opinion would be to simply set this aside, build a new server with new disks and the new Unraid, then try to copy the data from these old disks to the new server.

Maybe a more expensive project than you have in mind, but also maybe less trouble than sorting this out, and better long-term as I said.

 

Or somewhat similar but less expensive (and less future-proof), you could

On 8/30/2017 at 4:44 PM, trurl said:

New Config with only the known (?) good (?) disks, rebuild parity, then see what if anything can be read from the assortment of other disks you have laying around.

It may be that you can't really get your array stable with all of those disks in it because of the disks themselves, or other hardware issues.

 

One of our great disadvantages, apart from the confusion, is a lack of information. New versions of Unraid have a simple way for you to download a zip file that contains a lot of information that would take you a lot of time and effort to gather on your own.

 

But, before making any further recommendations, I think maybe we better make you take that time and effort. Starting with syslog and SMART for each of your disks as explained in this "sticky" thread pinned near the top of this same subforum:

 

https://forums.unraid.net/topic/9277-how-to-report-a-defect-and-capture-syslog-and-smart-reports/

 

Link to comment

I can't really do anything due to the console spamming the error. I tried to stop the array to shut down and bring up clean to pull your logs but it is hung up, scrolling across the bottom of the unraid webpage that error unmounting share(s) and retrying unmount....

 

It's kind of locked up. Also does the unMENU have a syslog tool? I've only been too busy to actually do it, but I'm here trying to recover my data so yes, it is important to me. 

Link to comment

OK I ran it through a clean cycle of memtestx86 just to be sure and it passed.

 

I keep getting messages about this

ATA6 link is slow, waiting and resetting at is boots up. Now I'm to the console login screen. I can reboot and try to copy verbatim if this isn't already captured. How should I proceed?

Screenshot 2019-01-31 at 7.31.34 AM.png

Edited by dogfluffy
Link to comment
48 minutes ago, trurl said:

before making any further recommendations, I think maybe we better make you take that time and effort. Starting with syslog and SMART for each of your disks as explained in this "sticky" thread pinned near the top of this same subforum:

 

https://forums.unraid.net/topic/9277-how-to-report-a-defect-and-capture-syslog-and-smart-reports/

 

28 minutes ago, trurl said:

When you get the syslog and SMART for each disk zip them all up into one file and attach it to your next post.

 

Link to comment

I am not even going to look at those. Please follow instructions.

1 hour ago, trurl said:

When you get the syslog and SMART for each disk zip them all up into one file and attach it to your next post.

And all of the files I have asked for are plain text files. I only want plain text files, not a .gdoc (whatever that is) or anything else. We should not have to install any additional software on our computer just so we can help you.

 

Don't post anything else until you have collected syslog and SMART for ALL disks currently installed, as PLAIN TEXT files, and zip them all up into a single ZIP file.

 

Please try harder to work with us. We are all just unpaid volunteers here. If you think what I have asked for is a lot of trouble just think about how much trouble it is going to be for us to analyze the information. And all the trouble we have already gone to.

 

 

Link to comment

I apologize for the tone of my earlier post. I'm still looking over the files you gave us.

 

The syslog shows missing disks 8,10 and removed disk 13. I think it considers 13 removed because it is disabled currently.

 

Of course it won't let you start the array from this point.

 

Did you change anything that might have made disks 8,10 missing?

 

 

Link to comment

No you're fine it's just text, I didn't know I could do any of that with my chromebook. I couldn't find a save as, only copy to. Anyway I think I have a failing controller. I can also hear an odd clicking on one of the drives, not like a failure click as much as like something polling it, maybe. I was actually just now thinking I should look into another one of the Supermicro pci controller cards. Assuming it's say $100 ish and would ultimately solve my problem here. I was just looking over the setup, and it's a nice server board, low power draw, stable and ECC ram so an add-in controller might bypass the onboard controller/driver issues and provide a clean Ver 6 Unraid upgrade path once this mess is sorted out. When I had previously attempted to upgrade Unraid I was missing 4 disks upon booting up, so I rolled it back to my backup. There are 2 different onboard 4 port SATA controllers.

 

Supermicro Sas2lp-mv8 is the card I was looking at for $50 off ebay. I have one already installed.

Edited by dogfluffy
Link to comment

Marvell has been known to give issues for some people in V6. That said, both of my servers have controllers with Marvell 88SX7042 chipset and I have had no issues.

 

All of the disks we have SMART for look OK except disk2 has several pending sectors and disk8 has a few reported uncorrect. These are probably the least of your troubles right now.

 

Are the missing disks 8,10 on the same controller?

 

At this point I think our next step is to get those disks reliably connected and get SMART for them.

 

Link to comment

Yeah I don't know but I'm assuming one of the onboard controllers crashed and died during the rebuild last night. I had no errors @ around 50% last I checked before I crashed. Console was spamming and locked. I rebooted it, missing 4 disks. I'm also thinking it was an intermittently failing controller this whole time. There maybe also be a failing disk connected to a controller and that is the clicking I hear. During the rebuilding yesterday it was humming along with no noises or errors so I've been a bit panicked and distracted this morning to find out what was going on. I only had to pull a few disks to ID the Western Digital so the drives aren't labeled and it's a tight fit to pop the rack out, unlatched the drive rail and pull to read the serial number to label it. I just wanted to get the controller card shipping today if 2 of those controlling the disks is a viable solution and not causing a bottleneck or other issue.

Edited by dogfluffy
Link to comment

Ok I managed to get some of the drives identified and also physically ID'd the bad controller. At this point if I disconnect the 4 drives that are connected to the other onboard controller and connect these last 3 drives. I could then pull the SMART logs and it wouldn't hurt anything or cause any more errors since the bad controller isn't connected to anything? I connected 8,10,13 to the other onboard SATA controller and rebooted but they didn't appear in unraid at all. Any idea what is going on here?

Edited by dogfluffy
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.