Jump to content
We're Hiring! Full Stack Developer ×

Rebuild completed with errors.


Recommended Posts

 

So I'm trying out unraid and one of the random hard drives I had chosen to use failed on me. I grabbed another random drive, checked out SMART (healthy no errors or warnings) in a windows computer and I did a hdtune surface scan in a windows machine, drive seemed to pass but apparently not. I put it in my unraid box to try and rebuild and the drive I chose to rebuild to failed about 30% into the rebuild and died (I'm having terrible luck here) So I bought a brand new drive to replace it and put that in then completed the rebuild fine. But now that the rebuild actually completed I see this message that the array has errors. And indeed several of the files I expected to be there just actually aren't there at all / Deleted / Poof missing. So how do I recover from this? How do I repair these errors and restore the files? Does unraid have any way to do this?

 

To be clear: I had 4 drives in the array. The other 3 drives never failed. The only thing that failed during the rebuild was the replacement drive I was trying to rebuild to. The other 3 drives were still there before, during, and after the rebuild process. I did not lose an additional drive out of the array during the rebuild. Which I could understand would lead to total data loss but that's not what happened. I'm really sad and at a loss now. I was excited to use unraid but if there's no way out of this situation to get my files back then I don't think I'll be able to rely on it if unraid is going to randomly delete files from the array just because the replacement drive we installed dies during the rebuild process. That's a common thing that can happen.

 

EDIT: I do have all of the data stored in this unraid system backed up on a second machine, which most of it was just copied into unraid while I explore, test, and learn about unraid. So I haven't actually lost anything. But at the same time.... this sucks and means unraid is unreliable for real usage as a file server I think? Unless someone can enlighten me to something I'm missing. 

image.png

Edited by AquaVixen
Link to comment

RAID controllers are not recommended, and Marvell controllers are not recommended.

 

Disk1 looks like a disk problem. Run an extended SMART test on it.

 

Disk3 isn't reporting SMART, not sure if it is due to controller or bad connections or something else.

 

Can you connect all disks without that Marvell RAID controller?

Link to comment
1 hour ago, trurl said:

RAID controllers are not recommended, and Marvell controllers are not recommended.

 

Disk1 looks like a disk problem. Run an extended SMART test on it.

 

Disk3 isn't reporting SMART, not sure if it is due to controller or bad connections or something else.

 

Can you connect all disks without that Marvell RAID controller?

That's not very helpful... I know Disk1 is a problem. It died during the rebuild. As in completely dead. I already told you that in the first post. Also I'm not sure what "Marvell" controller you are referring to. It's using a HighPoint RocketRAID 2310 adapter (I think it uses a Marvell chipset). It is NOT configured in any sort of RAID mode. I did not even touch the configuration on the thing. I just connected the drives and it exported it fine. Also no I can not connect the disks in this system without it. It only has 3 onboard SATA ports and only 2 work. One of which I'm using for the SSD for a cache drive. I was trying to use a second SSD and make a 2-SSD Cache pool but I never could figure out how to do that so I abandoned that idea and eventually just went without a cache entirely. This system was working perfectly fine with no issues for about 18 days with the HighPoint RocketRAID 2310 in it. It was only when 1 of the drives died and I tried to rebuild it yesterday where I had problems. 

 

None of that you commented has anything to do with my issue. The problem is the array is corrupted after disk1 failed during a rebuild process. I was trying to ask you: How can I recover from this? How can I recover my missing files? How can I make the array healthy again with no errors?

 

The problem is not the controller. The problem is a faulty disk that failed and that's all. 

Edited by AquaVixen
Link to comment
9 minutes ago, AquaVixen said:

None of that you commented has anything to do with my issue. The problem is the array is corrupted after disk1 failed during a rebuild process. I was trying to ask you: How can I recover from this? How can I recover my missing files? How can I make the array healthy again with no errors?

 

The problem is not the controller. The problem is a faulty disk that failed and that's all. 

The diagnostics are needed to get any idea of the best way forward.   What you have described should not have had the effect you described unless something else is going on.

 

The controller related statements were standard warnings often given out.  
 

The statement about Marvel controllers applies to any using marvel chipsets and applies to any,Linux based system that uses them.   It is not that they do not work at all but that they are prone to randomly dropping drives for no apparent reason.   Almost certainly an issue at the Linux driver level as they used to work fine in 32 bit kernels (Unraid v5 and earlier) but the issue has been around for many years now (ever since Unraid went 64-bit with v6) and thus uses a 64-bit Linux kernel.  It appears that Marvel have not put the effort into resolving this issue with their drivers on Linux and concentrate on Windows support. 

Link to comment
3 minutes ago, itimpi said:

The diagnostics are needed to get any idea of the best way forward.   What you have described should not have had the effect you described unless something else is going on.

 

The controller related statements were standard warnings often given out.  
 

The statement about Marvel controllers applies to any using marvel chipsets and applies to any,Linux based system that uses them.   It is not that they do not work at all but that they are prone to randomly dropping drives for no apparent reason.   Almost certainly an issue at the Linux driver level as they used to work fine in 32 bit kernels (Unraid v5 and earlier) but the issue has been around for many years now (ever since Unraid went 64-bit with v6) and thus uses a 64-bit Linux kernel.  It appears that Marvel have not put the effort into resolving this issue with their drivers on Linux and concentrate on Windows support. 

I see. So even though it appeared to work correctly for the first 18 days, it really wasn't working correctly and the issues with the drives and the corrupted array and failed rebuild and all of that is just because the Marvell SATA controller on this RocketRaid card doesn't play well with Linux / Unraid? Hrmm okay then. I don't have any other hardware that will be compatible with Unraid so I guess I'll just scrap the entire unraid project and go back to using windows on a hardware raid card. Shame. Unraid looks nice and I was hoping it would work. 

Link to comment
Just now, AquaVixen said:

I see. So even though it appeared to work correctly for the first 18 days, it really wasn't working correctly and the issues with the drives and the corrupted array and failed rebuild and all of that is just because the Marvell SATA controller on this RocketRaid card doesn't play well with Linux / Unraid? Hrmm okay then. I don't have any other hardware that will be compatible with Unraid so I guess I'll just scrap the entire unraid project and go back to using windows on a hardware raid card. Shame. Unraid looks nice and I was hoping it would work. 

No.   If the system was behaving itself then there should have been no issue.   many people with Marvel based controllers run for ages without experiencing issues.
 

It appears, however, that something went wrong during the rebuild and it is not clear exactly what.   Without the diagnostics we cannot tell exactly what happened.

Link to comment
4 minutes ago, itimpi said:

Without the diagnostics we cannot tell exactly what happened.

They were posted and the basis for my earlier unhelpful comments.

 

3 hours ago, AquaVixen said:

I had 4 drives in the array. The other 3 drives never failed.

You currently have 4 drives in the array, 1 parity plus 3 data drives. All of your data drives have data, why do you think anything is missing? Do you mean you had 4 data drives in the array?

 

4 hours ago, AquaVixen said:

I did not lose an additional drive out of the array during the rebuild. Which I could understand would lead to total data loss

That is not correct. Unraid IS NOT RAID. Even if you have more failed drives than parity can rebuild, only the failed drives would be lost. If you did have 4 data drives and for some reason you don't now, that doesn't affect the other drives. Each data disk in the array is an independent filesystem. This is how Unraid allows different sized disks in the array.

 

The reason I said disk1 had problems is because some of its SMART attributes aren't good, and there are read errors for the disk in syslog

Jun 30 02:31:13 Summer-NAS kernel: blk_update_request: I/O error, dev sdg, sector 725249934 op 0x0:(READ) flags 0x4000 phys_seg 53 prio class 0
Jun 30 02:31:13 Summer-NAS kernel: md: disk1 read error, sector=725249864
Jun 30 02:31:13 Summer-NAS kernel: md: disk1 read error, sector=725249872
...

 

The reason I looked at the controller is because you had multiple disk errors in syslog

Jun 30 02:31:13 Summer-NAS kernel: md: recovery thread: multiple disk errors, sector=725249864
Jun 30 02:31:13 Summer-NAS kernel: md: recovery thread: multiple disk errors, sector=725249872
Jun 30 02:31:13 Summer-NAS kernel: md: recovery thread: multiple disk errors, sector=725249880
...

and because we didn't get a SMART report for disk3

 

According to vars in the system folder of your diagnostics, no disks are disabled or missing, and df in that same folder shows 3 data disk mounted with contents.

 

Rebuild typically will either work or not. It won't

4 hours ago, AquaVixen said:

randomly delete files from the array just because the replacement drive we installed dies during the rebuild process

 

Sorry, but without more information I see no evidence anything is missing.

 

Link to comment
25 minutes ago, itimpi said:

No.   If the system was behaving itself then there should have been no issue.   many people with Marvel based controllers run for ages without experiencing issues.
 

It appears, however, that something went wrong during the rebuild and it is not clear exactly what.   Without the diagnostics we cannot tell exactly what happened.

I already submitted the diagnostics earlier in this thread as an attachment.

Link to comment

Also, those syslog entries for the disk problems were made while the array was rebuilding parity, not rebuilding a data disk.

Jun 30 01:17:06 Summer-NAS kernel: md: recovery thread: recon P ...

 

Why did you rebuild parity? How did you make Unraid rebuild parity instead of rebuilding a data disk?

Link to comment

Unraid disables a disk if a write to if fails. If you did have a disabled data disk, then it would have been emulated by the parity calculation. After a disk is disabled, it isn't used anymore until rebuilt.

 

Reads of the emulated disk get its data from the parity calculation by reading all other disks. Writes to the disk are emulated by updating parity as if the disk had been written, so the emulated writes, including that initial failed write that disabled the disk, can be recovered by rebuilding it.

 

If you make Unraid rebuild parity instead of rebuilding the disabled disk, then all those emulated writes are lost and you are left with only what was on the disk before it became disabled.

 

Does this perhaps give us some clue as to why you think something is missing?

Link to comment
4 minutes ago, trurl said:

Also, those syslog entries for the disk problems were made while the array was rebuilding parity, not rebuilding a data disk.

Jun 30 01:17:06 Summer-NAS kernel: md: recovery thread: recon P ...

 

Why did you rebuild parity? How did you make Unraid rebuild parity instead of rebuilding a data disk?

I wanted to add bigger drives to the array and replace the small 160GB drives I started with with a mix of 750 and 500 GB drives to expand the array. Unraid will not allow us to replace any of the drives with a larger capacity drive unless we first replace the parity drive with a larger capacity drive to be the same or larger capacity than the other drives we install later. I had to replace the parity drive with a 1TB one or I couldn't replace the other array drives with larger drives later. 

Link to comment
2 minutes ago, AquaVixen said:

Unraid will not allow us to replace any of the drives with a larger capacity drive unless we first replace the parity drive with a larger capacity drive

Not exactly the whole story. There is the parity swap procedure which will copy parity to a larger drive then rebuild the data drive to the former parity drive.

Link to comment
1 minute ago, trurl said:

Unraid disables a disk if a write to if fails. If you did have a disabled data disk, then it would have been emulated by the parity calculation. After a disk is disabled, it isn't used anymore until rebuilt.

 

Reads of the emulated disk get its data from the parity calculation by reading all other disks. Writes to the disk are emulated by updating parity as if the disk had been written, so the emulated writes, including that initial failed write that disabled the disk, can be recovered by rebuilding it.

 

If you make Unraid rebuild parity instead of rebuilding the disabled disk, then all those emulated writes are lost and you are left with only what was on the disk before it became disabled.

 

Does this perhaps give us some clue as to why you think something is missing?

I have the array's shares mounted as network shares with a virtual drive letter on my windows 10 machine. I have some games installed to / stored on the array. I started up one game and all of it's save files and configuration files were suddenly missing as well as some of the game's files and I had to do a verify with steam and re-download 1GB of data before the game would launch again. I have a MUCK client (weird I know, but whatever) that it's configuration file was completely missing as well as half the log files it had generated over the past 5 days. All of that was there before this "botched rebuild" failed and after unraid said it completed the rebuild with errors those files physically do not exist on the array anymore. I live alone and I am the only person in the house accessing these two computers. No one else has touched them. Anyway.. this has been a nightmare experience and seriously soured me on Unraid in general. For years I've been relying exclusively on hardware raid cards. In the past I have had replacement drives die or fail during a rebuild on hardware raid cards and it's never led to generalized array-wide corruption and data loss like this.

Link to comment
Just now, AquaVixen said:

All of that was there before this "botched rebuild" failed

But did it happen before the disk was disabled is the question. If the disk was already disabled then any writes to the disk would have been emulated writes.

 

Did you actually complete a rebuild of the data disk?

Link to comment
3 minutes ago, AquaVixen said:

In the past I have had replacement drives die or fail during a rebuild on hardware raid cards and it's never led to generalized array-wide corruption and data loss like this.

Unraid IS NOT RAID, and there is no array-wide corruption if a data disk can't be rebuilt. According to the diagnostics you posted, you have 3 mounted data disk with contents totalling over 400G, which is about 38% of the capacity.

 

Each data disk is an independent filesystem. Each file exists completely on one single disk, there is no striping. Each data disk can be read all by itself on any Linux.

 

I wish you had asked for help before doing anything when you had a problem. Possibly some misunderstanding was the cause of whatever you think happened. We have helped many people recover from failed disks and other problems such as corrupt filesystems. But it is also possible to do the wrong things that would be more difficult if not impossible to recover everything from.

 

 

Link to comment
34 minutes ago, AquaVixen said:

replace the small 160GB drives I started with with a mix of 750 and 500 GB drives to expand the array

You currently have 1x1TB parity plus 2x500GB data plus 1x160GB data drives in the array, plus 1x160GB unassigned disk.

 

Replacing data disks when you have single parity requires replacing and rebuilding one disk at a time.

 

Is that unassigned disk one that got replaced? Does it have any data on it?

Link to comment
2 hours ago, trurl said:

How did you make Unraid rebuild parity instead of rebuilding a data disk?

If you're a moderator of the unraid forums I'm honestly rather surprised you don't know this and a little confused as to why you're asking someone who has only used unraid for 18 days how to do this. I would assume you of all people should know how to do that. I found it in google. Just google for "How to upgrade capacity of parity drive in an unraid array" without quotes and it should be on the first page. That's what I did. But since you actually seem to don't know (some how??) I'll inform you I suppose: Stop array -> Disconnect parity disk with system running (helps to have a hot swap drive cage for this, just pop the tray out) -> Start array with parity missing -> Wait for array to start up with parity drive missing -> Stop array -> Replace parity drive with similar drive but of higher capacity -> wait for replacement drive to show up as unmounted in array drive list -> Tell unraid to assign the new higher capacity drive as the parity drive -> Start array -> Viola -> Unraid automagically rebuilds the array with the higher capacity drive and at the same time upgrades the capacity of the parity drive -> We can then proceed to repeat this process with any other drive(s) in the array and replace them with higher capacity drives and expand the array's total capacity. Don't even have to power off the computer/server either. 

 

2 hours ago, trurl said:

I wish you had asked for help before doing anything when you had a problem. Possibly some misunderstanding was the cause of whatever you think happened. We have helped many people recover from failed disks and other problems such as corrupt filesystems. But it is also possible to do the wrong things that would be more difficult if not impossible to recover everything from.

I followed the documentation on the unraid website and a video by space invader one on how to rebuild the array(s) with unraid. Apparently something that either the programmers for unraid failed to account for is the possibility that the replacement drive could die during the rebuild process and the rebuild process could fail before completing.

 

Anyway. It's already made clear to me several things:

1.) My files are completely gone forever and will never be recovered (I asked repeatedly for help. All these replies and back and forth and no one has helped me with that so I give up now).

2.) Personally I don't like unraid. I've been using hardware raid cards since the early 2000's at home and in many computers and servers in enterprise environments when I worked as a datacenter floor tech. I've never seen corruption like this in any computer or configuration just from trying to reconstruct/rebuild an array, raid or not.

 

I know you say "UNRAID IS NOT RAID"... by it's very nature anything that has a "parity drive" and can "rebuild from a drive failure" is "A type of RAID". Those are the definitions of a RAID system. Which I might want to take a moment to remind you of what that means: RAID = Redundant Array of Inexpensive Disks. Unraid has an array. It's also redundant against a failure. Therefore Unraid is a type of RAID. It fits the literal definition of the term.

 

You can reply and say whatever you want but I honestly don't really care anymore. I probably won't be reading or replying to this thread after this post. This was my first unraid experience and it definitely will be my last. 

Edited by AquaVixen
Link to comment

@AquaVixen The method you quoted invalidates the ability to rebuild the contents of a failed  data drive if you only have a single parity drive which was the original problem that you asked about, and such would not normally be the correct action to take. 

 

It is normal when it is not clear exactly what a user did to try and make sure that they took the correct action for the problem they actually had.

Link to comment
53 minutes ago, itimpi said:

@AquaVixen The method you quoted invalidates the ability to rebuild the contents of a failed  data drive if you only have a single parity drive which was the original problem that you asked about, and such would not normally be the correct action to take. 

 

It is normal when it is not clear exactly what a user did to try and make sure that they took the correct action for the problem they actually had.

If I was using a real raid card instead of unraid it could recover from a failed rebuild with only 1 parity drive. If Unraid is trying to mimic a RAID system then it should be able to do it too. 

Link to comment
1 hour ago, AquaVixen said:

If I was using a real raid card instead of unraid it could recover from a failed rebuild with only 1 parity drive. If Unraid is trying to mimic a RAID system then it should be able to do it too. 

Unraid can do this as well !

 

However if you are changing the parity drive you are in effect treating that drive as failed, so if you have a failed data drive as well you have 2 failed drives so you have exceeded the level of protection.  This is mentioned the section on Upgrading Parity Disks in the online documentation accessible via the Manual link at the bottom of the Unraid GUI.

 

If this happened on a traditional RAID system then you would lose the whole contents of the array.  On Unraid the contents of all the drives that have not failed are still intact so in this worst case you have lost the contents of 1 drive instead of everything.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...