[6.2.4] Array and Automatic Parity-Sync started while Disks Unmountable [Solved]


Recommended Posts

I am running unRAID 6.2.4 with 2 parity disks, 8 data disks, and 3 disks in the cache pool.

 

My server just crashed and was totally unresponsive, so I had to do a hard power off.

 

When I powered it back up, everything looked fine (all disks showed green). I hit start, and then the problems began. First, the cache pool wouldn't mount. Upon further inspection, the partitions had totally disappeared. I had to boot into a live linux and run testdisk to find the partitions. After that, the cache mounted fine.

 

Next, when I booted back into unRAID, everything was showing green again, so I hit Start. The array started, however, 2 of my disks (which happened to be formatted XFS, the rest are BTRFS) showed as Unmountable on the Web UI, and when doing ls /mnt those disks were just not there.

 

I ran xfs_repair -L (it was the only repair option that worked, as it was giving a "Structure needs cleaning" error when trying to mount), and then I was able to mount the disks. I started the array again, and all disks were mounted successfully. I am now running a parity check, and it found 1300 errors right off the bat.

 

So my issue is, if the array has unmountable disks, should it still be able to start? And if it does start with unmountable disks, shouldn't it use the Parity instead of just making the disks disappear? If the array starts with unmountable disks, won't that destroy the parity data?

 

I've also had this happen before with a BTRFS disk, and I ended up losing everything on that disk!

Link to comment

what was the device on which you ran the xfs_repair command?  The raw disk /dev/sd... Or the parity protected disk /dev/md.. ?

 

From your report I suspect you ran it on /dev/sd... Which is why you then had subsequent parity errors as you had made changes to the disk which were not reflected by the appropriate changes on the parity disk.

 

Had the xfs_repair been run on the /dev/md.. Device the parity would have been updated with the changes to the disk.

 

If you did run it on the /dev/md.. device then I'm not sure why you had parity errors.

 

Parity cannot be used to recover from this sort of failure where there has been an incomplete writing of data to a disk (which caused the file system corruption) as the same incomplete write will be reflected on the update to the parity disk

Link to comment

I had to run the xfs_repair command on the /dev/sdx device with the array offline, because it was unmountable. I'm not worried about the parity being out of sync from that. The parity was already trashed when the array was able to start with 2 disks showing as UNMOUNTABLE, and missing from the array. That is the issue. Why was the array able to start when disks were unmountable?

 

Shouldn't the proper response be to start the array, but use the parity data instead of the actual disk? Or if that's not possible, to not allow the array to start? Allowing the array to start, while having the unmountable disks missing from the array, should not happen, unless I'm misunderstanding something.

Link to comment

I had to run the xfs_repair command on the /dev/sdx device with the array offline, because it was unmountable. I'm not worried about the parity being out of sync from that. The parity was already trashed when the array was able to start with 2 disks showing as UNMOUNTABLE, and missing from the array. That is the issue. Why was the array able to start when disks were unmountable?

 

Parity remains in sync with unmountable disks.

 

Shouldn't the proper response be to start the array, but use the parity data instead of the actual disk? Or if that's not possible, to not allow the array to start? Allowing the array to start, while having the unmountable disks missing from the array, should not happen, unless I'm misunderstanding something.

 

Unmountable disks are fixed by running the appropriate filesystem repair tools, not parity, parity is used to replace a failed disk.

Link to comment

I had to run the xfs_repair command on the /dev/sdx device with the array offline, because it was unmountable. I'm not worried about the parity being out of sync from that. The parity was already trashed when the array was able to start with 2 disks showing as UNMOUNTABLE, and missing from the array. That is the issue. Why was the array able to start when disks were unmountable?

 

Parity remains in sync with unmountable disks.

 

Shouldn't the proper response be to start the array, but use the parity data instead of the actual disk? Or if that's not possible, to not allow the array to start? Allowing the array to start, while having the unmountable disks missing from the array, should not happen, unless I'm misunderstanding something.

 

Unmountable disks are fixed by running the appropriate filesystem repair tools, not parity, parity is used to replace a failed disk.

Also, any disk that hasn't been formatted yet it unmountable, and the array must be up to format a disk.
Link to comment

Thanks for the responses.

 

Also, any disk that hasn't been formatted yet it unmountable, and the array must be up to format a disk.

To clarify, the disks were part of the array, had data on them, and were included in the parity. After a reboot, everything looked in order, but when I clicked to start the array, the array started up, but 2 of my data disks that were part of the array were showing as Unmountable on the Web UI, and when doing "ls /mnt" those disks were missing (disk1 disk2 disk3 disk5 disk6 disk7 showed up, disk4 and disk8 were not there). So if the array started and those disks were missing, would the parity data not get trashed?

 

Parity remains in sync with unmountable disks.

So if I have data disks1-8, and they are protected by parity, and I start the array, and it starts up, but disk4 and disk8 become Unmountable, and are missing from the array, I  can continue using the system, and either format the Unmountable disks, or stop the array and replace them with new disks, and they will be able to be successfully rebuilt from parity? If that is the case, I guess there is no problem and I am mistaken.

I was under the impression that if the array started with a failed disk, the disk would still show up in the array, but the array would be unprotected because it would use Parity data instead of the data disk. Or if that was not possible, that the array would just not start.

Link to comment

So if I have data disks1-8, and they are protected by parity, and I start the array, and it starts up, but disk4 and disk8 become Unmountable, and are missing from the array, I  can continue using the system, and either format the Unmountable disks, or stop the array and replace them with new disks, and they will be able to be successfully rebuilt from parity? If that is the case, I guess there is no problem and I am mistaken.

 

You're confusing a failed disk with a corrupt filesystem (unmountable), if you format an unmountable disk you'll lose all data and parity is updated to reflect that, rebuilding an unmountable disk results in another unmountable disk.

 

 

Unmountable disks are fixed by running the appropriate filesystem repair tools, not parity, parity is used to replace a failed disk.

Link to comment

Also, any disk that hasn't been formatted yet it unmountable, and the array must be up to format a disk.

This comment wasn't intended to be about your specific situation. I was just making the point that in general normal operation requires the array to start with unmountable disks because you wouldn't be able to format a disk otherwise.
Link to comment

You're confusing a failed disk with a corrupt filesystem (unmountable), if you format an unmountable disk you'll lose all data and parity is updated to reflect that, rebuilding an unmountable disk results in another unmountable disk.

Okay, so if I have a properly functioning array with all disks protected by parity, the system is shut down uncleanly, then powered back on, and the array is started and mounts all disks except, let's just say, disk1, and shows disk1 as unmountable, what is the proper procedure? Also, the system did start a parity-sync automatically after it started with the Unmountable disk, since it was the first startup after an unclean shutdown.

 

If the disk is unmountable, there won't be a /dev/md1 to use the filesystem repair tools on, correct?

 

If guess that would leave the option of what I did, which is repairing the filesystem outside of the array, then rebuilding parity.

 

Would stopping the array, removing the unmountable disk from the array, then either formatting the current disk or replacing it with a new disk, then starting the array back up and rebuilding from parity also work? I was scared to try that because I thought if the array was running and writing data while a disk that should have been there was unmountable, it would have messed up the parity.

Link to comment

Okay, thanks. I guess I should have done more reading  :-[

 

How about the issue that a parity-sync was automatically started after the array started up with Unmountable disks? I forgot to mention that in my initial post. Wouldn't that eliminate any chance of the disk being rebuilt if the filesystem repair was unsuccessful?

Link to comment

How about the issue that a parity-sync was automatically started after the array started up with Unmountable disks? I forgot to mention that in my initial post. Wouldn't that eliminate any chance of the disk being rebuilt if the filesystem repair was unsuccessful?

 

That only happens if there was an unclean shutdown, probably what also caused your unmountable disks as those are never good for the filesystem.

 

When that happens parity can become out of sync, so it's important to do a correcting check to sync it or it can corrupt a disk rebuild.

 

Again, parity can't be use to fix a corrupt filesystem, only their respective repair tools.

 

 

Link to comment

Again, parity can't be use to fix a corrupt filesystem, only their respective repair tools.

 

I do realize that. However, here is the issue.

  • I was given no indication that anything was amiss with any of my disks.
  • After I clicked on Start to start the array, it started, THEN showed 2 disks as Unmountable, and automatically began a parity sync (with 2 disks missing)

 

So if I had been unable to repair the corrupt filesystem, and opted to replace the disk(s), I would not have been able to because of the parity sync that had been automatically initiated. A correcting parity sync should not be automatically initiated when all the disks are not present and mounted.

Link to comment

Again, parity can't be use to fix a corrupt filesystem, only their respective repair tools.

 

I do realize that. However, here is the issue.

  • I was given no indication that anything was amiss with any of my disks.
  • After I clicked on Start to start the array, it started, THEN showed 2 disks as Unmountable, and automatically began a parity sync (with 2 disks missing)

 

So if I had been unable to repair the corrupt filesystem, and opted to replace the disk(s), I would not have been able to because of the parity sync that had been automatically initiated. A correcting parity sync should not be automatically initiated when all the disks are not present and mounted.

The only way unRAID could start the array (whether automatically or by the user in the webUI) with 2 missing disks is if you have dual parity. An unmountable disk is not a missing disk. And as has already been stated, a rebuild will not fix a corrupt filesystem, whether the rebuild is to the original disk, or to a replacement.
Link to comment

  • The server was running with 8 data disks, parity was correct (2 parity disks).
  • Server froze and had to be to be powered off.
  • Booted server back up, everything showed as present and working.
  • Hit Start. After array started, 2 disks showed as Unmountable. Parity sync started automatically.
  • Upon checking the /mnt/ directory, disk1,disk2,disk3,disk5,disk6,disk7 were present (4 and 8 not there).

 

I guess the thing I am confused about is, would the parity check, that was automatically started, cause the parity to be "corrected" to only protect disks 1,2,3,5,6 and 7 in this case? Or would the Unmountable disks 4 and 8 still be covered by the parity?

Link to comment

I guess the thing I am confused about is, would the parity check, that was automatically started, cause the parity to be "corrected" to only protect disks 1,2,3,5,6 and 7 in this case? Or would the Unmountable disks 4 and 8 still be covered by the parity?

 

Parity always protects all disks in their current state, so unmountable disks are still protected by parity, but again in their current state, so let's say you had one unmountable disk and it failed (red X) before you could fix the filesystem, you would use parity (and all other disks) to rebuild the failed disk, the disk would be rebuilt as it was, unmountable, then you'd run the filesystem repair tools to fix the file system.

 

 

Link to comment

  • The server was running with 8 data disks, parity was correct (2 parity disks).
  • Server froze and had to be to be powered off.
  • Booted server back up, everything showed as present and working.
  • Hit Start. After array started, 2 disks showed as Unmountable. Parity sync started automatically.
  • Upon checking the /mnt/ directory, disk1,disk2,disk3,disk5,disk6,disk7 were present (4 and 8 not there).

 

I guess the thing I am confused about is, would the parity check, that was automatically started, cause the parity to be "corrected" to only protect disks 1,2,3,5,6 and 7 in this case? Or would the Unmountable disks 4 and 8 still be covered by the parity?

OK. If the disks were unmountable they were not mounted, hence they wouldn't appear in /mnt. A missing disk will be displayed in the webUI as missing, and an unmountable disk will appear in the webUI as unmountable. Very different things. A parity check includes all assigned disks, whether unmountable or not.

 

Link to comment

Hi,

 

Just for anyone who has a similar issue to what I had.  I had a disk being unmountable.  Disk is a default format of xfs.  After checking the disk, I found that it had a metadata corruption.  The fix was in maintenance mode to run the xfs_repair with the -Lv option.  The L is for force the zeroing of the log file on the disk.  I know that the in the xfs_repair screen (https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#xfs_repairmention that it isn't one of the options we probably need, however this was the only way to fix my disk and allow it to be mountable again.

 

Thanks,

 

Raven

Link to comment

OK. If the disks were unmountable they were not mounted, hence they wouldn't appear in /mnt. A missing disk will be displayed in the webUI as missing, and an unmountable disk will appear in the webUI as unmountable. Very different things. A parity check includes all assigned disks, whether unmountable or not.

Parity always protects all disks in their current state, so unmountable disks are still protected by parity, but again in their current state, so let's say you had one unmountable disk and it failed (red X) before you could fix the filesystem, you would use parity (and all other disks) to rebuild the failed disk, the disk would be rebuilt as it was, unmountable, then you'd run the filesystem repair tools to fix the file system.

 

Thanks guys. I had a problem a while back where a disk started failing during a parity sync, so the sync stopped partway through or something, and I can't remember what happened exactly but I ended up losing that whole disk and was unable to rebuild it, so I was afraid something like that would happen again in this instance. Thanks for setting me straight!

 

I had a disk being unmountable.  Disk is a default format of xfs.  After checking the disk, I found that it had a metadata corruption.  The fix was in maintenance mode to run the xfs_repair with the -Lv option.  The L is for force the zeroing of the log file on the disk.

That's what I had to do to get my disks mountable again, in this instance.

Link to comment

Thanks guys. I had a problem a while back where a disk started failing during a parity sync, so the sync stopped partway through or something, and I can't remember what happened exactly but I ended up losing that whole disk and was unable to rebuild it, so I was afraid something like that would happen again in this instance.

 

If a disk fails during the parity sync (not check) you can lose all it's data since the parity wasn't yet valid, that is not a problem during a parity check (although some parity corruption can happen if a disk fails during a correcting check).

Link to comment
  • 1 month later...

Hi,

 

Just for anyone who has a similar issue to what I had.  I had a disk being unmountable.  Disk is a default format of xfs.  After checking the disk, I found that it had a metadata corruption.  The fix was in maintenance mode to run the xfs_repair with the -Lv option.  The L is for force the zeroing of the log file on the disk.  I know that the in the xfs_repair screen (https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#xfs_repairmention that it isn't one of the options we probably need, however this was the only way to fix my disk and allow it to be mountable again.

 

Thanks,

 

Raven

 

Hi Raven,

 

When you ran the xfs_repair with the -Lv option, did you lose any data that was on the hard drive?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.