2 HDDs unmountable


Recommended Posts

6 minutes ago, SSD said:

@Ron -

 

Parity is computed on the PARTITION not the ENTIRE DISK. So if one disk starts are section 2048 and another starts on sector 64, the first parity block of the first disk would begin at physical sector 2048 and the first sector of the other would start at physical sector 64. So if a disks partition were updated to start at a difference sector, you'd think parity would be out of sync virtually everywhere. We are only seeing 9 parity errors, very consistent with why you'd see after a hard shutdown. This is good news IMO for your ability to recover your data.


The most important question that comes out of this is how unRAID is treating the odd paritioning of disk6 and disk7. Is it assuming that the disk partition is 63 or 64 based on the disk size, and sort of ignoring the partition table values? Or is it assuming the disk partition is accurate and hence the first parity block for disk6 is at the 2k point? (And it must have been that way from the time they were added to the array.). Hoping that Tom ( @limetech ) will weigh in here.

 

My hypothesis is if we can correct the partition tables for disks 6 and 7, your data will be valid. The question is, what is correct?

 

See below for my interpretation of the parity check log entries.

 

Jun 22 20:11:56 Tower kernel: mdcmd (45): check correct

- This is a correcting check - so parity is updated for every parity mismatch found

Jun 22 20:11:56 Tower kernel: md: recovery thread: check P Q ...

- Dual parity (P and Q)

Jun 22 20:11:56 Tower kernel: md: using 1536k window, over a total of 5860522532 blocks.

Jun 22 20:11:56 Tower kernel: md: recovery thread: PQ corrected, sector=128

- First parity corruption. Sector listed is always the first of 8 sectors (4k bytes). So we don't know if they were off in only a single sector, or if they were off in all 8 of the sectors between 128-135.
- What is interesting is that sectors 0-127 had no corruption. And there are only 9 sectors reported corrected. If the starting sector of disk6 should have been 64 (which is what unRAID would have done), and the parity check was instead starting at 2048 due to the partition table, you'd have to believe that this would impact virtually every sector on the disk. Does this mean that the parity check was assigning a starting sector of 64? Or does this mean that the disk has consistently been treating the first sector of the partition starting at sector 2048. (@limetech)
Jun 22 20:11:57 Tower kernel: md: recovery thread: Q corrected, sector=65688

- Over 64k of accurate parity blocks before this. Notice that only Q parity is corrected, P parity is accurate. I'd say this would be expected type of corruption for a hard shutdown.
Jun 22 20:11:57 Tower kernel: md: recovery thread: Q corrected, sector=263384

- Look at the timestamps. This one is very close to the prior one. Again, this is typical of corruption from hard shutdown. Again affecting only Q parity. It is logical that P parity is updated first and Q parity second. So appears a hard shutdown happened between those 2 I/Os.
Jun 22 20:12:38 Tower kernel: md: recovery thread: PQ corrected, sector=10987336

- This one is 41 seconds later - typical correction from hard shutdown
Jun 23 00:24:20 Tower kernel: md: recovery thread: PQ corrected, sector=2389180464

- This one is 12 minutes later. Still typical from hard shutdown.
Jun 23 04:18:49 Tower kernel: md: recovery thread: PQ corrected, sector=4811915336

- This one is almost 4 hours later. Again, a lot of parity is accurate. Typical.
Jun 23 04:18:49 Tower kernel: md: recovery thread: PQ corrected, sector=4811915352

- Very close to last one. Probably I/O on the same file. Typical.
Jun 23 05:50:25 Tower kernel: md: recovery thread: PQ corrected, sector=5878055056

- Over 1.5 hours later. Typical.
Jun 23 06:45:16 Tower kernel: md: recovery thread: PQ corrected, sector=6460538896

- 1 hour later. Typcial.
Jun 23 14:37:24 Tower kernel: md: sync done. time=66327sec

- Parity check done. Just under 7 hours from last Parity correction. A lot of valid parity.
Jun 23 14:37:24 Tower kernel: md: recovery thread: completion status: 0
- Parity check finished successfully

 

Note there is a max number of parity errors that unRAID will log. It has varied by unRAID version, but think it is in the hundreds. There were only 9 found, so expect all were logged. 

 

These few parity errors are very consistent with corruption from a hard shutdown. And updating parity is almost always the right thing to do, meaning all of the parity is likely corrected appropriately. This is not what I'd expect with misaligned partitions.

 

If we can correct the partition tables for disks 6 and 7, I believe the disks will mount and your data will be valid. The question is, what is correct? Tom, or one of our other very technical users, may be able to advise.

I went into docker, and turned off autostart for plex. Should I spin down all hdds? Should I shut down the NAS? 

Link to comment
  1. Large hdd was just purchased, and supposedly had been powered up to test an array for less than 20 minutes.
  2. Smaller hdd had was an external Usb hdd that I had used for several years.
  3. Both hdds were installed according to what I had read. It has been about 20 years since I used Linux other than installing UnRaid.
  4. Any data that is not from Unraid is not of any concern to me. 
Edited by Ron
Link to comment
2 hours ago, Ron said:

I went into docker, and turned off autostart for plex. Should I spin down all hdds? Should I shut down the NAS? 

 

I would tend to shutdown the server (or at least stop the array), and wait to hear something back from Tom. He may be the only one that can answer why so few parity errors with partition tables on two of the drives were off.

Link to comment

@Ron -

 

Hopefully Tom will respond, but I think it is likely that if you remove disks 6 and 7 from the array - by unassigning the disk from those two slots and then starting the array, that unRAID will simulate those two disks using parity. And that you'll be able to see the data on those drives. Once confirmed, you'd be able to stop the array, reassign the two disks, and restart the array. UnRAID would then rebuild those two disks based on parity.

 

I posted information about how to do this, but will give more details:

- Shutdown and power down the unRAID server

- Create a copy of the entire "config" folder on the unRAID flash from a Windows computer. Call it "config_backup" or something similar.

- Put the USB stick back into the unRAID server and power on.

- Stop array if it auto starts (I'd suggest disabling auto start if you haven't already)

- While array is stopped, unassign disks assigned to slot6 and slot7. DO NOT DO A NEW CONFIG.

- unRAID will warn you that two disks are missing, but still offer option to start the array

- Start the array

- Check to see if the files are present on disk6 and disk7 (which are now being simulated using parities and other disks in the array).

- If you see the files you can stop the array, breath easier, and do the following. (BTW, based on my understanding of unRAID, I think you have an excellent chance of this outcome.):

   - Stop the array

   - Reassign your disks to slot6 and slot7:

   - Start the array. I've never tried to rebuild two disks - but expect either they will rebuild in parallel, or one will rebuild and then the other will rebuild.

   - When rebuilds finish, confirm you still see the files (they will be on the physical disks - they are no longer being simulated).

   - You can reboot and make sure all files are there after the reboot.

- If you do not see the files present on the simulated disk, then we'll need to keep noodling over the issue. I'll provide you with instructions to restore the config backup, but not sure that will be necessary depending on what we learn if the files are still not visible.

 

Link to comment

@Ron -

 

If it were me, I would unassign the drives from the array as I laid out in my last post. With the backup config directory you could undo the operation. I would at least follow those instructions through this step:

- Check to see if the files are present on disk6 and disk7 (which are now being simulated using parities and other disks in the array). 

 

If the files are visible, you will be able to recover. And if they are not, you know you have a more serious issue. You could hold off doing the rebuild, as there may be a simpler way to accomplish the equivalent.

 

Been giving more thought - I believe that the super.dat file likely contains the disk partitioning information. I expect that is what the parity check is using for the starting sector and size. And that it is ignoring the partition table.  That would explain why the partitions were correctly aligned, and so few sync errors were found.

Link to comment

Thanks, we are moving on Monday or Tuesday, so I will unassign the 2 drives, and check it out tomorrow if I have time. I appreciate everyones help, and hope this is an easy fix.

 

Question...is there away to check the 2 hdds while they are still attached to the array with windows to see what is showing on the hdds? Then I would know immediately if unraid had written anything to them. All I know is that I installed the hdds...No idea if unraid actually wrote to them. The more I think about it...since I have Installed the new hdds, I do not think much if anything has been written to the NAS as I have not had my BR burner hooked up for months before the rebuild of the NAS.

Edited by Ron
Link to comment

 

2 hours ago, Ron said:

Question...is there away to check the 2 hdds while they are still attached to the array with windows to see what is showing on the hdds?

 

Not easily. You'd have to manually reconstruct the partition table. There may be some software to do this,  but it could still be tricky,  And there could be more than just the partition table impacted. Other early sectors on the disk may also have been trashed and we'd have no way to know which ones or how to repair or reset them. 

 

I still believe that removing the physical disks and letting unRaid stimulate them is the best way to diagnose this. Parity will allow unraid to stimulate the partition, but unraid has the smarts to stimulate the early disk sectors including the boot sector and partition table. And the procedure is completely reversible if you backup the config folder first. That's what I would do. The more I think about this,  the more optimistic I am that this will allow you to see what's was on the disk prior to the partition tables getting messed up. And  then allow you to repair the physical disks. 

Link to comment

Like I mentioned there's no way those disks were array disks and have unRAID data, if you want to see for yourself, and assuming the partitions are still valid Windows partitions, mount them manually and check the data they have, you can easy do that on the command line without unassigning the disks (with the array started or stopped):

 

mkdir /x
mount -vt ntfs /dev/sdX1 /x

Replace X with the correct letter, b and c, as of last diags, then you can use midnight commander to navigate to /x and see the data.

After first disk is done unmount:

umount /x

Note that you'll need to close mc or navigate to /x to successfully unmount, then repeat the process with the other disk, if both mount you can confirm there's no array data there just format both disks.

Link to comment
5 hours ago, johnnie.black said:

Also the 9 parity check errors are perfectly normal after an unclean shutdown and unrelated to this problem.

 

Yes,  I agree.  The most likely theory that fits all the facts ...

1 -- the disks were in the array

2 - the partitions are invalid for an array disk

3 - parity was nearly perfectly [meaning either the disks were always partioned wrong (seems very unlikely because we've never seen unraid mispartition a disk) or the partition table got corrupted and unraid used its own partition info (from super.dat) to guide the parity check. The latter is far more likely IMO]

 

is that the partition table got corrupted or accidentally overwritten due to user error.  

 

So removing the physical disk assignments and letting unraid stimulate the disks would allow the partitions to mount, because unraid would place the partition in a simulated disk structure that would contain the original and valid partition table. 

 

If this works,  the disks can be rebuilt.  If not, the whole operation can be undone and no damage would have been done in the trying. But at this point I don't see another alternative that leads to a successful recovery. And this option seems to have a decent, if not high, likelihood of success. 

Link to comment
17 minutes ago, SSD said:

The most likely theory that fits all the facts ...

What I believe happened:

 

-Some time ago OP did an new config and added both disks to the array and resynced parity

-OP forgot to format the disks and they were never part of the array.

-OP only now noticed that both disks are unmountable

Link to comment
8 minutes ago, johnnie.black said:

What I believe happened:

 

-Some time ago OP did an new config and added both disks to the array and resynced parity

-OP forgot to format the disks and they were never part of the array.

-OP only now noticed that both disks are unmountable

 

This is my view too. It's easy to destroy partition tables, but I haven't seen anyone before who have by accident manage to create valid partitions.

 

Only dd if=old-disk of=new-disk could manage the accidental transfer of a valid partition table to an existing disk without actually running a partitioning software. So nothing relating to any power loss.

Link to comment
39 minutes ago, johnnie.black said:

What I believe happened:

 

-Some time ago OP did an new config and added both disks to the array and resynced parity

-OP forgot to format the disks and they were never part of the array.

-OP only now noticed that both disks are unmountable

 

Ron did say earlier that disk7 contained a lot of data. Maybe he was mistaken. 

 

What would unraid do in the scenario you mention. Would it rebuild parity to include the two new disks? Would it assume the partition starting sector? (My understanding is parity only protects the partition, not the entire disk.)

 

The partition structure of disk6 seems wrong for a 4T disk. Not sure windows would have created such a partition. He did say it had come out of a USB enclosure,  but seems wrong even for that.  

 

I was thinking Ron might have put the disk back in a windows computer and inadvertently done something in Windows to repartition the disks.

 

Are you 100% confident that remounting of disks 6 and 7 in the array would not result in any writes to the disks or parity? Assuming both theories are possible,  which would be best to try first? The stimulating disk approach would not write if there are no writes to the array.  And would positively not write to the physical disk6 or disk7.

Link to comment
2 minutes ago, SSD said:

Are you 100% confident that remounting of disks 6 and 7 in the array would not result in any writes to the disks or parity?

He can mount the disks read only, but IMO doesn't matter since the disks have no unRAID data, just add ro as a mount option:

 

mount -vt ntfs -o ro /dev/sdX1 /x

 

3 minutes ago, SSD said:

The stimulating disk approach would not write if there are no writes to the array.

No problem doing this but it will produce the same result, two unmountable disks.

Link to comment
1 hour ago, johnnie.black said:

What I believe happened:

 

-Some time ago OP did an new config and added both disks to the array and resynced parity

-OP forgot to format the disks and they were never part of the array.

-OP only now noticed that both disks are unmountable

Could it be that with the power outage they then became unmountable, and prior they were just sitting there waiting for the process to start, and I thought they were ready already.

Edited by Ron
Link to comment

The hdds had the green lights, and I am pretty sure I would have noticed it saying unmountable. My PC is. I3NUC, so I have no way to hook the hdds up to it. Can I run the commands you said with putty While the hdds are still in the NAS.

Link to comment

Since I believe after thinking about it last night that nothing has been wriiten on the hdds since the rebuild of the NAS... I decided to format the HDDs in question. I did that, and now it is rebuilding #6, does that mean there was something on 6? I am assuming it could have been as simple as a couple of data files if so/

Link to comment
31 minutes ago, Ron said:

Since I believe after thinking about it last night that nothing has been wriiten on the hdds since the rebuild of the NAS... I decided to format the HDDs in question. I did that, and now it is rebuilding #6, does that mean there was something on 6? I am assuming it could have been as simple as a couple of data files if so/

 

So now we'll never know. Too bad.

 

Good luck!

  • Like 1
Link to comment
39 minutes ago, SSD said:

 

So now we'll never know. Too bad.

 

Good luck!

Parity is rebuilding the #6 HDD, so would that not mean that it was in the array, and will have any data from Unraid replaced? What I am wondering now is why #7 is not being rebuild, and still shows 60gb size after formatting. Sounds like something is still wrong with that hdd.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.