Jump to content

Parity issues during disk rebuild


Recommended Posts

Hello All, and thanks to the community for all the knowledge you've unselfishly shared with others.  For a short while, I was getting UDMA CRC errors on a disk that was getting worse.  I finally got in another drive and ran a parity check that went fine before I swapped that drive.  During the drive rebuild, something went crazy with the system and the parity disk and another disk (that oddly was the very next disk in the array) was falling on and offline.  Checked some cables and did some reboots, but the Parity disk now has a red X and says it's disabled.  The array seems to have stabilized now, and no issues with drives falling on and offline.  My question is this... is it likely that I can retrieve the data that was in the middle of rebuilding before the Parity freak-out?  I was considering trying to do a new config with parity, but seeing as there will be one disk now just formatted, would that even work?  Is there a better way to try to tackle this, or just suck it up and just rebuild the parity with current config and just try to hunt down the 6-7TB of media I lost?

Link to comment
57 minutes ago, Jon4got2 said:

UDMA CRC errors on a disk that was getting worse.  I finally got in another drive

CRC errors are connection issues and replacing the disk is not the usual solution.

 

Some things unclear about what all you did from your description. Maybe diagnostics will clear some things up. Hopefully you haven't rebooted.

 

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

 

Link to comment
58 minutes ago, Jon4got2 said:

During the drive rebuild, something went crazy with the system and the parity disk and another disk (that oddly was the very next disk in the array) was falling on and offline.  Checked some cables and did some reboots, but the Parity disk now has a red X and says it's disabled.

If you only have single parity, and the drive rebuild hadn't completed, then parity can't go disabled. So that is one thing that isn't clear. Did the rebuild complete?

 

1 hour ago, Jon4got2 said:

seeing as there will be one disk now just formatted

Did you actually agree to Format anything? Format is not part of rebuilding and there are warnings against formatting a disk. If you format a disk in the parity array, then parity is updated (just like for any write operation) and so parity agrees the disk is formatted. Rebuilding a formatted disk just results in a formatted disk.

 

So, as you can see there are some things about your description that don't completely add up. Can you clarify or add anything you might have left out?

Link to comment
1 hour ago, Jon4got2 said:

During the drive rebuild, something went crazy with the system and the parity disk and another disk (that oddly was the very next disk in the array) was falling on and offline.

It is very common to disturb connections of other disks when replacing a disk. You should have probably asked for advice earlier.

Link to comment

I formatted the new disk before I added to the array.  I've done this before, but it's possible I did something out of sequence.  And you are 100% correct Constructor.  I should have asked earlier.  I'm thinking I may try reinstalling the drive that was giving me the CRC errors temporarily to try to rebuild the parity.  [correction... can't do this since I upgraded that drive to a 10TB] then if I'm getting the errors again, I'll review that replacement procedure again, and try to follow it more carefully.  Could it be a drive enclosure that could be causing those CRC issues as well?

Edited by Jon4got2
Link to comment
2 hours ago, Jon4got2 said:

I formatted the new disk

Many people seem to have a very vague idea what format means. Format means "write an empty filesystem (of some specific type) to this disk". That is what it has always meant in every operating system you have ever used. Not that it matters (see below), but what filesystem did you format it with?

2 hours ago, Jon4got2 said:

before I added to the array

I am guessing you mean you "replaced" a disk with the new disk. For clarity, I usually like to reserve the word "add" to mean adding a disk to a new slot in the parity array. In either case, formatting a disk before putting it in the array is completely pointless.

 

If replacing a disk for rebuilding, the disk will be completely overwritten by the rebuild, so formatting the replacement disk before doing the replacement accomplishes nothing.

 

If adding a disk to a new slot in the parity array, Unraid will clear it (writing all zeros) so parity is maintained. So formatting before adding accomplishes nothing.

 

2 hours ago, Jon4got2 said:

Constructor

Newbie😉

 

I'm afraid I'm still unclear about the state of your system and its data, and what you want to do now.

3 hours ago, trurl said:

Maybe diagnostics will clear some things up.

 

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

 

If all you want to do now is rebuild parity it is very easy to rebuild it to the same disk assuming that disk is OK and you don't have any other problems. Possibly your main problems are just due to bad connections.

 

But, what about this?

3 hours ago, Jon4got2 said:

is it likely that I can retrieve the data that was in the middle of rebuilding before the Parity freak-out?

I think if nothing else, Diagnostics will help me understand what the current situation is better than anything you have said so far, and can server as a basis for further communication between us.

Link to comment
27 minutes ago, Jon4got2 said:

drive enclosures

Don't have any personal experience with those. The main thing is they have separate ports for each disk. Some people try to use enclosures with only one port for multiple disks. Sometimes even USB port (USB is not reliable enough for a permanent connection).

 

I have some other things to do right now. I will study your diagnostics and get back to you.

Link to comment

Looks like disk6 is newly formatted. Is this the disk you rebuilt? Did it ever tell you it was unmountable?

 

And parity is disabled as mentioned. SMART for disk6 and parity looks OK.

 

Not related to your problems, but your system share has files on the array, and it is set to be moved to the array. You want this share to be all on cache and set to stay on cache. Since dockers use this share and always have open files in this share, your docker performance will be impacted by slower parity, and your dockers will keep array disks spinning. Similarly for VMs and system/domains share, but you don't currently have VMs enabled.

Link to comment

Like the rest of the Unraid OS, the syslog is in RAM, so it starts over when you reboot. But there was enough after the reboot for me to see some things and answer some questions.

 

ata1 is the connection to parity, ata2 is the connection to disk2

Aug 29 13:38:24 Tower kernel: ata1.00: ATA-9: WDC WD100EMAZ-00WJTA0, JEGL12UN, 83.H0A83, max UDMA/133
...
Aug 29 13:38:24 Tower kernel: ata2.00: ATA-9: WDC WD80EMAZ-00WJTA0, 7HKGB64F, 83.H0A83, max UDMA/133
...
Aug 29 13:38:33 Tower kernel: md: import disk0: (sdb) WDC_WD100EMAZ-00WJTA0_JEGL12UN size: 9766436812 
...
Aug 29 13:38:33 Tower kernel: md: import disk2: (sdc) WDC_WD80EMAZ-00WJTA0_7HKGB64F size: 7814026532 

 

(emulated) disk6 was unmountable

Aug 29 13:38:43 Tower emhttpd: shcmd (60): mkdir -p /mnt/disk6
Aug 29 13:38:43 Tower emhttpd: shcmd (61): mount -t xfs -o noatime,nodiratime /dev/md6 /mnt/disk6
Aug 29 13:38:43 Tower kernel: XFS (md6): Mounting V5 Filesystem
Aug 29 13:38:43 Tower kernel: XFS (md6): Corruption warning: Metadata has LSN (1:7162) ahead of current LSN (1:843). Please unmount and run xfs_repair (>= v4.3) to resolve.
...
Aug 29 13:38:44 Tower emhttpd: shcmd (62): umount /mnt/disk6
Aug 29 13:38:44 Tower root: umount: /mnt/disk6: not mounted.
Aug 29 13:38:44 Tower emhttpd: shcmd (62): exit status: 32
Aug 29 13:38:44 Tower emhttpd: shcmd (63): rmdir /mnt/disk6

 

rebuild of disk6 started but parity and disk2 were disconnected. (SMART for disk2 also OK)

Aug 29 13:39:18 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x100000 SErr 0x4890000 action 0xe frozen
...
Aug 29 13:39:18 Tower kernel: ata1: hard resetting link
...
Aug 29 13:39:18 Tower kernel: ata2: hard resetting link

 

rebuild aborted and you formatted disk6

Aug 29 13:39:45 Tower kernel: md: recovery thread: exit status: -4
Aug 29 13:39:46 Tower emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
Aug 29 13:39:46 Tower emhttpd: shcmd (107): /sbin/wipefs -a /dev/md6

 

You may recall I said

2 hours ago, trurl said:

Format means "write an empty filesystem (of some specific type) to this disk".

When you format a disk in the parity array, Unraid treats this exactly like it does any other write operation. It updates parity. After formatting a disk in the parity array, parity agrees that the disk has an empty filesystem. So rebuilding a disk that has been formatted will result in an empty filesystem.

 

Then the answer to one of your earlier questions

6 hours ago, Jon4got2 said:

is it likely that I can retrieve the data that was in the middle of rebuilding before the Parity freak-out?

is NO.

 

Do you have backups?

 

I can tell you how to rebuild parity but the connection issues you have been having will probably make this a problem.

Link to comment

I have a second unraid machine that I use for experimenting on, but also does sync many of my more important media categories.  I should be able to get some of those back.  I also have a gsuite, but I've had it disconnected for a while until recently. 

I haven't been getting those same connection issues in the logs since this has been back online today.  However, I noticed my sabnzbd and krusader keep losing connection in the gui for only a second.  i just now tried pausing sabnzbd, and it seems to be not only losing a connection, but restarting because it is resuming the downloads.

Link to comment

I really wish you had asked for help very early on.

 

Since disk6 was disabled, it was being emulated by all the other disks from the parity calculation. But some of those disks were having connection issues. So it is possible disk6 wasn't really corrupt, but the emulation of disk6 was corrupted by the bad connections on the other disks.

 

In any case, instead of formatting the disk, the correct thing to do would have been to repair its filesystem.

 

Here is another recent thread that you may find educational:

 

 

Link to comment

Yeah I have a habit of trying to truly break something properly first before I ask for help.  Live and learn.  Thanks again for your help.  I will definitely read through that thread.

Do you believe it is connection issues that could be causing the restarts in sabnzbd and krusader as well?

Link to comment

At this point my recommendation is to disable docker service (Settings-Docker) and quit using the server until you successfully rebuild parity (your array is currently unprotected).

 

Then before enabling docker again we can work on this

2 hours ago, Jon4got2 said:

Good catch on the system share.

 

To rebuild to the same disk (whether parity or data)

  1. Stop array
  2. Unassign disabled disk 
  3. Start array with disabled disk unassigned 
  4. Stop array 
  5. Reassign disabled disk 
  6. Start array to begin rebuild 

 

Link to comment

right now all the disks are mounted and everything looks good except for the parity.  So if I understand correctly, I should just rebuild the parity using your steps above and forget about retrieving the Disk 6 data?  That's fine with me, I just want to make sure I'm following properly.

 

I also through old disk 6 into a shucked USB drive case I had laying around to see if I could pull the files from it.  UD wanted to format it.  I didn't see a way to mount it.  I also tried using PowerISO on windows to try to read those files, and PowerISO said the disk was unmountable.

Link to comment
32 minutes ago, Jon4got2 said:

I also through old disk 6 into a shucked USB drive case I had laying around to see if I could pull the files from it.  UD wanted to format it.  I didn't see a way to mount it.  I also tried using PowerISO on windows to try to read those files, and PowerISO said the disk was unmountable.

I don't know about that software, but Windows does not natively support any of the filesystems used by Unraid. The software often recommended is UFS Explorer.

 

You might try repairing the disks filesystem as an Unassigned Device in Unraid.

 

https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_XFS

 

Be sure to note this part in the Additional Comments of that wiki:

Quote

If you want to test and repair a non-array drive, you would use the drive's partition symbol (e.g. sdc1, sdj1, sdx1, etc), not the array device symbol (e.g. md1, md13, etc). So the device name would be something like /dev/sdj1, /dev/sdx1, etc.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...