disk 12 rebuild ends with disk unformatted

daniel.boone · December 1, 2014

I had disk12 (1TB) fail so I bought new 3TB disk. I backed up entire contents of my USB thumbdrive and precleared new disk successfully.

Last night I removed old disk, rebooted confirmed, disk missing, stopped array and added new disk, checked of disk rebuild, started process and walked away until this morning. Looking at it now and see disk has green ball but says unformatted. The new disk is in the array and the array is started but my disk 12 contents are not there.

Version 6 Beta 6

Here is the portion of the syslog that I think is relevant.

Nov 30 19:04:05 Tower kernel: mdcmd (69): check CORRECT

Nov 30 19:04:05 Tower kernel: md: recovery thread woken up ...

Nov 30 19:04:05 Tower kernel: md: recovery thread rebuilding disk12 ...

Nov 30 19:04:05 Tower kernel: md: using 1536k window, over a total of 2930266532 blocks.

Nov 30 19:04:06 Tower emhttp: shcmd (254): :>/etc/samba/smb-shares.conf

Nov 30 19:04:06 Tower avahi-daemon[3587]: Files changed, reloading.

Nov 30 19:04:06 Tower emhttp: Restart SMB...

Nov 30 19:04:06 Tower emhttp: shcmd (255): killall -HUP smbd

Nov 30 19:04:06 Tower emhttp: shcmd (256): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service

Nov 30 19:04:06 Tower avahi-daemon[3587]: Files changed, reloading.

Nov 30 19:04:06 Tower avahi-daemon[3587]: Service group file /etc/avahi/services/smb.service changed, reloading.

Nov 30 19:04:06 Tower emhttp: shcmd (257): ps axc | grep -q rpc.mountd

Nov 30 19:04:06 Tower emhttp: _shcmd: shcmd (257): exit status: 1

Nov 30 19:04:06 Tower emhttp: shcmd (258): /usr/local/sbin/emhttp_event svcs_restarted

Nov 30 19:04:06 Tower emhttp_event: svcs_restarted

Nov 30 19:04:06 Tower emhttp: shcmd (259): /usr/local/sbin/emhttp_event started

Nov 30 19:04:06 Tower emhttp_event: started

Nov 30 19:04:07 Tower avahi-daemon[3587]: Service "Tower" (/etc/avahi/services/smb.service) successfully established.

Nov 30 19:04:19 Tower kernel: docker0: port 2(vethe19d) entered forwarding state

Nov 30 19:04:20 Tower kernel: docker0: port 3(veth4a5e) entered forwarding state

Nov 30 19:04:20 Tower kernel: docker0: port 4(vethb767) entered forwarding state

Nov 30 19:04:20 Tower kernel: docker0: port 5(vethe50b) entered forwarding state

Nov 30 19:06:17 Tower avahi-daemon[3587]: Withdrawing workstation service for veth8101.

Nov 30 19:06:17 Tower kernel: docker0: port 1(veth8101) entered disabled state

Nov 30 19:06:17 Tower kernel: device veth8101 left promiscuous mode

Nov 30 19:06:17 Tower kernel: docker0: port 1(veth8101) entered disabled state

Dec 1 01:08:20 Tower kernel: mdcmd (70): spindown 3

Dec 1 01:08:20 Tower kernel: mdcmd (71): spindown 4

Dec 1 01:08:21 Tower kernel: mdcmd (72): spindown 5

Dec 1 01:08:22 Tower kernel: mdcmd (73): spindown 6

Dec 1 01:08:23 Tower kernel: mdcmd (74): spindown 7

Dec 1 01:08:23 Tower kernel: mdcmd (75): spindown 8

Dec 1 01:08:24 Tower kernel: mdcmd (76): spindown 13

Dec 1 01:08:25 Tower kernel: mdcmd (77): spindown 15

Dec 1 04:52:58 Tower kernel: mdcmd (78): spindown 1

Dec 1 04:52:58 Tower kernel: mdcmd (79): spindown 2

Dec 1 04:52:59 Tower kernel: mdcmd (80): spindown 9

Dec 1 04:53:00 Tower kernel: mdcmd (81): spindown 10

Dec 1 04:53:00 Tower kernel: mdcmd (82): spindown 11

Dec 1 04:53:01 Tower kernel: mdcmd (83): spindown 14

Dec 1 06:14:15 Tower kernel: md: sync done. time=40210sec

Dec 1 06:14:16 Tower kernel: md: recovery thread sync completion status: 0

Hoping I can still save the original disk 12 contents. Recommendations?

Thanks,

itimpi · December 1, 2014

Did the disk show as unformatted before the repair as the rebuild process would not fix such an issue? A disk suddenly showing as unformatted does not normally mean that it really is unformatted - instead it means that the mount of the disk by unRAID failed for some reason. The commonest cause is some level of file system corruption. Luckily the file systems used with unRAID have a level of redundancy so that they can normally be repaired as long as the physical media is OK.

If the disk is in reiserfs format then the normal recovery action is:

Stop the array and then restart it in maintenance mode
from a console/telnet session run a command of the form
reiserfsck --check /dev/md12
(the md?? number corresponds to the disk to be checked). This command will run for some time (particularly with large disks) so if using telnet make sure you do not close the telnet window while it is running (or consider running the command under screen).
The output from the previous step will indicate whether there appears to be a valid reiserfs file system on the disk and if so the recommended recovery action. Typically this involves rerunning the command with --rebuild-tree in place of the --check but other options can be suggested. If in any doubt then check back here before proceeding as taking the wrong action is likely to be unrecoverable.

If you are using a different file system then the basic process is the same although the name of the check/repair utility will vary (e.g. xfs_repair for XFS format).

As a side issue, do you still have the original disk that failed? If so there is a good chance that if anything goes wrong with the process given above data can still be recovered from it so keep it somewhere safe until you have safely recovered your system.

daniel.boone · December 1, 2014

ran the check

Came back right away with

reiserfs_open: the reiserfs superblock cannot be found on /dev/md12. Failed to open the filesystem.

Recommendation is to run --rebuild-sb if the partition table has not changed.

itimpi · December 1, 2014

ran the check

Came back right away with

reiserfs_open: the reiserfs superblock cannot be found on /dev/md12. Failed to open the filesystem.

Recommendation is to run --rebuild-sb if the partition table has not changed.

That is worrying. Trying to rebuild the superblock is not something I would recommend before carrying out less drastic action. I would wait to see if anyone else has suggestions for something that could be tried first.

Just a check - are you sure that the disk is in reiserfs format? That error message is something that I would not be surprised to see if it wasin another format. The unRAID GUI should tell you what it thinks the drive is meant to be formatted as.

You can also get the 'unformatted' state reported if there is a mismatch between the format that the GUI thinks is appropriate and the real format of the underlying disk. Since the default for new disks with v6 is now XFS then it is quite possible it is not reiserfs.

daniel.boone · December 1, 2014

Thanks, I'm holding off until I hear from others.

To clarify the disk preclear was done using the advance format as specified in that script's thread. I did not manually format the disk before starting. I was expecting the rebuild to handle all those bits.

daniel.boone · December 2, 2014

After I restored the original super.dat I attempted another rebuild of the disk.

DD on the new formatted disk.

Here is pic of the disk being rebuilt.

5 hours to go. I hate to say it but I'm not to hopeful.

When I think about it the parity is suspect in the bad rebuild. Not sure if this is a version issue or just dumb luck. I will probably lose disk12 data and have to create a new parity with a zeroed disk12.

daniel.boone · December 2, 2014

Just completed rebuild with same result :-[

From log file...

recovery thread sync completion status: 0

Array expanded to accommodate new drive but the rebuilt drive remains unformatted.

New DD output

daniel.boone · December 3, 2014

Anyone have any guidance to offer for this issue? I'd like to at least get stable. My fear is I encounter another issue and lose another 1 to 3TB of data.

TIA

hackztor · December 4, 2014

Well if it shows up as unformatted rebuilding will not help any further. Disk check and fix is the way. I moved away from reiserfs to xfs when this happened to a drive of mine a few months back. When your original disk failed was it failed or show up as unformatted?

reiserfsck --check /dev/md# (Easy tells you what is needed)

reiserfsck --fix-fixable /dev/md# (Hope this one fixes it)

reiserfsck --rebuild-tree /dev/md# (Still okay, just lost and found folder chances you might be missing something)

reiserfsck --rebuild-sb /dev/md# (Last restort)

Read below link before doing the sb one.

http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems

daniel.boone · December 4, 2014

Disk at first was failed. It was failed util the replacement drive showed.

At that point I shutdown, installed new drive so I could preclear, checked connections on old drive, restarted with both old and new in place and that's when it old disk showed unformatted.

What's the process to move to XFS?

daniel.boone · December 5, 2014

Responses to check command is --rebuild-sb. Tried the earlier commands but all say the same, rebuild. Put system in maintenance mode and started the process.

Last question to rebuild command I'm not so sure about. Since the drive is a 3TB replacement for the 1TB i'm leaning toward rebuilding the journal. I haven't changed the partition table but unRaid may have as part of the drive expansion process. What's the recommendation?

root@Tower:~# reiserfsck --rebuild-sb /dev/md12

reiserfsck 3.6.24

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

reiserfs_open: the reiserfs superblock cannot be found on /dev/md12.

what the version of ReiserFS do you use[1-4]

(1) 3.6.x

(2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one

(3) < 3.5.9 converted to new format (don't choose if unsure)

(4) < 3.5.9 (this is very old format, don't choose if unsure)

(X) exit

1

Enter block size [4096]:

4096

No journal device was specified. (If journal is not available, re-run with --no-journal-avail

Is journal default? (y/n)[y]:

Did you use resizer(y/n)[n]:

rebuild-sb: no uuid found, a new uuid was generated (34700135-879d-439a-abee-c8a7246a7a30)

rebuild-sb: You either have a corrupted journal or have just changed

the start of the partition with some partition table editor. If you are

sure that the start of the partition is ok, rebuild the journal header.

Do you want to rebuild the journal header? (y/n)[n]:

Thanks

SSD · December 5, 2014

Just seeing this.

When the disk 12 physical disk failed, unRaid would start simulating disk 12 using the other disks in the array (including parity). Were you able to successfully access the simulated disk 12 or was it slowing unformatted.

Also, when was your last parity check?

daniel.boone · December 5, 2014

At this point the replacement is in the fold so simulation is no longer.

At first yes it was simulating the drive. After a reboot it came up unformatted still with a orange ball. I never had a red ball. When I added the new disk it rebuilt and still came up unformatted but now with greenball. The array is also showing the additional 2TB in the total but disk 12 is not mounted (of course).

Last check was month ago. Nothing this month due to the issues. System says parity is valid. I have not force a parity re-write just yet.

If I can get the files off disk 12 from a different machine I would remove it from the array and rewrite the parity. I just need a bit of direction to make it happen. If the data is gone well I can live with the fallout. I just want to get the array stable. As it is now I run the risk of losing more data.

SSD · December 5, 2014

At this point the replacement is in the fold so simulation is no longer.

At first yes it was simulating the drive. After a reboot it came up unformatted still with a orange ball. I never had a red ball. When I added the new disk it rebuilt and still came up unformatted but now with greenball. The array is also showing the additional 2TB in the total but disk 12 is not mounted (of course).

Last check was month ago. Nothing this month due to the issues. System says parity is valid. I have not force a parity re-write just yet.

If I can get the files off disk 12 from a different machine I would remove it from the array and rewrite the parity. I just need a bit of direction to make it happen. If the data is gone well I can live with the fallout. I just want to get the array stable. As it is now I run the risk of losing more data.

If you lost a drive (red-balled) and unRAID simulated it successfully, but after a rebuild it showed unformatted, either you have some type of hardware issue, or this is a serious defect. Because the rebuild should do, on a whole disk basis, what simulation does ad hoc as parts of the disk are accessed.

You might want to reach out to Tom/Jonp if you feel this is what happened.

It would be interesting to do a read-only parity check just to confirm that the rebuild was accurate.

daniel.boone · December 12, 2014

Still looking for solid direction on getting bad drive removed and re-establishing a good parity. Email sent to Tom twice but I haven't heard back.

Can anyone help get me back and running? Just consider the data lost and advise how to move forward. I'd like to format 12. That's easy enough but how do I get a fresh parity on my drive after that? Can I remove if from the array, reboot and add it back?

Thanks

JonathanM · December 12, 2014

how do I get a fresh parity on my drive after that?

A correcting parity check should bring everything back in sync.

disk 12 rebuild ends with disk unformatted

Recommended Posts

daniel.boone

Link to comment

itimpi

Link to comment

daniel.boone

Link to comment

itimpi

Link to comment

daniel.boone

Link to comment

daniel.boone

Link to comment

daniel.boone

Link to comment

daniel.boone

Link to comment

hackztor

Link to comment

daniel.boone

Link to comment

daniel.boone

Link to comment

SSD

Link to comment

daniel.boone

Link to comment

SSD

Link to comment

daniel.boone

Link to comment

JonathanM

Link to comment

Archived