Random disks keep showing as needing to be formatted.. fix it and then another!


Recommended Posts

Hi all thank you for taking the time to read this post.

 

UPDATE: I have fixed the original problem, but it seems now that another disk has shown up as needing to be formatted I fixed that one and now a different one is showing as needing to be formatted.

 

 

 

I have had two drives fail on my system. I have dual parity so replaced the failed drives with new ones.

 

The rebuild is taking place right now and says it has 6 hours left, but the two disks that are being rebuilt are showing up as needing formatting. Is this correct? I'm worried once it has finished the rebuild in 6 hours that the new disks will be blank and parity will be valid for two blank disk. That's 8TB of files that could be lost.

 

Is this how it's supposed to look?

 

EVMzcQsm.png

39ZAEUCm.png

Link to comment

The array has dropped in size by 8TB, it has definitely messed up.

 

The first time I started the rebuild speed dropped so low it was gonna take 800 days to complete. I canceled the rebuild rebooted and then started it again and it said they needed to be formatted.

 

Am I screwed?  :(

Link to comment

Whatever you do, do NOT select the format option or you will definitely lose data.  A format is NEVER' part of the rebuild process.

 

Since you have dual parity, what happens if you stop the rebuild stop the array; change the two problem devices to unassigned; and then restart the array.  Ideally the two missing devices will come up as emulated and you can still see all the files.  That should mean that a successful rebuild will put everything back as it should be.

 

Another possibility is that the drives show as 'unmountable' which suggest file system level corruption that can be fixed using the appropriate recovery utility.

 

It is quite possible that the original drives are actually fine but we're just disabled due to a failed write caused by an external factor.  Therefore keep these around unchanged until you have recovered onto the new drives.    The fallback position is to try and recover the data from the drives you think have failed.

Link to comment

Whatever you do, do NOT select the format option or you will definitely lose data.  A format is NEVER' part of the rebuild process.

 

Since you have dual parity, what happens if you stop the rebuild stop the array; change the two problem devices to unassigned; and then restart the array.  Ideally the two missing devices will come up as emulated and you can still see all the files.  That should mean that a successful rebuild will put everything back as it should be.

 

Another possibility is that the drives show as 'unmountable' which suggest file system level corruption that can be fixed using the appropriate recovery utility.

 

It is quite possible that the original drives are actually fine but we're just disabled due to a failed write caused by an external factor.  Therefore keep these around unchanged until you have recovered onto the new drives.    The fallback position is to try and recover the data from the drives you think have failed.

 

Thank you for the reply!

 

I have not formatted or anything. When I change the new disks to blank it says the array cannot be started.

 

Unassigned Devices shows one as needing to be formatted and the other as XFS. But these are the new drives that have not yet had the data recovered from parity...

 

The interesting thing is that when the rebuild was taking place and the disks showing as needing to be formatted Unraid was writing to the new disks and reading from all the others which is correct for a rebuild. But why would they show as needing to be formatted and the data not being available? The total size of the array according to unraid has gone down by the sum of the two new disks

Link to comment

I have not formatted or anything. When I change the new disks to blank it says the array cannot be started.

 

Unassigned Devices shows one as needing to be formatted and the other as XFS. But these are the new drives that have not yet had the data recovered from parity...

With dual parity and only 2 missing data drives it should be possible to start. The screenshot in the OP is too small. Try another screenshot.
Link to comment

I have not formatted or anything. When I change the new disks to blank it says the array cannot be started.

 

Unassigned Devices shows one as needing to be formatted and the other as XFS. But these are the new drives that have not yet had the data recovered from parity...

With dual parity and only 2 missing data drives it should be possible to start. The screenshot in the OP is too small. Try another screenshot.

 

Sorry, I thought they were clickable.

 

39ZAEUCm.png

EVMzcQsm.png

 

Link to comment

Thank you for the reply. I ran:

 

xfs_repair -v /dev/md5

 

on the disk Sunday evening and ever since then it has been stuck at:

 

attempting to find secondary superblock.....

 

With more dots appearing every second for the last 5 days. I canceled it today as there is not even any disk activity anymore.

 

I am very confused because in maintenance mode it says that the "disk is disabled contents are emulated" so does that mean that xfs_repair is working or not? When I start the server and it mounts the disks it shows them as needing to be formatted and the array size does not take into account the two unformatted disks and the data is missing. So why does unraid say the contents are emulated if they clearly aren't?

 

Where did the contents actually go? This all came about from doing a 2 disk parity rebuild. As parity was valid before the rebuild?

 

I really dont understand whats going on with it, doesn't seem to add up?

 

The worst part is that due to split level I could have lost loads of picture, videos etc for projects I have not started yet that I naively thought were protected. All my old stuff is on Dropbox but more recent stuff I dumped on UnRaid waiting to be sorted.

 

I would really be so grateful to anybody that could help!  :-* :-*

Link to comment

...

I am very confused because in maintenance mode it says that the "disk is disabled contents are emulated" so does that mean that xfs_repair is working or not? When I start the server and it mounts the disks it shows them as needing to be formatted and the array size does not take into account the two unformatted disks and the data is missing. So why does unraid say the contents are emulated if they clearly aren't? ...

The contents of the disk is a corrupted filesystem. The corrupted filesystem is why unRAID says the disk needs to be formatted. It doesn't appear to be a correctly formatted disks due to the filesystem corruption.

 

The disk is emulated because it was disabled due to a write failure. The only way to enable a disk is to rebuild it. Anything you try to do to the disabled disk is actually done to the emulated disk, including the filesystem repair.

 

So, the filesystem repair is working on the emulated disk because the disk is disabled. If instead you rebuild the disk, it could be enabled again if the rebuild is successful, but it will still have filesystem corruption because a corrupt filesystem is the contents of the disk (and the contents of the emulated disk). And after the rebuild it would still show as needing to be formatted because the filesystem corruption makes it look like it is not a correctly formatted disk.

 

I know this doesn't help much in proceeding to recover your files, but thought it might clear up some of your confusion.

 

You may have to see if you can get the files from backup. You should always have at least one extra copy of any file you consider irreplaceable. unRAID parity can rebuild a disk, but it cannot help recover a corrupt file, or a file accidentally deleted, or some other scenario where a file is lost. And this is true of any other system. Backups are still required if you have some other RAID or storage system.

Link to comment

Thank you again.

 

I did a rebuild of the drives previously and they still said that they were being emulated after the rebuild, anyway, i have done it for a second time and drives are showing as unformatted. I have run xfs_repair and got this one the disks:

 

Linux 4.4.15-unRAID.

root@tower:~# xfs_repair -v /dev/md4

Phase 1 - find and verify superblock...

bad primary superblock - bad magic number !!!

 

attempting to find secondary superblock...

 

<few thousand more dots>

 

..................................found candidate secondary superblock...

verified secondary superblock...

writing modified primary superblock

        - block cache size set to 1470096 entries

sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97

resetting superblock realtime bitmap ino pointer to 97

sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98

resetting superblock realtime summary ino pointer to 98

Phase 2 - using internal log

        - zero log...

zero_log: head block 3289420 tail block 3289415

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed.  Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair.  If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

 

 

SECOND DISK:

 

Linux 4.4.15-unRAID.

root@tower:~# xfs_repair -v /dev/md8

Phase 1 - find and verify superblock...

bad primary superblock - bad CRC in superblock !!!

 

attempting to find secondary superblock...

 

<few thousand more dots>

 

.................found candidate secondary superblock...

verified secondary superblock...

writing modified primary superblock

        - block cache size set to 1470088 entries

Phase 2 - using internal log

        - zero log...

zero_log: head block 109735 tail block 109729

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed.  Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair.  If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

 

 

I am currently looking on here and google for the correct command to mount the file system, just posting the logs here before I loose them.

Link to comment

I have repaired the disks, got everything working then another disk showed as needing to be formatted. I replaced it and did a rebuild and now a different disk has started showing as needing to be formatted. This is now the 4th disk. I'm starting to think there is a bigger problem here somewhere.

 

Can I post logs or something that might help, if someone could take a look?

 

thank you all

Link to comment

Tools - Diagnostics. Post complete zip.

 

What is the exact model of your power supply?

 

Have you done a memtest recently?

 

Did you test (with preclear or otherwise) any of your disks before adding them to your array?

 

Its a CX750M

 

Did memtest around 6 months ago when I upgraded the server.

 

I have precleared most of the disks, some have not. The disks that have been acting up are new and old ones.

tower-diagnostics-20160801-1435.zip

Link to comment

Are the problem disks on the same controller card?

 

Have you tried reseating the controller card in its slot?

 

Have you check all cable connections, power and SATA at both ends?

 

Different controller cards.

 

Have reseated the one card.

 

Have not checked the PSU end but have the other end.

 

I'll take everything out and reseat them all and run memtest.

 

Thank you so much for the help.

 

Anything in the logs that stands out?

Link to comment

Anything in the logs that stands out?

 

Do have older logs, from when the errors occurred, it could be more helpful.

 

I do not have any older logs Im sorry.

 

I got everything running again and now another disk has gone and has loads of errors. I have stripped out all the cables and redone them, reseated the cards etc.

 

Is it possible there is some underlying file system corruption on the disks? I have moved files onto the disk that has now failed, is it possible to move corrupted files and damage a new disk?

 

I have attached a new set of logs:

https://db.tt/qXVKYuoN

 

Regards

Link to comment

You have what looks like two SAS2LP on the two bottom PCIe x16 slots, if that's case these would be my number 1 suspect, disks 3,4 and 8 referenced in this thread are connected to one or the other, there have been reports of similar issues with theses controllers and a small number of users.

 

If you are not using virtualization disable VT-d in the bios, it may solve the issue.

Link to comment

You have what looks like two SAS2LP on the two bottom PCIe x16 slots, if that's case these would be my number 1 suspect, disks 3,4 and 8 referenced in this thread are connected to one or the other, there have been reports of similar issues with theses controllers and a small number of users.

 

If you are not using virtualization disable VT-d in the bios, it may solve the issue.

 

Ok fantastic thank you. I will try that!

 

 

Sent from my iPhone using Tapatalk

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.