Help needed with unmountable file system and disk errors (initial setup)

Sandwich · March 11, 2021

Trying to set up an unRaid box to replace my Drobo 5N (which, thankfully, still works after 5+ years). I've got a Gigabyte GA-Z97MX-Gaming 5 motherboard (not sure what SATA controller it has as the specs page doesn't seem to say).

I have a WD Red 6Tb drive which was in the Drobo for 2.5 years and recently (supposedly) crashed; the Drobo ejected it from the array for some reason. I took it in for warranty replacement, where they did a thorough test that took a couple of days and reported the access time for each and every sector of the disk. It apparently passed with flying colors, so not sure why it was ejected from the Drobo.

In any case, I gave it a complete, full NTFS reformat in my Windows PC and ran `chkdsk /R`, both of which completed with no errors or issues. At this point I'm very confused as to why it was ejected from the Drobo. So I plug it into my otherwise-empty unRaid box to see what happens.

As expected, it reported that the NTFS-formatted disk was unmountable, so I clicked to have unRaid format it. Now it says the "FS" column is "xfs", but reports that the disk is "Unmountable: No file system". I have no idea what's going on.

Finally, the disk log shows a lot of errors, but I don't know what's going on with those either. I'm attaching both the SMART report and the copied disk log (is there a better way to extract that log than just copy-paste?).

Any ideas? Thanks so much for any help you can give me.

6tb disk log.txt cube-smart-20210311-1119.zip

JorgeB · March 11, 2021

Try replacing or swapping both cables, if still issues please post the complete diags: Tools -> Diasgnostics

Sandwich · March 11, 2021

I've swapped SATA cables with new ones out of the bag; no immediate change. I then stopped the array to try formatting the drive again. After about 2-3 minutes it just stopped formatting, still reporting "Unmountable: No file system", with a notification on the side:

Quote

Unraid array errors: 11-03-2021 18:22

Warning [CUBE] - array has errors
Array has 1 disk with read errors

The disk log is attached below, as is the full diagnostics as requested.

Thank you so much for your time and assistance with this!

6tb new cables formatting stopped errors.txt cube-diagnostics-20210311-1824.zip

JorgeB · March 11, 2021

If the new cables didn't help it almost certainly a disk problem, sometimes errors are weird, just recently had a disk that passes every SMART test, you can do a perfect copy with dd, but try to mount the disk with any filesystem and you get error after error.

Sandwich · March 11, 2021

Hmm. Ok, here's the extra wrench in the works: when the Drobo kicked out the 6Tb, I immediately bought an 8Tb "replacement" before realizing the 6Tb was still under warranty. Then, since the Drobo was down to single-disk redundancy, I ran the 8Tb through a week-long `badblocks` test (which it passed), and then installed it into my empty unRaid box. It mounted fine, I made it a share (or whatever the terminology is around creating shares on drives), and began another week-long process, that of using rsync to copy over all the data (just under 6Tb of data) from the Drobo to the 8Tb drive in unRaid.

That completed successfully, and for a while (5-10 days?), unRaid was working super-fast (relative to the slowpoke Drobo) on the network with just the 8Tb drive. Then, just in the last few days trying to figure out the issue with the 6Tb drive, suddenly the 8Tb drive stopped being recognized by unRaid as well. Same exact reported issue: "Unmountable: No file system".

Does any of that shed any more light on what might be going on here? Is it possible that, instead of the drive being bad, the motherboard/controller is bad? But if so, why would it copy 6Tb of data over without a hitch? But if the drives are bad, why has every test under the sun except attempting to mount them in unRaid saying the drives are fine? 🤔

JorgeB · March 11, 2021

If more drives are causing similar ATA errors then there might be other issue, diags showing that might help.

Sandwich · March 11, 2021

19 minutes ago, JorgeB said:

diags showing that might help.

Ok, so tell me what to do.

JorgeB · March 11, 2021

In the diags posted there's only 1 disk assigned, post new ones after there was a problem mounting the other disk.

Sandwich · March 11, 2021

Ah, of course, sorry! Here:

cube-diagnostics-20210311-1945.zip

JorgeB · March 11, 2021

Disk2 only shows filesystem corruption, not ATA errors, that should be fixable by checking filesystem:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Sandwich · March 11, 2021

Odd... the filesystem of the 8Tb shows as "auto".

EDIT: Ahh, but I do have a screenshot of that drive in unRaid 3 days ago, showing it as having `xfs`. I'll continue on that assumption.

Edited March 11, 2021 by Sandwich

Sandwich · March 11, 2021

It's not showing me the option to check filesystem status (presumably because the detected filesystem is "auto"). :-/

JorgeB · March 11, 2021

Click on that disk (with the array stopped) and change fs to xfs.

Sandwich · March 11, 2021

Phase 1 - find and verify superblock...
        - block cache size set to 703632 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2058495 tail block 2058491
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Thu Mar 11 20:28:53 2021

Phase		Start		End		Duration
Phase 1:	03/11 20:26:33	03/11 20:26:34	1 second
Phase 2:	03/11 20:26:34	03/11 20:26:34
Phase 3:	03/11 20:26:34	03/11 20:27:46	1 minute, 12 seconds
Phase 4:	03/11 20:27:46	03/11 20:27:46
Phase 5:	Skipped
Phase 6:	03/11 20:27:46	03/11 20:28:53	1 minute, 7 seconds
Phase 7:	03/11 20:28:53	03/11 20:28:53

Total run time: 2 minutes, 20 seconds

Does the above indicate it found issues that need to be repaired? The manual seemed to indicate there'd be a clearly stated option to use for re-running the check command, but I don't see anything that matches that description above.

itimpi · March 11, 2021

You need to rerun the check removing the -n (no modify flag) so that fixing is allowed and add the -L flag.

Sandwich · March 11, 2021

Geez, now I know where all that Holywood tech speak in movies comes from. O.O

Phase 1 - find and verify superblock...
        - block cache size set to 703632 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2058495 tail block 2058491
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 1
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:2058555) is ahead of log (1:2).
Format log to cycle 4.

        XFS_REPAIR Summary    Thu Mar 11 20:48:18 2021

Phase		Start		End		Duration
Phase 1:	03/11 20:45:54	03/11 20:45:54
Phase 2:	03/11 20:45:54	03/11 20:46:07	13 seconds
Phase 3:	03/11 20:46:07	03/11 20:47:03	56 seconds
Phase 4:	03/11 20:47:03	03/11 20:47:03
Phase 5:	03/11 20:47:03	03/11 20:47:03
Phase 6:	03/11 20:47:03	03/11 20:47:52	49 seconds
Phase 7:	03/11 20:47:52	03/11 20:47:52

Total run time: 1 minute, 58 seconds
done

Great, so... now what? Do I stop the array from maintenance mode and restart normally? Currently, the Main screen still shows both disks as "Unmountable: No file system", although at least there's progress that the 8Tb's FS is "xfs" now instead of just "auto". ¯\_(ツ)_/¯

Also, if at any point in all this there's any indication whether the issue would be due to a failing drive vs failing MB vs random, please do let me know.

itimpi · March 11, 2021

Yes, stop the array and restart in normal mode. The status does not get changed until the system next tries to mount the drive and I would expect the disk to now mount OK.

Sandwich · March 11, 2021

Ok, so that does seem to have brought the 8Tb back, although I'm still scratching my head as to how it got borked in the first place. Nevertheless, thank you!!

As for the original issue, the 6Tb drive... as best as you can tell, that does seem to be either a cable or drive issue, correct? And since I swapped out to new cables....

Sandwich · March 12, 2021

BTW, when I run a check on the 6Tb drive with `-n`, the result is this (canceled after a few minutes of whole-lotta-nothin):

Quote

Phase 1 - find and verify superblock...

bad primary superblock - filesystem mkfs-in-progress bit set !!!

attempting to find secondary superblock...

.......................................................................................................................................................................................

And the line with the dots just grows and grows.

itimpi · March 12, 2021

Did you do this from the command line? If so what is the exact command that you used?

Sandwich · March 12, 2021

No, from running the array in maintenance mode, clicking the disk, and running the check.

itimpi · March 12, 2021

25 minutes ago, Sandwich said:

No, from running the array in maintenance mode, clicking the disk, and running the check.

I have never seen that particular error message before so do not know what it means.

I think you are going to need to let it scan the disk to see if it can find a valid superblock (which can take hours on a large disk).

Maybe someone else will have a suggestion?

JorgeB · March 12, 2021

That disk appears to be failing, and xfs failed to format it correctly.

Sandwich · March 12, 2021

Ok, that's actually good news (since it's both empty and under warranty); it means there was a reason the Drobo kicked it out, and a reason unRAID can't make use of it. Thank you all for your help, you've been great!!

Sandwich · March 18, 2021

Bit of an update and further puzzlement: I've gotten the 6Tb drive replaced, and the tech at the store assured me that the drive was error-free. Great! Same thing he said about the original 6Tb drive. 🙄

So I installed the replacement 6Tb and, unlike the original one, unRAID was able to format it. Yay! Then, since I need the largest drive in the array to be the parity disk, I `rsync`ed everything from the 8Tb (which had everything `rsync`ed from the Drobo5N) to the replacement 6Tb. That process took about half a day, and completed successfully. I then created a new drive config (Tools -> New Config), and assigned the 8Tb as a parity disk, and the 6Tb as data. It started to rebuild the parity on the 8Tb, which it said would take about a day. So I left it like that last night.

This morning I come back to see that the process paused partway through, and there's an error notification on the screen:

If I click on the disk log, the modal window just loads screen after screen of log rows, with errors scattered everywhere.

At this point, I have no idea what to think.

Full diagnostic attached.

cube-diagnostics-20210318-0833.zip

Help needed with unmountable file system and disk errors (initial setup)

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation