2 Unmountable: No file system

DannyG · April 20, 2020

I found 2 RED X in my array tonight.

Disk 10 & Disk 12 (0f 20)

Disk 10 has data on it, disk 12 is still empty.
If the disks are defective, I can either remove them or replace them with smaller ones.

These Disks are part of my Netapp shelf. which I've only owned for about 2 month now, but have "disappeared" from my array before. (what I would notice is all new shows missing from my plex) a Reboot usually brought everything back up, but this is the first time I see an actual error like this.

I removed disk 10, and rebooted the server. that didn't do much. I put the disk back in, and now it doing a "parity-sync/data rebuild" not sure if this will cause a data lost.

What should I do?

tower-diagnostics-20200419-2208.zip

trurl · April 20, 2020

Rebuilding an unmountable filesystem usually results in an unmountable filesystem, so the filesystem will probably have to be repaired after the rebuild. If you had asked before starting the rebuild, we could have tried to repair the emulated filesystem before rebuilding it, and maybe we would have had other options at that point.

13 minutes ago, DannyG said:

replace them with smaller ones

Replacing / rebuilding a disk with a smaller disk isn't possible. You could set a New Config and assign any disk you wanted, but then rebuild wouldn't be possible.

Disk12 will also have to be rebuilt to make it consistent with parity, but since it is empty it can be formatted again instead of repairing its filesystem.

Diagnostics are after reboot so no syslog from when the problems occurred. Also your controller isn't passing the usual complete SMART attributes. I'm not sure how to interpret that but it does indicate disks are OK, just don't know whether or not its idea of OK is the same as ours.

Let the disk10 rebuild complete if it will, and post new diagnostics either way.

DannyG · April 20, 2020

sounds good. Thank you.
another 12 hours or so to go till 100%

trurl · April 20, 2020

10 hours ago, trurl said:

your controller isn't passing the usual complete SMART attributes.

Apparently that is what you get with SAS drives, less than ideal but what we have. Here is another recent post about that:

https://forums.unraid.net/topic/91279-aneely-docker-image-filling-up/?do=findComment&comment=846769

DannyG · April 20, 2020

ok, so Disk 10 has just finish rebuilding.

it's now green, but it's still displaying "Unmountable: No file system"
I stopped the array and started it back up to see if it would change, but it didn't.tower-diagnostics-20200420-1413.zip

I have attached my latest log.
Disk 12 is still RED X.

JorgeB · April 20, 2020

Check filesystem on both:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

DannyG · April 20, 2020

This is what I got after running the disk10 (Disk 12 has the same results) check in maintenance mode with flags -nv

Quote

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now.

Edited April 20, 2020 by DannyG

DannyG · April 20, 2020

ok, tried the repair (just running -v)

this is what I got

Quote

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 15439760 entries sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap ino pointer to 129 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary ino pointer to 130 Phase 2 - using internal log - zero log... zero_log: head block 667834 tail block 667830 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

JorgeB · April 20, 2020

use -L

DannyG · April 20, 2020

Can't I mount the drive first like it's asking? (to replay the log)

Edited April 20, 2020 by DannyG

DannyG · April 20, 2020

I don't want to lose the data on Disk10 - how do I mount the drive? (it looks mounted to me)

-L sounds like I'll destroy the data on the disk.. can you confirm?

itimpi · April 20, 2020

36 minutes ago, DannyG said:

-L sounds like I'll destroy the data on the disk.. can you confirm?

It normally destroys nothing. If anything does get lost it will only be related to the file that was being written at the time the corruption occurred.

DannyG · April 20, 2020

Thank you @itimpi

I used the -L flag

and received the following:

Phase 1 - find and verify superblock...
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap ino pointer to 129
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary ino pointer to 130
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_icount 0, counted 3776
sb_ifree 0, counted 178
sb_fdblocks 976277679, counted 626303125
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:667824) is ahead of log (1:2).
Format log to cycle 4.
done

DannyG · April 20, 2020

Alright, so I stopped the array, and spun it back up.

Disk 10 is back! and it looks like the files are there too.

Any idea why this is happening?

JorgeB · April 21, 2020

7 hours ago, DannyG said:

Any idea why this is happening?

We'd need the diags before rebooting, grab them if it happens again.

DannyG · April 22, 2020

I'l know for next time.

Thank you very much.

DannyG · May 4, 2020

Well.. it didn't take that long, but I'm getting the following errors now:

Unraid array errors: 2020-05-03 04:42
Warning [TOWER] - array has errors
Array has 12 disks with read errors

Here's my diagnostics file (without a reboot)

tower-diagnostics-20200504-0925.zip

trurl · May 4, 2020

Disks 10-20 plus parity2 all disconnected. Are these on the same controller?

JorgeB · May 4, 2020

Looks like a problem with the enclosure, there was a reset and lost communication with all disks:

May  3 04:41:15 Tower kernel: scsi 3:0:13:0: Enclosure         NETAPP   DS424IOM3        0212 PQ: 0 ANSI: 5
May  3 04:41:15 Tower kernel: scsi 3:0:13:0: Power-on or device reset occurred

DannyG · May 4, 2020

Yes, they're all on the same controller.

but all the disk are still showing online on my dashboard.

DannyG · May 4, 2020

trurl · May 4, 2020

Have you considered going to fewer, larger disks to get the same capacity? Larger disks perform better, fewer disks are fewer opportunities for problems.

In fact, it looks like you have enough free space to move all that off the 2TB disks and shrink the array.

DannyG · May 5, 2020

Hi Trurl,

My Server originally had 10x 2TB disk. they are internal disks running off 2x ibm m1015 controllers in IT mode from my Freenas days.

Recently, I got my hands on a Netapp Shelf with some netapp disks. Those disks are the ones showing up as disk10-22 + 2nd parity drive. These are the ones that I'm having reliability issues with.

I believe what you're asking me is "why don't I migrate everything off my reliable 2TB disks to my SAS 4TB drives" to have fewer disks and few problems. I'm just not convinced that that will be the case.

PS: my 2TB drives are old, like 7-9 years old. the Plan is to just let them die and shrink the array as it happens.

Edited May 5, 2020 by DannyG

trurl · May 5, 2020

More than one way to get small old disks out of the array and shrink it.

Since the disks you are having connection problems with are mostly empty, shrink the array to remove them, and then use some of those larger newer disks to rebuild those small old disks onto larger newer disks.

JonathanM · May 6, 2020

23 hours ago, DannyG said:

the Plan is to just let them die and shrink the array as it happens.

The issue with that is if you have one of your critical drives full of content unexpectedly quit, you are relying on known failing drives to rebuild it. Unraid requires ALL drives to be read flawlessly to rebuild a failed drive. You may as well just run without parity, at least that way when a drive dies it doesn't immediately start stressing all the other marginal drives.

I'm not being flippant here, you really do stand a better chance of keeping your data safe if you drop parity and use those two drives plus multiple other drives to keep backup copies of your data. You can set up scheduled copies with the user scripts add on, that way when drives die you have backups.

Keeping questionable drives as members of the parity array will bite you. I lost data that way many years ago when I started with unraid, never again.

2 Unmountable: No file system

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation