2 Unmountable: No file system


Recommended Posts

I found 2 RED X in my array tonight.

Disk 10 & Disk 12 (0f 20)

Disk 10 has data on it, disk 12 is still empty.
If the disks are defective, I can either remove them or replace them with smaller ones. 

These Disks are part of my Netapp shelf. which I've only owned for about 2 month now,  but have "disappeared" from my array before. (what I would notice is all new shows missing from my plex) a Reboot usually brought everything back up, but this is the first time I see an actual error like this.
 

I removed disk 10, and rebooted the server. that didn't do much. I put the disk back in, and now it doing a "parity-sync/data rebuild" not sure if this will cause a data lost.

What should I do?

tower-diagnostics-20200419-2208.zip

Link to comment

Rebuilding an unmountable filesystem usually results in an unmountable filesystem, so the filesystem will probably have to be repaired after the rebuild. If you had asked before starting the rebuild, we could have tried to repair the emulated filesystem before rebuilding it, and maybe we would have had other options at that point.

13 minutes ago, DannyG said:

replace them with smaller ones

Replacing / rebuilding a disk with a smaller disk isn't possible. You could set a New Config and assign any disk you wanted, but then rebuild wouldn't be possible.

 

Disk12 will also have to be rebuilt to make it consistent with parity, but since it is empty it can be formatted again instead of repairing its filesystem.

 

Diagnostics are after reboot so no syslog from when the problems occurred. Also your controller isn't passing the usual complete SMART attributes. I'm not sure how to interpret that but it does indicate disks are OK, just don't know whether or not its idea of OK is the same as ours.

 

Let the disk10 rebuild complete if it will, and post new diagnostics either way.

 

 

 

Link to comment

 

This is what I got after running the disk10 (Disk 12 has the same results) check in maintenance mode with flags -nv

Quote

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now.

Edited by DannyG
Link to comment

ok, tried the repair (just running -v)

this is what I got
 

Quote

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 15439760 entries sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap ino pointer to 129 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary ino pointer to 130 Phase 2 - using internal log - zero log... zero_log: head block 667834 tail block 667830 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

Link to comment

Thank you @itimpi

I used the -L flag

and received the following:

 

Phase 1 - find and verify superblock...
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap ino pointer to 129
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary ino pointer to 130
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_icount 0, counted 3776
sb_ifree 0, counted 178
sb_fdblocks 976277679, counted 626303125
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:667824) is ahead of log (1:2).
Format log to cycle 4.
done

 

Link to comment
  • 2 weeks later...

Looks like a problem with the enclosure, there was a reset and lost communication with all disks:

 

May  3 04:41:15 Tower kernel: scsi 3:0:13:0: Enclosure         NETAPP   DS424IOM3        0212 PQ: 0 ANSI: 5
May  3 04:41:15 Tower kernel: scsi 3:0:13:0: Power-on or device reset occurred

 

Link to comment

Have you considered going to fewer, larger disks to get the same capacity? Larger disks perform better, fewer disks are fewer opportunities for problems.

 

In fact, it looks like you have enough free space to move all that off the 2TB disks and shrink the array.

Link to comment

Hi Trurl,

 

My Server originally had  10x 2TB disk. they are internal disks running off 2x ibm m1015 controllers in IT mode from my Freenas days.

Recently, I got my hands on a Netapp Shelf with some netapp disks. Those disks are the ones showing up as disk10-22 + 2nd parity drive. These are the ones that I'm having reliability issues with.

 

I believe what you're asking me is "why don't I migrate everything off my reliable 2TB disks to my SAS 4TB drives" to have fewer disks and few problems. I'm just not convinced that that will be the case.

 

PS: my 2TB drives are old, like 7-9 years old. the Plan is to just let them die and shrink the array as it happens.

Edited by DannyG
Link to comment

More than one way to get small old disks out of the array and shrink it.

 

Since the disks you are having connection problems with are mostly empty, shrink the array to remove them, and then use some of those larger newer disks to rebuild those small old disks onto larger newer disks.

Link to comment
23 hours ago, DannyG said:

the Plan is to just let them die and shrink the array as it happens.

The issue with that is if you have one of your critical drives full of content unexpectedly quit, you are relying on known failing drives to rebuild it. Unraid requires ALL drives to be read flawlessly to rebuild a failed drive. You may as well just run without parity, at least that way when a drive dies it doesn't immediately start stressing all the other marginal drives.

 

I'm not being flippant here, you really do stand a better chance of keeping your data safe if you drop parity and use those two drives plus multiple other drives to keep backup copies of your data. You can set up scheduled copies with the user scripts add on, that way when drives die you have backups.

 

Keeping questionable drives as members of the parity array will bite you. I lost data that way many years ago when I started with unraid, never again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.