Previously working cache currently "unmountable: not mounted" after hanging due to (?) full syslog file (SOLVED)

Tzundoku · June 10, 2021

I hope I'll make sense here,

Uptime was uninterrupted since upgrading to 6.9.1 (stable launch) until all of a sudden I couldn't access containers remotely. Upon checking I noticed a full syslog, docker/VMs down. Restarted and the cache (xfs) was then showing as an Unassigned device- as soon as I reassigned it to the pool I got the error shown in the title.

I attempted to follow this guide, and after mounting there's only a single folder left with some of my NC data. Taking the cache off the array, it appears as it's btrfs with 0 bytes used/0 free until formatted.

Any idea what happened here? Any chance I can recover the data?

Thanks for your time!

tower-diagnostics-20210609-2209.zip

Edited June 12, 2021 by Tzundoku

JorgeB · June 10, 2021

34 minutes ago, Tzundoku said:

I attempted to follow this guide,

That guide if for btrfs, not xfs.

The NVMe device dropped offline:

Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 998 QID 21 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 999 QID 21 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 1000 QID 21 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 968 QID 5 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 969 QID 5 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 934 QID 12 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 935 QID 12 timeout, aborting
Jun  7 15:19:21 Tower kernel: nvme nvme0: I/O 936 QID 12 timeout, aborting
Jun  7 15:19:51 Tower kernel: nvme nvme0: I/O 968 QID 5 timeout, reset controller
Jun  7 15:20:21 Tower kernel: nvme nvme0: I/O 12 QID 0 timeout, reset controller
Jun  7 15:21:15 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Jun  7 15:21:15 Tower kernel: nvme nvme0: Abort status: 0x371
### [PREVIOUS LINE REPEATED 7 TIMES] ###
Jun  7 15:21:36 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Jun  7 15:21:36 Tower kernel: nvme nvme0: Removing after probe failure status: -19
Jun  7 15:21:56 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Jun  7 15:21:56 Tower kernel: XFS (nvme0n1p1): log I/O error -5

Post diags after rebooting.

Tzundoku · June 10, 2021

41 minutes ago, JorgeB said:

That guide if for btrfs, not xfs.

The NVMe device dropped offline:

Post diags after rebooting.

Appreciate the prompt reply.

Noticed its for btrfs, tried to see if I could rescue the data since the drive is currently showing as btrfs (no manual format attempted).

Attaching current diags, after reboot.

tower-diagnostics-20210610-2238.zip

JorgeB · June 11, 2021

Check filesystem to see if it still can be fixed:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Tzundoku · June 12, 2021

On 6/11/2021 at 9:13 AM, JorgeB said:

Check filesystem to see if it still can be fixed:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Thanks a bunch.

Tried as per the guide, came up with this:

Quote

Phase 1 - find and verify superblock... - block cache size set to 1536600 entries Phase 2 - using internal log - zero log... zero_log: head block 277078 tail block 276881 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agf_freeblks 18594701, counted 18594717 in ag 1 sb_icount 124608, counted 150912 sb_ifree 3051, counted 373 sb_fdblocks 75725874, counted 44572093 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 data fork in ino 270006436 claims free block 33749787 imap claims in-use inode 270006436 is free, correcting imap data fork in ino 270060728 claims free block 33757579 - agno = 2 bad nblocks 6502724 for inode 539059680, would reset to 6502740 bad nextents 154002 for inode 539059680, would reset to 154000 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... free space (1,185726-185741) only seen by one free space btree - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 bad nblocks 6502724 for inode 539059680, would reset to 6502740 bad nextents 154002 for inode 539059680, would reset to 154000 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 270006436, would move to lost+found disconnected inode 270060730, would move to lost+found Phase 7 - verify link counts... would have reset inode 270060728 nlinks from 1 to 2 No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Sat Jun 12 13:31:34 2021 Phase Start End Duration Phase 1: 06/12 13:31:33 06/12 13:31:33 Phase 2: 06/12 13:31:33 06/12 13:31:33 Phase 3: 06/12 13:31:33 06/12 13:31:34 1 second Phase 4: 06/12 13:31:34 06/12 13:31:34 Phase 5: Skipped Phase 6: 06/12 13:31:34 06/12 13:31:34 Phase 7: 06/12 13:31:34 06/12 13:31:34 Total run time: 1 second

I might have misled you as I got confused over what I was seeing: the drive I noticed was btrfs wasn't the nvme cache, the actual nvme cache (similar model to the btrfs one) wasn't appearing in any of my devices (since it dropped offline apparently) and I didn't realize I was looking at the wrong drive.

The nvme cache is currently showing (after powering off and taking the server off power for a while), but it appears as a new device if I attempt to set it to the cache pool.

The above filesystem status corresponds to the correct nvme drive.

JorgeB · June 12, 2021

7 minutes ago, Tzundoku said:

The nvme cache is currently showing (after powering off and taking the server off power for a while), but it appears as a new device if I attempt to set it to the cache pool.

That's OK, as long as there's no warning on the right side that "all data on this device will be deleted at array start" you can just start the array, if it doesn't mount run another filesystem check but without -n.

Tzundoku · June 12, 2021

1 hour ago, JorgeB said:

That's OK, as long as there's no warning on the right side that "all data on this device will be deleted at array start" you can just start the array, if it doesn't mount run another filesystem check but without -n.

That worked!

Previously working cache currently "unmountable: not mounted" after hanging due to (?) full syslog file (SOLVED)

Recommended Posts

Tzundoku

Link to comment

JorgeB

Link to comment

Tzundoku

Link to comment

JorgeB

Link to comment

Tzundoku

Link to comment

JorgeB

Link to comment

Tzundoku

Link to comment

Join the conversation