Jump to content

Cache Drive FS corruption, repair?


Recommended Posts

I woke up this morning to docker and VM manager being down. The cache drive (xfs fs), which these files are stored on is now "unmountable" (wrong or no file system). 

 

I did put the array into maintenance mode and attempt to hit repair. You can see the results below. My backup of it is from a few months ago, so I can recover most all of what was on it, but if possible I would definitely like to get this functional again. On the "Main" menu it reads out the temp, says active and "Healthy". I'm not sure if these are accurate or just last readings of the disk. I was on 6.10.0. I did push the update today (disk failed before this) to 6.10.3.

 

Any help would be greatly appreciated. I use the VM for work and have lots of other things down that are somewhat important and time sensitive.

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
agi unlinked bucket 23 is 73394135 in ag 1 (inode=1147135959)
sb_icount 1021248, counted 1021376
sb_ifree 8167, counted 6971
sb_fdblocks 164425577, counted 166053595
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 1147135959, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 1147135959 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.

 

Edited by live4soccer7
Link to comment
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
agi unlinked bucket 23 is 73394135 in ag 1 (inode=1147135959)
sb_icount 1021248, counted 1021376
sb_ifree 8167, counted 6971
sb_fdblocks 164425577, counted 166053595
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 0
clearing reflink flag on inodes when possible
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 1147135959, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (72:1879323) is ahead of log (1:2).
Format log to cycle 75.
xfs_repair: Flushing the data device failed, err=61!
Cannot clear needsrepair due to flush failure, err=61.
xfs_repair: Flushing the data device failed, err=61!

fatal error -- File system metadata writeout failed, err=61.  Re-run xfs_repair.

 

Link to comment
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
xfs_repair: Flushing the data device failed, err=61!
Cannot clear needsrepair due to flush failure, err=61.
xfs_repair: Flushing the data device failed, err=61!

fatal error -- File system metadata writeout failed, err=61.  Re-run xfs_repair.

 

Ran without any flags a couple times with this. It's always possible a log or something filled it up and caused this problem.

 

Link to comment
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.46-Unraid] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 980 PRO 2TB
Serial Number:                      S6B0NG0R405728R
Firmware Version:                   2B2QGXA7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            1,496,877,862,912 [1.49 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 b41150549f
Local Time is:                      Thu Jun 30 09:39:16 2022 PDT
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.49W       -        -    0  0  0  0        0       0
 1 +     4.48W       -        -    1  1  1  1        0     200
 2 +     3.18W       -        -    2  2  2  2        0    1000
 3 -   0.0400W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
- media has been placed in read only mode

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x09
Temperature:                        36 Celsius
Available Spare:                    0%
Available Spare Threshold:          10%
Percentage Used:                    56%
Data Units Read:                    3,017,526,991 [1.54 PB]
Data Units Written:                 2,839,436,501 [1.45 PB]
Host Read Commands:                 5,464,158,312
Host Write Commands:                4,063,944,349
Controller Busy Time:               45,841
Power Cycles:                       457
Power On Hours:                     4,340
Unsafe Shutdowns:                   28
Media and Data Integrity Errors:    9,994
Error Information Log Entries:      9,994
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               36 Celsius
Temperature Sensor 2:               49 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

 

Link to comment
4 minutes ago, live4soccer7 said:

Could the "failed" status have any to do with the unmountable drive/fs?

 

9 minutes ago, live4soccer7 said:
- media has been placed in read only mode

The device is in read-only mode, that's why xfs_repair is failing to write the corrections, you'll need to replace it and restore data from backups if available.

Link to comment
6 minutes ago, live4soccer7 said:

What would cause it to go into read-only mode?

 

31 minutes ago, live4soccer7 said:
- available spare has fallen below threshold

 

 

7 minutes ago, live4soccer7 said:

If it is in read only mode, can I extract data off it still?

It would be easy if the filesystem was still mounting, since it's not and it can't be fixed there are basically two options: use a file recovery util like UFS explorer or clone it to another device then run xfs_repair.

Link to comment

Flash devices come with some spare cells to replace ones that turn bad, for that device once the spare space gets below 10% you'd get a SMART warning, it's now at 0% and I assume the reason why the device is read-only.

 

Also note that according to this the device was just a little half way past predicted life, but this is just an indication, I have one currently at 187% and still going strong.

 

1 hour ago, live4soccer7 said:
Percentage Used:                    56%

 

Link to comment
12 hours ago, live4soccer7 said:

If using a recovery tool like UFS, would this impair the ability to clone the drive?

No, but if you have a spare device available cloning would be my first option, no need to buy another program, just note that if cloning to a larger device it won't mount with Unraid, but you can use UD.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...