Tzundoku Posted June 10, 2021 Share Posted June 10, 2021 (edited) I hope I'll make sense here, Uptime was uninterrupted since upgrading to 6.9.1 (stable launch) until all of a sudden I couldn't access containers remotely. Upon checking I noticed a full syslog, docker/VMs down. Restarted and the cache (xfs) was then showing as an Unassigned device- as soon as I reassigned it to the pool I got the error shown in the title. I attempted to follow this guide, and after mounting there's only a single folder left with some of my NC data. Taking the cache off the array, it appears as it's btrfs with 0 bytes used/0 free until formatted. Any idea what happened here? Any chance I can recover the data? Thanks for your time! tower-diagnostics-20210609-2209.zip Edited June 12, 2021 by Tzundoku Quote Link to comment
JorgeB Posted June 10, 2021 Share Posted June 10, 2021 34 minutes ago, Tzundoku said: I attempted to follow this guide, That guide if for btrfs, not xfs. The NVMe device dropped offline: Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 998 QID 21 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 999 QID 21 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 1000 QID 21 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 968 QID 5 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 969 QID 5 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 934 QID 12 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 935 QID 12 timeout, aborting Jun 7 15:19:21 Tower kernel: nvme nvme0: I/O 936 QID 12 timeout, aborting Jun 7 15:19:51 Tower kernel: nvme nvme0: I/O 968 QID 5 timeout, reset controller Jun 7 15:20:21 Tower kernel: nvme nvme0: I/O 12 QID 0 timeout, reset controller Jun 7 15:21:15 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1 Jun 7 15:21:15 Tower kernel: nvme nvme0: Abort status: 0x371 ### [PREVIOUS LINE REPEATED 7 TIMES] ### Jun 7 15:21:36 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1 Jun 7 15:21:36 Tower kernel: nvme nvme0: Removing after probe failure status: -19 Jun 7 15:21:56 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1 Jun 7 15:21:56 Tower kernel: XFS (nvme0n1p1): log I/O error -5 Post diags after rebooting. Quote Link to comment
Tzundoku Posted June 10, 2021 Author Share Posted June 10, 2021 41 minutes ago, JorgeB said: That guide if for btrfs, not xfs. The NVMe device dropped offline: Post diags after rebooting. Appreciate the prompt reply. Noticed its for btrfs, tried to see if I could rescue the data since the drive is currently showing as btrfs (no manual format attempted). Attaching current diags, after reboot. tower-diagnostics-20210610-2238.zip Quote Link to comment
Solution JorgeB Posted June 11, 2021 Solution Share Posted June 11, 2021 Check filesystem to see if it still can be fixed: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote Link to comment
Tzundoku Posted June 12, 2021 Author Share Posted June 12, 2021 On 6/11/2021 at 9:13 AM, JorgeB said: Check filesystem to see if it still can be fixed: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Thanks a bunch. Tried as per the guide, came up with this: Quote Phase 1 - find and verify superblock... - block cache size set to 1536600 entries Phase 2 - using internal log - zero log... zero_log: head block 277078 tail block 276881 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agf_freeblks 18594701, counted 18594717 in ag 1 sb_icount 124608, counted 150912 sb_ifree 3051, counted 373 sb_fdblocks 75725874, counted 44572093 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 data fork in ino 270006436 claims free block 33749787 imap claims in-use inode 270006436 is free, correcting imap data fork in ino 270060728 claims free block 33757579 - agno = 2 bad nblocks 6502724 for inode 539059680, would reset to 6502740 bad nextents 154002 for inode 539059680, would reset to 154000 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... free space (1,185726-185741) only seen by one free space btree - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 bad nblocks 6502724 for inode 539059680, would reset to 6502740 bad nextents 154002 for inode 539059680, would reset to 154000 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 270006436, would move to lost+found disconnected inode 270060730, would move to lost+found Phase 7 - verify link counts... would have reset inode 270060728 nlinks from 1 to 2 No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Sat Jun 12 13:31:34 2021 Phase Start End Duration Phase 1: 06/12 13:31:33 06/12 13:31:33 Phase 2: 06/12 13:31:33 06/12 13:31:33 Phase 3: 06/12 13:31:33 06/12 13:31:34 1 second Phase 4: 06/12 13:31:34 06/12 13:31:34 Phase 5: Skipped Phase 6: 06/12 13:31:34 06/12 13:31:34 Phase 7: 06/12 13:31:34 06/12 13:31:34 Total run time: 1 second I might have misled you as I got confused over what I was seeing: the drive I noticed was btrfs wasn't the nvme cache, the actual nvme cache (similar model to the btrfs one) wasn't appearing in any of my devices (since it dropped offline apparently) and I didn't realize I was looking at the wrong drive. The nvme cache is currently showing (after powering off and taking the server off power for a while), but it appears as a new device if I attempt to set it to the cache pool. The above filesystem status corresponds to the correct nvme drive. Quote Link to comment
JorgeB Posted June 12, 2021 Share Posted June 12, 2021 7 minutes ago, Tzundoku said: The nvme cache is currently showing (after powering off and taking the server off power for a while), but it appears as a new device if I attempt to set it to the cache pool. That's OK, as long as there's no warning on the right side that "all data on this device will be deleted at array start" you can just start the array, if it doesn't mount run another filesystem check but without -n. Quote Link to comment
Tzundoku Posted June 12, 2021 Author Share Posted June 12, 2021 1 hour ago, JorgeB said: That's OK, as long as there's no warning on the right side that "all data on this device will be deleted at array start" you can just start the array, if it doesn't mount run another filesystem check but without -n. That worked! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.