Jump to content

[SOLVED] Sudden sync errors and missing data, unmountable disk - help?


Recommended Posts

Hi all,

Something strange is happening with my array and I'm not sure if I should be very worried or what to do.

Edit: I'm running 6.8.3 latest stable, no changes for some time now.

 

So today I ran a monthly parity check on my 34TB array.

A little over 9 18 hours in, I checked the status and saw that there were 450 errors listed.

Checked the current log and looks like one of my disks (Disk 4 / ata7) was playing up and having an issue.

 

I stopped the parity check, stopped the array and checked the log of the disk.

It looked like the disk was having some kind of initialisation error, but I foolishly didn't take a screenshot or note.

 

I brought the array back online, and saw that the reported usage was the same. Disk 4 still having issues.

When accessing the array over LAN, I noticed many files missing, and that my VMs wouldn't start.

Apparently there appeared to be many files and directories missing, despite the reported array size being correct.

The VMs would not start because files like the GPU bios and virtio-win-0.1.173-2.iso image were missing.

 

At this point I decided to completely shut down the system and leave it for a little bit, then start up clean.

Now the array is mounted, Disk 4 is showing "Unmountable: No file system", with the option to format the disk available further down.

 

The files missing were still missing, but after a short time seemed to have reappeared. I haven't verified everything.

The array usage now seems to reports what looks like incorrect total usage:

image.thumb.png.53d51172db98ff9603d5e5d5d1398488.png

 

image.thumb.png.b9c8294363ddd118a11f3a130a0694c2.png

 

image.png.aef36548039037d0c97fcd0571d4b3c6.png

 

Any advice on what I should or can do?

 

Thanks for any help.

 

 

blaster-diagnostics-20201001-2003.zip

Edited by KptnKMan
Link to comment

Ran the test in Maintenance mode.

 

Ran test with -nv options

 

Results:

Phase 1 - find and verify superblock...
        - block cache size set to 3043288 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1126679 tail block 1126656
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
sb_fdblocks 278251709, counted 279719268
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
data fork in ino 1567417 claims free block 195322
data fork in ino 1567417 claims free block 195323
data fork in ino 1567419 claims free block 250450
data fork in ino 1567419 claims free block 250451
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
data fork in ino 12884902030 claims free block 1610613854
data fork in ino 12884902030 claims free block 1610613855
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 7
        - agno = 6
        - agno = 0
        - agno = 1
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (3:1137604) is ahead of log (3:1126679).
Would format log to cycle 6.
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Thu Oct  1 20:31:14 2020

Phase		Start		End		Duration
Phase 1:	10/01 20:30:17	10/01 20:30:18	1 second
Phase 2:	10/01 20:30:18	10/01 20:30:18
Phase 3:	10/01 20:30:18	10/01 20:30:50	32 seconds
Phase 4:	10/01 20:30:50	10/01 20:30:50
Phase 5:	Skipped
Phase 6:	10/01 20:30:50	10/01 20:31:14	24 seconds
Phase 7:	10/01 20:31:14	10/01 20:31:14

Total run time: 57 seconds

 

Link to comment

Ok thanks, running without anything produced this response.

I'll try again with -L as advised and listed in response:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Link to comment

Check complete using -L

 

Results:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_fdblocks 278251709, counted 279719268
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
data fork in ino 1567417 claims free block 195322
data fork in ino 1567417 claims free block 195323
data fork in ino 1567419 claims free block 250450
data fork in ino 1567419 claims free block 250451
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
data fork in ino 12884902030 claims free block 1610613854
data fork in ino 12884902030 claims free block 1610613855
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 1
        - agno = 5
        - agno = 4
        - agno = 7
        - agno = 3
        - agno = 6
        - agno = 0
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (3:1137604) is ahead of log (1:2).
Format log to cycle 6.
done

 

Link to comment

Well I ran the check with -nv as recommended by documentation.

 

Result before I start the array normally:

Phase 1 - find and verify superblock...
        - block cache size set to 3043288 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 0 tail block 0
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 1
        - agno = 3
        - agno = 4
        - agno = 7
        - agno = 5
        - agno = 6
        - agno = 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Thu Oct  1 20:45:37 2020

Phase		Start		End		Duration
Phase 1:	10/01 20:44:38	10/01 20:44:40	2 seconds
Phase 2:	10/01 20:44:40	10/01 20:44:40
Phase 3:	10/01 20:44:40	10/01 20:45:13	33 seconds
Phase 4:	10/01 20:45:13	10/01 20:45:13
Phase 5:	Skipped
Phase 6:	10/01 20:45:13	10/01 20:45:37	24 seconds
Phase 7:	10/01 20:45:37	10/01 20:45:37

Total run time: 59 seconds

 

Link to comment
  • JorgeB changed the title to [SOLVED] Sudden sync errors and missing data, unmountable disk - help?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...