How screwed am I? io errors everywhere


15 posts in this topic Last Reply

Recommended Posts

So had some heat issues today, some disks hit 60c before I realised.

 

Anyway, I sorted but found some strange behaviour, but primary io write errors from smb, so loaded up logs and found a lot of issues. took array down and up again, no go. rebooted in maintainence and found disk14 reported xfs_check issues, but then after leaving it for a while and checking logs it's filled with below.

 

image.thumb.png.49052d0aaf1fe8a56b8c6357601ed6db.png

 

 

So.... how bad is it?  looking at docs I should run xfs_check -V /dev/sdX ,which I tried with disk14 which was only one that actually reported issue in webgui using xfs check.

 

 

But that's been running for past 15mins trying to find 2ndary superblocks in filesystem

image.thumb.png.22ed0cbd01a74fba946aa31d5a6398c8.png

 

 

 

So, help please :S

Link to post

xfs_repair in webgui, after 2nd run of -nv said, "just start the array up bro, it'll be good", as doubtful as I was, I tried it and so far... it seems okay. but scrubbing btrfs cache as well just in case.

 

edit;

 

ye okay, spoke too soon.

 

image.thumb.png.b1953c8cc8d15c63b8afb296ec02b543.png

 

thoughts on best action? I'll try to unmount again and repair but doubt it'll work

Edited by Mizerka
Link to post
39 minutes ago, Mizerka said:

run xfs_check -V /dev/sdX

You can't repair the sdX device, and you shouldn't repair the sdX1 partition of a disk in the array, or you will invalidate parity. You must run it on the md# device.

10 minutes ago, Mizerka said:

xfs_repair in webgui

If you do it from the webUI then it will do it correctly.

Link to post
5 minutes ago, trurl said:

You can't repair the sdX device, and you shouldn't repair the sdX1 partition of a disk in the array, or you will invalidate parity. You must run it on the md# device.

If you do it from the webUI then it will do it correctly.

I see, the doc specifies either can be used, I'll try now with md# instead, to confirm if I want to run it from webui I should just change -nv to -v?

 

trying to run from webui now only displays this;

image.png.24b3d15d1f29c22cb0473588c8bd7b27.png

 

not allowing any actions

 

using md# also returns device is busy

 

image.png.ac3458509b83a692e8b09f6e19b2a6cd.png

Edited by Mizerka
Link to post
8 minutes ago, trurl said:

Also, see this section on that same wiki page:

 

https://wiki.unraid.net/Check_Disk_Filesystems#Drive_names_and_symbols

okay, ye makes sense, so run it against md# instead, I've gone back to maintenance and I'm getting the errors in edit, md14 is saying drive busy and webui refuses to run beyond -n/-nv

 

I've tried to run repair, but it never got past saying magic number failed and trying to find secondary superblock

 

which outputs this if it helps

 

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x43c89d, xfs_bnobt block 0x3a381e28/0x1000
Metadata CRC error detected at 0x43c89d, xfs_bnobt block 0x74703c48/0x1000
btree block 1/1 is suspect, error -74
btree block 2/1 is suspect, error -74
bad magic # 0xdaa0086c in btbno block 1/1
bad magic # 0x2fdfba35 in btbno block 2/1
Metadata CRC error detected at 0x43c89d, xfs_cntbt block 0x3a381e30/0x1000
btree block 1/2 is suspect, error -74
bad magic # 0x419e48e9 in btcnt block 1/2
agf_freeblks 122094523, counted 0 in ag 1
agf_longest 122094523, counted 0 in ag 1
Metadata CRC error detected at 0x43c89d, xfs_cntbt block 0x74703c50/0x1000
btree block 2/2 is suspect, error -74
bad magic # 0xa8692ca5 in btcnt block 2/2
agf_freeblks 121856058, counted 0 in ag 2
agf_longest 121856058, counted 0 in ag 2
Metadata CRC error detected at 0x46ad5d, xfs_inobt block 0x3a381e38/0x1000
btree block 1/3 is suspect, error -74
Metadata CRC error detected at 0x46ad5d, xfs_inobt block 0x74703c58/0x1000
bad magic # 0x639e272e in inobt block 1/3
btree block 2/3 is suspect, error -74
bad magic # 0x796a2ce3 in inobt block 2/3
Metadata CRC error detected at 0x46ad5d, xfs_inobt block 0xaea85a78/0x1000
btree block 3/3 is suspect, error -74
bad magic # 0x15f1f03 in inobt block 3/3
sb_ifree 59, counted 44
sb_fdblocks 2926555418, counted 2681574888
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 4
        - agno = 3
        - agno = 14
        - agno = 22
        - agno = 8
        - agno = 9
        - agno = 5
        - agno = 6
        - agno = 10
        - agno = 12
        - agno = 15
        - agno = 16
        - agno = 13
        - agno = 17
        - agno = 18
        - agno = 2
        - agno = 21
        - agno = 7
        - agno = 19
        - agno = 20
        - agno = 23
        - agno = 11
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
Maximum metadata LSN (904557511:-555599277) is ahead of log (1:6247).
Would format log to cycle 904557514.
No modify flag set, skipping filesystem flush and exiting.

 

Edited by Mizerka
Link to post
34 minutes ago, Mizerka said:

xfs_repair in webgui, after 2nd run of -nv said, "just start the array up bro, it'll be good"

What did it actually say?

 

The -n (nomodify) flag means check but don't repair anything.

Link to post
1 minute ago, trurl said:

What did it actually say?

 

The -n (nomodify) flag means check but don't repair anything.

running webgui with a -v flag gives this output;

 

Phase 1 - find and verify superblock...
        - block cache size set to 6097840 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 6247 tail block 6235
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Link to post
47 minutes ago, trurl said:

What did it actually say?

 

The -n (nomodify) flag means check but don't repair anything.

After looking around forums a bit more came across similar post,

mod advised to run against /dev/mapper/md# if drives are encrypted (all of mine are btw), then to -L it.

 

which spits out this output, same as webui

image.png.3626c57d8f7b855664a4f95f75d18167.png

 

Clearly it wants me to run with -L but that sounds destructive? It's a 12tb mostly filled, I'd really hate to lose it, at this point I'd almost be better to remove it and let parity emulate it probably and move data around before reformatting and adding back to array?

Edited by Mizerka
Link to post

okay, so I think I'm good now, ended up booting back into full array with md14 mounted, moved all data off of it without issues, then went back into maintenance and could now run -v, once complete I've started array again and seems good fine for last 20mins or so, crisis averted for now. if it didn't -v I'd probably -L and just reformat it if it corrupts the filesystem.

Link to post

Running it from the webUI on encrypted drives should still do the correct thing; i.e., 

1 hour ago, Mizerka said:

run against /dev/mapper/md#

-L is usually necessary, since the log can't be used if the disk is unmountable.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.