xfs error - Help sorely needed


Recommended Posts

Hey folks:  I really need some help here.

 

I had a failed drive so I bought a new one and replaced it.  The drive rebuilt fine and was humming a ong fine for a day and then I noticed some weird behavior in logs:

 

 

Mar  4 03:40:10 Hog logger: *** Skipping any contents from this failed directory ***

ar  4 03:40:10 Hog kernel: XFS (md8): Internal error XFS_WANT_CORRUPTED_RETURN at line 1137 of file fs/xfs/libxfs/xfs_ialloc.c.  Caller xfs_dialloc_ag+0x195/0x248

Mar  4 03:40:10 Hog kernel: CPU: 1 PID: 23476 Comm: shfs Not tainted 4.1.17-unRAID #1

Mar  4 03:40:10 Hog kernel: Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H-BK/Z97X-UD5H-BK, BIOS F7 04/21/2015

Mar  4 03:40:10 Hog kernel: ffff8800086f3b88 ffff8800086f3ac8 ffffffff815f1df0 ffff88041fa50a01

Mar  4 03:40:10 Hog kernel: ffff88040a53d790 ffff8800086f3ae8 ffffffff81260934 ffffffff81253028

Mar  4 03:40:10 Hog kernel: ffffffff81251b13 ffff8800086f3b38 ffffffff8125207e ffff8800cb86b000

Mar  4 03:40:10 Hog kernel: Call Trace:

Mar  4 03:40:10 Hog kernel: [<ffffffff815f1df0>] dump_stack+0x4c/0x6e

Mar  4 03:40:10 Hog kernel: [<ffffffff81260934>] xfs_error_report+0x38/0x3a

Mar  4 03:40:10 Hog kernel: [<ffffffff81253028>] ? xfs_dialloc_ag+0x195/0x248

Mar  4 03:40:10 Hog kernel: [<ffffffff81251b13>] ? xfs_inobt_lookup+0x22/0x24

Mar  4 03:40:10 Hog kernel: [<ffffffff8125207e>] xfs_dialloc_ag_update_inobt+0xbd/0xdb

Mar  4 03:40:10 Hog kernel: [<ffffffff81253028>] xfs_dialloc_ag+0x195/0x248

Mar  4 03:40:10 Hog kernel: [<ffffffff81253d0d>] xfs_dialloc+0x1d6/0x1f5

Mar  4 03:40:10 Hog kernel: [<ffffffff8126b564>] xfs_ialloc+0x4b/0x46f

Mar  4 03:40:10 Hog kernel: [<ffffffff81275097>] ? xlog_grant_head_check+0x4b/0xc7

Mar  4 03:40:10 Hog kernel: [<ffffffff8126b9e2>] xfs_dir_ialloc+0x5a/0x1fb

Mar  4 03:40:10 Hog kernel: [<ffffffff8126be24>] xfs_create+0x261/0x485

Mar  4 03:40:10 Hog kernel: [<ffffffff81269124>] xfs_generic_create+0xb2/0x237

Mar  4 03:40:10 Hog kernel: [<ffffffff8113b2e8>] ? get_acl+0x12/0x4f

Mar  4 03:40:10 Hog kernel: [<ffffffff812692ce>] xfs_vn_mknod+0xf/0x11

Mar  4 03:40:10 Hog kernel: [<ffffffff812692e1>] xfs_vn_mkdir+0x11/0x13

Mar  4 03:40:10 Hog kernel: [<ffffffff81105fd6>] vfs_mkdir+0x6e/0xa8

Mar  4 03:40:10 Hog kernel: [<ffffffff8110a72f>] SyS_mkdirat+0x6d/0xab

Mar  4 03:40:10 Hog kernel: [<ffffffff8110a781>] SyS_mkdir+0x14/0x16

Mar  4 03:40:10 Hog kernel: [<ffffffff815f74ee>] system_call_fastpath+0x12/0x71

Mar  4 03:40:10 Hog kernel: XFS (md8): Internal error xfs_trans_cancel at line 1007 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x3de/0x485

Mar  4 03:40:10 Hog kernel: CPU: 1 PID: 23476 Comm: shfs Not tainted 4.1.17-unRAID #1

Mar  4 03:40:10 Hog kernel: Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H-BK/Z97X-UD5H-BK, BIOS F7 04/21/2015

Mar  4 03:40:10 Hog kernel: 000000000000000c ffff8800086f3cf8 ffffffff815f1df0 0000000000000000

Mar  4 03:40:10 Hog kernel: ffff880098ce2cb0 ffff8800086f3d18 ffffffff81260934 ffffffff8126bfa1

Mar  4 03:40:10 Hog kernel: 00ff8800cb86b000 ffff8800086f3d48 ffffffff812744a3 ffff8800cb86b001

Mar  4 03:40:10 Hog kernel: Call Trace:

Mar  4 03:40:10 Hog kernel: [<ffffffff815f1df0>] dump_stack+0x4c/0x6e

Mar  4 03:40:10 Hog kernel: [<ffffffff81260934>] xfs_error_report+0x38/0x3a

Mar  4 03:40:10 Hog kernel: [<ffffffff8126bfa1>] ? xfs_create+0x3de/0x485

Mar  4 03:40:10 Hog kernel: [<ffffffff812744a3>] xfs_trans_cancel+0x5b/0xda

Mar  4 03:40:10 Hog kernel: [<ffffffff8126bfa1>] xfs_create+0x3de/0x485

Mar  4 03:40:10 Hog kernel: [<ffffffff81269124>] xfs_generic_create+0xb2/0x237

Mar  4 03:40:10 Hog kernel: [<ffffffff8113b2e8>] ? get_acl+0x12/0x4f

Mar  4 03:40:10 Hog kernel: [<ffffffff812692ce>] xfs_vn_mknod+0xf/0x11

Mar  4 03:40:10 Hog kernel: [<ffffffff812692e1>] xfs_vn_mkdir+0x11/0x13

Mar  4 03:40:10 Hog kernel: [<ffffffff81105fd6>] vfs_mkdir+0x6e/0xa8

Mar  4 03:40:10 Hog kernel: [<ffffffff8110a72f>] SyS_mkdirat+0x6d/0xab

Mar  4 03:40:10 Hog kernel: [<ffffffff8110a781>] SyS_mkdir+0x14/0x16

Mar  4 03:40:10 Hog kernel: [<ffffffff815f74ee>] system_call_fastpath+0x12/0x71

Mar  4 03:40:10 Hog kernel: XFS (md8): xfs_do_force_shutdown(0x8) called from line 1008 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffff812744bc

Mar  4 03:40:10 Hog kernel: XFS (md8): Corruption of in-memory data detected.  Shutting down filesystem

Mar  4 03:40:10 Hog kernel: XFS (md8): Please umount the filesystem and rectify the problem(s)

Mar  4 03:40:10 Hog logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

 

 

And then I see a ton of these when trying to access files on the restored drive:

Mar  4 04:37:14 Hog kernel: XFS (md8): xfs_log_force: error -5 returned.

Mar  4 04:37:44 Hog kernel: XFS (md8): xfs_log_force: error -5 returned.

Mar  4 04:38:14 Hog kernel: XFS (md8): xfs_log_force: error -5 returned.

Mar  4 04:38:44 Hog kernel: XFS (md8): xfs_log_force: error -5 returned.

Mar  4 04:39:14 Hog kernel: XFS (md8): xfs_log_force: error -5 returned.

Mar  4 04:39:44 Hog kernel: XFS (md8): xfs_log_force: error -5 returned.

 

 

What really weird if I reboot the box, it works again for a while.  All Web UI elemets report the drive is healthy with no issies.

 

Full log attached.

hog-diagnostics-20160304-0916.zip

Link to comment

Attached is the output from the check.

 

I am curious, is it possible that because the drive failed when rebuilding my parity drive if bad data was written to parity and then when restored is restored bad data that makes it appear the FS is corrupted?

 

----

node allocation btrees are too corrupted, skipping phases 6 and 7

No modify flag set, skipping filesystem flush and exiting.

----

xfs_check.txt

Link to comment

Corruption is more likely to occur when a disk fails and some garbage is written to parity, it could be also that your parity was not completely synced, if you don’t run regular parity checks, or a number of different reasons.

 

If the failed drive is still readable you can check it on a test server.

 

 

 

Link to comment

I think the failed drive is mostly readable for the data I care about saving.  Most I can lose.

 

Do anyone have a link that explains the process to basically wipe this drive to clear both the drive and reset /md8 data on the parity drive and then copy over what I want to keep?

 

I am thinking just a format of /md8 would work and then rebuild parity, finally copy over what I want to keep from previous failed drive from another box.

 

--

gs

Link to comment

If you don’t care for the data on disk8, you can format it while maintaining parity, with the array stopped, click on disk8, change filesystem to reiserfs, start array, format disk8 (this will delete all data on disk8), stop array, change fs back to xfs, start array, re-format disk to xfs and copy data from old disk, parity is still valid.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.