magmpzero Posted March 4, 2016 Share Posted March 4, 2016 Hey folks: I really need some help here. I had a failed drive so I bought a new one and replaced it. The drive rebuilt fine and was humming a ong fine for a day and then I noticed some weird behavior in logs: Mar 4 03:40:10 Hog logger: *** Skipping any contents from this failed directory *** ar 4 03:40:10 Hog kernel: XFS (md8): Internal error XFS_WANT_CORRUPTED_RETURN at line 1137 of file fs/xfs/libxfs/xfs_ialloc.c. Caller xfs_dialloc_ag+0x195/0x248 Mar 4 03:40:10 Hog kernel: CPU: 1 PID: 23476 Comm: shfs Not tainted 4.1.17-unRAID #1 Mar 4 03:40:10 Hog kernel: Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H-BK/Z97X-UD5H-BK, BIOS F7 04/21/2015 Mar 4 03:40:10 Hog kernel: ffff8800086f3b88 ffff8800086f3ac8 ffffffff815f1df0 ffff88041fa50a01 Mar 4 03:40:10 Hog kernel: ffff88040a53d790 ffff8800086f3ae8 ffffffff81260934 ffffffff81253028 Mar 4 03:40:10 Hog kernel: ffffffff81251b13 ffff8800086f3b38 ffffffff8125207e ffff8800cb86b000 Mar 4 03:40:10 Hog kernel: Call Trace: Mar 4 03:40:10 Hog kernel: [<ffffffff815f1df0>] dump_stack+0x4c/0x6e Mar 4 03:40:10 Hog kernel: [<ffffffff81260934>] xfs_error_report+0x38/0x3a Mar 4 03:40:10 Hog kernel: [<ffffffff81253028>] ? xfs_dialloc_ag+0x195/0x248 Mar 4 03:40:10 Hog kernel: [<ffffffff81251b13>] ? xfs_inobt_lookup+0x22/0x24 Mar 4 03:40:10 Hog kernel: [<ffffffff8125207e>] xfs_dialloc_ag_update_inobt+0xbd/0xdb Mar 4 03:40:10 Hog kernel: [<ffffffff81253028>] xfs_dialloc_ag+0x195/0x248 Mar 4 03:40:10 Hog kernel: [<ffffffff81253d0d>] xfs_dialloc+0x1d6/0x1f5 Mar 4 03:40:10 Hog kernel: [<ffffffff8126b564>] xfs_ialloc+0x4b/0x46f Mar 4 03:40:10 Hog kernel: [<ffffffff81275097>] ? xlog_grant_head_check+0x4b/0xc7 Mar 4 03:40:10 Hog kernel: [<ffffffff8126b9e2>] xfs_dir_ialloc+0x5a/0x1fb Mar 4 03:40:10 Hog kernel: [<ffffffff8126be24>] xfs_create+0x261/0x485 Mar 4 03:40:10 Hog kernel: [<ffffffff81269124>] xfs_generic_create+0xb2/0x237 Mar 4 03:40:10 Hog kernel: [<ffffffff8113b2e8>] ? get_acl+0x12/0x4f Mar 4 03:40:10 Hog kernel: [<ffffffff812692ce>] xfs_vn_mknod+0xf/0x11 Mar 4 03:40:10 Hog kernel: [<ffffffff812692e1>] xfs_vn_mkdir+0x11/0x13 Mar 4 03:40:10 Hog kernel: [<ffffffff81105fd6>] vfs_mkdir+0x6e/0xa8 Mar 4 03:40:10 Hog kernel: [<ffffffff8110a72f>] SyS_mkdirat+0x6d/0xab Mar 4 03:40:10 Hog kernel: [<ffffffff8110a781>] SyS_mkdir+0x14/0x16 Mar 4 03:40:10 Hog kernel: [<ffffffff815f74ee>] system_call_fastpath+0x12/0x71 Mar 4 03:40:10 Hog kernel: XFS (md8): Internal error xfs_trans_cancel at line 1007 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x3de/0x485 Mar 4 03:40:10 Hog kernel: CPU: 1 PID: 23476 Comm: shfs Not tainted 4.1.17-unRAID #1 Mar 4 03:40:10 Hog kernel: Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H-BK/Z97X-UD5H-BK, BIOS F7 04/21/2015 Mar 4 03:40:10 Hog kernel: 000000000000000c ffff8800086f3cf8 ffffffff815f1df0 0000000000000000 Mar 4 03:40:10 Hog kernel: ffff880098ce2cb0 ffff8800086f3d18 ffffffff81260934 ffffffff8126bfa1 Mar 4 03:40:10 Hog kernel: 00ff8800cb86b000 ffff8800086f3d48 ffffffff812744a3 ffff8800cb86b001 Mar 4 03:40:10 Hog kernel: Call Trace: Mar 4 03:40:10 Hog kernel: [<ffffffff815f1df0>] dump_stack+0x4c/0x6e Mar 4 03:40:10 Hog kernel: [<ffffffff81260934>] xfs_error_report+0x38/0x3a Mar 4 03:40:10 Hog kernel: [<ffffffff8126bfa1>] ? xfs_create+0x3de/0x485 Mar 4 03:40:10 Hog kernel: [<ffffffff812744a3>] xfs_trans_cancel+0x5b/0xda Mar 4 03:40:10 Hog kernel: [<ffffffff8126bfa1>] xfs_create+0x3de/0x485 Mar 4 03:40:10 Hog kernel: [<ffffffff81269124>] xfs_generic_create+0xb2/0x237 Mar 4 03:40:10 Hog kernel: [<ffffffff8113b2e8>] ? get_acl+0x12/0x4f Mar 4 03:40:10 Hog kernel: [<ffffffff812692ce>] xfs_vn_mknod+0xf/0x11 Mar 4 03:40:10 Hog kernel: [<ffffffff812692e1>] xfs_vn_mkdir+0x11/0x13 Mar 4 03:40:10 Hog kernel: [<ffffffff81105fd6>] vfs_mkdir+0x6e/0xa8 Mar 4 03:40:10 Hog kernel: [<ffffffff8110a72f>] SyS_mkdirat+0x6d/0xab Mar 4 03:40:10 Hog kernel: [<ffffffff8110a781>] SyS_mkdir+0x14/0x16 Mar 4 03:40:10 Hog kernel: [<ffffffff815f74ee>] system_call_fastpath+0x12/0x71 Mar 4 03:40:10 Hog kernel: XFS (md8): xfs_do_force_shutdown(0x8) called from line 1008 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff812744bc Mar 4 03:40:10 Hog kernel: XFS (md8): Corruption of in-memory data detected. Shutting down filesystem Mar 4 03:40:10 Hog kernel: XFS (md8): Please umount the filesystem and rectify the problem(s) Mar 4 03:40:10 Hog logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0] And then I see a ton of these when trying to access files on the restored drive: Mar 4 04:37:14 Hog kernel: XFS (md8): xfs_log_force: error -5 returned. Mar 4 04:37:44 Hog kernel: XFS (md8): xfs_log_force: error -5 returned. Mar 4 04:38:14 Hog kernel: XFS (md8): xfs_log_force: error -5 returned. Mar 4 04:38:44 Hog kernel: XFS (md8): xfs_log_force: error -5 returned. Mar 4 04:39:14 Hog kernel: XFS (md8): xfs_log_force: error -5 returned. Mar 4 04:39:44 Hog kernel: XFS (md8): xfs_log_force: error -5 returned. What really weird if I reboot the box, it works again for a while. All Web UI elemets report the drive is healthy with no issies. Full log attached. hog-diagnostics-20160304-0916.zip Quote Link to comment
JorgeB Posted March 4, 2016 Share Posted March 4, 2016 https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS You need to check disk8 (/dev/md8) Quote Link to comment
magmpzero Posted March 4, 2016 Author Share Posted March 4, 2016 Attached is the output from the check. I am curious, is it possible that because the drive failed when rebuilding my parity drive if bad data was written to parity and then when restored is restored bad data that makes it appear the FS is corrupted? ---- node allocation btrees are too corrupted, skipping phases 6 and 7 No modify flag set, skipping filesystem flush and exiting. ---- xfs_check.txt Quote Link to comment
JorgeB Posted March 4, 2016 Share Posted March 4, 2016 Corruption is more likely to occur when a disk fails and some garbage is written to parity, it could be also that your parity was not completely synced, if you don’t run regular parity checks, or a number of different reasons. If the failed drive is still readable you can check it on a test server. Quote Link to comment
magmpzero Posted March 4, 2016 Author Share Posted March 4, 2016 I think the failed drive is mostly readable for the data I care about saving. Most I can lose. Do anyone have a link that explains the process to basically wipe this drive to clear both the drive and reset /md8 data on the parity drive and then copy over what I want to keep? I am thinking just a format of /md8 would work and then rebuild parity, finally copy over what I want to keep from previous failed drive from another box. -- gs Quote Link to comment
JorgeB Posted March 4, 2016 Share Posted March 4, 2016 If you don’t care for the data on disk8, you can format it while maintaining parity, with the array stopped, click on disk8, change filesystem to reiserfs, start array, format disk8 (this will delete all data on disk8), stop array, change fs back to xfs, start array, re-format disk to xfs and copy data from old disk, parity is still valid. Quote Link to comment
magmpzero Posted March 5, 2016 Author Share Posted March 5, 2016 Just following up on. Issue has been resolved after formatting /md8 and copying data back over. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.