[SOLVED] already repaired xfs on unmountable disks but physical screen still shows xfs corruption detected


Recommended Posts

okay another long one from me...

 

 

sorry its so long, but i'm didn't want to leave out any important information and i'm not sure what's important...

 

i had a disabled disk (disk 1) so i had to stop the array to rebuild it.

so i stopped the array and unassigned the disabled disk.

then i shutdown because i wanted to check all the sata connections because i had been getting errors on some drives, which i understand could be due to faulty cables.

 

this time i thought i was a bit smarter because i had disabled array autostart, dockers and vms first.

 

booted up the server again and everything looked ok.

so i assign the disk (same disk, because i was pretty sure the disk was actually still good) and start array.

array starts up and rebuild is running, but i noticed now another disk (disk 3) is unmountable.

 

i decided i wanted to fix that problem first so i stopped rebuild (stopped the array) and went to maintenance mode to run xfs_repair on disk 3. i just clicked on check with the default -n option on and it worked.

started the array again and disk 3 was mounted.

but... now disk 1 was unmountable. same disk 1 that was earlier disabled and being rebuilt. weird...

 

so i stopped the array again, went to maintenance mode but this time the check button was unresponsive.

I stumbled through some forum posts and wikis and learned how to xfs_repair it from terminal.

did xfs_repair -n /dev/md1 (since -n could repair disk 3 just now) and started array but disk 1 was still unmountable.

did xfs_repair -v /dev/md1 (got that from the wiki) it took some time and seemed to have done something... but still not mountable...

then did xfs_repair /dev/md1 and that did the trick. now disk 1 was mounted and rebuild was in progress.

 

about to call it a night but just happened to notice that the physically connected server screen was full of these errors, just repeating over and over:

XFS (md1): Corruption detected. Unmount and run xfs_repair

XFS (md1): Internal error rval !=0 .....

IMG_0192.thumb.jpg.f7852ebd89eddcf8b3b8daa9cfe7ff26.jpg

i tried going through the xfs_repair steps a few more times but the error still appears but only on the physical screen.

everything seems ok. VMs and dockers are running, and shares are working.

Or perhaps I should say I haven't found any problems yet...

 

How do i fix this ?

 

for now unraid is still rebuilding, it will take 7 hours to complete.

 

right now i really need to sleep...

 

thanks for taking the time to read my entire post.

 

 

 

 

silometalico-diagnostics-20210617-0339.zip

Link to comment

1 hour remaining until data rebuild completes and so far the GUI doesn't report any errors.

I have a new lost & found folder with 1 file in it. else from that it looks like all my shares and files are there.

i don't know yet if any files had been corrupted, but i have file integrity plugin - would that show a report if any files were corrupted?

 

unraid server screen still shows all these errors:

XFS (md1): Corruption detected. Unmount and run xfs_repair...

XFS (md1): Internal error rval !=0 .....

 

any tips on how to fix this would be very much appreciated.

 

thank you.

Link to comment

I'll let the guys chime in the XFS repair stuff, however I see other potential issues in your log, on several CPUs.

I wonder if your filesystem issues are not the result of some larger problem.

 

Jun 17 03:39:15 SILOmetalico kernel: XFS (md1): Internal error rval != 0 && args->dp->i_d.di_size != args->geo->blksize at line 609 of file fs/xfs/libxfs/xfs_dir2.c.  Caller xfs_dir2_isblock+0x59/0xaf [xfs]
Jun 17 03:39:15 SILOmetalico kernel: CPU: 1 PID: 17300 Comm: find Not tainted 5.10.28-Unraid #1
Jun 17 03:39:15 SILOmetalico kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B75 Pro3, BIOS P1.80 10/01/2013
Jun 17 03:39:15 SILOmetalico kernel: Call Trace:
Jun 17 03:39:15 SILOmetalico kernel: dump_stack+0x6b/0x83
Jun 17 03:39:15 SILOmetalico kernel: xfs_corruption_error+0x5f/0x79 [xfs]
Jun 17 03:39:15 SILOmetalico kernel: xfs_dir2_isblock+0x87/0xaf [xfs]
Jun 17 03:39:15 SILOmetalico kernel: ? xfs_dir2_isblock+0x59/0xaf [xfs]
Jun 17 03:39:15 SILOmetalico kernel: xfs_readdir+0xbf/0x10c [xfs]
Jun 17 03:39:15 SILOmetalico kernel: iterate_dir+0x93/0x131
Jun 17 03:39:15 SILOmetalico kernel: __do_sys_getdents64+0x6b/0xd4
Jun 17 03:39:15 SILOmetalico kernel: ? filldir+0x17c/0x17c
Jun 17 03:39:15 SILOmetalico kernel: ? __do_sys_fcntl+0x53/0x70
Jun 17 03:39:15 SILOmetalico kernel: do_syscall_64+0x5d/0x6a
Jun 17 03:39:15 SILOmetalico kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 17 03:39:15 SILOmetalico kernel: RIP: 0033:0x14e80db06f97
Jun 17 03:39:15 SILOmetalico kernel: Code: 0f 1f 00 48 8b 47 20 c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 81 fa ff ff ff 7f b8 ff ff ff 7f 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c9 6e 10 00 f7 d8 64 89 02 48
Jun 17 03:39:15 SILOmetalico kernel: RSP: 002b:00007ffd242a7a88 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Jun 17 03:39:15 SILOmetalico kernel: RAX: ffffffffffffffda RBX: 0000000000455470 RCX: 000014e80db06f97
Jun 17 03:39:15 SILOmetalico kernel: RDX: 0000000000008000 RSI: 0000000000455470 RDI: 0000000000000008
Jun 17 03:39:15 SILOmetalico kernel: RBP: ffffffffffffff80 R08: 0000000000000030 R09: 0000000000000001
Jun 17 03:39:15 SILOmetalico kernel: R10: 0000000000000100 R11: 0000000000000293 R12: 0000000000455444
Jun 17 03:39:15 SILOmetalico kernel: R13: 0000000000000000 R14: 0000000000455440 R15: 0000000000449c50
Jun 17 03:39:15 SILOmetalico kernel: XFS (md1): Corruption detected. Unmount and run xfs_repair

 

Link to comment
57 minutes ago, ChatNoir said:

I see other potential issues in your log, on several CPUs

 😱😱

thinking about it, this isn't the first time i've had a drive get disabled and later another drive unmountable. i always assumed that those sporadic errors and disk problems were caused by wonky sata connections.

other than that i've not noticed any issues... i've been very happy with this old server of mine! still learning how to treat it with care, but its otherwise been really good!

Link to comment
3 minutes ago, JorgeB said:

This does nothing, -n is the "no modify" flag, used to do a read only check, nothing is done

oh that's weird... that's the default option if we run xfs_repair from the gui and i swear i only clicked the check button... somehow it fixed disk3, so i thought to give it a go with disk1.

 

rebuild is done, no errors.

 

do i need to enter any other options when i run xfs_repair again?

I noticed sometimes users are told to do xfs_repair -L

 

Link to comment
2 minutes ago, JorgeB said:

Run without options or -v

ah that's what i did and that got unraid to mount the disk.

but then again it only fixed the emulated disk.

i'll try again later tonight.

 

2 minutes ago, JorgeB said:

If it keeps corrupting also a good idea to run memtest.

i'd say that my problems don't happen too often... i usually also go for months without needing to restart or touch the server.

so far any disk problems would go away after i fiddle about with the disks, check the cables, etc. its an old supermicro cse-m35t cage, gets a bit temperamental...

but yeah i think i'll give that memtest a go. i've never done it so maybe its about time.

 

 

Link to comment

@JorgeB ok i rebooted and did xfs_repair on disk 1 again in maintenance mode.

 

it ran quite fast and gave these results:

image.png.784954b6b56896c2622253167aaa8a12.png

 

once i started up the array the errors started flooding the screen again.

 

there were none of these errors at the login screen after server rebooted and when i started the array in maintenance mode.

they only started flooding back when array is started in normal mode (not maintenance mode).

 

array is now started, GUI doesn't show any errors and all disks are green.

 

what else should i try?

 

silometalico-diagnostics-20210617-2043.zip

Edited by limawaken
re-phrased my sentence
Link to comment

hi trurl, i'm confused because the array looks good and GUI doesn't report any file system problems.

the kind of corruption i was expecting was some disk errors to appear or the disk becomes unmountable again, but the server started up and started working.

 

ok, i'll give memtest a go.

 

before I do that, can I ask, does memtest require any specific arguments or parameters for diagnosing this problem? it doesn't do any repairs, right?

i'm afraid i'll get stuck in the middle of memtest and won't know what to do next.

Link to comment

ok, so ran memtest but it was taking too long... i stoped it at this point:

image.thumb.png.8631443bba41ecae2393b525344ef6b5.png

look like there were some errors... guess I really did have faulty RAM? or a faulty CPU? something doesn't look right.

i'll do it again another day. I right now i'm too tired to wait for it to finish.

 

I also downloaded the latest xfsprogs and put it in the extra folder, so it got installed when unraid booted up.

but now when i run xfs_repair i get this error:

image.png.75235dd7305d2aa5f644854959140ade.png

did i download the wrong xfsprogs?

the one i got was from here: https://slackware.pkgs.org/current/slackware-x86_64/xfsprogs-5.12.0-x86_64-1.txz.html

 

so, at end i couldn't get xfs_repair to work and i couldn't complete memtest... ugh... 

Link to comment
5 minutes ago, limawaken said:

guess I really did have faulty RAM?

Yes, only 0 errors are acceptable, and even that not a guarantee there's aren't issues, but any errors is a guarantee there are.

 

Looks like you need an updated GLIBC, download this one, also to the extra folder, then reboot:

 

http://ftp.riken.jp/Linux/slackware/slackware64-current/slackware64/a/aaa_glibc-solibs-2.33-x86_64-2.txz

 

 

 

 

  • Thanks 1
Link to comment

hi guys.

 

thanks for all your advice. i do appreciate it and i do understand what you mean. 

 

I'm going to try to fix this memory problem first. then i'll retry the xfs repair.

 

i will run memtest 1 ramstick at a time. that would break up the test into shorter segments, but would take me 4 nights of memtest to test all my ram.

 

just a question about memtest - should i change any of the options? if i'm not wrong, i noticed before it started i was given the option to choose multiple threads. would that be better?

Link to comment
6 hours ago, limawaken said:

will run memtest 1 ramstick at a time. that would break up the test into shorter segments, but would take me 4 nights of memtest to test all my ram.


just to point out that you can get scenarios where all sticks test fine individually, but you still get failures when all sticks are installed at the same time.

  • Thanks 1
Link to comment
  • JorgeB changed the title to [SOLVED] already repaired xfs on unmountable disks but physical screen still shows xfs corruption detected

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.