limawaken Posted June 16, 2021 Share Posted June 16, 2021 okay another long one from me... sorry its so long, but i'm didn't want to leave out any important information and i'm not sure what's important... i had a disabled disk (disk 1) so i had to stop the array to rebuild it. so i stopped the array and unassigned the disabled disk. then i shutdown because i wanted to check all the sata connections because i had been getting errors on some drives, which i understand could be due to faulty cables. this time i thought i was a bit smarter because i had disabled array autostart, dockers and vms first. booted up the server again and everything looked ok. so i assign the disk (same disk, because i was pretty sure the disk was actually still good) and start array. array starts up and rebuild is running, but i noticed now another disk (disk 3) is unmountable. i decided i wanted to fix that problem first so i stopped rebuild (stopped the array) and went to maintenance mode to run xfs_repair on disk 3. i just clicked on check with the default -n option on and it worked. started the array again and disk 3 was mounted. but... now disk 1 was unmountable. same disk 1 that was earlier disabled and being rebuilt. weird... so i stopped the array again, went to maintenance mode but this time the check button was unresponsive. I stumbled through some forum posts and wikis and learned how to xfs_repair it from terminal. did xfs_repair -n /dev/md1 (since -n could repair disk 3 just now) and started array but disk 1 was still unmountable. did xfs_repair -v /dev/md1 (got that from the wiki) it took some time and seemed to have done something... but still not mountable... then did xfs_repair /dev/md1 and that did the trick. now disk 1 was mounted and rebuild was in progress. about to call it a night but just happened to notice that the physically connected server screen was full of these errors, just repeating over and over: XFS (md1): Corruption detected. Unmount and run xfs_repair XFS (md1): Internal error rval !=0 ..... i tried going through the xfs_repair steps a few more times but the error still appears but only on the physical screen. everything seems ok. VMs and dockers are running, and shares are working. Or perhaps I should say I haven't found any problems yet... How do i fix this ? for now unraid is still rebuilding, it will take 7 hours to complete. right now i really need to sleep... thanks for taking the time to read my entire post. silometalico-diagnostics-20210617-0339.zip Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 1 hour remaining until data rebuild completes and so far the GUI doesn't report any errors. I have a new lost & found folder with 1 file in it. else from that it looks like all my shares and files are there. i don't know yet if any files had been corrupted, but i have file integrity plugin - would that show a report if any files were corrupted? unraid server screen still shows all these errors: XFS (md1): Corruption detected. Unmount and run xfs_repair... XFS (md1): Internal error rval !=0 ..... any tips on how to fix this would be very much appreciated. thank you. Quote Link to comment
ChatNoir Posted June 17, 2021 Share Posted June 17, 2021 I'll let the guys chime in the XFS repair stuff, however I see other potential issues in your log, on several CPUs. I wonder if your filesystem issues are not the result of some larger problem. Jun 17 03:39:15 SILOmetalico kernel: XFS (md1): Internal error rval != 0 && args->dp->i_d.di_size != args->geo->blksize at line 609 of file fs/xfs/libxfs/xfs_dir2.c. Caller xfs_dir2_isblock+0x59/0xaf [xfs] Jun 17 03:39:15 SILOmetalico kernel: CPU: 1 PID: 17300 Comm: find Not tainted 5.10.28-Unraid #1 Jun 17 03:39:15 SILOmetalico kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B75 Pro3, BIOS P1.80 10/01/2013 Jun 17 03:39:15 SILOmetalico kernel: Call Trace: Jun 17 03:39:15 SILOmetalico kernel: dump_stack+0x6b/0x83 Jun 17 03:39:15 SILOmetalico kernel: xfs_corruption_error+0x5f/0x79 [xfs] Jun 17 03:39:15 SILOmetalico kernel: xfs_dir2_isblock+0x87/0xaf [xfs] Jun 17 03:39:15 SILOmetalico kernel: ? xfs_dir2_isblock+0x59/0xaf [xfs] Jun 17 03:39:15 SILOmetalico kernel: xfs_readdir+0xbf/0x10c [xfs] Jun 17 03:39:15 SILOmetalico kernel: iterate_dir+0x93/0x131 Jun 17 03:39:15 SILOmetalico kernel: __do_sys_getdents64+0x6b/0xd4 Jun 17 03:39:15 SILOmetalico kernel: ? filldir+0x17c/0x17c Jun 17 03:39:15 SILOmetalico kernel: ? __do_sys_fcntl+0x53/0x70 Jun 17 03:39:15 SILOmetalico kernel: do_syscall_64+0x5d/0x6a Jun 17 03:39:15 SILOmetalico kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 17 03:39:15 SILOmetalico kernel: RIP: 0033:0x14e80db06f97 Jun 17 03:39:15 SILOmetalico kernel: Code: 0f 1f 00 48 8b 47 20 c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 81 fa ff ff ff 7f b8 ff ff ff 7f 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c9 6e 10 00 f7 d8 64 89 02 48 Jun 17 03:39:15 SILOmetalico kernel: RSP: 002b:00007ffd242a7a88 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 Jun 17 03:39:15 SILOmetalico kernel: RAX: ffffffffffffffda RBX: 0000000000455470 RCX: 000014e80db06f97 Jun 17 03:39:15 SILOmetalico kernel: RDX: 0000000000008000 RSI: 0000000000455470 RDI: 0000000000000008 Jun 17 03:39:15 SILOmetalico kernel: RBP: ffffffffffffff80 R08: 0000000000000030 R09: 0000000000000001 Jun 17 03:39:15 SILOmetalico kernel: R10: 0000000000000100 R11: 0000000000000293 R12: 0000000000455444 Jun 17 03:39:15 SILOmetalico kernel: R13: 0000000000000000 R14: 0000000000455440 R15: 0000000000449c50 Jun 17 03:39:15 SILOmetalico kernel: XFS (md1): Corruption detected. Unmount and run xfs_repair Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 When the rebuild is done reboot and run xfs_repair again on disk1. P.S.: 11 hours ago, limawaken said: i just clicked on check with the default -n option on and it worked. This does nothing, -n is the "no modify" flag, used to do a read only check, nothing is done. Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 57 minutes ago, ChatNoir said: I see other potential issues in your log, on several CPUs 😱😱 thinking about it, this isn't the first time i've had a drive get disabled and later another drive unmountable. i always assumed that those sporadic errors and disk problems were caused by wonky sata connections. other than that i've not noticed any issues... i've been very happy with this old server of mine! still learning how to treat it with care, but its otherwise been really good! Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 3 minutes ago, JorgeB said: This does nothing, -n is the "no modify" flag, used to do a read only check, nothing is done oh that's weird... that's the default option if we run xfs_repair from the gui and i swear i only clicked the check button... somehow it fixed disk3, so i thought to give it a go with disk1. rebuild is done, no errors. do i need to enter any other options when i run xfs_repair again? I noticed sometimes users are told to do xfs_repair -L Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 Run without options or -v (verbose), use -L only if asked. Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 If it keeps corrupting also a good idea to run memtest. Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 2 minutes ago, JorgeB said: Run without options or -v ah that's what i did and that got unraid to mount the disk. but then again it only fixed the emulated disk. i'll try again later tonight. 2 minutes ago, JorgeB said: If it keeps corrupting also a good idea to run memtest. i'd say that my problems don't happen too often... i usually also go for months without needing to restart or touch the server. so far any disk problems would go away after i fiddle about with the disks, check the cables, etc. its an old supermicro cse-m35t cage, gets a bit temperamental... but yeah i think i'll give that memtest a go. i've never done it so maybe its about time. Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 (edited) @JorgeB ok i rebooted and did xfs_repair on disk 1 again in maintenance mode. it ran quite fast and gave these results: once i started up the array the errors started flooding the screen again. there were none of these errors at the login screen after server rebooted and when i started the array in maintenance mode. they only started flooding back when array is started in normal mode (not maintenance mode). array is now started, GUI doesn't show any errors and all disks are green. what else should i try? silometalico-diagnostics-20210617-2043.zip Edited June 17, 2021 by limawaken re-phrased my sentence Quote Link to comment
trurl Posted June 17, 2021 Share Posted June 17, 2021 41 minutes ago, limawaken said: what else should i try? 6 hours ago, JorgeB said: If it keeps corrupting also a good idea to run memtest. Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 Possibly it's an xfs bug, you can try installing newer xfsprogs package to see if it can fix it, still good idea to run a few passes of memtest first. Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 hi trurl, i'm confused because the array looks good and GUI doesn't report any file system problems. the kind of corruption i was expecting was some disk errors to appear or the disk becomes unmountable again, but the server started up and started working. ok, i'll give memtest a go. before I do that, can I ask, does memtest require any specific arguments or parameters for diagnosing this problem? it doesn't do any repairs, right? i'm afraid i'll get stuck in the middle of memtest and won't know what to do next. Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 You just run it, any error will appear in red, like so: But I now think it's more likely a xfs bug. Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 if memtest throws up any errors would i still be able to start array? sorry if i'm acting all weird and paranoid. i even get kinda nervous when i had to reboot... that xfs error thing has made me kinda jumpy. its either that or my coffee was too strong. Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 12 minutes ago, limawaken said: if memtest throws up any errors would i still be able to start array? Yes, just running memtest won't change or fix anything, it might just confirm if there is a RAM problem. Quote Link to comment
limawaken Posted June 17, 2021 Author Share Posted June 17, 2021 ok, so ran memtest but it was taking too long... i stoped it at this point: look like there were some errors... guess I really did have faulty RAM? or a faulty CPU? something doesn't look right. i'll do it again another day. I right now i'm too tired to wait for it to finish. I also downloaded the latest xfsprogs and put it in the extra folder, so it got installed when unraid booted up. but now when i run xfs_repair i get this error: did i download the wrong xfsprogs? the one i got was from here: https://slackware.pkgs.org/current/slackware-x86_64/xfsprogs-5.12.0-x86_64-1.txz.html so, at end i couldn't get xfs_repair to work and i couldn't complete memtest... ugh... Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 5 minutes ago, limawaken said: guess I really did have faulty RAM? Yes, only 0 errors are acceptable, and even that not a guarantee there's aren't issues, but any errors is a guarantee there are. Looks like you need an updated GLIBC, download this one, also to the extra folder, then reboot: http://ftp.riken.jp/Linux/slackware/slackware64-current/slackware64/a/aaa_glibc-solibs-2.33-x86_64-2.txz 1 Quote Link to comment
JorgeB Posted June 17, 2021 Share Posted June 17, 2021 After this is done don't forget to clean the extra folder, or it will keep installing those versions even after Unraid uses newer ones. Quote Link to comment
trurl Posted June 17, 2021 Share Posted June 17, 2021 6 hours ago, limawaken said: i'll do it again another day You don't want to even run a system that has any memory problems. Everything your system does goes through RAM, your data, the OS, EVERYTHING. Quote Link to comment
JorgeB Posted June 18, 2021 Share Posted June 18, 2021 7 hours ago, trurl said: You don't want to even run a system that has any memory problems Yeah, I didn't mention it because it should be obvious, before attempting any more filesystem repairs you need to fix the RAM issue. Quote Link to comment
limawaken Posted June 18, 2021 Author Share Posted June 18, 2021 hi guys. thanks for all your advice. i do appreciate it and i do understand what you mean. I'm going to try to fix this memory problem first. then i'll retry the xfs repair. i will run memtest 1 ramstick at a time. that would break up the test into shorter segments, but would take me 4 nights of memtest to test all my ram. just a question about memtest - should i change any of the options? if i'm not wrong, i noticed before it started i was given the option to choose multiple threads. would that be better? Quote Link to comment
JorgeB Posted June 18, 2021 Share Posted June 18, 2021 53 minutes ago, limawaken said: should i change any of the options? Not needed. Quote Link to comment
trurl Posted June 18, 2021 Share Posted June 18, 2021 5 hours ago, limawaken said: 4 nights of memtest to test all my ram. If any error appears that test is done, no need to let it complete. 1 Quote Link to comment
itimpi Posted June 18, 2021 Share Posted June 18, 2021 6 hours ago, limawaken said: will run memtest 1 ramstick at a time. that would break up the test into shorter segments, but would take me 4 nights of memtest to test all my ram. just to point out that you can get scenarios where all sticks test fine individually, but you still get failures when all sticks are installed at the same time. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.