March 7, 20179 yr While trying to work with a directory on on disk 1 I received a strange windows error. When I tried to access the disk with mc I get both disk1 and user0 preceded by a question mark and highlighted in red. Here is a log: Quote ErrorWarningSystemArrayLogin Mar 6 18:18:01 Tower2 kernel: mdcmd (527): spinup 1Mar 6 18:18:01 Tower2 kernel: mdcmd (528): spinup 2Mar 6 18:18:01 Tower2 kernel: mdcmd (529): spinup 3Mar 6 18:18:01 Tower2 kernel: mdcmd (530): spinup 4Mar 6 18:18:01 Tower2 kernel: mdcmd (531): spinup 5Mar 6 18:18:01 Tower2 kernel: mdcmd (532): spinup 6Mar 6 18:18:01 Tower2 kernel: mdcmd (533): spinup 7Mar 6 18:18:01 Tower2 kernel: mdcmd (534): spinup 8Mar 6 18:18:01 Tower2 kernel: mdcmd (535): spinup 29Mar 6 18:21:50 Tower2 kernel: XFS (md1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3504 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x43a/0x569Mar 6 18:21:50 Tower2 kernel: CPU: 0 PID: 16528 Comm: smbd Tainted: G W I 4.9.10-unRAID #1Mar 6 18:21:50 Tower2 kernel: Hardware name: System manufacturer System Product Name/P6X58D PREMIUM, BIOS 1501 05/10/2011Mar 6 18:21:50 Tower2 kernel: ffffc900087cbae8 ffffffff813a353e ffff880004dab220 ffffc900087cbbecMar 6 18:21:50 Tower2 kernel: ffffc900087cbb00 ffffffff8129b917 ffffffff812635bd ffffc900087cbba0Mar 6 18:21:50 Tower2 kernel: ffffffff8127a5b0 0000000000000000 00000000087cbba0 ffffffffffffffffMar 6 18:21:50 Tower2 kernel: Call Trace:Mar 6 18:21:50 Tower2 kernel: [<ffffffff813a353e>] dump_stack+0x61/0x7eMar 6 18:21:50 Tower2 kernel: [<ffffffff8129b917>] xfs_error_report+0x32/0x35Mar 6 18:21:50 Tower2 kernel: [<ffffffff812635bd>] ? xfs_free_ag_extent+0x43a/0x569Mar 6 18:21:50 Tower2 kernel: [<ffffffff8127a5b0>] xfs_btree_insert+0xe2/0x17dMar 6 18:21:50 Tower2 kernel: [<ffffffff812635bd>] xfs_free_ag_extent+0x43a/0x569Mar 6 18:21:50 Tower2 kernel: [<ffffffff812635bd>] ? xfs_free_ag_extent+0x43a/0x569Mar 6 18:21:50 Tower2 kernel: [<ffffffff81265244>] xfs_free_extent+0xd4/0x115Mar 6 18:21:50 Tower2 kernel: [<ffffffff812c028a>] xfs_trans_free_extent+0x28/0x65Mar 6 18:21:50 Tower2 kernel: [<ffffffff812c02e7>] xfs_extent_free_finish_item+0x20/0x32Mar 6 18:21:50 Tower2 kernel: [<ffffffff8127e1a9>] xfs_defer_finish+0xe7/0x1ebMar 6 18:21:50 Tower2 kernel: [<ffffffff812a840a>] xfs_itruncate_extents+0xea/0x191Mar 6 18:21:50 Tower2 kernel: [<ffffffff812a852a>] xfs_inactive_truncate+0x79/0xbfMar 6 18:21:50 Tower2 kernel: [<ffffffff812a8997>] xfs_inactive+0xa1/0xc0Mar 6 18:21:50 Tower2 kernel: [<ffffffff812ae3a7>] xfs_fs_destroy_inode+0xc6/0x17aMar 6 18:21:50 Tower2 kernel: [<ffffffff81136c01>] destroy_inode+0x38/0x50Mar 6 18:21:50 Tower2 kernel: [<ffffffff81136d7f>] evict+0x166/0x16dMar 6 18:21:50 Tower2 kernel: [<ffffffff811373c2>] iput+0x163/0x170Mar 6 18:21:50 Tower2 kernel: [<ffffffff8112d1ba>] do_unlinkat+0x125/0x201Mar 6 18:21:50 Tower2 kernel: [<ffffffff8112e60b>] SyS_unlink+0x11/0x13Mar 6 18:21:50 Tower2 kernel: [<ffffffff8167d2b7>] entry_SYSCALL_64_fastpath+0x1a/0xa9Mar 6 18:21:50 Tower2 kernel: XFS (md1): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffff8127df9dMar 6 18:21:53 Tower2 kernel: XFS (md1): Corruption of in-memory data detected. Shutting down filesystemMar 6 18:21:53 Tower2 kernel: XFS (md1): Please umount the filesystem and rectify the problem(s) Mar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/Sinai Docs statfs: Input/output errorMar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/appdata statfs: Input/output errorMar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/domains statfs: Input/output errorMar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/system statfs: Input/output errorMar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/vdisks statfs: Input/output errorMar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/Sinai Docs statfs: Input/output errorMar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/appdata statfs: Input/output errorMar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/domains statfs: Input/output errorMar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/system statfs: Input/output errorMar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/vdisks statfs: Input/output errorMar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/Sinai Docs statfs: Input/output errorMar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/appdata statfs: Input/output errorMar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/domains statfs: Input/output errorMar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/system statfs: Input/output errorMar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/vdisks statfs: Input/output error What should I do? Reboot? Edited March 7, 20179 yr by levster
March 7, 20179 yr If you can get to console type diagnostics and upload them otherwise do it in GUI under tools Diagnostics. Post here and smarter people than I can read them. Might be a repair of XFS BUT POST DIAGNOSTICS BEFORE DOING ANYTHING! *Steps off soapbox* and wait for a response.
March 7, 20179 yr Don't reboot. Can you please post your diagnostics file. Type diagnostics from the command line and grab the file from the usb OR use the GUI to grab it.
March 7, 20179 yr 1 minute ago, phbigred said: If you can get to console type diagnostics and upload them otherwise do it in GUI under tools Diagnostics. Post here and smarter people than I can read them. Might be a repair of XFS BUT POST DIAGNOSTICS BEFORE DOING ANYTHING! *Steps off soapbox* and wait for a response. SNAP.
March 7, 20179 yr Author 15 minutes ago, phbigred said: If you can get to console type diagnostics and upload them otherwise do it in GUI under tools Diagnostics. Post here and smarter people than I can read them. Might be a repair of XFS BUT POST DIAGNOSTICS BEFORE DOING ANYTHING! *Steps off soapbox* and wait for a response. I had to look up what *Steps off soapbox* meant!
March 7, 20179 yr I am feeling there is a corrupt filesystem here. Then again I have seen these errors when there is a bad disk. Some parts of the log "might" suggest memory issues too (although I'm not clear on this). In any case, now you have grabbed diagnostics I would say that it is safe to do a clean shutdown. If it was me I would work through the potential causes to try and diagnose. I have only had the chance to skim your diagnostics file as I am at work but here are my initial findings .... HUA723020ALA641 is showing as having a reallocated sector - which "might" indicate issues with that disk. Although that is a stretch I think as strictly speaking having a reallocated sector is not a bad thing. There are no pending sectors which I would be more worried about. You could run XFS_repair to check the FS and fix any issues (if this is in fact due to a corrupt FS) but I would wait for further input before doing this. In the meantime, while you wait on others to pipe in, run a memtest overnight. You can do this from the initial unRAID boot menu.
March 7, 20179 yr Community Expert You need to check filesystem on disk1:https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Redoing_a_drive_formatted_with_XFSAlso there are several bios related call traces, look for a bios update.
March 7, 20179 yr Author So, I checked the file system and here is the output: Quote Not available Phase 1 - find and verify superblock... - block cache size set to 706216 entries Phase 2 - using internal log - zero log... zero_log: head block 460001 tail block 457061 - scan filesystem freespace and inode maps... sb_fdblocks 47014245, counted 47022437 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Tue Mar 7 12:25:31 2017 Phase Start End Duration Phase 1: 03/07 12:25:08 03/07 12:25:08 Phase 2: 03/07 12:25:08 03/07 12:25:16 8 seconds Phase 3: 03/07 12:25:16 03/07 12:25:27 11 seconds Phase 4: 03/07 12:25:27 03/07 12:25:28 1 second Phase 5: Skipped Phase 6: 03/07 12:25:28 03/07 12:25:31 3 seconds Phase 7: 03/07 12:25:31 03/07 12:25:31 Total run time: 23 seconds Any suggestions would be great. Edited March 7, 20179 yr by levster
March 7, 20179 yr Author 1 hour ago, johnnie.black said: Remove the -n flag or xfs_repair won't repair the file system. Do I just leave that field blank?
March 7, 20179 yr Author Here is the latest response: Quote Not available Phase 1 - find and verify superblock... - block cache size set to 706216 entries Phase 2 - using internal log - zero log... zero_log: head block 460001 tail block 457061 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. So, I restarted the system and here is the log: Quote ErrorWarningSystemArrayLogin Mar 7 14:55:08 Tower2 root: Updating templates... Updating info... Done.Mar 7 14:55:08 Tower2 emhttp: shcmd (284126): set -o pipefail ; /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1 |& loggerMar 7 14:55:09 Tower2 kernel: BTRFS info (device loop1): disk space caching is enabledMar 7 14:55:09 Tower2 kernel: BTRFS info (device loop1): has skinny extentsMar 7 14:55:09 Tower2 root: Resize '/etc/libvirt' of 'max'Mar 7 14:55:09 Tower2 kernel: BTRFS info (device loop1): new size for /dev/loop1 is 1073741824Mar 7 14:55:09 Tower2 emhttp: shcmd (284130): /etc/rc.d/rc.libvirt start |& loggerMar 7 14:55:09 Tower2 root: Starting virtlockd...Mar 7 14:55:09 Tower2 root: Starting virtlogd...Mar 7 14:55:09 Tower2 root: Starting libvirtd...Mar 7 14:55:09 Tower2 kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaroundMar 7 14:55:09 Tower2 kernel: tun: Universal TUN/TAP device driver, 1.6Mar 7 14:55:09 Tower2 kernel: tun: (C) 1999-2004 Max Krasnyansky <[email protected]>Mar 7 14:55:09 Tower2 emhttp: nothing to syncMar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered blocking stateMar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered disabled stateMar 7 14:55:10 Tower2 kernel: device virbr0-nic entered promiscuous modeMar 7 14:55:10 Tower2 avahi-daemon[13126]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.Mar 7 14:55:10 Tower2 avahi-daemon[13126]: New relevant interface virbr0.IPv4 for mDNS.Mar 7 14:55:10 Tower2 avahi-daemon[13126]: Registering new address record for 192.168.122.1 on virbr0.IPv4.Mar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered blocking stateMar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered listening stateMar 7 14:55:10 Tower2 dnsmasq[16608]: started, version 2.76 cachesize 150Mar 7 14:55:10 Tower2 dnsmasq[16608]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotifyMar 7 14:55:10 Tower2 dnsmasq-dhcp[16608]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1hMar 7 14:55:10 Tower2 dnsmasq-dhcp[16608]: DHCP, sockets bound exclusively to interface virbr0Mar 7 14:55:10 Tower2 dnsmasq[16608]: reading /etc/resolv.confMar 7 14:55:10 Tower2 dnsmasq[16608]: using nameserver 192.168.1.1#53Mar 7 14:55:10 Tower2 dnsmasq[16608]: read /etc/hosts - 2 addressesMar 7 14:55:10 Tower2 dnsmasq[16608]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addressesMar 7 14:55:10 Tower2 dnsmasq-dhcp[16608]: read /var/lib/libvirt/dnsmasq/default.hostsfileMar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered disabled stateMar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered blocking stateMar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered disabled stateMar 7 14:55:10 Tower2 kernel: device vnet0 entered promiscuous modeMar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered blocking stateMar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered forwarding stateMar 7 14:55:10 Tower2 kernel: kvm: zapping shadow pages for mmio generation wraparoundMar 7 14:55:10 Tower2 kernel: kvm: zapping shadow pages for mmio generation wraparound Also, disk shows as unmountable. How does this help? Edited March 7, 20179 yr by levster
March 7, 20179 yr Author 7 minutes ago, johnnie.black said: You need to use -L OK. Here goes... Do you know how long these XFS repairs should take? I don't want to start panicking prematurely Edited March 7, 20179 yr by levster
March 7, 20179 yr 42 minutes ago, levster said: OK. Here goes... Do you know how long these XFS repairs should take? I don't want to start panicking prematurely I don't think you are going to get an answer to that question. My understanding is that it depends on many things including the drive specification, file system damage and file system use etc. Reports online suggest that it can take days. Therefore my advice would be to "set and forget". Check every now and then but prepare for the long haul and let it do it's thing. It will finish when it finishes.
March 8, 20179 yr Author OK, here is the output: Quote Not available Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... Metadata corruption detected at xfs_agf block 0x1/0x200 flfirst 118 in agf 0 too large (max = 118) freeblk count 5 != flcount 6 in ag 3 agi unlinked bucket 24 is 86311832 in ag 0 (inode=86311832) sb_icount 103040, counted 100864 sb_ifree 122, counted 581 sb_fdblocks 47014245, counted 64670552 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 86311832, moving to lost+found Phase 7 - verify and correct link counts... Maximum metadata LSN (1:457063) is ahead of log (1:2). Format log to cycle 4. done One file of lost and found was created. Does it mean that one of the files was corrupted? How can I find out which one? If the system is protected with 2 parity disks why is there an actual corruption? I know that these are perhaps strange questions, but isn't it the idea behind this system, to have so much redundancy that loosing a file should be next to impossible. Should I rerun a parity check? Lev Edited March 8, 20179 yr by levster
March 8, 20179 yr Your output indicates a successful operation. Based on the steps I believe you have taken, I feel your Parity and data is fine. As the instructions in the wiki indicate: Quote The xfs_repair instructions here are designed to check and fix the integrity of the XFS file system of a data drive, while maintaining its parity info. https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Running_xfs_repair Edited March 8, 20179 yr by danioj
March 8, 20179 yr Community Expert File(s) on lost+found may be corrupt, check them, all other files should be OK. Parity remains in sync, unless you don't do regular scheduled parity checks, no need to do one now.
March 8, 20179 yr 31 minutes ago, johnnie.black said: File(s) on lost+found may be corrupt, check them, all other files should be OK. Parity remains in sync, unless you don't do regular scheduled parity checks, no need to do one now. I didn't realise this. I thought that the files(s) in the lost+found contained files that had their name data lost and because xfs_repair can't deal with that it puts them there but they are not corrupt as the repair was successful? Maybe I misunderstood. @OP either way I'd say better advice would be just to do a restore from your backup if you're worried. Edited March 8, 20179 yr by danioj
March 8, 20179 yr I don't think that anything important was affected as I was in the process of actually deleting data of the drive. Just concerned that there could actually be data loss on a protected array. Sent from my SM-G935V using Tapatalk
Archived
This topic is now archived and is closed to further replies.