Unable to access disks

March 7, 20179 yr

While trying to work with a directory on on disk 1 I received a strange windows error. When I tried to access the disk with mc I get both disk1 and user0 preceded by a question mark and highlighted in red. Here is a log:

Quote

ErrorWarningSystemArrayLogin

Mar 6 18:18:01 Tower2 kernel: mdcmd (527): spinup 1
Mar 6 18:18:01 Tower2 kernel: mdcmd (528): spinup 2
Mar 6 18:18:01 Tower2 kernel: mdcmd (529): spinup 3
Mar 6 18:18:01 Tower2 kernel: mdcmd (530): spinup 4
Mar 6 18:18:01 Tower2 kernel: mdcmd (531): spinup 5
Mar 6 18:18:01 Tower2 kernel: mdcmd (532): spinup 6
Mar 6 18:18:01 Tower2 kernel: mdcmd (533): spinup 7
Mar 6 18:18:01 Tower2 kernel: mdcmd (534): spinup 8
Mar 6 18:18:01 Tower2 kernel: mdcmd (535): spinup 29
Mar 6 18:21:50 Tower2 kernel: XFS (md1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3504 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x43a/0x569
Mar 6 18:21:50 Tower2 kernel: CPU: 0 PID: 16528 Comm: smbd Tainted: G W I 4.9.10-unRAID #1
Mar 6 18:21:50 Tower2 kernel: Hardware name: System manufacturer System Product Name/P6X58D PREMIUM, BIOS 1501 05/10/2011
Mar 6 18:21:50 Tower2 kernel: ffffc900087cbae8 ffffffff813a353e ffff880004dab220 ffffc900087cbbec
Mar 6 18:21:50 Tower2 kernel: ffffc900087cbb00 ffffffff8129b917 ffffffff812635bd ffffc900087cbba0
Mar 6 18:21:50 Tower2 kernel: ffffffff8127a5b0 0000000000000000 00000000087cbba0 ffffffffffffffff
Mar 6 18:21:50 Tower2 kernel: Call Trace:
Mar 6 18:21:50 Tower2 kernel: [<ffffffff813a353e>] dump_stack+0x61/0x7e
Mar 6 18:21:50 Tower2 kernel: [<ffffffff8129b917>] xfs_error_report+0x32/0x35
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812635bd>] ? xfs_free_ag_extent+0x43a/0x569
Mar 6 18:21:50 Tower2 kernel: [<ffffffff8127a5b0>] xfs_btree_insert+0xe2/0x17d
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812635bd>] xfs_free_ag_extent+0x43a/0x569
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812635bd>] ? xfs_free_ag_extent+0x43a/0x569
Mar 6 18:21:50 Tower2 kernel: [<ffffffff81265244>] xfs_free_extent+0xd4/0x115
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812c028a>] xfs_trans_free_extent+0x28/0x65
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812c02e7>] xfs_extent_free_finish_item+0x20/0x32
Mar 6 18:21:50 Tower2 kernel: [<ffffffff8127e1a9>] xfs_defer_finish+0xe7/0x1eb
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812a840a>] xfs_itruncate_extents+0xea/0x191
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812a852a>] xfs_inactive_truncate+0x79/0xbf
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812a8997>] xfs_inactive+0xa1/0xc0
Mar 6 18:21:50 Tower2 kernel: [<ffffffff812ae3a7>] xfs_fs_destroy_inode+0xc6/0x17a
Mar 6 18:21:50 Tower2 kernel: [<ffffffff81136c01>] destroy_inode+0x38/0x50
Mar 6 18:21:50 Tower2 kernel: [<ffffffff81136d7f>] evict+0x166/0x16d
Mar 6 18:21:50 Tower2 kernel: [<ffffffff811373c2>] iput+0x163/0x170
Mar 6 18:21:50 Tower2 kernel: [<ffffffff8112d1ba>] do_unlinkat+0x125/0x201
Mar 6 18:21:50 Tower2 kernel: [<ffffffff8112e60b>] SyS_unlink+0x11/0x13
Mar 6 18:21:50 Tower2 kernel: [<ffffffff8167d2b7>] entry_SYSCALL_64_fastpath+0x1a/0xa9
Mar 6 18:21:50 Tower2 kernel: XFS (md1): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffff8127df9d
Mar 6 18:21:53 Tower2 kernel: XFS (md1): Corruption of in-memory data detected. Shutting down filesystem
Mar 6 18:21:53 Tower2 kernel: XFS (md1): Please umount the filesystem and rectify the problem(s)

Mar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/Sinai Docs statfs: Input/output error
Mar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/appdata statfs: Input/output error
Mar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/domains statfs: Input/output error
Mar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/system statfs: Input/output error
Mar 6 18:27:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/vdisks statfs: Input/output error
Mar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/Sinai Docs statfs: Input/output error
Mar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/appdata statfs: Input/output error
Mar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/domains statfs: Input/output error
Mar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/system statfs: Input/output error
Mar 6 18:28:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/vdisks statfs: Input/output error
Mar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/Sinai Docs statfs: Input/output error
Mar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/appdata statfs: Input/output error
Mar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/domains statfs: Input/output error
Mar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/system statfs: Input/output error
Mar 6 18:29:01 Tower2 emhttp: err: get_fs_sizes: /mnt/user/vdisks statfs: Input/output error

What should I do? Reboot?

Edited March 7, 20179 yr by levster

Quote

March 7, 20179 yr

If you can get to console type diagnostics and upload them otherwise do it in GUI under tools Diagnostics. Post here and smarter people than I can read them. Might be a repair of XFS BUT POST DIAGNOSTICS BEFORE DOING ANYTHING! *Steps off soapbox* and wait for a response.

Quote

March 7, 20179 yr

Don't reboot. Can you please post your diagnostics file.

Type diagnostics from the command line and grab the file from the usb OR use the GUI to grab it.

Quote

March 7, 20179 yr

1 minute ago, phbigred said:

If you can get to console type diagnostics and upload them otherwise do it in GUI under tools Diagnostics. Post here and smarter people than I can read them. Might be a repair of XFS BUT POST DIAGNOSTICS BEFORE DOING ANYTHING! *Steps off soapbox* and wait for a response.

SNAP.

Quote

March 7, 20179 yr

Author

Great! Here is the diagnostics file

tower2-diagnostics-20170306-1846.zip

Quote

March 7, 20179 yr

Author

15 minutes ago, phbigred said:

If you can get to console type diagnostics and upload them otherwise do it in GUI under tools Diagnostics. Post here and smarter people than I can read them. Might be a repair of XFS BUT POST DIAGNOSTICS BEFORE DOING ANYTHING! *Steps off soapbox* and wait for a response.

I had to look up what *Steps off soapbox* meant!

Quote

March 7, 20179 yr

Author

Any thoughts?

Quote

March 7, 20179 yr

I am feeling there is a corrupt filesystem here. Then again I have seen these errors when there is a bad disk. Some parts of the log "might" suggest memory issues too (although I'm not clear on this).

In any case, now you have grabbed diagnostics I would say that it is safe to do a clean shutdown. If it was me I would work through the potential causes to try and diagnose.

I have only had the chance to skim your diagnostics file as I am at work but here are my initial findings ....

HUA723020ALA641 is showing as having a reallocated sector - which "might" indicate issues with that disk. Although that is a stretch I think as strictly speaking having a reallocated sector is not a bad thing. There are no pending sectors which I would be more worried about.

You could run XFS_repair to check the FS and fix any issues (if this is in fact due to a corrupt FS) but I would wait for further input before doing this.

In the meantime, while you wait on others to pipe in, run a memtest overnight. You can do this from the initial unRAID boot menu.

Quote

March 7, 20179 yr

Community Expert

You need to check filesystem on disk1:

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Redoing_a_drive_formatted_with_XFS

Also there are several bios related call traces, look for a bios update.

Quote

March 7, 20179 yr

Author

So, I checked the file system and here is the output:

Quote


Not available
Phase 1 - find and verify superblock...
        - block cache size set to 706216 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 460001 tail block 457061
        - scan filesystem freespace and inode maps...
sb_fdblocks 47014245, counted 47022437
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Tue Mar  7 12:25:31 2017

Phase		Start		End		Duration
Phase 1:	03/07 12:25:08	03/07 12:25:08
Phase 2:	03/07 12:25:08	03/07 12:25:16	8 seconds
Phase 3:	03/07 12:25:16	03/07 12:25:27	11 seconds
Phase 4:	03/07 12:25:27	03/07 12:25:28	1 second
Phase 5:	Skipped
Phase 6:	03/07 12:25:28	03/07 12:25:31	3 seconds
Phase 7:	03/07 12:25:31	03/07 12:25:31

Total run time: 23 seconds

Any suggestions would be great.

Edited March 7, 20179 yr by levster

Quote

March 7, 20179 yr

Community Expert

Remove the -n flag or xfs_repair won't repair the file system.

Quote

March 7, 20179 yr

Author

1 hour ago, johnnie.black said:

Remove the -n flag or xfs_repair won't repair the file system.

Do I just leave that field blank?

Quote

March 7, 20179 yr

Community Expert

Yes, or use -v for verbose output.

Quote

March 7, 20179 yr

Author

Here is the latest response:

Quote


Not available
Phase 1 - find and verify superblock...
        - block cache size set to 706216 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 460001 tail block 457061
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

So, I restarted the system and here is the log:

Quote

ErrorWarningSystemArrayLogin

Mar 7 14:55:08 Tower2 root: Updating templates... Updating info... Done.
Mar 7 14:55:08 Tower2 emhttp: shcmd (284126): set -o pipefail ; /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1 |& logger
Mar 7 14:55:09 Tower2 kernel: BTRFS info (device loop1): disk space caching is enabled
Mar 7 14:55:09 Tower2 kernel: BTRFS info (device loop1): has skinny extents
Mar 7 14:55:09 Tower2 root: Resize '/etc/libvirt' of 'max'
Mar 7 14:55:09 Tower2 kernel: BTRFS info (device loop1): new size for /dev/loop1 is 1073741824
Mar 7 14:55:09 Tower2 emhttp: shcmd (284130): /etc/rc.d/rc.libvirt start |& logger
Mar 7 14:55:09 Tower2 root: Starting virtlockd...
Mar 7 14:55:09 Tower2 root: Starting virtlogd...
Mar 7 14:55:09 Tower2 root: Starting libvirtd...
Mar 7 14:55:09 Tower2 kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
Mar 7 14:55:09 Tower2 kernel: tun: Universal TUN/TAP device driver, 1.6
Mar 7 14:55:09 Tower2 kernel: tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
Mar 7 14:55:09 Tower2 emhttp: nothing to sync
Mar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered blocking state
Mar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered disabled state
Mar 7 14:55:10 Tower2 kernel: device virbr0-nic entered promiscuous mode
Mar 7 14:55:10 Tower2 avahi-daemon[13126]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Mar 7 14:55:10 Tower2 avahi-daemon[13126]: New relevant interface virbr0.IPv4 for mDNS.
Mar 7 14:55:10 Tower2 avahi-daemon[13126]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
Mar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered blocking state
Mar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered listening state
Mar 7 14:55:10 Tower2 dnsmasq[16608]: started, version 2.76 cachesize 150
Mar 7 14:55:10 Tower2 dnsmasq[16608]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Mar 7 14:55:10 Tower2 dnsmasq-dhcp[16608]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Mar 7 14:55:10 Tower2 dnsmasq-dhcp[16608]: DHCP, sockets bound exclusively to interface virbr0
Mar 7 14:55:10 Tower2 dnsmasq[16608]: reading /etc/resolv.conf
Mar 7 14:55:10 Tower2 dnsmasq[16608]: using nameserver 192.168.1.1#53
Mar 7 14:55:10 Tower2 dnsmasq[16608]: read /etc/hosts - 2 addresses
Mar 7 14:55:10 Tower2 dnsmasq[16608]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Mar 7 14:55:10 Tower2 dnsmasq-dhcp[16608]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Mar 7 14:55:10 Tower2 kernel: virbr0: port 1(virbr0-nic) entered disabled state
Mar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered blocking state
Mar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered disabled state
Mar 7 14:55:10 Tower2 kernel: device vnet0 entered promiscuous mode
Mar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered blocking state
Mar 7 14:55:10 Tower2 kernel: br0: port 2(vnet0) entered forwarding state
Mar 7 14:55:10 Tower2 kernel: kvm: zapping shadow pages for mmio generation wraparound
Mar 7 14:55:10 Tower2 kernel: kvm: zapping shadow pages for mmio generation wraparound

Also, disk shows as unmountable.

How does this help?

Edited March 7, 20179 yr by levster

Quote

March 7, 20179 yr

Community Expert

You need to use -L

Quote

March 7, 20179 yr

Author

7 minutes ago, johnnie.black said:

You need to use -L

OK. Here goes... Do you know how long these XFS repairs should take? I don't want to start panicking prematurely

Edited March 7, 20179 yr by levster

Quote

March 7, 20179 yr

42 minutes ago, levster said:

OK. Here goes... Do you know how long these XFS repairs should take? I don't want to start panicking prematurely

I don't think you are going to get an answer to that question. My understanding is that it depends on many things including the drive specification, file system damage and file system use etc.

Reports online suggest that it can take days. Therefore my advice would be to "set and forget". Check every now and then but prepare for the long haul and let it do it's thing. It will finish when it finishes.

Quote

March 8, 20179 yr

Author

OK, here is the output:

Quote


Not available
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
Metadata corruption detected at xfs_agf block 0x1/0x200
flfirst 118 in agf 0 too large (max = 118)
freeblk count 5 != flcount 6 in ag 3
agi unlinked bucket 24 is 86311832 in ag 0 (inode=86311832)
sb_icount 103040, counted 100864
sb_ifree 122, counted 581
sb_fdblocks 47014245, counted 64670552
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 86311832, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:457063) is ahead of log (1:2).
Format log to cycle 4.
done

One file of lost and found was created. Does it mean that one of the files was corrupted? How can I find out which one? If the system is protected with 2 parity disks why is there an actual corruption?

I know that these are perhaps strange questions, but isn't it the idea behind this system, to have so much redundancy that loosing a file should be next to impossible. Should I rerun a parity check?

Lev

Edited March 8, 20179 yr by levster

Quote

March 8, 20179 yr

Your output indicates a successful operation.

Based on the steps I believe you have taken, I feel your Parity and data is fine.

As the instructions in the wiki indicate:

Quote

The xfs_repair instructions here are designed to check and fix the integrity of the XFS file system of a data drive, while maintaining its parity info.

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Running_xfs_repair

Edited March 8, 20179 yr by danioj

Quote

March 8, 20179 yr

Community Expert

File(s) on lost+found may be corrupt, check them, all other files should be OK.

Parity remains in sync, unless you don't do regular scheduled parity checks, no need to do one now.

Quote

March 8, 20179 yr

31 minutes ago, johnnie.black said:

File(s) on lost+found may be corrupt, check them, all other files should be OK.

Parity remains in sync, unless you don't do regular scheduled parity checks, no need to do one now.

I didn't realise this. I thought that the files(s) in the lost+found contained files that had their name data lost and because xfs_repair can't deal with that it puts them there but they are not corrupt as the repair was successful? Maybe I misunderstood.

@OP either way I'd say better advice would be just to do a restore from your backup if you're worried.

Edited March 8, 20179 yr by danioj

Quote

March 8, 20179 yr

I don't think that anything important was affected as I was in the process of actually deleting data of the drive. Just concerned that there could actually be data loss on a protected array.

Sent from my SM-G935V using Tapatalk

Quote

Unable to access disks

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)