After a reboot this morning my cache drives seems to be unmountable... No idea what is going on...
Syslog is attached
Error messages in the log are as below:
May 8 09:51:08 Tower kernel: ACPI: Early table checksum verification disabled
May 8 09:51:08 Tower kernel: spurious 8259A interrupt: IRQ7.
May 8 09:51:08 Tower kernel: floppy0: no floppy controllers found
May 8 09:51:08 Tower kernel: random: 7 urandom warning(s) missed due to ratelimiting
May 8 09:51:09 Tower rpc.statd[1802]: Failed to read /var/lib/nfs/state: Success
May 8 09:51:09 Tower ntpd[1832]: bind(19) AF_INET6 fe80::1c3e:aeff:fe3a:defa%13#123 flags 0x11 failed: Cannot assign requested address
May 8 09:51:09 Tower ntpd[1832]: failed to init interface for address fe80::1c3e:aeff:fe3a:defa%13
May 8 09:51:28 Tower avahi-daemon[11706]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
May 8 09:51:40 Tower kernel: WARNING: CPU: 2 PID: 12688 at fs/btrfs/extent-tree.c:6795 __btrfs_free_extent+0x1fd/0x8e4
May 8 09:51:40 Tower kernel: CPU: 2 PID: 12688 Comm: mount Not tainted 4.19.37-Unraid #1
May 8 09:51:40 Tower kernel: Call Trace:
May 8 09:51:40 Tower kernel: BTRFS error (device nvme0n1p1): unable to find ref byte nr 1037649829888 parent 0 root 5 owner 77097 offset 230969344
May 8 09:51:40 Tower kernel: BTRFS: Transaction aborted (error -2)
May 8 09:51:40 Tower kernel: WARNING: CPU: 2 PID: 12688 at fs/btrfs/extent-tree.c:6801 __btrfs_free_extent+0x250/0x8e4
May 8 09:51:40 Tower kernel: CPU: 2 PID: 12688 Comm: mount Tainted: G W 4.19.37-Unraid #1
May 8 09:51:40 Tower kernel: Call Trace:
May 8 09:51:40 Tower kernel: BTRFS: error (device nvme0n1p1) in __btrfs_free_extent:6801: errno=-2 No such entry
May 8 09:51:40 Tower kernel: BTRFS: error (device nvme0n1p1) in btrfs_run_delayed_refs:2935: errno=-2 No such entry
May 8 09:51:40 Tower kernel: BTRFS: error (device nvme0n1p1) in btrfs_replay_log:2277: errno=-2 No such entry (Failed to recover log tree)
May 8 09:51:40 Tower kernel: BTRFS error (device nvme0n1p1): pending csums is 134717440
May 8 09:51:40 Tower root: mount: /mnt/cache: mount(2) system call failed: No such file or directory.
May 8 09:51:40 Tower emhttpd: /mnt/cache mount error: No file system
May 8 09:51:40 Tower kernel: BTRFS error (device nvme0n1p1): open_ctree failed
The cache drive is still listed as a cache drive, just with an unmountable file system, attributes do not show issues I recognise as a an issue:
Critical warning 0x00
- Temperature 36 Celsius
- Available spare 100%
- Available spare threshold 5%
- Percentage used 4%
- Data units read 155,230,378 [79.4 TB]
- Data units written 90,224,490 [46.1 TB]
- Host read commands 464,542,688
- Host write commands 539,484,666
- Controller busy time 2,395
- Power cycles 21
- Power on hours 2,684
- Unsafe shutdowns 13
- Media and data integrity errors 0
- Error information log entries 10,922
- Warning comp. temperature time 0
- Critical comp. temperature time 0
Balance and scrub cannot be run "because array is not started" (array is ofcourse started and working)
I have started the array in maitenance mode so I can run the btrfs filesystem check in readonly mode, results are as follows:
[1/7] checking root items
[2/7] checking extents
ref mismatch on [1037649817600 8192] extent item 255, found 1
data backref 1037649829888 root 5 owner 77097 offset 230969344 num_refs 0 not found in extent tree
incorrect local backref count on 1037649829888 root 5 owner 77097 offset 230969344 found 1 wanted 0 back 0xcd9f170
incorrect local backref count on 1037649829888 root 5 owner 77097 offset 17208183807669456896 found 0 wanted 4287137790 back 0x17974a30
backref disk bytenr does not match extent record, bytenr=1037649829888, ref bytenr=0
backpointer mismatch on [1037649829888 4096]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p1
UUID: 344c37ac-26f1-4307-8451-1116b06922be
found 238952861696 bytes used, error(s) found
total csum bytes: 172892316
total tree bytes: 1707900928
total fs tree bytes: 1359200256
total extent tree bytes: 124682240
btree space waste bytes: 369061441
file data blocks allocated: 1187284238336
referenced 233465798656
Since errors are found I changed the --readonly to --repair and started a new check, allowing BTRFS to fix itself. It looks however like a dialogue process is now presented that is waiting for input that I ofcourse cannot give thru the webpage:
enabling repair mode
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p1
UUID: 344c37ac-26f1-4307-8451-1116b06922be
repair mode will force to clear out log tree, are you sure? [y/N]:
To make sure something else is not rotten I stopped the array, unassigned the cache drive, started the array without cache drive, stopped the array and re-added the cache drive. Cache drives comes back but again without file system.
Since the BTRFS repair option still might work but appears to be stuck in a dialogue process I want to run it through commandline, unofrtunately the /dev/ name listed as name of the cache drive does seem to work, if I give:
btrfs check --repair /dev/nvme0n1 that comes back with a remark the there is no btrfs filesystem there..
I checked the log to see how the check is run through the GUI, this gives a different /dev/ name: /dev/nvme0nlpl
I am now running the following command:
btrfs check --repair /dev/nvme0nlpl
Unfortunately it comes back as aborted, output is as follows:
root@Tower:/dev# btrfs check --repair /dev/nvme0n1p1
enabling repair mode
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p1
UUID: 344c37ac-26f1-4307-8451-1116b06922be
repair mode will force to clear out log tree, are you sure? [y/N]: Y
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [1037649817600 8192] extent item 255, found 1
repair deleting extent record: key [1037649817600,168,8192]
adding new data backref on 1037649817600 root 5 owner 77097 offset 188153856 found 1
Repaired extent references for 1037649817600
data backref 1037649829888 root 5 owner 77097 offset 230969344 num_refs 0 not found in extent tree
incorrect local backref count on 1037649829888 root 5 owner 77097 offset 230969344 found 1 wanted 0 back 0xce5cd30
incorrect local backref count on 1037649829888 root 5 owner 77097 offset 17208183807669456896 found 0 wanted 4287137790 back 0x17a32240
backref disk bytenr does not match extent record, bytenr=1037649829888, ref bytenr=0
backpointer mismatch on [1037649829888 4096]
repair deleting extent record: key [1037649829888,168,4096]
adding new data backref on 1037649829888 root 5 owner 77097 offset 230969344 found 1
Repaired extent references for 1037649829888
Failed to find [253425188864, 168, 16384]
btrfs unable to find ref byte nr 253425221632 parent 0 root 2 owner 0 offset 0
transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
btrfs[0x43e9f2]
btrfs(btrfs_commit_transaction+0x1ae)[0x43efce]
btrfs[0x45d282]
btrfs(cmd_check+0xc07)[0x45fff7]
btrfs(main+0x8e)[0x40dcbe]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x14f732db9b5b]
btrfs(_start+0x2a)[0x40deba]
Aborted
I have tried the same with the array not running... same result..
I ran the fix a couple of more times... Because I think the output was slightly different every time, maybe it was working itself through something.. I got through it without an abort after 4 tries, when I now bootup the array in maintenance mode and do a readonly check I get the following output:
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p1
UUID: 344c37ac-26f1-4307-8451-1116b06922be
cache and super generation don't match, space cache will be invalidated
found 238952861696 bytes used, no error found
total csum bytes: 172892316
total tree bytes: 1707900928
total fs tree bytes: 1359200256
total extent tree bytes: 124682240
btree space waste bytes: 369061441
file data blocks allocated: 1187284238336
referenced 233465798656
This basically looks error free I think ?
The cache drive continues to appear as without file system though... Even after stopping and restarting the array..
Therefore I did again:
I stopped the array, unassigned the cache drive, started the array without cache drive, stopped the array and re-added the cache drive. Then started the array in maintenance mode. There is no message relating to an unmountable file system any more..
I stop the array and restart it regularly (without maintenance mode)
Now the array comes back up without a missing filesystem.
Cache drive appears to be back in full operation, dockers are also running again...
Issue solved.. But any idea what went wrong here ?
- 1
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.