GlennCottam Posted August 19, 2021 Share Posted August 19, 2021 Yesterday, I got a notification that the cache pool was full, and took action to rectify it (emptied my recycle bin, and invoked the mover, which cleared plenty of space off the pool). This morning, all of my dockers where down, and all dockers where saying the file system was read only. This meant my cache was mounted as read only. Thinking that one of the cache SSD's might have just been full, I attempted to balance the pool. The balance exited with: "BTRFS info (device sdd1): balance: ended with status -30". I rebooted the server, and attempted the balance again with the same error. The error only shows itself on the disk log for one of the SSD's. I have also tried a scrub which failed the same way. The other 3 SSD's have no errors in their logs, but the SSD having issues has the following errors on boot: Aug 19 13:00:33 Unraid kernel: BTRFS error (device sdd1): unable to find ref byte nr 11987553361920 parent 0 root 5 owner 36039698 offset 1277067264 Aug 19 13:00:33 Unraid kernel: BTRFS: error (device sdd1) in __btrfs_free_extent:3092: errno=-2 No such entry Aug 19 13:00:33 Unraid kernel: BTRFS info (device sdd1): forced readonly Aug 19 13:00:33 Unraid kernel: BTRFS: error (device sdd1) in btrfs_run_delayed_refs:2144: errno=-2 No such entry I then rebooted into safe mode, and entered maintenance mode in order to see if more information would show itself. I was able to run a BTRFS check on the SSD getting the following results: [1/7] checking root items [2/7] checking extents parent transid verify failed on 10084585357312 wanted 2949214 found 2949025 parent transid verify failed on 10957839237120 wanted 18446612689103786680 found 2993596 parent transid verify failed on 10957839237120 wanted 18446612689103786680 found 2993596 parent transid verify failed on 10957839237120 wanted 18446612689103786680 found 2993596 Ignoring transid failure data backref 11987553361920 root 5 owner 36039698 offset 1277067264 num_refs 0 not found in extent tree incorrect local backref count on 11987553361920 root 5 owner 36039698 offset 1277067264 found 1 wanted 0 back 0x11deca50 incorrect local backref count on 11987553361920 root 11763000595709952005 owner 4294936705 offset 1277067264 found 0 wanted 1 back 0x11df3a50 backref disk bytenr does not match extent record, bytenr=11987553361920, ref bytenr=0 backpointer mismatch on [11987553361920 86016] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots parent transid verify failed on 10957839237120 wanted 18446612689103786680 found 2993596 Ignoring transid failure [5/7] checking only csums items (without verifying data) parent transid verify failed on 10957839237120 wanted 18446612689103786680 found 2993596 Ignoring transid failure [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) ERROR: transid errors in file system Opening filesystem to check... Checking filesystem on /dev/sdd1 UUID: b81ae343-60cb-4303-8fac-192942135255 cache and super generation don't match, space cache will be invalidated found 1205836599296 bytes used, error(s) found total csum bytes: 1174204308 total tree bytes: 2446475264 total fs tree bytes: 917962752 total extent tree bytes: 130154496 btree space waste bytes: 430889587 file data blocks allocated: 1675174596608 referenced 1200826040320 I am completely lost on what to do at this point, and need assistance. The only thing I can think to do now is to copy the contents of the cache onto the array, and reformat the cache disks. I appreciate any help I can get! unraid-diagnostics-20210819-1322.zip Quote Link to comment
JorgeB Posted August 19, 2021 Share Posted August 19, 2021 5 minutes ago, GlennCottam said: The only thing I can think to do now is to copy the contents of the cache onto the array, and reformat the cache disks. Yes, that's what you should do, some recovery options here if needed. Quote Link to comment
GlennCottam Posted August 19, 2021 Author Share Posted August 19, 2021 Thanks for the quick reply! I did see that post in my googling but wanted to see if there was another option before I proceeded. Started the copy to the array. Ill reply if anything comes up. Quote Link to comment
ChatNoir Posted August 19, 2021 Share Posted August 19, 2021 1 hour ago, GlennCottam said: Yesterday, I got a notification that the cache pool was full, and took action to rectify it If not already set, you should probably configure your pool notification thresholds. It would allow you to act before you have the issue. And setting a Minimum free space would allow Unraid to write to the Array instead of filling your Pool for all shares set as Prefer if I remember correctly. (even though it is not indicated as such in the help tooltip) Quote Link to comment
itimpi Posted August 19, 2021 Share Posted August 19, 2021 13 minutes ago, ChatNoir said: even though it is not indicated as such in the help tooltip) I think that the help is actually not correct as it mentions new files resulting in an ‘out of space’ error. That is not true as if the User share Use Cache setting is Yes or Prefer since then new files can instead be written to the array. Quote Link to comment
GlennCottam Posted August 20, 2021 Author Share Posted August 20, 2021 After moving the files all day yesterday, I am ready to try to repair the disk itself using: btrfs check --repair /dev/sdd I am wondering if using btrfs check --repair will erase the disk? On a side note when I checked this morning, the server was unresponsive. I couldn't access the physical terminal (keyboard and monitor on the server) as the screen was blank, the web ui would not respond, and my SSH connection failed. I had to force reboot the server to get control back. Unfortunately the syslog does not show anything useful. I have had these types of crashes before after I installed a new 10GbE networking card. Thought I had it fixed as it hasn't done this in a few weeks. I am not sure if this problem might be related to the cache disk, or another problem I have to find a solution for. Regardless, the single cache disk is still acting strange, still giving the same error, and in read only format. Quote Link to comment
JorgeB Posted August 20, 2021 Share Posted August 20, 2021 17 minutes ago, GlennCottam said: I am wondering if using btrfs check --repair will erase the disk? It shouldn't, but it might make things worse, or just fix it temporarily, recommend re-formatting it once it's backed up. Quote Link to comment
GlennCottam Posted August 20, 2021 Author Share Posted August 20, 2021 10 minutes ago, JorgeB said: It shouldn't, but it might make things worse, or just fix it temporarily, recommend re-formatting it once it's backed up. Thank you! I formatted the drive, and then reformatted the cache pool. However, I now have a new problem. The pool is saying I only have 1.1TB of cache, when I have 2.1TB of SSD's installed. Everything that I can look at seams to be in working order. I am not sure where the issue lies. All the SSD's have the right allocation in their menus. Quote Link to comment
JorgeB Posted August 20, 2021 Share Posted August 20, 2021 Default mode is raid1, you can convert to others. Quote Link to comment
GlennCottam Posted August 20, 2021 Author Share Posted August 20, 2021 (edited) 5 minutes ago, JorgeB said: Default mode is raid1, you can convert to others. That would make sense! Thank you! Edited August 20, 2021 by GlennCottam Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.