PsyCl0ne Posted August 20, 2021 Share Posted August 20, 2021 Randomly I have lost the ability to start any dockers besides netdata. No updates or randomly power outages. Everything was working fine server uptime of 31 days. When trying to refresh web interfaces for dockers that were running, I was seeing some SQL errors. I tried to restart the docker service and now nothing will start up. Last time I was having issues I was advised to delete and recreate the docker image which worked so its a fairly new image. I don't think I have any hardware failure happening, as fix common problems doesn't see anything. Any advice would be appreciated. Diagnostics have been attached. tower-diagnostics-20210819-2149.zip Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 Looks like something wrong with your cache pool and that must have broken user shares. Jul 20 00:14:00 Tower emhttpd: shcmd (35802): /sbin/btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/cache && /sbin/btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/cache & Jul 20 00:14:00 Tower kernel: BTRFS error (device sdj1): balance: invalid convert data profile raid1 Pool: cache Overall: Device size: 953.87GiB Device allocated: 450.03GiB Device unallocated: 503.84GiB Device missing: 0.00B Used: 202.04GiB Free (estimated): 749.69GiB (min: 749.69GiB) Free (statfs, df): 749.69GiB Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path single single single Unallocated -- --------- --------- --------- -------- ----------- 2 /dev/sdj1 447.00GiB 3.00GiB 32.00MiB 503.84GiB -- --------- --------- --------- -------- ----------- Total 447.00GiB 3.00GiB 32.00MiB 503.84GiB Used 201.15GiB 914.97MiB 96.00KiB I will have to pass this off to @JorgeB, may be a few hours before he sees it. 1 Quote Link to comment
JorgeB Posted August 20, 2021 Share Posted August 20, 2021 There's a problem with the pool, it's only using one device, looks like the 2nd one was never successfully added, but unlikely to be related to your issue, to fix that you can try this: -Stop array -Unassign cache1 (sdk currently) -Start array -Stop array -Re-assign cache1 -Start array and post new diags. Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 12 hours ago, trurl said: must have broken user shares The reason I said that is because no /mnt/user in df Quote Link to comment
JorgeB Posted August 20, 2021 Share Posted August 20, 2021 41 minutes ago, trurl said: because no /mnt/user in df Yes, missed that, but that's not because of the pool: Aug 18 23:08:20 Tower shfs: shfs: ../lib/fuse.c:1451: unlink_node: Assertion `node->nlookup > 1' failed. It's this issue: A reboot will fix it. Quote Link to comment
PsyCl0ne Posted August 21, 2021 Author Share Posted August 21, 2021 18 hours ago, JorgeB said: There's a problem with the pool, it's only using one device, looks like the 2nd one was never successfully added, but unlikely to be related to your issue, to fix that you can try this: -Stop array -Unassign cache1 (sdk currently) -Start array -Stop array -Re-assign cache1 -Start array and post new diags. Thank you for your time JorgeB. Attached the new diagnostics after following your steps. I did not reboot just yet but will shortly after posting this. tower-diagnostics-20210820-2115.zip Quote Link to comment
JorgeB Posted August 21, 2021 Share Posted August 21, 2021 Pool looks OK now, just need to wait for the balance to finish. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.