Disaster - Dockers Corrupted? Destroyed?


Profezor

Recommended Posts

On 9/19/2021 at 5:05 PM, Squid said:

I just assumed the logs got cut off.  Corruption exists on drive 4 (partial explanation for no containers), and if the server hasn't been powered off and cabling reseated, then the parity drive needs that and it also explains why the check is slow.

Squid - Dockers Tab started after the Disk 4 system check and repair. You guys are a life saver. I guess I am not totally out of the woods yet.  As most dockers are giving a Execution Error 403. 

 

 I need some newer, better drives. I guess I should run a parity asap.

galaxy-diagnostics-20210920-1756.zip

Edited by Profezor
Link to comment
1 hour ago, Profezor said:

most dockers are giving a Execution Error 403

You will probably have to recreate docker.img. It was ridiculously large at 150G anyway. Looks like it was using 29G of that 150G, which makes me question whether or not you don't have some application writing to a path that isn't mapped. Try 20G when you recreate it.

 

But while you have Docker disabled and before recreating it at 20G, also disable VM Manager, then run Mover so you can get appdata, domains, system shares moved off the array. Having them on the array will keep drives spunup since these always have open files, and will impact performance of dockers/VMs if they are on slower parity array.

 

https://wiki.unraid.net/Manual/Troubleshooting#How_do_I_recreate_docker.img.3F

 

https://wiki.unraid.net/Manual/Troubleshooting#Restoring_your_Docker_Applications

 

 

Link to comment
4 hours ago, trurl said:

You will probably have to recreate docker.img. It was ridiculously large at 150G anyway. Looks like it was using 29G of that 150G, which makes me question whether or not you don't have some application writing to a path that isn't mapped. Try 20G when you recreate it.

 

But while you have Docker disabled and before recreating it at 20G, also disable VM Manager, then run Mover so you can get appdata, domains, system shares moved off the array. Having them on the array will keep drives spunup since these always have open files, and will impact performance of dockers/VMs if they are on slower parity array.

 

https://wiki.unraid.net/Manual/Troubleshooting#How_do_I_recreate_docker.img.3F

 

https://wiki.unraid.net/Manual/Troubleshooting#Restoring_your_Docker_Applications

 

 

Doing this now. Will report back.

Link to comment

Are these the latest diagnostics? Doesn't look like you did anything I said.

On 9/20/2021 at 1:02 PM, trurl said:

while you have Docker disabled and before recreating it at 20G, also disable VM Manager, then run Mover so you can get appdata, domains, system shares moved off the array.

All these shares still have files on the array. And your docker.img is still 150G

Link to comment

Still doesn't look like you did anything, still corruption on disk4, docker.img is still 150G though it looks like you did configure it for 20G, which wouldn't take effect until you recreate it.  Even those previous diagnostics had it reconfigured for 20G but it was still 150G.

 

And appdata, domains, system still have files on the array. Doesn't even look like you ran mover.

 

Let's break things down.

 

Go to Settings and disable Docker. Leave it disabled for now.

 

Then check filesystem on disk4 and post the output.

 

 

Link to comment

Not sure of you time zone. I am in Europe.

I assure you I have done everything you have said TWICE.

I can't tell you what the logs say as I am not that technical.

 

I need the server back up now after a month.

 

SO I will do what every you say again happily 🙂 Thanks

 

Off to Settings.

Edited by Profezor
Link to comment
3 hours ago, Profezor said:

Not sure of you time zone. I am in Europe.

Irrelevant. We see lots of different timezones. I am basing what I said completely on the contents of your diagnostics. How exactly are you getting those diagnostics? Are you sure you are giving us the current diagnostics?

 

For example from those latest diagnostics you posted:

 

20 hours ago, trurl said:

still corruption on disk4

Though it is mounting (from logs/syslog.txt)

Sep 22 13:34:42 Galaxy emhttpd: shcmd (39): mount -t xfs -o noatime /dev/md4 /mnt/disk4
Sep 22 13:34:42 Galaxy kernel: XFS (md4): Mounting V5 Filesystem
Sep 22 13:34:42 Galaxy kernel: XFS (md4): Ending clean mount

later we see

Sep 22 13:46:23 Galaxy kernel: XFS (md4): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x23f721790 dinode
Sep 22 13:46:23 Galaxy kernel: XFS (md4): Unmount and run xfs_repair

Perhaps that is a recurring problem due to something like bad connections and so we would see them again even after repair.

 

20 hours ago, trurl said:

docker.img is still 150G though it looks like you did configure it for 20G

system/df.txt

Filesystem      Size  Used Avail Use% Mounted on
...
/dev/loop2      150G   29G  120G  20% /var/lib/docker

config/docker.cfg

DOCKER_ENABLED="yes"
DOCKER_IMAGE_FILE="/mnt/user/system/docker/docker.img"
DOCKER_IMAGE_SIZE="20"

 

20 hours ago, trurl said:

appdata, domains, system still have files on the array

shares/appdata.cfg

shareUseCache="prefer"
# Share exists on cache,disk1,disk3,disk4,disk5

shares/domains.cfg

shareUseCache="prefer"
# Share exists on disk3,disk4

shares/system.cfg

shareUseCache="prefer"
# Share exists on cache,disk3,disk4

 

There are some reasons that mover might not move those, but searching your syslog doesn't show mover ever being invoked.

 

Link to comment

I haven't forgotten. 

I can't reach the system remotely anymore apparently  so I need to setup a monitor and then figure out how get you the output from the Disk4 check.

 

I did run the repair with -n flag. I will not do that this time as per JorgeB.

 

MOVER - runs for hours and hours, but I guess it really sin't moving much as you point out.

 

BTW, I mentioned timezone just to let you know, I am not igne.noring this problem, but I just may be sleeping 🙂

 

Thanks for your continued help. I gotta get this back online

Edited by Profezor
accuracy
Link to comment

Your screenshot of User Shares above shows appdata on cache and disks 1,3,4 but your diagnostics say there is also appdata on disk5. And the diagnostics also show you had filled log space with entries from mover.

 

Nothing can move open files, and mover won't move duplicates. Looks like you have duplicates.

 

Reboot if you haven't already to get log space cleared.

Then go to the command line and post the results of

ls -lah /mnt/cache/appdata

and

ls -lah /mnt/user0/appdata

 

Link to comment
11 hours ago, trurl said:

Your screenshot of User Shares above shows appdata on cache and disks 1,3,4 but your diagnostics say there is also appdata on disk5. And the diagnostics also show you had filled log space with entries from mover.

 

Nothing can move open files, and mover won't move duplicates. Looks like you have duplicates.

 

Reboot if you haven't already to get log space cleared.

Then go to the command line and post the results of

ls -lah /mnt/cache/appdata

and

ls -lah /mnt/user0/appdata

 

root@Galaxy:~# ls -lah /mnt/cache/appdata
total 232K
drwxrwxrwx 1 nobody   users 1.1K Aug 15 19:58 ./
drwxrwxrwx 1 nobody   users   40 Mar  7  2021 ../
-rw-rw-rw- 1 terrence users 8.1K Sep 17 00:46 .DS_Store
-rw-rw-rw- 1 terrence users  368 Mar 15  2021 ._.DS_Store
-rw-r--r-- 1 nobody   users 2.4K Feb 21  2021 .bashrc
drwxr-xr-x 1 nobody   users   28 Mar 12  2021 .cache/
drwxr-xr-x 1 nobody   users   42 Mar 12  2021 .config/
drwx------ 1 nobody   users   10 Feb 21  2021 .local/
drwx------ 1 nobody   users   10 Mar 12  2021 .pki/
-rw-r--r-- 1 nobody   users  27K Mar 12  2021 .xorgxrdp.10.log
-rw-r--r-- 1 nobody   users 163K Mar 12  2021 .xorgxrdp.10.log.old
drwxrwxrwx 1 nobody   users   78 Apr 30 11:38 Grafana-Unraid-Stack/
drwxrwxrwx 1 nobody   users   22 May 29 01:58 MusicBrainz-Picard/
drwxrwxrwx 1 nobody   users  334 Sep 20 17:22 NginxProxyManager/
drwxrwxrwx 1 nobody   users  102 Jun 28 01:02 bazarr/
drwxrwxr-x 1 nobody   users  546 Sep 21 13:37 binhex-delugevpn/
drwxrwxr-x 1 nobody   users  256 Aug 21 08:57 binhex-jellyfin/
drwxrwxr-x 1 nobody   users   56 Feb 18  2021 binhex-krusader/
drwxrwxr-x 1 nobody   users  416 Sep 22 13:40 binhex-lidarr/
drwxrwxr-x 1 nobody   users   58 Apr 17 22:53 binhex-nginx/
drwxrwxr-x 1 nobody   users  202 Jul  6 22:55 binhex-nzbhydra2/
drwxrwxr-x 1 nobody   users  208 Jul 30 22:47 binhex-plex/
drwxrwxr-x 1 nobody   users  332 Sep 22 13:39 binhex-radarr/
drwxrwxr-x 1 nobody   users  246 Aug 17 07:19 binhex-sabnzbdvpn/
drwxrwxr-x 1 nobody   users  388 Sep 22 13:40 binhex-sonarr/
drwxrwxrwx 1 nobody   users  222 Sep 20 17:22 bitwarden/
drwxr-xr-x 1 root     root   120 Jun 11 10:14 cloudflared/
drwxrwxrwx 1 nobody   users   12 Jul  3 09:17 dupeGuru/
drwxrwxrwx 1 root     root    82 Jul 10 14:46 macinabox/
drwxrwxrwx 1 nobody   users  114 Jun 25 01:02 mariadb/
drwxrwxrwx 1 nobody   users  160 Jun 21 01:02 nextcloud/
drwxrwxrwx 1 root     root    28 Jun 21 13:08 onlyoffice/
drwxrwxrwx 1 nobody   users  136 Jun 15 21:33 openvpn-as/
drwxrwxrwx 1 nobody   users   36 Jun 30 16:40 organizrv2/
drwxrwxr-x 1 nobody   users   50 May  2 12:29 overseerr/
drwxrwxrwx 1 root     root     8 Apr 22 20:45 paperless-ng/
drwxrwxrwx 1 nobody   users   58 May 28 20:19 papermerge/
drwxrwxrwx 1 nobody   users   88 Sep 20 17:22 photoprism/
drwxrwxr-x 1 nobody   users  154 Sep 22 13:40 prowlarr/
drwxrwxrwx 1 nobody   users    0 Apr 29 21:57 rebuild-dndc/
drwxrwxrwx 1 nobody   users    0 Aug 15 19:58 socials/
drwxr-xr-x 1      911   911  114 Jul 15 16:56 speedtest-tracker/
-rwxr-xr-x 1 nobody   users   54 Feb 21  2021 startwm.sh*
drwxrwxrwx 1 nobody   users   32 Jul 25 12:31 tailscale/
drwxr-xr-x 1 nobody   users   40 Apr  2 11:24 tubesync/
drwxr-xr-x 1 nobody   users  102 Aug 12 21:44 unmanic/
drwxrwxrwx 1 nobody   users   40 Mar 21  2021 vm_custom_icons/
drwxrwxrwx 1 root     root   264 Aug 17 13:21 windows11_uupdump/
drwxr-xr-x 1 sshd     sshd   582 Apr  2 14:34 wordpress/

 

 

root@Galaxy:~# ls -lah /mnt/user0/appdata
total 0
drwxrwxrwx 1 nobody users  37 Aug 15 19:58 ./
drwxrwxrwx 1 nobody users 160 May 28 20:17 ../
drwxrwxrwx 1 nobody users  26 Apr 30 11:38 Grafana-Unraid-Stack/
drwxrwxrwx 1 nobody users  26 Sep 20 17:22 NginxProxyManager/
drwxrwxrwx 1 nobody users  25 Jun 28 01:02 bazarr/
drwxrwxr-x 1 nobody users  35 Sep 13 18:17 binhex-delugevpn/
drwxrwxr-x 1 nobody users  26 Feb 18  2021 binhex-krusader/
drwxrwxr-x 1 nobody users  39 Jul 30 22:47 binhex-plex/
drwxrwxrwx 1 nobody users  64 Sep 13 18:17 photoprism/

 

 

But now Disk 4 seems to have a big issue on top of all of this 😞

Screenshot 2021-10-01 at 13.59.47.png

Edited by Profezor
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.