Shares Missing, Docker Service Failed. Suspect failed drive

Bait Fish · January 1, 2023

[Version: 6.11.5] Unraid, containers, and plugins. Seems to have had a BTFRS error on SDD1 this morning. Log attached. Gotta run to work now, so keeping this description short.

Happy new year, and thanks for any help.

homer-diagnostics-20230101-0456.zip

Edited January 1, 2023 by Bait Fish
version, title mod

Bait Fish · January 1, 2023

Now that I'm back in the land of the living, details I could not include in my haste this morning. These errors happened in the middle of the night on a system that was humming along without any other concerns previously.

Shares are not showing that had been there previously. I'm sure this is a good part of the problem.

Unraid Version: 6.11.5.

Fix Common Problems greeted me with these issues when I checked on the server after coffee:

Unable to write to cache_nvme   Drive mounted read-only or completely full.
Unable to write to cache_ssd   Drive mounted read-only or completely full.
Unable to write to Docker Image   Docker Image either full or corrupted.

On the Docker tab in Unraid, these errors show.

Docker Containers
APPLICATION VERSION NETWORK PORT MAPPINGS (APP TO HOST) VOLUME MAPPINGS (APP TO HOST) AUTOSTART UPTIME

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 712
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 898

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 712
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 967
No Docker containers installed

And Unraid Settings>Docker mirrors these errors.

Enable Docker:

Yes One or more paths do not exist (view)

Docker vDisk location:

/mnt/user/system/docker/docker.img Path does not exist

Default appdata storage location:

/mnt/user/appdata/ Path does not exist

itimpi · January 2, 2023

Could be that the ‘fuse’ system that handles User Shares has crashed. Would need your system’s diagnostics take while the problem is occurring to be sure.

The FCP issues are a bit. concerning so something else could be going on. Have you run a memtest on your system recently in case it is a RAM related issue.

Bait Fish · January 2, 2023

I'll get a memtest going first thing tomorrow. Thanks for your guidance.

Sent from my Pixel 7 Pro using Tapatalk

Bait Fish · January 3, 2023

The memory test passed.

First boot into unRAID indicates the cache drive is missing. I'll check cables next. This SSD is only a couple/few months old.

Sent from my Pixel 7 Pro using Tapatalk

trurl · January 3, 2023

4 minutes ago, Bait Fish said:

First boot into unRAID indicates the cache drive is missing. I'll check cables next.

Post new diagnostics after.

Disable Docker and VM Manager in Settings till things are working well again.

cache_nvme is showing (XFS) corruption, check filesystem

docker and libvirt img both showing corruption. Since system share is on cache_nvme fix that filesystem then you will probably have to recreate them.

Corruption on cache_ssd, (btrfs) may be more complicated to fix.

trurl · January 3, 2023

14 minutes ago, Bait Fish said:

The memory test passed.

How long did you let it run?

Bait Fish · January 3, 2023

It took about 10 hours, 4 passes

Bait Fish · January 3, 2023

I started it up a second time. The cache drive (sde) that showed missing this morning, now showed present and ready. Nothing was done but restarting 9 hours later.

I shut it down, reseated the cables for cache sde.

Then started back up, and cache sde remained available. I saved diagnostics from this session and have uploaded to this post as suggested above.

I'll attempt repairs now.

Update:

Cache_nvme repair with default -n option results with,

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
block (0,50297084-50297193) multiply claimed by cnt space tree, state - 2
block (0,48998093-48998203) multiply claimed by cnt space tree, state - 2
block (0,50379227-50379336) multiply claimed by cnt space tree, state - 2
block (0,49633120-49633230) multiply claimed by cnt space tree, state - 2
agf_freeblks 64128684, counted 64133581 in ag 0
agf_freeblks 97418327, counted 97438471 in ag 2
sb_icount 4368768, counted 4433280
sb_ifree 44699, counted 1263236
sb_fdblocks 340342643, counted 354388165
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
data fork in ino 58802146 claims free block 7583440
data fork in ino 58802146 claims free block 7583505
data fork in ino 59896646 claims free block 7574017
data fork in ino 59896646 claims free block 7574079
        - agno = 1
        - agno = 2
bad nblocks 10397115 for inode 2175513269, would reset to 10397118
bad nextents 207685 for inode 2175513269, would reset to 207683
        - agno = 3
bad CRC for inode 3227792998
bad CRC for inode 3227792998, would rewrite
would have cleared inode 3227792998
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
free space (0,48811889-48811997) only seen by one free space btree
free space (0,50494325-50494435) only seen by one free space btree
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad CRC for inode 3227792998, would rewrite
would have cleared inode 3227792998
bad nblocks 10397115 for inode 2175513269, would reset to 10397118
bad nextents 207685 for inode 2175513269, would reset to 207683
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
would rebuild directory inode 1239086149
Metadata corruption detected at 0x46e010, inode 0xc0643666 dinode
couldn't map inode 3227792998, err = 117
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 3227798067, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x46e010, inode 0xc0643666 dinode
couldn't map inode 3227792998, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

homer-diagnostics-20230103-1458.zip

Edited January 3, 2023 by Bait Fish
Repair scan results

Bait Fish · January 3, 2023

Having trouble on mobile modifying the code box above. Scanned with the -nv flag per docs.

Quote

Phase 1 - find and verify superblock...
        - block cache size set to 3061336 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 417567 tail block 393702
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
block (0,50297084-50297193) multiply claimed by cnt space tree, state - 2
block (0,48998093-48998203) multiply claimed by cnt space tree, state - 2
block (0,50379227-50379336) multiply claimed by cnt space tree, state - 2
block (0,49633120-49633230) multiply claimed by cnt space tree, state - 2
agf_freeblks 64128684, counted 64133581 in ag 0
agf_freeblks 97418327, counted 97438471 in ag 2
sb_icount 4368768, counted 4433280
sb_ifree 44699, counted 1263236
sb_fdblocks 340342643, counted 354388165
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
data fork in ino 58802146 claims free block 7583440
data fork in ino 58802146 claims free block 7583505
data fork in ino 59896646 claims free block 7574017
data fork in ino 59896646 claims free block 7574079
        - agno = 1
        - agno = 2
bad nblocks 10397115 for inode 2175513269, would reset to 10397118
bad nextents 207685 for inode 2175513269, would reset to 207683
        - agno = 3
bad CRC for inode 3227792998
bad CRC for inode 3227792998, would rewrite
would have cleared inode 3227792998
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
free space (0,48811889-48811997) only seen by one free space btree
free space (0,50494325-50494435) only seen by one free space btree
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
bad CRC for inode 3227792998, would rewrite
would have cleared inode 3227792998
bad nblocks 10397115 for inode 2175513269, would reset to 10397118
bad nextents 207685 for inode 2175513269, would reset to 207683
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
would rebuild directory inode 1239086149
        - agno = 2
        - agno = 3
Metadata corruption detected at 0x46e010, inode 0xc0643666 dinode
couldn't map inode 3227792998, err = 117
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 3227798067, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x46e010, inode 0xc0643666 dinode
couldn't map inode 3227792998, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Tue Jan  3 15:25:14 2023

Phase		Start		End		Duration
Phase 1:	01/03 15:25:04	01/03 15:25:04
Phase 2:	01/03 15:25:04	01/03 15:25:04
Phase 3:	01/03 15:25:04	01/03 15:25:09	5 seconds
Phase 4:	01/03 15:25:09	01/03 15:25:10	1 second
Phase 5:	Skipped
Phase 6:	01/03 15:25:10	01/03 15:25:14	4 seconds
Phase 7:	01/03 15:25:14	01/03 15:25:14

Total run time: 10 seconds

trurl · January 4, 2023

35 minutes ago, Bait Fish said:
No modify flag set

Do it again without -n, if it ask for it use -L

Bait Fish · January 4, 2023

I ran through a couple repair sessions and tried to follow its directions. I pasted all the notes in this code block including which flag I was using, typically keeping verbose on.

I did not catch it telling me to run the L option so I have not done that.

Quote

Phase 1 - find and verify superblock...
        - block cache size set to 3061336 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 417567 tail block 393702
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.


Stopped maint mode array. Started array. Stopped array. Started maint mode array. Rechecked xfs -nv

Phase 1 - find and verify superblock...
        - block cache size set to 3061320 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 417613 tail block 417613
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad CRC for inode 3227792998
bad CRC for inode 3227792998, would rewrite
would have cleared inode 3227792998
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad CRC for inode 3227792998, would rewrite
would have cleared inode 3227792998
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
would rebuild directory inode 1239086149
        - agno = 2
        - agno = 3
Metadata corruption detected at 0x46e010, inode 0xc0643666 dinode
couldn't map inode 3227792998, err = 117
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 3227798067, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x46e010, inode 0xc0643666 dinode
couldn't map inode 3227792998, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Tue Jan  3 16:05:11 2023

Phase		Start		End		Duration
Phase 1:	01/03 16:05:02	01/03 16:05:02
Phase 2:	01/03 16:05:02	01/03 16:05:02
Phase 3:	01/03 16:05:02	01/03 16:05:07	5 seconds
Phase 4:	01/03 16:05:07	01/03 16:05:08	1 second
Phase 5:	Skipped
Phase 6:	01/03 16:05:08	01/03 16:05:11	3 seconds
Phase 7:	01/03 16:05:11	01/03 16:05:11

Total run time: 9 seconds



Ran xfs repair -v

Phase 1 - find and verify superblock...
        - block cache size set to 3061320 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 417613 tail block 417613
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad CRC for inode 3227792998
bad CRC for inode 3227792998, will rewrite
cleared inode 3227792998
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 1
clearing reflink flag on inode 2149504425
clearing reflink flag on inode 1073935201
clearing reflink flag on inode 2149760794
clearing reflink flag on inode 3223369736
clearing reflink flag on inode 2193593298
clearing reflink flag on inode 3270882671
clearing reflink flag on inode 286726471
clearing reflink flag on inode 286743730
clearing reflink flag on inode 3307207359
clearing reflink flag on inode 1148980934
clearing reflink flag on inode 1148980936
clearing reflink flag on inode 1148980938
clearing reflink flag on inode 1148980939
clearing reflink flag on inode 1148980940
clearing reflink flag on inode 1148980941
clearing reflink flag on inode 1148980942
clearing reflink flag on inode 1151756820
clearing reflink flag on inode 1151756823
clearing reflink flag on inode 325929482
clearing reflink flag on inode 325929522
clearing reflink flag on inode 325929602
clearing reflink flag on inode 325929606
clearing reflink flag on inode 325929607
clearing reflink flag on inode 325929608
clearing reflink flag on inode 3376954786
clearing reflink flag on inode 1232940612
clearing reflink flag on inode 1232940615
clearing reflink flag on inode 1232940616
clearing reflink flag on inode 1232940617
clearing reflink flag on inode 379736619
clearing reflink flag on inode 380663459
clearing reflink flag on inode 380663497
clearing reflink flag on inode 380663502
clearing reflink flag on inode 380663503
clearing reflink flag on inode 380663510
clearing reflink flag on inode 380663511
clearing reflink flag on inode 380663520
clearing reflink flag on inode 380663521
clearing reflink flag on inode 380663522
clearing reflink flag on inode 380663523
clearing reflink flag on inode 380663524
clearing reflink flag on inode 380663525
clearing reflink flag on inode 380663540
clearing reflink flag on inode 380663541
clearing reflink flag on inode 3410317497
clearing reflink flag on inode 3410343811
clearing reflink flag on inode 3410343893
clearing reflink flag on inode 3410343927
clearing reflink flag on inode 3410649145
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
rebuilding directory inode 1239086149
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Tue Jan  3 16:07:43 2023

Phase		Start		End		Duration
Phase 1:	01/03 16:07:33	01/03 16:07:33
Phase 2:	01/03 16:07:33	01/03 16:07:33
Phase 3:	01/03 16:07:33	01/03 16:07:39	6 seconds
Phase 4:	01/03 16:07:39	01/03 16:07:40	1 second
Phase 5:	01/03 16:07:40	01/03 16:07:40
Phase 6:	01/03 16:07:40	01/03 16:07:43	3 seconds
Phase 7:	01/03 16:07:43	01/03 16:07:43

Total run time: 10 seconds
done


xfs repair -nv

Phase 1 - find and verify superblock...
        - block cache size set to 3061320 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 417613 tail block 417613
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Tue Jan  3 16:08:50 2023

Phase		Start		End		Duration
Phase 1:	01/03 16:08:41	01/03 16:08:41
Phase 2:	01/03 16:08:41	01/03 16:08:41
Phase 3:	01/03 16:08:41	01/03 16:08:46	5 seconds
Phase 4:	01/03 16:08:46	01/03 16:08:47	1 second
Phase 5:	Skipped
Phase 6:	01/03 16:08:47	01/03 16:08:50	3 seconds
Phase 7:	01/03 16:08:50	01/03 16:08:50

Total run time: 9 seconds


xfs repair -v

Phase 1 - find and verify superblock...
        - block cache size set to 3061320 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 417613 tail block 417613
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Tue Jan  3 16:10:45 2023

Phase		Start		End		Duration
Phase 1:	01/03 16:10:35	01/03 16:10:35
Phase 2:	01/03 16:10:35	01/03 16:10:35
Phase 3:	01/03 16:10:35	01/03 16:10:40	5 seconds
Phase 4:	01/03 16:10:40	01/03 16:10:41	1 second
Phase 5:	01/03 16:10:41	01/03 16:10:42	1 second
Phase 6:	01/03 16:10:42	01/03 16:10:45	3 seconds
Phase 7:	01/03 16:10:45	01/03 16:10:45

Total run time: 10 seconds
done

trurl · January 4, 2023

Start the array in Normal (not Maintenance mode) and post new diagnostics

Bait Fish · January 4, 2023

New diagnostics after starting array in normal mode

homer-diagnostics-20230103-1639.zip

trurl · January 4, 2023

Looks like they are mounted now. Have you checked contents?

What do you get from command line with this?

ls -lah /mnt/cache_nvme/system

Bait Fish · January 4, 2023

root@Homer:~# ls -lah /mnt/cache_nvme/system
total 0
drwxrwxrwx 4 nobody users 35 Nov 16  2021 ./
drwxrwxrwx 5 nobody users 50 Jan  3 16:48 ../
drwxrwxrwx 2 nobody users 24 Oct 11 16:32 docker/
drwxrwxrwx 2 nobody users 25 Nov 16  2021 libvirt/
root@Homer:~#

Contents appear ok looking through the various directories.

Edited January 4, 2023 by Bait Fish

trurl · January 4, 2023

What do you get from command line with this?

ls -lah /mnt/cache_nvme/system/docker

and this?

ls -lah /mnt/cache_nvme/system/libvirt

Bait Fish · January 4, 2023

root@Homer:/mnt/cache_nvme/system# ls -lah /mnt/cache_nvme/system/docker
total 40G
drwxrwxrwx 2 nobody users  24 Oct 11 16:32 ./
drwxrwxrwx 4 nobody users  35 Nov 16  2021 ../
-rw-rw-rw- 1 nobody users 50G Jan  1 01:39 docker.img
root@Homer:/mnt/cache_nvme/system# ls -lah /mnt/cache_nvme/system/libvirt
total 104M
drwxrwxrwx 2 nobody users   25 Nov 16  2021 ./
drwxrwxrwx 4 nobody users   35 Nov 16  2021 ../
-rw-rw-rw- 1 nobody users 1.0G Dec 31 17:00 libvirt.img
root@Homer:/mnt/cache_nvme/system#

trurl · January 4, 2023

Looks like the img files are there, no guarantee they are usable though.

docker.img is easy enough to recreate. Do you have a backup of libvirt.img?

Bait Fish · January 4, 2023

Uh oh. I have the VM autobackup itself on schedule. BUT I recall the morning the system went bad, that the VM backup executed.

I may not.

edit: libvirt.img is not in the backup location . . . I do not recall making any backup of it elsewhere manually.

x2: and i did not have a location set in Appdata Backup/Restore for backing up libvirt.img

Edited January 4, 2023 by Bait Fish

trurl · January 4, 2023

Enable VM Manager and see what happens.

Bait Fish · January 4, 2023

VM Manager started without any obvious issues. Started a Win10 VM successfully. Seems good.

trurl · January 4, 2023

I guess you could try enabling Docker with that existing img

Bait Fish · January 4, 2023

Docker service started without complaints. Containers appear to have auto started successfully as well!

You mebtioned BTFRS corruption. I haven't tried that yet. Should I tackle that next?

And, thanks for walking me thorough all this.

trurl · January 4, 2023

cache_ssd is mounted but only has 7.4G data. Is that the expected amount?

Shares Missing, Docker Service Failed. Suspect failed drive

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation