Shares disappeared, docker folder doesn't exist.

bobobeastie · March 1, 2020

Woke up and found a fix common errors notification, Unable to write to cache, and Unable to write to Docker Image, then I went to the Shares page and it's empty, which explains the docker part. Through SMB I am able to see flash and an UD drive share, nothing else.

I briefly, uneducatedly, looked through to diagnostics files, and see the shares are present, and I can see them if I look at files on individual drives also. Which makes me guess everything will be fine after a reboot. I would have rebooted already after getting diagnostics, but I am 81% through a data rebuild of a new 12tb disk, so I have 6 hours left.

I did notice some errors in my syslog, not sure if any of them are serious or related.

Running 6.8.2

nastheripper-diagnostics-20200301-0825.zip

Squid · March 1, 2020

The filesystem detected corruption on the cache drive, and basically shut itself off to protect itself.

Run the filesystem checks against the cache drive https://wiki.unraid.net/Check_Disk_Filesystems

Feb 29 21:39:21 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1011004192
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x3c42b2e0 len 32 error 61
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): xfs_do_force_shutdown(0x8) called from line 3399 of file fs/xfs/xfs_inode.c.  Return address = 00000000811a5f23
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): Corruption of in-memory data detected.  Shutting down filesystem
Feb 29 21:39:21 NAStheRIPPER kernel: print_req_error: I/O error, dev loop2, sector 23450224
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): Please umount the filesystem and rectify the problem(s)

And it probably wouldn't hurt to run an extended SMART test against the cache drive also (Main, Cache, click on Cache...)

Media and Data Integrity Errors:    6

bobobeastie · March 1, 2020

Thank you for the response. I need to wait for the data rebuild to use maintenance mode, as my cache drive is xfs. I tried to run the extended smart test, but it doesn't look like it did anything, below it says "No self-tests logged on this disk".

I was able to download a file and maybe this is useful as to the 6 errors, could be useful or probably this was already in my diagnostics:

Error Information (NVMe Log 0x01, max 256 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 6 7 0x0357 0x0281 - 1011004192 1 -
1 5 7 0x0357 0x0281 - 1011004192 1 -
2 4 7 0x0357 0x0281 - 1011004192 1 -
3 3 7 0x0357 0x0281 - 1011004192 1 -
4 2 7 0x0357 0x0281 - 1011004192 1 -
5 1 7 0x0357 0x0281 - 1011004192 1 -

I'll try both steps after the rebuild, in maintenance mode, and if they both don't work I'll reboot and try again.

bobobeastie · March 1, 2020

I had to reboot as it was stuck on something like unmounting fs when I tried to stop the array. I ran -n and got 903 lines of output, plety of which are obvious errors, like "would move to lost+found", the output includes filenames (torrent files and plex docker files) so I don't want to attach it, at least not before obfuscating. After that, running the check with no options gets this:

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Assuming this means trying to start the array normally, stop the array, start in maintenance mode, I have done that, and I get the same message. So if that is correct, then I should run it with the -L flag next? Please let me know if I should post the original output of -n. I'm going to wait so I don't make this worse by being hasty like I have in the past.

bobobeastie · March 1, 2020

Went ahead and edited the output to hide filenames, see attached.

xfs_checkfs.txt

trurl · March 1, 2020

24 minutes ago, bobobeastie said:

I should run it with the -L flag next?

Yes

bobobeastie · March 1, 2020

Thanks, that seems to have worked. The array started, I have a lost+found file with almost nothing in it, which I'm not sure is good or bad. Docker service won't start:

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 658
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 830

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 658
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 894

I'm pretty sure I need to delete the docker image and then hopefully the templates are still there for adding back? Easier and/or better solutions are welcome.

Squid · March 1, 2020

50 minutes ago, bobobeastie said:

I'm pretty sure I need to delete the docker image

Most likely

50 minutes ago, bobobeastie said:

hopefully the templates are still there for adding back?

They're on the flash drive. Go to apps, previous apps

51 minutes ago, bobobeastie said:

lost+found

Lost & found are file(s) that the system couldn't recover for one reason or another.

bobobeastie · March 2, 2020

Great thanks, I was worried for a minute about plex because my previous config didn't appear to be saved, but it was. This is my second -L rodeo, so I knew what lost+found was. What I meant was maybe the lack of files was a bad sign because perhaps some were lost. But maybe that not how it works. Anyway, thank you @Squid and @trurl

...doesn't look like I'm out of the woods yet, guessing the drive is bad:

Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546720
Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546728
Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546728
Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546728
Mar 1 20:51:24 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741416
Mar 1 20:51:25 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741448
Mar 1 20:51:26 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741544
Mar 1 20:51:26 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741544
Mar 1 21:00:04 NAStheRIPPER kernel: print_req_error: critical target error, dev nvme1n1, sector 1997853383

The last error is from my former cache drive which I left installed, it is not able to run trim for some reason. The current cache drive (nvmeOn1) is an HP SSD EX950 1TB. One thing worth noting, when I last swapped nvme drives around on this board (designare threadripper), I noticed that there is something like a black rubber feeling thermal pad on the included nvme heatsink, and it seemed kind of wet, like from separated glue or something between the metal and "rubber", I just wiped them off and kept using them. Rubber isn't thermally conductive so I'm sure it's something else.

When I click either SMART button it very quickly changes back to what it started as, not even time to read that it says stop, I had to run it on another drive to check how it normally works, so I don't think they are running. I'm going to switch from prefer cache to Yes for all applicable shares to start offloading. Should I run a smart check in the terminal? If so how?

edit: Not sure it matters, but I just noticed I have the first bios version, and I'm 90% sure I updated it at some point. maybe it kept a base copy and reverted at some point. Looking at the change log, here https://www.gigabyte.com/us/Motherboard/X399-DESIGNARE-EX-rev-10/support#support-dl-bios it doesn't mention nvme, except only in relation to RAID, which obviously I'm not using.

nastheripper-diagnostics-20200301-2132.zip

Edited March 2, 2020 by bobobeastie
new info

bobobeastie · March 2, 2020

Still having the issue. I updated my bios, did another -L, and am waiting for the array to start, it has been stuck on starting services for a while. Looking for suggestions please. I wasn't able to figure out console smartctl, this page: https://wiki.unraid.net/Console doesn't really mention nvme, so I tried but the commands didn't work.

bobobeastie · March 2, 2020

I examined both nvme drives and swapped them. Attached is an image of what I was talking about, and I don't think I'm concerned about it as a cause. I rubbed one of them with my hand and it started to disintegrate, so I stopped. It only makes contact with the tops of chips or stickers, and residue on those was minimal.

Once I started it back up, everything appears to be okay, at least for now. I am letting mover run with all shares set to yes for cache so that I can set to no when done. At that point I am open to suggestions as to what to do.

JorgeB · March 2, 2020

SMART tests don't work on NVMe devices, though that does look like a failing device.

bobobeastie · March 2, 2020

Alrighty, apparently HP doesn't handle support for their SSD's, and their support gives a phone number with a full voicemail box, and an outdated email address belonging to a security camera company ([email protected]), When I googled the phone number I found a mention of [email protected], which I found was actually connected to HP. The only reason I'm mentioning this here is so when someone googles any of these things they might be helped by this info. This has taught me not to be an HP customer.

In the mean time, it would be great if I could offload from cache, but it just disabled itself again after mover was running very slowly. The only thing I can think of trying, other than keep trying mover after fixing the fs errors, is to use backup/restore appdata, to make a copy or maybe copying over the share works just as well? Is there was away to ignore fs errors?

nastheripper-diagnostics-20200302-1534.zip

trurl · March 3, 2020

3 hours ago, bobobeastie said:

backup/restore appdata

If you have appdata backup that plus the templates on flash are all you need to get your dockers going again using the Previous Apps feature on the Apps page. You could do this with a different cache or even without cache.

bobobeastie · March 3, 2020

54 minutes ago, trurl said:

If you have appdata backup that plus the templates on flash are all you need to get your dockers going again using the Previous Apps feature on the Apps page. You could do this with a different cache or even without cache.

Yes, and it looks like mover was able to move almost all of appdata, all that's left is a single file, nzbdrone.db-wal in binhex sonarr. I'm assuming that's the kind of thing that will fix its self, or through reinstalling the container. Everything else is torrents, which should fix themselves, and one recently transcoded backed up blu ray. Not too bad. I think I will run cache-less while I iron out the things below. Is the procedure for this as simple as stop array, select no device for cache, then start array?

The future cache drive is a 1TB ADATA_SX6000NP. I posted the issue I was having with it a while ago and it received no responses. The issue being that when the trim plugin runs, I get an email saying "fstrim: /mnt/disks/ADATA_SX6000NP_XXXXXXXXXXX: FITRIM ioctl failed: Remote I/O error". I would feel much better about using it as a cache if that problem was solved. If I can get trim working, I'm thinking about switching to BTRFS and then RAID1'ing it and the replacement NVMe drives, as I understand I would then be able to lose 1 of them.

Shares disappeared, docker folder doesn't exist.

Recommended Posts

bobobeastie

Link to comment

Squid

Link to comment

bobobeastie

Link to comment

bobobeastie

Link to comment

bobobeastie

Link to comment

trurl

Link to comment

bobobeastie

Link to comment

Squid

Link to comment

bobobeastie

Link to comment

bobobeastie

Link to comment

bobobeastie

Link to comment

JorgeB

Link to comment

bobobeastie

Link to comment

trurl

Link to comment

bobobeastie

Link to comment

Join the conversation