Jump to content

Shares disappeared, docker folder doesn't exist.


Recommended Posts

Woke up and found a fix common errors notification, Unable to write to cache, and Unable to write to Docker Image, then I went to the Shares page and it's empty, which explains the docker part.  Through SMB I am able to see flash and an UD drive share, nothing else.

 

I briefly, uneducatedly, looked through to diagnostics files, and see the shares are present, and I can see them if I look at files on individual drives also.  Which makes me guess everything will be fine after a reboot.  I would have rebooted already after getting diagnostics, but I am 81% through a data rebuild of a new 12tb disk, so I have 6 hours left.

 

I did notice some errors in my syslog, not sure if any of them are serious or related.

 

Running 6.8.2

nastheripper-diagnostics-20200301-0825.zip

Link to comment

The filesystem detected corruption on the cache drive, and basically shut itself off to protect itself.

 

Run the filesystem checks against the cache drive https://wiki.unraid.net/Check_Disk_Filesystems

 

Feb 29 21:39:21 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1011004192
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x3c42b2e0 len 32 error 61
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): xfs_do_force_shutdown(0x8) called from line 3399 of file fs/xfs/xfs_inode.c.  Return address = 00000000811a5f23
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): Corruption of in-memory data detected.  Shutting down filesystem
Feb 29 21:39:21 NAStheRIPPER kernel: print_req_error: I/O error, dev loop2, sector 23450224
Feb 29 21:39:21 NAStheRIPPER kernel: XFS (nvme0n1p1): Please umount the filesystem and rectify the problem(s)

 

And it probably wouldn't hurt to run an extended SMART test against the cache drive also (Main, Cache, click on Cache...)

Media and Data Integrity Errors:    6

 

Link to comment

Thank you for the response.  I need to wait for the data rebuild to use maintenance mode, as my cache drive is xfs.  I tried to run the extended smart test, but it doesn't look like it did anything, below it says "No self-tests logged on this disk".

 

I was able to download a file and maybe this is useful as to the 6 errors, could be useful or probably this was already in my diagnostics:

Error Information (NVMe Log 0x01, max 256 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          6     7  0x0357  0x0281      -   1011004192     1     -
  1          5     7  0x0357  0x0281      -   1011004192     1     -
  2          4     7  0x0357  0x0281      -   1011004192     1     -
  3          3     7  0x0357  0x0281      -   1011004192     1     -
  4          2     7  0x0357  0x0281      -   1011004192     1     -
  5          1     7  0x0357  0x0281      -   1011004192     1     -

 

I'll try both steps after the rebuild, in maintenance mode, and if they both don't work I'll reboot and try again.

Link to comment

I had to reboot as it was stuck on something like unmounting fs when I tried to stop the array.  I ran -n and got 903 lines of output, plety of which are obvious errors, like "would move to lost+found", the output includes filenames (torrent files and plex docker files) so I don't want to attach it, at least not before obfuscating.  After that, running the check with no options gets this:

 

 

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

Assuming this means trying to start the array normally, stop the array, start in maintenance mode, I have done that, and I get the same message.  So if that is correct, then I should run it with the -L flag next? Please let me know if I should post the original output of -n. I'm going to wait so I don't make this worse by being hasty like I have in the past.

Link to comment

Thanks, that seems to have worked.  The array started, I have a lost+found file with almost nothing in it, which I'm not sure is good or bad. Docker service won't start:

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 658
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 830

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 658
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 894

 

I'm pretty sure I need to delete the docker image and then hopefully the templates are still there for adding back?  Easier and/or better solutions are welcome.

Link to comment
50 minutes ago, bobobeastie said:

I'm pretty sure I need to delete the docker image

Most likely

50 minutes ago, bobobeastie said:

hopefully the templates are still there for adding back?

They're on the flash drive.  Go to apps, previous apps

 

51 minutes ago, bobobeastie said:

lost+found

Lost & found are file(s) that the system couldn't recover for one reason or another.  

Link to comment

Great thanks, I was worried for a minute about plex because my previous config didn't appear to be saved, but it was.  This is my second -L rodeo, so I knew what lost+found was.  What I meant was maybe the lack of files was a bad sign because perhaps some were lost.  But maybe that not how it works.  Anyway, thank you @Squid and @trurl

 

...doesn't look like I'm out of the woods yet, guessing the drive is bad:

Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546720
Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546728
Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546728
Mar 1 20:48:32 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 1161546728
Mar 1 20:51:24 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741416
Mar 1 20:51:25 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741448
Mar 1 20:51:26 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741544
Mar 1 20:51:26 NAStheRIPPER kernel: print_req_error: critical medium error, dev nvme0n1, sector 237741544
Mar 1 21:00:04 NAStheRIPPER kernel: print_req_error: critical target error, dev nvme1n1, sector 1997853383

 

The last error is from my former cache drive which I left installed, it is not able to run trim for some reason.  The current cache drive (nvmeOn1) is an HP SSD EX950 1TB.  One thing worth noting, when I last swapped nvme drives around on this board (designare threadripper), I noticed that there is something like a black rubber feeling thermal pad on the included nvme heatsink, and it seemed kind of wet, like from separated glue or something between the metal and "rubber", I just wiped them off and kept using them.  Rubber isn't thermally conductive so I'm sure it's something else.

 

When I click either SMART button it very quickly changes back to what it started as, not even time to read that it says stop, I had to run it on another drive to check how it normally works, so I don't think they are running.  I'm going to switch from prefer cache to Yes for all applicable shares to start offloading.  Should I run a smart check in the terminal?  If so how?

 

edit: Not sure it matters, but I just noticed I have the first bios version, and I'm 90% sure I updated it at some point.  maybe it kept a base copy and reverted at some point.  Looking at the change log, here https://www.gigabyte.com/us/Motherboard/X399-DESIGNARE-EX-rev-10/support#support-dl-bios it doesn't mention nvme, except only in relation to RAID, which obviously I'm not using.

 

nastheripper-diagnostics-20200301-2132.zip

Edited by bobobeastie
new info
Link to comment

I examined both nvme drives and swapped them.  Attached is an image of what I was talking about, and I don't think I'm concerned about it as a cause.  I rubbed one of them with my hand and it started to disintegrate, so I stopped.  It only makes contact with the tops of chips or stickers, and residue on those was minimal.

 

Once I started it back up, everything appears to be okay, at least for now.  I am letting mover run with all shares set to yes for cache so that I can set to no when done.  At that point I am open to suggestions as to what to do.

IMG_20200302_094737.jpg

Link to comment

Alrighty, apparently HP doesn't handle support for their SSD's, and their support gives a phone number with a full voicemail box, and an outdated email address belonging to a security camera company ([email protected]), When I googled the phone number I found a mention of [email protected], which I found was actually connected to HP.  The only reason I'm mentioning this here is so when someone googles any of these things they might be helped by this info.  This has taught me not to be an HP customer.

 

In the mean time, it would be great if I could offload from cache, but it just disabled itself again after mover was running very slowly.  The only thing I can think of trying, other than keep trying mover after fixing the fs errors, is to use backup/restore appdata, to make a copy or maybe copying over the share works just as well? Is there was away to ignore fs errors?

nastheripper-diagnostics-20200302-1534.zip

Link to comment
3 hours ago, bobobeastie said:

backup/restore appdata

If you have appdata backup that plus the templates on flash are all you need to get your dockers going again using the Previous Apps feature on the Apps page. You could do this with a different cache or even without cache.

Link to comment
54 minutes ago, trurl said:

If you have appdata backup that plus the templates on flash are all you need to get your dockers going again using the Previous Apps feature on the Apps page. You could do this with a different cache or even without cache.

Yes, and it looks like mover was able to move almost all of appdata, all that's left is a single file, nzbdrone.db-wal in binhex sonarr.  I'm assuming that's the kind of thing that will fix its self, or through reinstalling the container.  Everything else is torrents, which should fix themselves, and one recently transcoded backed up blu ray.  Not too bad.  I think I will run cache-less while I iron out the things below.  Is the procedure for this as simple as stop array, select no device for cache, then start array?

 

The future cache drive is a 1TB ADATA_SX6000NP.  I posted the issue I was having with it a while ago and it received no responses.  The issue being that when the trim plugin runs, I get an email saying "fstrim: /mnt/disks/ADATA_SX6000NP_XXXXXXXXXXX: FITRIM ioctl failed: Remote I/O error".  I would feel much better about using it as a cache if that problem was solved.  If I can get trim working, I'm thinking about switching to BTRFS and then RAID1'ing it and the replacement NVMe drives, as I understand I would then be able to lose 1 of them.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...