I should preface this by saying all was going well until I changed the structure (cache pools/size etc) of my array, moved some files with unBalancer, and then all hell broke loose - but only for Postgresql!
Also I don't have backups and I've accepted my loss, which isn't too bad as while the db was important it only had recent data on it. I am sourcing a proper backup solution and won't be changing the array again, but I'm new to all this and it has been fun.
My setup includes a HP Dl380P G8 running 6.10.0-rc4. I've got 50+ containers running something or other, many with a mix of NFS share mounts or standard binds to /mnt/user/data or /mnt/user/appdata.
I also had issues with my NFS shares where they would drop off after a single read/write and I would have to restart the container to get it back. After reading the forums to fix this I have to: disable hard links, remove cache, or use CIFS. I chose to disable hard links as I assumed any were viable options.
Note: I started using NFS shares as my *arr stack, jellyfin, nzbd, and qbit/deluge were having permission issues even though they're mostly lsio containers running on 1000:1000 (iirc). Trying to have '/mnt/user/data/shows/??' available to multiple containers was the issue I think. Advice here welcome too.
This fixed my NFS issue but lead me to the Postgres issue I think - even though Postgres is not using an NFS share, it is bind mounted to /mnt/user/appdata/postgres.
My current issue is once the array was reconfigured (1 parity, 6 disks) and a new cache drive, I had mostly everything working with no problems, trying to start Postgresql official docker running 14.2, I started to get an error:
2022-04-28 18:48:12.879 ACST [29] LOG: could not link file "pg_wal/xlogtemp.29" to "pg_wal/000000010000000000000001": Function not implemented
2022-04-28 18:48:12.882 ACST [29] FATAL: could not open file "pg_wal/000000010000000000000001": No such file or directory
Stackoverflow posts tell me my db is f*****. Fair enough, they recommended to simply reinstall the Postgres instance and copy the /data directory back. I did this but the issue persisted. I thought maybe disabling the hardlinks caused it (reference 'could not link file') but the kicker is I had it working last night with a fresh DB (literally dozens of re installs and trying nothing new and it worked one time) but now it's not working.
As noted, while the new container worked, I had issues simply creating a new container having the same issue as above, even though the files should not exists.
Questions:
- Could disabling the hard link support in Global Share Settings be causing this issue?, and
- Why would the link issue persist even when I completely reinstall the container?
I believe I know my long-term fix would be to:
- Properly establish bind mounts for all containers, or
- Migrate any share to CIFS/SMB
Please let me know which log files you need.