Jump to content

robobub

Members
  • Content Count

    19
  • Joined

  • Last visited

Community Reputation

1 Neutral

About robobub

  • Rank
    Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Indeed this still occurs in safe mode with all plugins disabled. Any ideas or troubleshooting steps? diagnostics from safe mode: tower-diagnostics-20200119-2337.zip Has no one experienced this with Btrfs-encrypted array drives ad a xfs-encrypted cache drive with a cache=yes share? Importing any files with many programs is a major pain, as any initial files are error'd out, requiring me to have to manually re-import those selected files.
  2. So after a few tests, here's what's occuring. It's not a time delay. It's only making one directory at a time on /mnt/cache/$share to match what is on /mnt/user/$share What could possibly be causing this? Persisted across reboots. After this next preclear, I'll be running in safe mode without plugins to see if it still occurs. Test command: while (true); do tree /mnt/cache/Archive; touch a; tree /mnt/cache/Archive; echo "Sleeping for 5 seconds..."; sleep 5; done So, after touching a file, it makes the first missing directory instantly. After 5 seconds, you can see no more have been made. Full output of that command: 20200119_unraid_cache_write_issue.log
  3. Thanks, will update that. Was just a temporary lazy way of having it dynamically SMB exported without modifying smb-extra.conf. Better ideas welcome, I am new to unRAID. Out of curiosity I did see if it helped the "not supported" issue but unfortunately not.
  4. I have a 3 drive array + 1 parity disk and 1 cache drive. For a share that has cache=yes and a directory that hasn't been written to recently (seems like when the directory doesn't exist yet on /mnt/cache/), writes are initially "not supported" but then soon after become possible. EDIT: It's not a time delay. It's only making one directory at a time on /mnt/cache/$share to match what is on /mnt/user/$share, see my test here Arrays are Btrfs-encrypted, cache is xfs-encrypted, tons of free space on both, everything spun up. This does not occur on shares with cache=no. Ran some tools like Docker Safe New Permissions, no effect. Reproducible by just going to a different directory that exists on the array but not in /mnt/cache yet. Occurs in safe mode as well, diagnostics on latest post. On 6.8.1 Windows via smb says: Diagnostics: tower-diagnostics-20200118-1549.zip
  5. So while it was a conscious decision to exclude cache, but me and many others like to store some data longer term on cache. Is it possible to have an option to check cache disks for integrity? It happens to be my only disk without Btrfs, so no scrub is available either
  6. I have an issue where the vpn goes down but the watchdog doesn't detect it. I've attached by supervisord.log with debug turned on. From looking at watchdog.sh it seems that I still technically have a vpn connection (port, ip address), it's just I can no longer access anything (e.g. pinging google or anything else doesn't work). Since I know it's likely just my VPN provider is poor and has a unique failure case, what's the best way to add / modify a script to the docker (e.g. checking ping and then touching /tmp/portclosed) to check this yet still get any updates from binhex's upstream docker? I could do this externally with a user.script and docker exec -d, but I'm not seeing a way around having to run this regularly and checking if the script is running as there's no way to synchronize with docker starting the container or restarting it. 20200116_vpn_notresponding_supervisord.zip
  7. I had similar symptoms, using an older Samsung 830 SSD as a single Btrfs LUKS-encrypted cache. When copying very large file, iowait would hit the 80's and then at some point the system became unresponsive, and write speeds were around 80 MB/s. Howerver, moving to XFS LUKS-encrypted did not help things at all. In my case, it had to do with LUKS-encryption. Moving to non-encrypted cache, either Btrfs or XFS, iowait would be much lower, and write speeds at 200. However, I'm on an i7-3770 which has AES acceleration and have barely any CPU utilization One guess is that the 830 controller doesn't handle incompressible data as well, but looking at reviews, that's where it shined compared to Sandforce controllers. Some searching lead me to this post: Setting the IO Scheduler to none for my cache drive helped a bit, but lowering nr_requests with any IO scheduler helped more, at least in my case.
  8. So the issue is one of my failing drives, that I'm running tests on, has become unresponsive and doesn't respond to smartctl. It is interesting though since it is still making a bit of progress on badblocks, and the smartctl query that's hung even has a timeout parameter, but that doesn't help. So the GUI is also querying that drive and hanging. Nothing in dmesg about the drive. Removing that drive restores everything without rebooting.
  9. Thanks for the suggestions. I tried closing all browser sessions. I even sent SIGSTOP to preclear and badblocks, paused all docker containers, and the issue remains. The system is very responsive via terminal and the docker container GUIs even before any of the above. Preclear and badblocks have been running for days (I'm doing a lot of passes, as these are old drives and I'm paranoid) before this issue. I do have a swapfile on my SSD on this machine until more memory arrives, but nothing is being swapped in and out (according to dstat). I've captured top before and after pausing everything (36% idle, 56% iowait and 72% idle, 25% iowait respectively), let me know if there's anything else I can provide. Is restarting the nginx service safe to do and a potential way of recovering? Though perhaps it's worth keeping in this state to figure out what happened. top-2020-01-14.zip
  10. I uploaded a zip file just now. diagnostics still hasn't completed and isn't using very much CPU, so it seems like it's another potentially related issue, so that's why I just grabbed the syslog. Let me know what other information I can provide to help debug both the GUI issue and diagnostic collection. I'm comfortable with modifying some php code in /usr/local/sbin/diagnostics
  11. unRAID's GUI stopped responding to me, reporting "500 Internal Server Error" and lots of "upstream timed out (110: Connection timed out) " and "auth request unexpected status: 504 while sending to client" in the syslog. All of my docker GUIs, shares, smb, ssh, etc. all still seem to work fine. I did run diagnostics on the command line but it seems to be taking a very long time to run. I am running a fair amount of stuff (duplicati doing a large backup, downloads, preclears, badblocks on drives that are unassigned which discovered issues) but it had been working fine for hours. I'll post my syslog for now, and diagnostics when that process finishes. Along with finding a cause, is there a way to recover the GUI without rebooting? Issues started around Jan 14 13:30 this is a recent setup, started with 6.8.0 a few weeks ago, and upgraded to 6.8.1 a few days ago
  12. Unraid has redundancy, it's just distributed differently. And yes, the idea is parity would not be synced in this scenario because the first read after bitrot occurs, the individual disk filesystem notices a different checksum. Unraid would then need to read all the corresponding sections from other disks to figure out which exact bits changed in the extent. Yes, literally rebuilding from parity, but automatically and just that one corrupted extent. Now, this would require more synchronization between a checksumming filesystem and Unraid and maybe that is not easily achievable.
  13. I see where the misunderstanding is coming from. You've missed an important part of my feature request: integration with checksumming. That is what both BTRFS and Dynamix File Integrity that I mention offer for detecting errors. That tells you which drive has the error, and where. Then it's a matter of using parity to determine which bit(s) are corrupted. This is essentially how BTRFS and ZFS can do that silent corruption repair when they have parity. Does that make sense?
  14. Please help me understand how that statement is incorrect. Unless you're being pedantic and would prefer "the data can be reconstructed from parity and the other drives"
  15. Presumably with a watt meter that the server's PSU plugs into.