_Shorty

Members
  • Posts

    89
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

_Shorty's Achievements

Rookie

Rookie (2/14)

8

Reputation

  1. More testing revealed this was probably just pure luck. I'm still getting crashes even testing with /IPG:100, which appears to limit it to about 10 files per second when they're small, as one would expect. But whatever is making it crash is still making it crash, so it would seem /IPG doesn't make a dent with this one after all. I think this is likely because it only has any effect when a file transfer actually occurs. It doesn't seem to throttle any other kinds of activity, so when it is going through and checking file metadata to see if anything needs to be updated this is probably what's slapping around whatever is getting slapped around. Maybe I should try a different util to do the mirroring, and maybe that will sidestep these crashes.
  2. I've since learned that robocopy has a switch, /IPG, that tells it to insert an "inter-packet gap" of a specified number of milliseconds, "to free bandwidth on slow lines." I just arbitrarily tried 10 ms (/IPG:10) for this switch to see what happened, and have only had one crash since then. So whatever is going on, it seems to be fairly borderline, and these 10 ms gaps that I've now introduced seem to have nearly eliminated the problem. I don't know if it's having any appreciable effect on the amount of bandwidth file transfers are using. I'll have to run some more thorough tests. But so far, it has helped out the routine back scripts I use quite a bit. I'll have to run some tests with large files to see if transfer speeds are noticeably different or not. Some penalty there, if there even is any, would be fairly acceptable if it means I'm avoiding the crashes. Perhaps I'll play with different amounts of gaps and see if there's some number that seems to avoid crashes altogether without causing slow than acceptable transfer speeds. I haven't looked at the transfer speeds yet to see if it even is having any effect with just 10 ms added. Something worth testing anyway.
  3. Wow, ok, maybe I just lucked out prior to 6.12's release. I thought the issue only started with 6.12's release, but that seems to not actually be the case. I just tested with 6.11.5 and a run that produced 11 crashes with 6.12.4 only yielded 2 crashes with 6.11.5, so maybe my backup data simply contains more small files now than it did before 6.12's release. Perhaps that's why I never noticed it before, if it was possible prior to 6.12, which it appears to be. Perhaps not so coincidentally, the data I'm testing with is something I also only started working with recently. Maybe that was also close to the time 6.12 was released. Anyway, with 6.11.5 the issue seems to be present but much less severe, and with 6.12.4 it seems to be happening much more frequently. Perhaps I was just getting close to the line with 6.11.5 and never saw any crashes happen, if any did. But since then I may have more small files, and it seems as though 6.12 might be more sensitive to it than previous versions, and I'm past that line now and it crashes quite frequently during a run.
  4. Well, it only surfaced after 6.12 was released. If you like, I think going back to the last stable version before 6.12 should be easy enough, and I can test it there now. I really doubt anything is going on with my hardware, but that should reveal whether or not that's the case, I would imagine. Since it began immediately after installing 6.12 I don't imagine anything else is going to be at fault but 6.12 itself.
  5. Alright, I finally found some time to play with this some more last night and this morning. I enabled disk shares and tried the same routine with the same test directory, only this time using the disk share for the cache drive in addition to the usual user share on the array. I have been using more than one copy of the directory in question in order to make the crashes more repeatable, and that worked rather well, with many crashes occurring during a single run. Now I also tried paring it back to just a single copy of the directory and ran it a few times until I had a run with no crashes utilizing the user share. Using the disk share only required single runs, as it never seems to trigger the crashes. Even the 32-copy run went off without a hitch when using the disk share. And it sure is faster with the disk share. disk share single copy: 34.200 seconds no crashes user share single copy: 3:17.384 no crashes disk share eight copies: 4:48.440 no crashes user share eight copies: 1:09:10.257 with 11 crashes disk share 32 copies: 18:42.339 no crashes user share 32 copies: 2:32:28.529 with 13 crashes So it would seem to be something in the code that takes care of things with user shares. Deleting the test batch on the Windows box and rerunning the mirror operation in order to delete all the files on the unRAID box lead to some interesting problems with the crashes. Robocopy would try to delete all the files and directories on the unRAID box but would fail after the first crash happened. And after the unRAID box sorted itself out and was accessible again I would try the robocopy mirror command again to get it to try to complete the deletion job, but it would have trouble deleting some of the files/directories for some reason, or would just continually crash anyway. I'd have to go into the unRAID box myself to delete the remaining files/directories before I could try another test run. Quite strange.
  6. I still don't know if you are saying that the cache counts or does not count as having already tried a disk share. I'll try disabling the cache and then just create a disk share with it to see what happens. To add further information, I had to reinstall the OS on one of my Windows machines, and after doing so I tried to restore its backup files from my array. The same error occurred when it was reading all the files from the array as happened with the earlier tests, only this is reading from it rather than writing to it. I'll report back as to whether or not anything improves when using a disk by itself.
  7. Alright, I'm confused. Are you saying that copying to a cache drive would be the same as what you're asking me to try? I have an array with parity drives. And I have a single SSD for cache. Cache is turned on for all shares. So in every case where I did not specifically turn off the cache drive it was writing all those new files only to the cache drive itself, and the issue occurred. Disabling the cache so it was writing directly to the array also saw the issue occur at pretty much the same frequency.
  8. I tried expanding the test batch to see if it would repeat the error case more often by making 16 copies of the directory and doing mirror runs with the directories in place and moved elsewhere so it would do copy runs and delete runs. It didn't seem to make any difference to have the cache drive enabled or disabled. Each run it would trigger the error once or twice. Copying, no cache 2023/09/24 13:04:07 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\Joel Real Timing\trackmaps\virginia patriot\img\logo_pct.txt The network path was not found. 2023/09/24 13:48:35 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\LabRadar data - Copy to test unRAID crash 8\SR0179\TRK\Shot0099 Track.csv The network path was not found. Deleting, no cache 2023/09/24 14:09:04 ERROR 53 (0x00000035) Deleting Extra File \\Tower\Backups\Docs-Clay\LabRadar data - Copy to test unRAID crash 1\SR0165\TRK\Shot0037 Track.csv The network path was not found. Copying, with cache 2023/09/24 15:12:23 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\LabRadar data - Copy to test unRAID crash 10\SR0102\SR0102 BC 0.281 (min 15 dB SNR).png The network path was not found. 2023/09/24 15:18:54 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\Motec\i2\Workspaces\Inerters (Copy 4)\Track Maps\belleisle.mt2 The network path was not found. Deleting, with cache 2023/09/24 16:40:37 ERROR 53 (0x00000035) Deleting Extra File \\Tower\Backups\Docs-Clay\LabRadar data - Copy to test unRAID crash 1\SR0158\SR0158.lbr The network path was not found. 2023/09/24 16:44:45 ERROR 53 (0x00000035) Scanning Destination Directory \\Tower\Backups\Docs-Clay\Joel Real Timing\import - export\dashboard pages\Neil_Dashboards - default\ The network path was not found. I've attached another diagnostics zip from this time period. If you still think it would be worthwhile to try it with an isolated drive I suppose I could disable the cache again and make that drive a new share to test it with. Let me know and I can do that if you'd like. Hmm, would that involve lengthy parity shuffling? tower-diagnostics-20230924-1653.with.and.without.cache.16.dirs.zip
  9. Rather than letting it copy to the cache as usual? I suppose the easiest way to test that would just be to turn the cache off and try, eh?
  10. AMD Phenom II X4 965 Asus M3N72-D motherboard 8 GB RAM 2 parity SATA drives 10 data SATA drives 1 cache SATA SSD Dockers: binhex-krusader, qbittorrent, and recently added Czkawka to find dupe files but problem occurred before that docker was added. Currently have 6.12.4 running, but it happened with every 6.12.x stable revision so far, I think. I didn't know how to cause it before, but now I can recreate it on demand just by copying a whole bunch of 3-4 KB files at once (serially) from Windows using robocopy to mirror a directory. The whole server does not crash, as my current uptime is still showing nearly two weeks since I last restarted that box, but it stops responding to SMB traffic from the Windows machine(s), and the web UI stops responding. Whatever is going on seems to take about 3 minutes to resolve itself and then the web UI and SMB traffic will be responsive again and things seem normal again. Normal near-idle file traffic, say with a HTPC streaming a movie, never seems to have any issues. But when I start a backup of a bunch of files via robocopy and it contains a fair number of small files something freaks out and the machine goes MIA for ~3 minutes. My current test crop is a directory containing just over 7,000 files mostly 3-4 KB in size, which are just a bunch of CSV files from a chronograph. I'll just make another copy of that directory and start the robocopy again to get it to mirror the parent directory as part of a routine backup, thus copying the new test directory during the process. Once it starts firing off all the small files it is only a matter of time before whatever is going on will trigger and the machine will then be basically unreachable for ~3 minutes, after which it seems to be back to normal. At least, unless that condition is met again and it goes MIA again, whatever that condition is. I'm thinking this only started with the initial 6.12 stable release. I don't think I was using any of the release candidates prior to that, and don't think I ever saw any similar behaviour prior to 6.12, either. At any rate, I can make it happen now with 100% certainty. Any ideas? Diagnostics file attached. edit: If it helps, there should be an occurrence around 11:33:44 am. 2023/09/23 11:33:44 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\LabRadar data - Copy to test unRAID crash\SR0157\TRK\Shot0015 Track.csv The network path was not found. Waiting 30 seconds... tower-diagnostics-20230923-1142.zip
  11. Perhaps it is a language/communication issue. You seem to be saying specifically not to worry if it reports errors because errors are ok.
  12. Do you honestly not recognize how silly this response is? The whole point of the util is to inform you of hash mismatches if there ever is one, because a hash mismatch means you now have a corrupt file that you need to deal with. You're saying to ignore hash mismatch reports because they mean nothing and everything is actually fine. Perhaps you should take the weekend and think about why this is ridiculous to say.
  13. Are you supposing that a file that is a mere 2945 bytes was somehow hashed by the util before it completed writing and that's why the hash was incorrect? Heh. I sincerely doubt that. Saying "A file's hash is different if you hash only a portion of the file." is kind of a silly thing to say. Of course it is going to give a different hash. It is different data. I'd be incredibly surprised if we were talking about incomplete files being hashed, espectially given my initial report involving just 2945 bytes of data. By your own admission, the util does not function correctly. No point trying to defend it by saying it is mostly ok a lot of the time. This is something that is supposed to be 100% correct 100% of the time. If it is not, there's no point in using it.
  14. Well, in my case, nothing uses any of my shares directly except for Kodi accessing files for playback. All files that are copied to the unRAID box are either copied there manually, or backed up occasionally via a scheduled robocopy. The file in question in the screenshot I posted way back then is an image file that belongs to a game. It would have been copied over to the unRAID box once and never changed. Nothing would have updated it. Nothing would have edited it. And after initially being put there by robocopy, robocopy would not have put it there again or touched it in any way that I'm aware of. It is a file that the game devs did not change, so there would be no reason for it to be updated during one of those backup runs. I could be wrong, but I see no reason for a false positive other than something within this util itself, because as far as I'm aware the file was copied to unRAID once and never touched again.
  15. Well, I first posted about this when I saw it happening to me back in June of 2018 here: And I'm not the only one that has posted in this thread reporting that this is happening on their machines. I don't think anyone wants to know that some random percentage of their files are probably ok. Seems to me the whole point to begin with is to be able to trust that 100% of your files are ok if it reports that they are, and if there happen to actually be files that are not ok then they should be reported as not being ok. But if it is falsely reporting hash mismatches when a handful of other hashing utils report that the reportedly changed file has not in fact changed at all then that points to the util not operating correctly. And if it is not operating correctly then it can't be trusted. And if it can't be trusted then it fails to meet its goal. I'm rather confused as to how this issue hasn't received any attention in all this time. We're talking over four years of a major bug seemingly being ignored. Perhaps it should get some dev attention so that it can eventually be remedied, and the util then rendered actually useful. I'm afraid it isn't useful while it is not operating correctly. To be fair, perhaps it did receive some dev attention at some point, but apparently it is still exhibiting this flaw, so this would indicate the root of the problem was never actually fixed. Hopefully it will be fixed at some point, because it would be nice to actually be able to use and trust such a thing.