_Shorty

Members
  • Posts

    89
  • Joined

  • Last visited

Everything posted by _Shorty

  1. More testing revealed this was probably just pure luck. I'm still getting crashes even testing with /IPG:100, which appears to limit it to about 10 files per second when they're small, as one would expect. But whatever is making it crash is still making it crash, so it would seem /IPG doesn't make a dent with this one after all. I think this is likely because it only has any effect when a file transfer actually occurs. It doesn't seem to throttle any other kinds of activity, so when it is going through and checking file metadata to see if anything needs to be updated this is probably what's slapping around whatever is getting slapped around. Maybe I should try a different util to do the mirroring, and maybe that will sidestep these crashes.
  2. I've since learned that robocopy has a switch, /IPG, that tells it to insert an "inter-packet gap" of a specified number of milliseconds, "to free bandwidth on slow lines." I just arbitrarily tried 10 ms (/IPG:10) for this switch to see what happened, and have only had one crash since then. So whatever is going on, it seems to be fairly borderline, and these 10 ms gaps that I've now introduced seem to have nearly eliminated the problem. I don't know if it's having any appreciable effect on the amount of bandwidth file transfers are using. I'll have to run some more thorough tests. But so far, it has helped out the routine back scripts I use quite a bit. I'll have to run some tests with large files to see if transfer speeds are noticeably different or not. Some penalty there, if there even is any, would be fairly acceptable if it means I'm avoiding the crashes. Perhaps I'll play with different amounts of gaps and see if there's some number that seems to avoid crashes altogether without causing slow than acceptable transfer speeds. I haven't looked at the transfer speeds yet to see if it even is having any effect with just 10 ms added. Something worth testing anyway.
  3. Wow, ok, maybe I just lucked out prior to 6.12's release. I thought the issue only started with 6.12's release, but that seems to not actually be the case. I just tested with 6.11.5 and a run that produced 11 crashes with 6.12.4 only yielded 2 crashes with 6.11.5, so maybe my backup data simply contains more small files now than it did before 6.12's release. Perhaps that's why I never noticed it before, if it was possible prior to 6.12, which it appears to be. Perhaps not so coincidentally, the data I'm testing with is something I also only started working with recently. Maybe that was also close to the time 6.12 was released. Anyway, with 6.11.5 the issue seems to be present but much less severe, and with 6.12.4 it seems to be happening much more frequently. Perhaps I was just getting close to the line with 6.11.5 and never saw any crashes happen, if any did. But since then I may have more small files, and it seems as though 6.12 might be more sensitive to it than previous versions, and I'm past that line now and it crashes quite frequently during a run.
  4. Well, it only surfaced after 6.12 was released. If you like, I think going back to the last stable version before 6.12 should be easy enough, and I can test it there now. I really doubt anything is going on with my hardware, but that should reveal whether or not that's the case, I would imagine. Since it began immediately after installing 6.12 I don't imagine anything else is going to be at fault but 6.12 itself.
  5. Alright, I finally found some time to play with this some more last night and this morning. I enabled disk shares and tried the same routine with the same test directory, only this time using the disk share for the cache drive in addition to the usual user share on the array. I have been using more than one copy of the directory in question in order to make the crashes more repeatable, and that worked rather well, with many crashes occurring during a single run. Now I also tried paring it back to just a single copy of the directory and ran it a few times until I had a run with no crashes utilizing the user share. Using the disk share only required single runs, as it never seems to trigger the crashes. Even the 32-copy run went off without a hitch when using the disk share. And it sure is faster with the disk share. disk share single copy: 34.200 seconds no crashes user share single copy: 3:17.384 no crashes disk share eight copies: 4:48.440 no crashes user share eight copies: 1:09:10.257 with 11 crashes disk share 32 copies: 18:42.339 no crashes user share 32 copies: 2:32:28.529 with 13 crashes So it would seem to be something in the code that takes care of things with user shares. Deleting the test batch on the Windows box and rerunning the mirror operation in order to delete all the files on the unRAID box lead to some interesting problems with the crashes. Robocopy would try to delete all the files and directories on the unRAID box but would fail after the first crash happened. And after the unRAID box sorted itself out and was accessible again I would try the robocopy mirror command again to get it to try to complete the deletion job, but it would have trouble deleting some of the files/directories for some reason, or would just continually crash anyway. I'd have to go into the unRAID box myself to delete the remaining files/directories before I could try another test run. Quite strange.
  6. I still don't know if you are saying that the cache counts or does not count as having already tried a disk share. I'll try disabling the cache and then just create a disk share with it to see what happens. To add further information, I had to reinstall the OS on one of my Windows machines, and after doing so I tried to restore its backup files from my array. The same error occurred when it was reading all the files from the array as happened with the earlier tests, only this is reading from it rather than writing to it. I'll report back as to whether or not anything improves when using a disk by itself.
  7. Alright, I'm confused. Are you saying that copying to a cache drive would be the same as what you're asking me to try? I have an array with parity drives. And I have a single SSD for cache. Cache is turned on for all shares. So in every case where I did not specifically turn off the cache drive it was writing all those new files only to the cache drive itself, and the issue occurred. Disabling the cache so it was writing directly to the array also saw the issue occur at pretty much the same frequency.
  8. I tried expanding the test batch to see if it would repeat the error case more often by making 16 copies of the directory and doing mirror runs with the directories in place and moved elsewhere so it would do copy runs and delete runs. It didn't seem to make any difference to have the cache drive enabled or disabled. Each run it would trigger the error once or twice. Copying, no cache 2023/09/24 13:04:07 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\Joel Real Timing\trackmaps\virginia patriot\img\logo_pct.txt The network path was not found. 2023/09/24 13:48:35 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\LabRadar data - Copy to test unRAID crash 8\SR0179\TRK\Shot0099 Track.csv The network path was not found. Deleting, no cache 2023/09/24 14:09:04 ERROR 53 (0x00000035) Deleting Extra File \\Tower\Backups\Docs-Clay\LabRadar data - Copy to test unRAID crash 1\SR0165\TRK\Shot0037 Track.csv The network path was not found. Copying, with cache 2023/09/24 15:12:23 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\LabRadar data - Copy to test unRAID crash 10\SR0102\SR0102 BC 0.281 (min 15 dB SNR).png The network path was not found. 2023/09/24 15:18:54 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\Motec\i2\Workspaces\Inerters (Copy 4)\Track Maps\belleisle.mt2 The network path was not found. Deleting, with cache 2023/09/24 16:40:37 ERROR 53 (0x00000035) Deleting Extra File \\Tower\Backups\Docs-Clay\LabRadar data - Copy to test unRAID crash 1\SR0158\SR0158.lbr The network path was not found. 2023/09/24 16:44:45 ERROR 53 (0x00000035) Scanning Destination Directory \\Tower\Backups\Docs-Clay\Joel Real Timing\import - export\dashboard pages\Neil_Dashboards - default\ The network path was not found. I've attached another diagnostics zip from this time period. If you still think it would be worthwhile to try it with an isolated drive I suppose I could disable the cache again and make that drive a new share to test it with. Let me know and I can do that if you'd like. Hmm, would that involve lengthy parity shuffling? tower-diagnostics-20230924-1653.with.and.without.cache.16.dirs.zip
  9. Rather than letting it copy to the cache as usual? I suppose the easiest way to test that would just be to turn the cache off and try, eh?
  10. AMD Phenom II X4 965 Asus M3N72-D motherboard 8 GB RAM 2 parity SATA drives 10 data SATA drives 1 cache SATA SSD Dockers: binhex-krusader, qbittorrent, and recently added Czkawka to find dupe files but problem occurred before that docker was added. Currently have 6.12.4 running, but it happened with every 6.12.x stable revision so far, I think. I didn't know how to cause it before, but now I can recreate it on demand just by copying a whole bunch of 3-4 KB files at once (serially) from Windows using robocopy to mirror a directory. The whole server does not crash, as my current uptime is still showing nearly two weeks since I last restarted that box, but it stops responding to SMB traffic from the Windows machine(s), and the web UI stops responding. Whatever is going on seems to take about 3 minutes to resolve itself and then the web UI and SMB traffic will be responsive again and things seem normal again. Normal near-idle file traffic, say with a HTPC streaming a movie, never seems to have any issues. But when I start a backup of a bunch of files via robocopy and it contains a fair number of small files something freaks out and the machine goes MIA for ~3 minutes. My current test crop is a directory containing just over 7,000 files mostly 3-4 KB in size, which are just a bunch of CSV files from a chronograph. I'll just make another copy of that directory and start the robocopy again to get it to mirror the parent directory as part of a routine backup, thus copying the new test directory during the process. Once it starts firing off all the small files it is only a matter of time before whatever is going on will trigger and the machine will then be basically unreachable for ~3 minutes, after which it seems to be back to normal. At least, unless that condition is met again and it goes MIA again, whatever that condition is. I'm thinking this only started with the initial 6.12 stable release. I don't think I was using any of the release candidates prior to that, and don't think I ever saw any similar behaviour prior to 6.12, either. At any rate, I can make it happen now with 100% certainty. Any ideas? Diagnostics file attached. edit: If it helps, there should be an occurrence around 11:33:44 am. 2023/09/23 11:33:44 ERROR 53 (0x00000035) Copying File C:\Users\Clay\Documents\LabRadar data - Copy to test unRAID crash\SR0157\TRK\Shot0015 Track.csv The network path was not found. Waiting 30 seconds... tower-diagnostics-20230923-1142.zip
  11. Perhaps it is a language/communication issue. You seem to be saying specifically not to worry if it reports errors because errors are ok.
  12. Do you honestly not recognize how silly this response is? The whole point of the util is to inform you of hash mismatches if there ever is one, because a hash mismatch means you now have a corrupt file that you need to deal with. You're saying to ignore hash mismatch reports because they mean nothing and everything is actually fine. Perhaps you should take the weekend and think about why this is ridiculous to say.
  13. Are you supposing that a file that is a mere 2945 bytes was somehow hashed by the util before it completed writing and that's why the hash was incorrect? Heh. I sincerely doubt that. Saying "A file's hash is different if you hash only a portion of the file." is kind of a silly thing to say. Of course it is going to give a different hash. It is different data. I'd be incredibly surprised if we were talking about incomplete files being hashed, espectially given my initial report involving just 2945 bytes of data. By your own admission, the util does not function correctly. No point trying to defend it by saying it is mostly ok a lot of the time. This is something that is supposed to be 100% correct 100% of the time. If it is not, there's no point in using it.
  14. Well, in my case, nothing uses any of my shares directly except for Kodi accessing files for playback. All files that are copied to the unRAID box are either copied there manually, or backed up occasionally via a scheduled robocopy. The file in question in the screenshot I posted way back then is an image file that belongs to a game. It would have been copied over to the unRAID box once and never changed. Nothing would have updated it. Nothing would have edited it. And after initially being put there by robocopy, robocopy would not have put it there again or touched it in any way that I'm aware of. It is a file that the game devs did not change, so there would be no reason for it to be updated during one of those backup runs. I could be wrong, but I see no reason for a false positive other than something within this util itself, because as far as I'm aware the file was copied to unRAID once and never touched again.
  15. Well, I first posted about this when I saw it happening to me back in June of 2018 here: And I'm not the only one that has posted in this thread reporting that this is happening on their machines. I don't think anyone wants to know that some random percentage of their files are probably ok. Seems to me the whole point to begin with is to be able to trust that 100% of your files are ok if it reports that they are, and if there happen to actually be files that are not ok then they should be reported as not being ok. But if it is falsely reporting hash mismatches when a handful of other hashing utils report that the reportedly changed file has not in fact changed at all then that points to the util not operating correctly. And if it is not operating correctly then it can't be trusted. And if it can't be trusted then it fails to meet its goal. I'm rather confused as to how this issue hasn't received any attention in all this time. We're talking over four years of a major bug seemingly being ignored. Perhaps it should get some dev attention so that it can eventually be remedied, and the util then rendered actually useful. I'm afraid it isn't useful while it is not operating correctly. To be fair, perhaps it did receive some dev attention at some point, but apparently it is still exhibiting this flaw, so this would indicate the root of the problem was never actually fixed. Hopefully it will be fixed at some point, because it would be nice to actually be able to use and trust such a thing.
  16. So, is this util still broken and people are still using it? Anyway... If you only have a handful of files you'd like to do this for, just move them somewhere else, and then move them back.
  17. No, Nvidia NIC, if I remember. It's a pretty old AMD system with an Nvidia chipset. AMD Phenom™ II X4 965 CPU, ASUS M3N72-D motherboard with NVIDIA nForce 750a SLI chipset. The CPU and the Realtek 2.5 Gbps were only giving me about 800 Mbps while maxing out one of the cores, haha. So for now I went back to the Nvidia NIC, as it offloads enough from the CPU to max out its 1 Gbps. But at least it is another reason to upgrade that machine to something newer with an NVMe SSD for cache. At least I have a 2.5 Gbps NIC that I know will work if whatever I get doesn't already have a 2.5 Gbps port built in. The Realtek seems to have a larger software component that the old CPU just can't crunch through quickly enough to make it worthwhile to use in that old machine.
  18. I'm happy to say I figured out the issue. For whatever reason something wasn't happy with both the motherboard ethernet and the new NIC installed. I did manage to find an old GPU I could put in the box so I could do some investigating, and the first thing I tried was disabling the onboard ethernet to see if that might be enough. Luckily it was just the ticket. Fired up the machine and it showed 1.0 Gbps at the switch at first, but once the OS started booting and hardware drivers started getting loaded, when it got around to the driver for the NIC the light went out on the switch and shortly afterwards came back on but showed a 2.5 Gbps connection. I'm glad it was as simple as that. Thanks again for your time anyway.
  19. Without a GPU it isn't quite that simple, but thank you for trying to help. I can't remember if I have a spare video card lying around, but if I do I will look further into it. Otherwise I'll just live with it as is for now. Thanks.
  20. Yeah, my new main/gaming rig I built recently has a 2.5 Gbps NIC in the motherboard. I also upgraded my internet service to 1.5 Gbps, and that new modem has four ethernet jacks, but one is a 2.5 Gbps jack, so I figured I might as well get a 2.5 Gbps switch and NIC for the unRAID box and speed up transfers between it and my main box. Now, the unRAID box is headless. (Doesn't even have a GPU in it.) I'm not sure how I can get the diagnostics file off of it to take a look, as with the new NIC installed I can't connect to it. I imagine as soon as I take it out and connect back to the original NIC and fire it up the diagnostics will be overwritten as it is booting up, no? edit: Ah, I see I actually have to run the diagnostics tool to get that info. I can't do that without being able to connect to it over the network. Doh!
  21. Well, I'm sorry to say the card arrived and does not work for some reason. When I power on the box my switch shows that it is connected at 1.0 Gbps and then as the box continues through the boot cycle it eventually stops showing as connected at the switch. The first card I had would still show connected at the switch at 1.0 Gbps, but would never get a network connection in the OS. It's all quite puzzling. Guess I'm stuck with 1.0 Gbps via the motherboard NIC. Oh well. edit: The card looks different than the first one I had, but also looks different than the one in the picture at the link. Newer revision yet again, I suppose. This one's got a heatsink on it.
  22. Have you selected any files/directories to attempt a job yet? They're inaccessible until there's something selected, at which point they become clickable.
  23. At 2.5 Gbps? I snagged this one the other day, which looks very similar but is perhaps/probably a different revision, and it would only work at 1.0 Gbps. https://www.amazon.ca/gp/product/B08VWQFQ5Z/ I returned it since it did me no good sitting at 1.0 Gbps.
  24. As has been said by more than one user, when this plugin says there are hash errors but external tools say the hashes/files have not changed, you cannot blame anything but this plugin. My server is 100% fine. If it were not, there would be other issues rearing their heads. This plugin incorrectly stating files have changed is a problem with this plugin. Again, I am not the only person that has reported this issue. And it's not like I'm here fuming mad about an issue the dev(s) are unaware of. I'm simply reporting, again, that the plugin does not properly do what it is designed to do. It really does not matter to me at all if it ever works, as the more I think about it there isn't even much purpose to it. Other hash tools report unchanged hashes. Parity does not complain. So is my file actually corrupt because the plugin states it is? Incredibly unlikely. If you take half a dozen hash tools and they all report the same hash, at some point you have to believe the hash is correct. What bothers me is person after person trickling into this thread every now and then reporting, seemingly full of anxiety and in a panic, that their file(s) are being reported as corrupt. When the fact of the matter is their files are more than likely 100% fine, and the plugin simply is malfunctioning. Reporting that it is not functioning correctly for everyone and to not trust it as a result is most certainly constructive.